1 00:00:04,190 --> 00:00:06,630 Hi everybody, Welcome to 2 00:00:06,630 --> 00:00:08,099 the System Science Friday 3 00:00:08,099 --> 00:00:09,359 noon seminar series. 4 00:00:09,359 --> 00:00:11,129 And today we're glad to 5 00:00:11,129 --> 00:00:14,265 have one of our graduates. 6 00:00:14,265 --> 00:00:17,339 David Percy goes by Percy, 7 00:00:17,339 --> 00:00:19,650 and he's been 8 00:00:19,650 --> 00:00:22,049 an instructor or assistant professor, 9 00:00:22,049 --> 00:00:23,925 whatever they call you nowadays. 10 00:00:23,925 --> 00:00:26,759 Over in the geology for many, 11 00:00:26,759 --> 00:00:29,535 many years with a focus on 12 00:00:29,535 --> 00:00:33,334 GIS instruction and research. 13 00:00:33,334 --> 00:00:35,569 And I'll let him complete his introduction. 14 00:00:35,569 --> 00:00:36,649 He can tell you lots more 15 00:00:36,649 --> 00:00:38,254 interesting things about themselves. 16 00:00:38,254 --> 00:00:39,079 Take it away. 17 00:00:39,079 --> 00:00:40,655 Thank you, Wayne. 18 00:00:40,655 --> 00:00:43,279 I just want to say how happy 19 00:00:43,279 --> 00:00:44,030 I am to be here in 20 00:00:44,030 --> 00:00:45,619 the System Science Seminar. 21 00:00:45,619 --> 00:00:48,049 This is my true home. 22 00:00:48,049 --> 00:00:50,435 I'm in a lot of ways at PSU, 23 00:00:50,435 --> 00:00:51,889 even though I have 24 00:00:51,889 --> 00:00:53,269 been in the geology department for 25 00:00:53,269 --> 00:00:54,589 over 20 years and got 26 00:00:54,589 --> 00:00:57,319 my undergraduate degree there as well. 27 00:00:57,319 --> 00:00:59,330 As pointed out, 28 00:00:59,330 --> 00:01:00,829 I've done a lot of graduate work 29 00:01:00,829 --> 00:01:05,165 in systems science and did a master's here, 30 00:01:05,165 --> 00:01:08,659 some just based on coursework. 31 00:01:08,659 --> 00:01:11,165 And so I've been, 32 00:01:11,165 --> 00:01:13,800 I've been interested in 33 00:01:14,130 --> 00:01:17,200 all sorts of exotic 34 00:01:17,200 --> 00:01:18,925 computing as I like to put it, 35 00:01:18,925 --> 00:01:21,639 with regards to a cellular automata and 36 00:01:21,639 --> 00:01:22,870 fractals and chaos and 37 00:01:22,870 --> 00:01:24,850 artificial intelligence course. 38 00:01:24,850 --> 00:01:26,350 And I did do 39 00:01:26,350 --> 00:01:29,169 my first neural networks class in 40 00:01:29,169 --> 00:01:31,149 the system science department right here 41 00:01:31,149 --> 00:01:34,389 with Georgia and Doris back in 1991. 42 00:01:34,389 --> 00:01:36,895 And I ended up 43 00:01:36,895 --> 00:01:40,240 retaking that class around 2015 or 2016, 44 00:01:40,240 --> 00:01:43,070 and it was remarkably similar. 45 00:01:45,390 --> 00:01:50,019 So that's the cool thing 46 00:01:50,019 --> 00:01:51,879 is that we've had this, 47 00:01:51,879 --> 00:01:56,540 this AI and machine learning stuff ANR in 48 00:01:56,540 --> 00:01:58,880 our toolkit for 49 00:01:58,880 --> 00:02:01,369 a long time in system science. 50 00:02:01,369 --> 00:02:03,575 And it's just, it's fun. 51 00:02:03,575 --> 00:02:04,909 It's good to be able to 52 00:02:04,909 --> 00:02:06,605 apply it to different things. 53 00:02:06,605 --> 00:02:08,810 And so yeah, so 54 00:02:08,810 --> 00:02:12,350 let's what I'm doing with it with 55 00:02:12,350 --> 00:02:16,325 my work today is sort of 56 00:02:16,325 --> 00:02:18,200 piggybacking off of something that I 57 00:02:18,200 --> 00:02:20,929 do in the geology department. 58 00:02:20,929 --> 00:02:23,480 I'm a database guy and 59 00:02:23,480 --> 00:02:26,314 I've been dealing with large databases for 60 00:02:26,314 --> 00:02:28,550 decades And I was doing 61 00:02:28,550 --> 00:02:30,605 medical research for a long time 62 00:02:30,605 --> 00:02:31,954 and then I got into geology. 63 00:02:31,954 --> 00:02:34,369 And so the geologic database, 64 00:02:34,369 --> 00:02:36,290 of course, was one of 65 00:02:36,290 --> 00:02:37,160 the obvious things for 66 00:02:37,160 --> 00:02:38,675 me to get involved with. 67 00:02:38,675 --> 00:02:40,940 And I was actually involved back in 68 00:02:40,940 --> 00:02:42,470 the early 2000s in 69 00:02:42,470 --> 00:02:43,610 getting the standard that 70 00:02:43,610 --> 00:02:44,585 we're talking about today. 71 00:02:44,585 --> 00:02:47,359 This particular standard adopted 72 00:02:47,359 --> 00:02:49,594 by the state then. 73 00:02:49,594 --> 00:02:52,670 And then I've also taken over as 74 00:02:52,670 --> 00:02:54,634 the framework implementation lead 75 00:02:54,634 --> 00:02:56,975 for this dataset for the state of Oregon. 76 00:02:56,975 --> 00:02:59,600 I'm really involved and 77 00:02:59,600 --> 00:03:01,759 they organs geologic map. 78 00:03:01,759 --> 00:03:03,740 And that's kinda, That's actually how 79 00:03:03,740 --> 00:03:06,810 I got involved with this stuff. 80 00:03:06,940 --> 00:03:10,220 So the takeaway from 81 00:03:10,220 --> 00:03:12,320 this whole presentation is 82 00:03:12,320 --> 00:03:13,955 right here on this slide. 83 00:03:13,955 --> 00:03:17,179 Oregon geology is mapped at different scales 84 00:03:17,179 --> 00:03:20,555 in similar terrains. Let's unpack that. 85 00:03:20,555 --> 00:03:24,904 Scales and mapping are like one to 24,000, 86 00:03:24,904 --> 00:03:26,945 where one unit on your map 87 00:03:26,945 --> 00:03:28,280 represents 24,000 88 00:03:28,280 --> 00:03:29,750 of those units out in real space, 89 00:03:29,750 --> 00:03:32,630 all the way out to serve like 1.2000001 to 5 90 00:03:32,630 --> 00:03:34,580 million would be at the scale of 91 00:03:34,580 --> 00:03:37,414 the Nash of the North America. 92 00:03:37,414 --> 00:03:40,909 So different scales are 93 00:03:40,909 --> 00:03:42,349 useful for different purposes 94 00:03:42,349 --> 00:03:43,790 and I talk about that more later. 95 00:03:43,790 --> 00:03:46,580 But this part about different scales in 96 00:03:46,580 --> 00:03:48,440 similar terrains is the key that's 97 00:03:48,440 --> 00:03:50,704 actually going to help us solve this problem. 98 00:03:50,704 --> 00:03:54,259 Because terrains or different kinds of 99 00:03:54,259 --> 00:03:58,489 packages of environmental variables. 100 00:03:58,489 --> 00:04:00,950 Mostly we're talking here about geology, 101 00:04:00,950 --> 00:04:03,050 but it also has to do with, 102 00:04:03,050 --> 00:04:05,570 well, the elevation and 103 00:04:05,570 --> 00:04:06,830 the slope and all of that sort of 104 00:04:06,830 --> 00:04:08,165 stuff can come into the what, 105 00:04:08,165 --> 00:04:10,024 what makes up a terrain. 106 00:04:10,024 --> 00:04:12,529 Sentinel data is satellite data. 107 00:04:12,529 --> 00:04:13,880 As I've got a picture up 108 00:04:13,880 --> 00:04:16,174 here that are multispectral. 109 00:04:16,174 --> 00:04:17,510 Now that's important because 110 00:04:17,510 --> 00:04:19,609 most satellite data actually 111 00:04:19,609 --> 00:04:21,440 only have four bands of data, 112 00:04:21,440 --> 00:04:22,639 red, green, blue, 113 00:04:22,639 --> 00:04:25,114 and infrared, or near infrared. 114 00:04:25,114 --> 00:04:29,060 And so this Sentinel data 115 00:04:29,060 --> 00:04:30,769 here is important because it's 116 00:04:30,769 --> 00:04:33,350 actually 12 bands to start with, 117 00:04:33,350 --> 00:04:36,124 and we actually, I end up using ten later. 118 00:04:36,124 --> 00:04:39,499 But these extra bands 119 00:04:39,499 --> 00:04:40,789 of information are 120 00:04:40,789 --> 00:04:42,965 the key to this whole thing. 121 00:04:42,965 --> 00:04:44,929 They're freely available at a scale 122 00:04:44,929 --> 00:04:47,105 of 20 m. Now, 123 00:04:47,105 --> 00:04:48,919 a lot of you may not be able to 124 00:04:48,919 --> 00:04:50,659 picture how big 20 m is. 125 00:04:50,659 --> 00:04:53,794 I just want to sort of think about that. 126 00:04:53,794 --> 00:04:55,039 I don't really have something 127 00:04:55,039 --> 00:04:57,215 to give you as a scale. 128 00:04:57,215 --> 00:05:00,770 But like a football field is 100 m long, 129 00:05:00,770 --> 00:05:02,660 so kinda scale it to that. 130 00:05:02,660 --> 00:05:04,789 Maybe. That gives you 131 00:05:04,789 --> 00:05:06,679 a sense that the resolution that we have is 132 00:05:06,679 --> 00:05:12,889 not down at really high resolution 133 00:05:12,889 --> 00:05:15,364 that we can make really detailed maps with. 134 00:05:15,364 --> 00:05:16,969 But it's good to get a good cut 135 00:05:16,969 --> 00:05:18,995 at stuff and 20 m is pretty good. 136 00:05:18,995 --> 00:05:20,270 So using the areas that are 137 00:05:20,270 --> 00:05:21,980 mapped at high resolution. 138 00:05:21,980 --> 00:05:24,410 That are adjacent to areas mapped at 139 00:05:24,410 --> 00:05:26,420 low resolution was the insight that 140 00:05:26,420 --> 00:05:28,805 I've had to do this. 141 00:05:28,805 --> 00:05:32,539 I'm using adjacent areas that are mapped 142 00:05:32,539 --> 00:05:36,184 at high resolution as training areas 143 00:05:36,184 --> 00:05:39,410 to then predict what 144 00:05:39,410 --> 00:05:42,080 the geologic units would 145 00:05:42,080 --> 00:05:43,310 be in areas that are 146 00:05:43,310 --> 00:05:44,959 mapped at lower resolution. 147 00:05:44,959 --> 00:05:46,369 And I'm using machine learning 148 00:05:46,369 --> 00:05:48,845 to do the classification. 149 00:05:48,845 --> 00:05:50,990 And my systems approach 150 00:05:50,990 --> 00:05:52,714 to this taken ties it all together. 151 00:05:52,714 --> 00:05:56,720 We were talking about how is this system Z? 152 00:05:56,720 --> 00:05:58,430 And I almost put in a slide, 153 00:05:58,430 --> 00:06:00,545 how is this system Z? 154 00:06:00,545 --> 00:06:02,359 To try to encourage all of 155 00:06:02,359 --> 00:06:03,875 us that are giving toxin in 156 00:06:03,875 --> 00:06:05,479 the system side of seminar and sort of 157 00:06:05,479 --> 00:06:07,549 point out the system is Ynez of it. 158 00:06:07,549 --> 00:06:09,890 And the way I kinda feel 159 00:06:09,890 --> 00:06:11,989 like it just me doing this, 160 00:06:11,989 --> 00:06:13,759 it makes it systemsy because I am 161 00:06:13,759 --> 00:06:15,670 so like I just said, 162 00:06:15,670 --> 00:06:18,455 attack everything as a systems lens. 163 00:06:18,455 --> 00:06:20,059 And so I'm working at this 164 00:06:20,059 --> 00:06:22,040 and multiple scales and 165 00:06:22,040 --> 00:06:27,709 multiple lines of domains. 166 00:06:27,709 --> 00:06:29,570 So we're working with satellite data and 167 00:06:29,570 --> 00:06:31,519 geologic data and all of that sort of stuff. 168 00:06:31,519 --> 00:06:34,789 So I think that's what makes IT system Z is 169 00:06:34,789 --> 00:06:36,170 this working 170 00:06:36,170 --> 00:06:39,245 across all the different boundaries. 171 00:06:39,245 --> 00:06:42,169 So there's another copy 172 00:06:42,169 --> 00:06:43,910 of the geologic map of Oregon. 173 00:06:43,910 --> 00:06:47,029 And you will see 174 00:06:47,029 --> 00:06:49,280 shortly that these areas over 175 00:06:49,280 --> 00:06:51,169 here in Western Oregon are mapped in 176 00:06:51,169 --> 00:06:52,880 much greater detail than 177 00:06:52,880 --> 00:06:55,460 these areas out here in Eastern Oregon. 178 00:06:55,460 --> 00:06:58,099 And you'll actually see this on 179 00:06:58,099 --> 00:07:00,530 the map where there's actual places where 180 00:07:00,530 --> 00:07:02,449 there's solid straight lines 181 00:07:02,449 --> 00:07:04,235 which just don't look like 182 00:07:04,235 --> 00:07:07,399 the way that geology 183 00:07:07,399 --> 00:07:09,485 actually maps out in reality. 184 00:07:09,485 --> 00:07:11,390 So here's a scale, 185 00:07:11,390 --> 00:07:12,980 here's a map of the scale 186 00:07:12,980 --> 00:07:14,389 of the existing mapping. 187 00:07:14,389 --> 00:07:16,009 So what I've done here is I 188 00:07:16,009 --> 00:07:17,779 took the geologic map and 189 00:07:17,779 --> 00:07:19,969 every actual unit in 190 00:07:19,969 --> 00:07:22,270 the geologic map is 191 00:07:22,270 --> 00:07:24,654 tagged by what scale it was mapped out. 192 00:07:24,654 --> 00:07:26,050 And I should have actually 193 00:07:26,050 --> 00:07:27,490 had a reference mapping here that 194 00:07:27,490 --> 00:07:30,279 shows you where all of these data come from. 195 00:07:30,279 --> 00:07:31,450 But the idea behind 196 00:07:31,450 --> 00:07:33,160 this map is that we were using 197 00:07:33,160 --> 00:07:36,520 the best available data for every location. 198 00:07:36,520 --> 00:07:39,520 So if we've got a one in 2014 to scale map 199 00:07:39,520 --> 00:07:41,020 at this area here where 200 00:07:41,020 --> 00:07:42,775 I've got green, we'll use that. 201 00:07:42,775 --> 00:07:47,514 If we've got a one to 250 K map 202 00:07:47,514 --> 00:07:50,230 just adjacent to it, then we'll use that. 203 00:07:50,230 --> 00:07:51,339 So you can see that there's 204 00:07:51,339 --> 00:07:52,809 areas right next to 205 00:07:52,809 --> 00:07:54,040 each other where there's stuff 206 00:07:54,040 --> 00:07:55,420 mapped in high resolution, 207 00:07:55,420 --> 00:07:56,739 right next to stuff mapped 208 00:07:56,739 --> 00:07:58,779 in low, lower resolution. 209 00:07:58,779 --> 00:08:00,489 Here I've put the break for 210 00:08:00,489 --> 00:08:01,974 lower resolution at 211 00:08:01,974 --> 00:08:06,305 greater than 125 Thousands scale. 212 00:08:06,305 --> 00:08:10,084 And I've said 124 k is our gold standard. 213 00:08:10,084 --> 00:08:12,590 And anything in the 3,125 214 00:08:12,590 --> 00:08:15,109 K is good, usable stuff. 215 00:08:15,109 --> 00:08:16,819 It's not as good as the 24 k, 216 00:08:16,819 --> 00:08:17,990 but it could work for us 217 00:08:17,990 --> 00:08:20,029 if that's what we need to fall back on. 218 00:08:20,029 --> 00:08:22,430 And what we'll see here in these parts of 219 00:08:22,430 --> 00:08:23,780 Eastern Oregon where I'm wiggling 220 00:08:23,780 --> 00:08:25,385 my mouse over right now, 221 00:08:25,385 --> 00:08:27,424 there's a lot of red. 222 00:08:27,424 --> 00:08:29,510 You can see all these red areas 223 00:08:29,510 --> 00:08:31,220 are mapped in low resolution. 224 00:08:31,220 --> 00:08:33,139 And there's a couple of little postage stamps 225 00:08:33,139 --> 00:08:34,580 in here, green stuff. 226 00:08:34,580 --> 00:08:36,560 And then some nice swaths 227 00:08:36,560 --> 00:08:39,964 of yellow that we can, we'll end up using. 228 00:08:39,964 --> 00:08:44,659 So this is the motivation for 229 00:08:44,659 --> 00:08:46,940 this was actually Part of 230 00:08:46,940 --> 00:08:49,625 my framework implementation role. 231 00:08:49,625 --> 00:08:52,159 I was talking to the director of 232 00:08:52,159 --> 00:08:54,319 the Institute for Natural 233 00:08:54,319 --> 00:08:55,699 Resources and he was 234 00:08:55,699 --> 00:08:58,610 bemoaning the fact that we don't have a map 235 00:08:58,610 --> 00:09:00,260 at the scale of one to 100,000 236 00:09:00,260 --> 00:09:01,849 K for the entire state. 237 00:09:01,849 --> 00:09:04,099 And that we have that we can 238 00:09:04,099 --> 00:09:06,485 do all sorts of better climate modeling 239 00:09:06,485 --> 00:09:10,340 and biodiversity modelling 240 00:09:10,340 --> 00:09:11,539 and all of that sort of stuff. 241 00:09:11,539 --> 00:09:14,329 So that was, the motivation was, 242 00:09:14,329 --> 00:09:15,590 well, if we can at least get to 243 00:09:15,590 --> 00:09:17,420 a scale of one to 100 K, 244 00:09:17,420 --> 00:09:20,269 then we could actually satisfy the needs of 245 00:09:20,269 --> 00:09:22,249 the people that are actually trying to use 246 00:09:22,249 --> 00:09:25,639 this map to do modeling and stuff. 247 00:09:25,639 --> 00:09:28,445 I was originally, when I, 248 00:09:28,445 --> 00:09:29,989 when I first started looking into this, 249 00:09:29,989 --> 00:09:32,570 I ran into this paper by cracked null, 250 00:09:32,570 --> 00:09:34,145 where he basically did 251 00:09:34,145 --> 00:09:35,600 the exact thing that we're talking about. 252 00:09:35,600 --> 00:09:38,269 He used satellite data 253 00:09:38,269 --> 00:09:40,415 and machine learning algorithms 254 00:09:40,415 --> 00:09:46,145 to produce geologic maps in Tasmania. 255 00:09:46,145 --> 00:09:48,079 And his conclusion, 256 00:09:48,079 --> 00:09:50,210 after looking at all of these methods, 257 00:09:50,210 --> 00:09:52,249 was the random forest performed the 258 00:09:52,249 --> 00:09:54,605 best and was the easiest to understand. 259 00:09:54,605 --> 00:09:56,600 As we know from 260 00:09:56,600 --> 00:09:59,359 trying to build models of things. 261 00:09:59,359 --> 00:10:01,069 Some of these things turn into 262 00:10:01,069 --> 00:10:03,245 black boxes that are hard to understand. 263 00:10:03,245 --> 00:10:06,530 And so random forest is nice. 264 00:10:06,530 --> 00:10:09,530 So I went ahead and started there with 265 00:10:09,530 --> 00:10:11,419 the random forest will just piggyback 266 00:10:11,419 --> 00:10:14,134 write-off of practicals work here. 267 00:10:14,134 --> 00:10:16,564 And first we'll take a look 268 00:10:16,564 --> 00:10:18,860 at the Sentinel data that we're going to use. 269 00:10:18,860 --> 00:10:21,005 So these are the 12 bands 270 00:10:21,005 --> 00:10:23,359 of Sentinel data that are coming down. 271 00:10:23,359 --> 00:10:28,264 It's it occupies each place on the Earth, 272 00:10:28,264 --> 00:10:29,480 I think something like every ten 273 00:10:29,480 --> 00:10:31,009 days or something like that. 274 00:10:31,009 --> 00:10:32,929 So there's data at 275 00:10:32,929 --> 00:10:35,059 different times of the year 276 00:10:35,059 --> 00:10:37,010 for particular places which we'll 277 00:10:37,010 --> 00:10:39,754 see as important shortly. 278 00:10:39,754 --> 00:10:41,900 So the blue, green, 279 00:10:41,900 --> 00:10:43,489 and red are the ones that we're all 280 00:10:43,489 --> 00:10:45,320 familiar with, red, green, and blue. 281 00:10:45,320 --> 00:10:48,679 And then it's also got this nice band 282 00:10:48,679 --> 00:10:50,119 called ultra blue for 283 00:10:50,119 --> 00:10:52,250 coastal and aerosol, but that's a low, 284 00:10:52,250 --> 00:10:56,075 super low resolution of only 60 m. We get, 285 00:10:56,075 --> 00:10:57,560 we pick up some visible and near 286 00:10:57,560 --> 00:10:59,254 infrared bands in here. 287 00:10:59,254 --> 00:11:01,940 And then we pick up a couple of 288 00:11:01,940 --> 00:11:03,364 these 60 meter bands that 289 00:11:03,364 --> 00:11:06,169 I end up dropping later. 290 00:11:06,169 --> 00:11:07,940 Then these two bands are 291 00:11:07,940 --> 00:11:10,124 the interesting ones because these are bands 292 00:11:10,124 --> 00:11:11,750 11.12 are known to be 293 00:11:11,750 --> 00:11:14,569 useful for geologic applications. 294 00:11:14,569 --> 00:11:17,149 So I know that those are probably going to be 295 00:11:17,149 --> 00:11:19,699 important in my model in trying to 296 00:11:19,699 --> 00:11:24,905 predict Geology from satellite data. 297 00:11:24,905 --> 00:11:27,559 And the idea here 298 00:11:27,559 --> 00:11:29,974 is that with only 20 meter resolution, 299 00:11:29,974 --> 00:11:31,474 I'm not gonna be able to actually make 300 00:11:31,474 --> 00:11:33,470 a really beautiful geologic map 301 00:11:33,470 --> 00:11:34,744 is all detailed and stuff. 302 00:11:34,744 --> 00:11:36,169 But what I can 303 00:11:36,169 --> 00:11:37,940 do is shorten the time that it takes to 304 00:11:37,940 --> 00:11:39,305 actually make the map by 305 00:11:39,305 --> 00:11:41,660 directing us to outcrops where we're 306 00:11:41,660 --> 00:11:43,654 going to find exposures 307 00:11:43,654 --> 00:11:47,029 that will help us with the mapping. 308 00:11:47,029 --> 00:11:51,200 So looking for a pilot project, 309 00:11:51,200 --> 00:11:53,989 I wanted a place that was not too far from 310 00:11:53,989 --> 00:11:55,970 roads with adjacent quads 311 00:11:55,970 --> 00:11:58,024 mapped at very different scales. 312 00:11:58,024 --> 00:12:00,020 For reference, we can see there's 313 00:12:00,020 --> 00:12:02,449 Eugene right there and there's Ben. 314 00:12:02,449 --> 00:12:03,350 And so we know that there's 315 00:12:03,350 --> 00:12:04,519 a nice road that goes 316 00:12:04,519 --> 00:12:06,830 through here from Ben to Burns. 317 00:12:06,830 --> 00:12:10,355 And so if I needed to go down here, 318 00:12:10,355 --> 00:12:12,424 I could do this drive. 319 00:12:12,424 --> 00:12:13,850 It takes about maybe 320 00:12:13,850 --> 00:12:15,739 two-and-a-half 3 h to get here. 321 00:12:15,739 --> 00:12:17,315 So this is doable. 322 00:12:17,315 --> 00:12:20,254 So now that being said, 323 00:12:20,254 --> 00:12:21,799 I haven't actually gone out and 324 00:12:21,799 --> 00:12:23,179 collected the samples 325 00:12:23,179 --> 00:12:24,514 that I thought I was going to. 326 00:12:24,514 --> 00:12:26,329 But this is this is 327 00:12:26,329 --> 00:12:28,774 the area that I decided to focus on. 328 00:12:28,774 --> 00:12:31,054 And because I've got these, 329 00:12:31,054 --> 00:12:32,554 I got a whole bunch of red 330 00:12:32,554 --> 00:12:34,099 around here and then I've got a couple 331 00:12:34,099 --> 00:12:35,509 of little sections of 332 00:12:35,509 --> 00:12:36,799 green that 333 00:12:36,799 --> 00:12:38,810 are yellow that are mapped in here, 334 00:12:38,810 --> 00:12:40,889 so we'll use those. 335 00:12:40,889 --> 00:12:43,329 So zooming in here, 336 00:12:43,329 --> 00:12:45,310 we've got Dickerson flat. 337 00:12:45,310 --> 00:12:47,140 So I don't actually have 338 00:12:47,140 --> 00:12:50,005 a quad map overlaid on this particular thing. 339 00:12:50,005 --> 00:12:52,030 We can see the Dickerson and soldiers 340 00:12:52,030 --> 00:12:53,349 quads are down here 341 00:12:53,349 --> 00:12:54,955 in the middle of the state. 342 00:12:54,955 --> 00:12:59,349 And Dickerson is mapped right now, 343 00:12:59,349 --> 00:13:02,125 clearly at as one unit, 344 00:13:02,125 --> 00:13:04,270 one geologic map map 345 00:13:04,270 --> 00:13:06,220 unit for that entire area. 346 00:13:06,220 --> 00:13:07,705 Whereas right next door 347 00:13:07,705 --> 00:13:09,774 and soldiers gap quad. 348 00:13:09,774 --> 00:13:12,700 And this is what we do in mapping, 349 00:13:12,700 --> 00:13:13,959 is we divide things up into 350 00:13:13,959 --> 00:13:16,524 quadrangles and name them and stuff. 351 00:13:16,524 --> 00:13:19,089 And in the soldiers gap 352 00:13:19,089 --> 00:13:20,860 quadrangle right next door to it, 353 00:13:20,860 --> 00:13:23,420 we've got a bunch of 354 00:13:23,420 --> 00:13:25,640 different geologic units mapped 355 00:13:25,640 --> 00:13:27,650 and we can see over here to the right, 356 00:13:27,650 --> 00:13:30,050 this level of detail continues on over there. 357 00:13:30,050 --> 00:13:31,430 So we can see that we've got 358 00:13:31,430 --> 00:13:33,200 quite a few units mapped here. 359 00:13:33,200 --> 00:13:36,094 What I've done is a stratified sampling 360 00:13:36,094 --> 00:13:39,004 with 50 points per polygon. 361 00:13:39,004 --> 00:13:41,150 And then I'm using that 362 00:13:41,150 --> 00:13:44,570 to extract the satellite data. 363 00:13:44,570 --> 00:13:47,510 The map units from soldiers gap turned out to 364 00:13:47,510 --> 00:13:51,019 be this list that we see here. 365 00:13:51,019 --> 00:13:55,639 Basalts, day sites, more volcanic rocks, 366 00:13:55,639 --> 00:13:58,039 and these, these are of 367 00:13:58,039 --> 00:14:00,590 interest to any geologists. 368 00:14:00,590 --> 00:14:01,819 I see Nick is here, so he'd 369 00:14:01,819 --> 00:14:04,770 probably is appreciating this part. 370 00:14:04,770 --> 00:14:08,800 So what I'm trying to do 371 00:14:08,800 --> 00:14:10,465 then is find these units 372 00:14:10,465 --> 00:14:11,890 in indifference and flat, 373 00:14:11,890 --> 00:14:13,809 or at least train up a model that's going to 374 00:14:13,809 --> 00:14:14,889 potentially find these 375 00:14:14,889 --> 00:14:16,960 units in Dickerson flat. 376 00:14:16,960 --> 00:14:19,794 And so we're gonna go ahead and use 377 00:14:19,794 --> 00:14:21,490 a supervised classification 378 00:14:21,490 --> 00:14:22,779 and random forests. 379 00:14:22,779 --> 00:14:26,290 So here's a brief depiction 380 00:14:26,290 --> 00:14:29,049 of what happens with random forests. 381 00:14:29,049 --> 00:14:31,179 So we divide the space 382 00:14:31,179 --> 00:14:34,449 up with some sort of a partition and then 383 00:14:34,449 --> 00:14:36,249 continue partitioning and keep 384 00:14:36,249 --> 00:14:39,279 partitioning all the way down until we 385 00:14:39,279 --> 00:14:41,800 get to the level of depth 386 00:14:41,800 --> 00:14:44,740 that we have specified in the model. 387 00:14:44,740 --> 00:14:48,694 And then we can examine 388 00:14:48,694 --> 00:14:54,305 the tree for the random forest and see how, 389 00:14:54,305 --> 00:14:56,329 we can see how the actual choices 390 00:14:56,329 --> 00:14:57,649 are being made at 391 00:14:57,649 --> 00:14:59,149 each particular break and make 392 00:14:59,149 --> 00:15:02,069 sure that they actually make sense to us. 393 00:15:02,260 --> 00:15:06,185 So I'm dividing my data up into 394 00:15:06,185 --> 00:15:09,380 70% training data, 30% test data, 395 00:15:09,380 --> 00:15:11,000 which is pretty standard 396 00:15:11,000 --> 00:15:13,519 statistical modeling methodology 397 00:15:13,519 --> 00:15:14,780 you can tell because I just pulled 398 00:15:14,780 --> 00:15:16,249 this graphics straight off 399 00:15:16,249 --> 00:15:18,334 of somebody else's website. 400 00:15:18,334 --> 00:15:20,480 Then in Python, I'm taking 401 00:15:20,480 --> 00:15:22,310 my soldiers gap data and splitting it 402 00:15:22,310 --> 00:15:24,875 into training and testing sets and 403 00:15:24,875 --> 00:15:26,375 modeling it using 404 00:15:26,375 --> 00:15:28,520 scikit-learn has various methods, 405 00:15:28,520 --> 00:15:29,810 mostly focusing on 406 00:15:29,810 --> 00:15:32,344 support vector and random forest. 407 00:15:32,344 --> 00:15:33,860 And then the support, 408 00:15:33,860 --> 00:15:36,349 the soldiers gap data are 409 00:15:36,349 --> 00:15:39,860 then validated using model metrics. 410 00:15:39,860 --> 00:15:41,479 And once a model 411 00:15:41,479 --> 00:15:43,655 has reasonably high accuracy, 412 00:15:43,655 --> 00:15:45,829 I go ahead and fit that to 413 00:15:45,829 --> 00:15:48,200 new data from Dickerson flat. 414 00:15:48,200 --> 00:15:51,064 Export that to my GIS software, 415 00:15:51,064 --> 00:15:52,190 join it back up to 416 00:15:52,190 --> 00:15:54,950 the original data and then display 417 00:15:54,950 --> 00:15:56,569 the map and convert to 418 00:15:56,569 --> 00:16:00,064 raster for display purposes. 419 00:16:00,064 --> 00:16:06,065 And it turned out in the modeling. 420 00:16:06,065 --> 00:16:09,515 That bands 11.12 did turn out to be 421 00:16:09,515 --> 00:16:11,840 a super high importance and 422 00:16:11,840 --> 00:16:14,629 banned for which is this, 423 00:16:14,629 --> 00:16:16,639 these go blue, green, red. 424 00:16:16,639 --> 00:16:18,439 So the red band from 425 00:16:18,439 --> 00:16:21,019 the visible spectrum also was important. 426 00:16:21,019 --> 00:16:23,149 So that's kind of interesting. 427 00:16:23,149 --> 00:16:26,029 And then for the results, 428 00:16:26,029 --> 00:16:28,400 here it is, it looks like a geologic map, 429 00:16:28,400 --> 00:16:30,620 if you know what a geologic map looks like. 430 00:16:30,620 --> 00:16:33,679 So what we've got remember let's see, 431 00:16:33,679 --> 00:16:37,220 let's go back to, yeah, right there. 432 00:16:37,220 --> 00:16:38,495 So there's difference in flat 433 00:16:38,495 --> 00:16:39,649 in its original all 434 00:16:39,649 --> 00:16:44,135 monolithic, homogeneous single unit. 435 00:16:44,135 --> 00:16:47,090 Again. Here we've got a couple 436 00:16:47,090 --> 00:16:50,150 of units poking in from outside. 437 00:16:50,150 --> 00:16:56,105 And then here's my new map with the results. 438 00:16:56,105 --> 00:16:57,920 And it's looking, 439 00:16:57,920 --> 00:16:59,329 it's looking quite coherent. 440 00:16:59,329 --> 00:17:02,750 In fact, in the sense that units 441 00:17:02,750 --> 00:17:06,349 that are grouped together makes sense. 442 00:17:06,349 --> 00:17:09,739 And so from a geological perspective, 443 00:17:09,739 --> 00:17:14,375 this map, actually, it's pretty good. 444 00:17:14,375 --> 00:17:17,300 This is, so that's where 445 00:17:17,300 --> 00:17:19,490 we ended up with at the end 446 00:17:19,490 --> 00:17:24,185 of the first round of my processing. 447 00:17:24,185 --> 00:17:26,660 Then what I was looking at 0. 448 00:17:26,660 --> 00:17:29,524 And then the idea is to overlay that on 449 00:17:29,524 --> 00:17:31,459 Google Earth and find 450 00:17:31,459 --> 00:17:34,219 an actual outcrop that we can go visit. 451 00:17:34,219 --> 00:17:36,680 This is my slide from 452 00:17:36,680 --> 00:17:39,019 last year when I presented 453 00:17:39,019 --> 00:17:41,254 the original work and this was 454 00:17:41,254 --> 00:17:44,645 the quads to map this summer. 455 00:17:44,645 --> 00:17:47,014 And that was for the summer of 2022. 456 00:17:47,014 --> 00:17:48,679 So what happens in 457 00:17:48,679 --> 00:17:51,364 my department is that we run 458 00:17:51,364 --> 00:17:53,899 a field mapping class in 459 00:17:53,899 --> 00:17:56,735 the summertime and the meat, 460 00:17:56,735 --> 00:18:01,024 the main researcher on that, Martin, 461 00:18:01,024 --> 00:18:04,069 martin picks 462 00:18:04,069 --> 00:18:06,379 the the quads that we're gonna do. 463 00:18:06,379 --> 00:18:07,729 Then we get together all of 464 00:18:07,729 --> 00:18:09,140 the information for those quads. 465 00:18:09,140 --> 00:18:11,419 And then students go out and spend 466 00:18:11,419 --> 00:18:12,679 a couple of weeks in the field 467 00:18:12,679 --> 00:18:14,150 actually mapping those quads. 468 00:18:14,150 --> 00:18:16,190 So these are the quads that were on 469 00:18:16,190 --> 00:18:18,799 the docket to be mapped last summer. 470 00:18:18,799 --> 00:18:21,754 So I went ahead and got started on that. 471 00:18:21,754 --> 00:18:23,509 Now the interesting thing about that, 472 00:18:23,509 --> 00:18:24,409 and this looks like it's 473 00:18:24,409 --> 00:18:26,150 a little bit out of order. 474 00:18:26,150 --> 00:18:31,999 Is that there's physiographic provinces 475 00:18:31,999 --> 00:18:36,274 in just defined everywhere. 476 00:18:36,274 --> 00:18:38,959 And physiographic provinces are packages 477 00:18:38,959 --> 00:18:41,149 of stuff like I 478 00:18:41,149 --> 00:18:43,370 was talking about earlier terrains. 479 00:18:43,370 --> 00:18:45,215 These are packages of 480 00:18:45,215 --> 00:18:47,060 rocks that are similar and 481 00:18:47,060 --> 00:18:51,500 also typography and stuff like that. 482 00:18:51,500 --> 00:18:53,060 So geomorphic lily, 483 00:18:53,060 --> 00:18:54,560 these things are gonna be similar. 484 00:18:54,560 --> 00:18:55,639 So all of this stuff that I'm 485 00:18:55,639 --> 00:18:57,320 wiggling my mouse over down here in 486 00:18:57,320 --> 00:19:00,050 the lower right-hand corner is 487 00:19:00,050 --> 00:19:04,430 one defined physiographic province. 488 00:19:04,430 --> 00:19:05,839 And then over here on 489 00:19:05,839 --> 00:19:07,879 the left where I'm wiggling my mouse, 490 00:19:07,879 --> 00:19:09,770 that's another physiographic province. 491 00:19:09,770 --> 00:19:10,909 And finally it just up to 492 00:19:10,909 --> 00:19:12,740 the top here is the third one. 493 00:19:12,740 --> 00:19:14,509 So what we can see here is that 494 00:19:14,509 --> 00:19:16,490 my van and craft quads are 495 00:19:16,490 --> 00:19:18,694 actually split down the middle 496 00:19:18,694 --> 00:19:21,770 by two different physiographic provinces. 497 00:19:21,770 --> 00:19:23,629 So that's gonna be a little bit 498 00:19:23,629 --> 00:19:25,250 of a challenge for coming up with 499 00:19:25,250 --> 00:19:29,300 training data for particular set. 500 00:19:29,300 --> 00:19:31,070 And so what I ended up 501 00:19:31,070 --> 00:19:33,724 doing for the students to go out 502 00:19:33,724 --> 00:19:35,930 immediately in the summer was I just 503 00:19:35,930 --> 00:19:38,554 did an unsupervised classification. 504 00:19:38,554 --> 00:19:42,244 So this is what I got from just running. 505 00:19:42,244 --> 00:19:44,540 I mean, I tweaked it and stuff like that. 506 00:19:44,540 --> 00:19:46,099 But this was sort of what 507 00:19:46,099 --> 00:19:48,529 the machine-learning models came 508 00:19:48,529 --> 00:19:49,264 up with for an 509 00:19:49,264 --> 00:19:50,599 unclassified or 510 00:19:50,599 --> 00:19:53,179 an unsupervised classification of 511 00:19:53,179 --> 00:19:55,400 the van and craft quads and Van is 512 00:19:55,400 --> 00:19:57,799 the top quad and craft is the lower flawed. 513 00:19:57,799 --> 00:20:00,920 And these are the two quads that we started 514 00:20:00,920 --> 00:20:05,149 off to try to use machine learning on. 515 00:20:05,149 --> 00:20:07,369 Here's a bigger picture of 516 00:20:07,369 --> 00:20:10,099 the physiographic regions of the area. 517 00:20:10,099 --> 00:20:12,980 And so it outlined in green 518 00:20:12,980 --> 00:20:14,570 down here is the section 519 00:20:14,570 --> 00:20:17,129 that we're looking at. 520 00:20:18,820 --> 00:20:21,499 And that's 521 00:20:21,499 --> 00:20:23,405 where that slide was supposed to be. 522 00:20:23,405 --> 00:20:24,349 Okay. 523 00:20:24,349 --> 00:20:26,284 Then so these are 524 00:20:26,284 --> 00:20:28,070 this is the actual geologic map. 525 00:20:28,070 --> 00:20:28,309 This is 526 00:20:28,309 --> 00:20:30,695 the existing geologic map for that area. 527 00:20:30,695 --> 00:20:33,574 This is the van wad and the craft quad. 528 00:20:33,574 --> 00:20:35,689 And we did see in the craft quad that there 529 00:20:35,689 --> 00:20:37,670 was a little section of yellow down in here. 530 00:20:37,670 --> 00:20:39,200 So somebody has previously 531 00:20:39,200 --> 00:20:40,534 mapped part of this. 532 00:20:40,534 --> 00:20:49,010 But what we see in this map is that 533 00:20:49,010 --> 00:20:53,090 there are several places that 534 00:20:53,090 --> 00:20:57,530 are potentially map useful as training sites. 535 00:20:57,530 --> 00:20:58,910 Let me go back to this map. 536 00:20:58,910 --> 00:21:00,035 That's why this one is here. 537 00:21:00,035 --> 00:21:01,490 So Van van is 538 00:21:01,490 --> 00:21:04,625 red craft has some yellow in it. 539 00:21:04,625 --> 00:21:07,220 Down here to the lower left, 540 00:21:07,220 --> 00:21:08,840 I've got Burns Butte and burns. 541 00:21:08,840 --> 00:21:10,700 These two quads are mapped in 542 00:21:10,700 --> 00:21:13,819 green there in this physiographic province. 543 00:21:13,819 --> 00:21:15,380 And then over here on this side I've got 544 00:21:15,380 --> 00:21:17,434 Warm Springs and mosquito mountain. 545 00:21:17,434 --> 00:21:18,710 And those are mapped in, in 546 00:21:18,710 --> 00:21:20,029 high resolution in this 547 00:21:20,029 --> 00:21:21,514 physiographic province. 548 00:21:21,514 --> 00:21:27,110 So those, those two training sites might be 549 00:21:27,110 --> 00:21:30,229 good candidates for training up 550 00:21:30,229 --> 00:21:34,039 to try to apply those to ban and craft. 551 00:21:34,039 --> 00:21:37,115 But it turns out that 552 00:21:37,115 --> 00:21:40,309 even closer to van and craft, 553 00:21:40,309 --> 00:21:43,070 I've got Buchanan and 554 00:21:43,070 --> 00:21:45,994 stinking water pass right down here below. 555 00:21:45,994 --> 00:21:49,894 And so I ended up looking at these. 556 00:21:49,894 --> 00:21:52,070 I can see that the rocks on 557 00:21:52,070 --> 00:21:52,969 this side over here and 558 00:21:52,969 --> 00:21:54,380 stinking water pass do 559 00:21:54,380 --> 00:21:56,869 seem similar to the rocks on 560 00:21:56,869 --> 00:21:58,159 the right-hand side of 561 00:21:58,159 --> 00:21:59,989 our maps and van and craft. 562 00:21:59,989 --> 00:22:01,699 And the rocks on 563 00:22:01,699 --> 00:22:05,134 the left-hand side of this in Buchanan 564 00:22:05,134 --> 00:22:08,300 are more similar perhaps to the stuff on 565 00:22:08,300 --> 00:22:11,494 the left-hand side of this divide. 566 00:22:11,494 --> 00:22:12,920 Now, the thing about this 567 00:22:12,920 --> 00:22:14,390 physiographic provinces 568 00:22:14,390 --> 00:22:16,039 thing is that it's kind 569 00:22:16,039 --> 00:22:17,914 of a really big picture thing. 570 00:22:17,914 --> 00:22:20,059 Like this line that I'm looking at here, 571 00:22:20,059 --> 00:22:22,894 could literally be tens, 572 00:22:22,894 --> 00:22:24,559 tens of miles on, 573 00:22:24,559 --> 00:22:26,284 to the left or right. 574 00:22:26,284 --> 00:22:29,060 So this is kind of a general cut at it 575 00:22:29,060 --> 00:22:31,610 in order to nail it down. 576 00:22:31,610 --> 00:22:33,799 But this but this line could do could 577 00:22:33,799 --> 00:22:36,410 be over here or it could be over here. 578 00:22:36,410 --> 00:22:39,029 It could be all the way over here perhaps. 579 00:22:39,640 --> 00:22:41,779 But looking at this map 580 00:22:41,779 --> 00:22:42,994 and spending some time looking at it, 581 00:22:42,994 --> 00:22:44,329 I did notice there's 582 00:22:44,329 --> 00:22:46,654 some nice material down here and burns view 583 00:22:46,654 --> 00:22:48,860 that I could sample and I may end up using 584 00:22:48,860 --> 00:22:50,360 that because I don't 585 00:22:50,360 --> 00:22:52,339 have to only sample from one plot. 586 00:22:52,339 --> 00:22:54,664 I can take samples from anywhere I want. 587 00:22:54,664 --> 00:22:57,515 So it's possible that will, 588 00:22:57,515 --> 00:23:00,259 that we will make 589 00:23:00,259 --> 00:23:01,429 this model a little bit more 590 00:23:01,429 --> 00:23:03,560 sophisticated by sampling from some areas 591 00:23:03,560 --> 00:23:05,719 that maybe are not, that don't exist. 592 00:23:05,719 --> 00:23:08,854 But for this effort, 593 00:23:08,854 --> 00:23:10,384 I just ended up using 594 00:23:10,384 --> 00:23:14,255 Buchanan and stinking water pass. 595 00:23:14,255 --> 00:23:16,370 I went ahead and did 596 00:23:16,370 --> 00:23:18,259 a lot of sampling on this one. 597 00:23:18,259 --> 00:23:22,579 So that's my stratified sampling strategy 598 00:23:22,579 --> 00:23:27,455 for my Buchanan and stinking water pass. 599 00:23:27,455 --> 00:23:32,845 And then I want to introduce some of you. 600 00:23:32,845 --> 00:23:34,329 I wasn't sure how many people 601 00:23:34,329 --> 00:23:35,635 would show up today, 602 00:23:35,635 --> 00:23:39,219 but I did a demo of this in the, 603 00:23:39,219 --> 00:23:40,420 in the geology department 604 00:23:40,420 --> 00:23:42,460 for my grad students. 605 00:23:42,460 --> 00:23:44,769 I do a lot of work in Python and 606 00:23:44,769 --> 00:23:46,885 R. And I'm gonna do 607 00:23:46,885 --> 00:23:49,030 any final work in Python or 608 00:23:49,030 --> 00:23:49,629 R because of 609 00:23:49,629 --> 00:23:52,119 the reproducibility of the results. 610 00:23:52,119 --> 00:23:54,339 Because you can actually step through it 611 00:23:54,339 --> 00:23:56,875 and understand exactly what's going on. 612 00:23:56,875 --> 00:23:58,824 But I want to encourage people to use this. 613 00:23:58,824 --> 00:24:00,310 It's called the orange, and 614 00:24:00,310 --> 00:24:01,405 it's for data mining. 615 00:24:01,405 --> 00:24:02,905 And this is an example 616 00:24:02,905 --> 00:24:05,740 of a workflow where I've got this, 617 00:24:05,740 --> 00:24:06,850 this is what we want. 618 00:24:06,850 --> 00:24:09,009 I ended up doing my training data 619 00:24:09,009 --> 00:24:10,420 coming into this. 620 00:24:10,420 --> 00:24:14,355 So this is a visual modeling program. 621 00:24:14,355 --> 00:24:16,084 I've got my training data coming in. 622 00:24:16,084 --> 00:24:17,720 I'm selecting my columns 623 00:24:17,720 --> 00:24:19,220 and selecting the map unit 624 00:24:19,220 --> 00:24:22,955 as the target that I'm trying to predict. 625 00:24:22,955 --> 00:24:25,249 So I've got all of my columns of 626 00:24:25,249 --> 00:24:27,740 data coming in in 627 00:24:27,740 --> 00:24:30,979 a CSV with one column that's got 628 00:24:30,979 --> 00:24:33,019 the map unit that goes along 629 00:24:33,019 --> 00:24:35,375 with those ten column, 630 00:24:35,375 --> 00:24:37,594 ten columns of satellite data. 631 00:24:37,594 --> 00:24:40,084 So I set the map unit is the target. 632 00:24:40,084 --> 00:24:41,779 I, I sample it down, 633 00:24:41,779 --> 00:24:44,719 so I'm not grabbing all 144,000. 634 00:24:44,719 --> 00:24:46,655 I'm using only 10%. 635 00:24:46,655 --> 00:24:48,380 Then I'm sending that off 636 00:24:48,380 --> 00:24:49,910 to a random forest model, 637 00:24:49,910 --> 00:24:51,589 a logistic regression model, 638 00:24:51,589 --> 00:24:52,955 and a neural network model. 639 00:24:52,955 --> 00:24:55,280 And then looking at the testing and scoring 640 00:24:55,280 --> 00:24:57,784 in the confusion matrix and all of that. 641 00:24:57,784 --> 00:24:59,645 And then down at the bottom of 642 00:24:59,645 --> 00:25:02,000 the workflow here I've got another dataset 643 00:25:02,000 --> 00:25:03,905 coming in and this is the prediction data 644 00:25:03,905 --> 00:25:07,355 that I had to rename the sum of the columns. 645 00:25:07,355 --> 00:25:09,170 And then I'm sending 646 00:25:09,170 --> 00:25:10,834 that through to the predictions. 647 00:25:10,834 --> 00:25:12,440 And that's grabbing the model 648 00:25:12,440 --> 00:25:13,880 from random forest over here, 649 00:25:13,880 --> 00:25:15,214 sending it to the predictions, 650 00:25:15,214 --> 00:25:17,569 merging that with the data that I brought in, 651 00:25:17,569 --> 00:25:19,505 and then creating 652 00:25:19,505 --> 00:25:22,640 a data table and then exporting 653 00:25:22,640 --> 00:25:24,830 it as a CSV that I can then 654 00:25:24,830 --> 00:25:25,880 integrate back with my 655 00:25:25,880 --> 00:25:28,354 GIS software to make a map. 656 00:25:28,354 --> 00:25:30,769 There's the confusion matrix 657 00:25:30,769 --> 00:25:33,589 from the different units. 658 00:25:33,589 --> 00:25:35,359 And we can see right 659 00:25:35,359 --> 00:25:37,129 away that this one unit here, 660 00:25:37,129 --> 00:25:40,204 T MBA, it has a lot of, 661 00:25:40,204 --> 00:25:42,830 a lot of problems and it also ends up 662 00:25:42,830 --> 00:25:47,450 being the most ubiquitous unit in the thing. 663 00:25:47,450 --> 00:25:48,620 So I'm going to have to do something 664 00:25:48,620 --> 00:25:49,730 to take care of that. 665 00:25:49,730 --> 00:25:50,660 And I haven't quite figured 666 00:25:50,660 --> 00:25:52,550 out what to do about that yet. 667 00:25:52,550 --> 00:25:54,830 But here it is. 668 00:25:54,830 --> 00:25:56,074 This is the actual map 669 00:25:56,074 --> 00:25:59,120 of the predictions from, 670 00:25:59,120 --> 00:26:00,679 from my random forest model. 671 00:26:00,679 --> 00:26:02,734 And we can see once again that this is, 672 00:26:02,734 --> 00:26:05,420 it's looking coherent like a geologic map. 673 00:26:05,420 --> 00:26:08,494 We've got stuff grouping together over here 674 00:26:08,494 --> 00:26:13,205 that is genetically related. 675 00:26:13,205 --> 00:26:14,870 We've got stuff down in here 676 00:26:14,870 --> 00:26:16,429 that looks quite similar. 677 00:26:16,429 --> 00:26:17,120 And then we've got 678 00:26:17,120 --> 00:26:18,559 a streak of stuff going through 679 00:26:18,559 --> 00:26:21,499 here and some nice groupings down in here. 680 00:26:21,499 --> 00:26:23,885 So it seems to be pretty good. 681 00:26:23,885 --> 00:26:27,530 Then if we overlay that on the, 682 00:26:27,530 --> 00:26:29,524 with the hillshade map, 683 00:26:29,524 --> 00:26:31,729 we can see that we can actually 684 00:26:31,729 --> 00:26:33,994 find out crops along in here. 685 00:26:33,994 --> 00:26:34,670 Now I need to do 686 00:26:34,670 --> 00:26:36,425 a little bit more sophisticated work 687 00:26:36,425 --> 00:26:38,495 on finding the outcrops. 688 00:26:38,495 --> 00:26:41,809 And that is just really a matter of time. 689 00:26:41,809 --> 00:26:44,315 Because what I can do 690 00:26:44,315 --> 00:26:47,239 is look for exposures where there's 691 00:26:47,239 --> 00:26:49,769 a lack of vegetation and hopefully you 692 00:26:49,769 --> 00:26:51,829 see that that would end up 693 00:26:51,829 --> 00:26:54,949 being a good outcrop there. 694 00:26:54,949 --> 00:26:57,664 Now, this summer we did, 695 00:26:57,664 --> 00:27:00,199 the students did go out and 696 00:27:00,199 --> 00:27:03,364 they went ahead and sampled the data. 697 00:27:03,364 --> 00:27:05,645 From the field. And so this is 698 00:27:05,645 --> 00:27:07,625 the data from craft point from 699 00:27:07,625 --> 00:27:12,605 Angela stepson who shared her data with me. 700 00:27:12,605 --> 00:27:15,079 And I'm still integrating 701 00:27:15,079 --> 00:27:17,989 this with the data from the model, 702 00:27:17,989 --> 00:27:19,580 but it's looking quite good. 703 00:27:19,580 --> 00:27:22,024 We're starting to see similar patterns. 704 00:27:22,024 --> 00:27:24,770 So this idea of integrating the, 705 00:27:24,770 --> 00:27:26,614 the modeling with the field work 706 00:27:26,614 --> 00:27:29,509 is really where I'd like to go with this. 707 00:27:29,509 --> 00:27:32,479 And that is, I would like us to be able 708 00:27:32,479 --> 00:27:35,659 to have these models in 709 00:27:35,659 --> 00:27:37,834 a Cloud-based system and then put 710 00:27:37,834 --> 00:27:40,220 data into it at night after we 711 00:27:40,220 --> 00:27:41,359 get back from mapping in 712 00:27:41,359 --> 00:27:42,830 the field and then be able 713 00:27:42,830 --> 00:27:46,559 to update the model appropriately. 714 00:27:46,559 --> 00:27:49,420 And I guess I, I guess I got through that 715 00:27:49,420 --> 00:27:50,560 fairly quickly because this 716 00:27:50,560 --> 00:27:52,525 is my final thoughts slide. 717 00:27:52,525 --> 00:27:54,550 So the idea is that 718 00:27:54,550 --> 00:27:57,924 this approach is yielding promising results. 719 00:27:57,924 --> 00:28:00,534 But other data can be added. 720 00:28:00,534 --> 00:28:03,280 Increase the accuracy I need to deal with 721 00:28:03,280 --> 00:28:07,315 whatever t MBA unit because 722 00:28:07,315 --> 00:28:10,060 that one is actually about half of 723 00:28:10,060 --> 00:28:13,389 the data that I'm 724 00:28:13,389 --> 00:28:15,489 using is of that particular type. 725 00:28:15,489 --> 00:28:15,790 So there's 726 00:28:15,790 --> 00:28:18,355 something interesting going on there. 727 00:28:18,355 --> 00:28:20,425 Then I need to tackle. 728 00:28:20,425 --> 00:28:22,270 The most important thing though, of course, 729 00:28:22,270 --> 00:28:24,610 is I want to use this to go 730 00:28:24,610 --> 00:28:28,620 after funding for getting better data. 731 00:28:28,620 --> 00:28:31,054 Because right now I'm limited to 732 00:28:31,054 --> 00:28:33,139 just the freely available Sentinel data 733 00:28:33,139 --> 00:28:34,339 at the scale of 20 734 00:28:34,339 --> 00:28:36,350 m. But there's 735 00:28:36,350 --> 00:28:40,324 much higher resolution available data. 736 00:28:40,324 --> 00:28:42,844 And with some funding from the state, 737 00:28:42,844 --> 00:28:44,329 we could probably get 738 00:28:44,329 --> 00:28:47,539 some pretty good results from that. 739 00:28:47,539 --> 00:28:50,540 And the idea is still though, 740 00:28:50,540 --> 00:28:52,489 is that integrating this with field work is 741 00:28:52,489 --> 00:28:54,529 going to be the best we don't want. 742 00:28:54,529 --> 00:28:58,144 I am not proposing in any way that we 743 00:28:58,144 --> 00:29:02,015 start to replace geologic mapping. 744 00:29:02,015 --> 00:29:04,160 The actual activity of geologic mapping, 745 00:29:04,160 --> 00:29:05,509 which involves going out and hitting 746 00:29:05,509 --> 00:29:07,174 rocks with hammers and all of that. 747 00:29:07,174 --> 00:29:08,210 We're not going to replace 748 00:29:08,210 --> 00:29:09,215 that with satellite data, 749 00:29:09,215 --> 00:29:11,180 but we're going to enhance it. 750 00:29:11,180 --> 00:29:12,410 So there we go. 751 00:29:12,410 --> 00:29:13,775 That's my talk. 752 00:29:13,775 --> 00:29:15,559 Probably meet maybe one of 753 00:29:15,559 --> 00:29:17,060 the shortest system science talks ever, 754 00:29:17,060 --> 00:29:18,359 right? 755 00:29:20,980 --> 00:29:23,999 That's terrific. 756 00:29:25,570 --> 00:29:27,994 So yeah, you all got back 757 00:29:27,994 --> 00:29:30,560 an extra half-hour except 758 00:29:30,560 --> 00:29:31,580 that we're probably going to want 759 00:29:31,580 --> 00:29:32,675 to ask some questions. 760 00:29:32,675 --> 00:29:35,189 Yes, please ask questions. 761 00:29:35,800 --> 00:29:38,600 You mentioned Python a little while ago. 762 00:29:38,600 --> 00:29:39,800 How critical is Python? 763 00:29:39,800 --> 00:29:41,119 Do the work that you do? 764 00:29:41,119 --> 00:29:45,364 Oh, good question. It's not. 765 00:29:45,364 --> 00:29:47,554 I'm I'm agnostic. 766 00:29:47,554 --> 00:29:50,480 I use are, I 767 00:29:50,480 --> 00:29:53,134 actually I prefer are for some stuff. 768 00:29:53,134 --> 00:29:56,239 Because the way that Python works, 769 00:29:56,239 --> 00:29:57,890 some of the ways that some of 770 00:29:57,890 --> 00:30:00,260 the idioms in Python don't work as 771 00:30:00,260 --> 00:30:03,139 well for modeling as well as R. 772 00:30:03,139 --> 00:30:05,045 But it's one of those things where 773 00:30:05,045 --> 00:30:06,544 the orange though, 774 00:30:06,544 --> 00:30:08,089 that orange thing that I was showing you, 775 00:30:08,089 --> 00:30:09,094 the orange data mining, 776 00:30:09,094 --> 00:30:10,519 that's a little Python. 777 00:30:10,519 --> 00:30:13,820 So what you can do, that's kinda cool. 778 00:30:13,820 --> 00:30:15,559 And I haven't actually tried this yet. 779 00:30:15,559 --> 00:30:17,119 As you can get your model all 780 00:30:17,119 --> 00:30:18,769 working there in orange in 781 00:30:18,769 --> 00:30:20,240 the visual thing and then 782 00:30:20,240 --> 00:30:21,920 export it to Python. 783 00:30:21,920 --> 00:30:23,750 And all of your stuff is documented 784 00:30:23,750 --> 00:30:25,640 as far as which widgets using and stuff. 785 00:30:25,640 --> 00:30:27,215 And then you can go in and tweak stuff. 786 00:30:27,215 --> 00:30:29,824 So that's actually kinda promising. 787 00:30:29,824 --> 00:30:31,955 I guess, where I was going with this. 788 00:30:31,955 --> 00:30:33,349 What kind of advice would you 789 00:30:33,349 --> 00:30:34,970 recommend to our students who 790 00:30:34,970 --> 00:30:36,229 want to be well-prepared 791 00:30:36,229 --> 00:30:38,300 for the research world, 792 00:30:38,300 --> 00:30:42,065 a computationally oriented research world. 793 00:30:42,065 --> 00:30:43,220 Most of them are wondering, should 794 00:30:43,220 --> 00:30:44,479 I learn our Python? 795 00:30:44,479 --> 00:30:46,279 Both are there other tools I should learn? 796 00:30:46,279 --> 00:30:47,660 Should I wait until the latest thing 797 00:30:47,660 --> 00:30:49,714 emerges and I'm ready to learn? 798 00:30:49,714 --> 00:30:53,959 Yeah, I know. You can't. 799 00:30:53,959 --> 00:30:56,344 I think everybody needs to learn Python. 800 00:30:56,344 --> 00:30:58,130 And the thing is, what I've 801 00:30:58,130 --> 00:31:00,199 been telling students lately is 802 00:31:00,199 --> 00:31:02,269 that you need to 803 00:31:02,269 --> 00:31:04,580 learn these things well enough that you can 804 00:31:04,580 --> 00:31:06,439 understand other people's code 805 00:31:06,439 --> 00:31:07,609 so that when you run 806 00:31:07,609 --> 00:31:09,244 into somebody else's code 807 00:31:09,244 --> 00:31:10,985 and you want to use it yourself, 808 00:31:10,985 --> 00:31:12,049 you can figure out where the 809 00:31:12,049 --> 00:31:13,550 variables are and how to change 810 00:31:13,550 --> 00:31:14,899 the data is that it's 811 00:31:14,899 --> 00:31:17,639 pointing to and all of that sort of stuff. 812 00:31:17,710 --> 00:31:21,499 That to me, two most, most, 813 00:31:21,499 --> 00:31:23,569 most working scientists these 814 00:31:23,569 --> 00:31:25,759 days need to be at that level. 815 00:31:25,759 --> 00:31:28,220 At least. I don't expect people to be able 816 00:31:28,220 --> 00:31:29,075 to sort of like 817 00:31:29,075 --> 00:31:31,519 write-up program from scratch. 818 00:31:31,519 --> 00:31:33,650 Because that actually is much, 819 00:31:33,650 --> 00:31:35,119 much harder than 820 00:31:35,119 --> 00:31:36,814 tweaking other people's code. 821 00:31:36,814 --> 00:31:38,719 And tweaking other people's code is actually 822 00:31:38,719 --> 00:31:41,479 a more valuable skill in a lot of 823 00:31:41,479 --> 00:31:43,160 ways than being able to just 824 00:31:43,160 --> 00:31:45,769 write a program from a script from scratch. 825 00:31:45,769 --> 00:31:49,444 But I also have, I come from being a hacker. 826 00:31:49,444 --> 00:31:51,920 I'm a hacker from the '80s and 827 00:31:51,920 --> 00:31:55,505 I I did IT and stuff. 828 00:31:55,505 --> 00:31:56,855 So I haven't really, 829 00:31:56,855 --> 00:31:59,960 I have a very different attitude 830 00:31:59,960 --> 00:32:01,220 towards using software. 831 00:32:01,220 --> 00:32:02,254 I just like, Oh, 832 00:32:02,254 --> 00:32:03,500 look, it's another piece of software. 833 00:32:03,500 --> 00:32:03,740 Cool. 834 00:32:03,740 --> 00:32:05,359 Let's just dive in and see what it does. 835 00:32:05,359 --> 00:32:07,325 You know, a lot of people go. 836 00:32:07,325 --> 00:32:08,149 I know. 837 00:32:08,149 --> 00:32:12,649 I do think when you guys are accessing data, 838 00:32:12,649 --> 00:32:13,670 you obviously have to use 839 00:32:13,670 --> 00:32:15,949 GIS type of tools primarily 840 00:32:15,949 --> 00:32:19,700 or GIS compatible use. 841 00:32:19,700 --> 00:32:21,544 I'm not sure what I'm saying here, 842 00:32:21,544 --> 00:32:22,849 but I've had people ask me, 843 00:32:22,849 --> 00:32:24,139 should I learn SQL, which is 844 00:32:24,139 --> 00:32:26,210 just ancient language, are dealing with. 845 00:32:26,210 --> 00:32:28,070 With databases and I 846 00:32:28,070 --> 00:32:29,630 pretty much say, why not? 847 00:32:29,630 --> 00:32:30,769 And I'm just wondering what 848 00:32:30,769 --> 00:32:32,239 other people think and you could comment, 849 00:32:32,239 --> 00:32:33,529 but others in the room could comment 850 00:32:33,529 --> 00:32:35,570 as well, right? 851 00:32:35,570 --> 00:32:36,500 Yeah. 852 00:32:36,500 --> 00:32:38,480 Rick, what are you what do you tell 853 00:32:38,480 --> 00:32:41,570 your students to do for you for? 854 00:32:41,570 --> 00:32:46,145 Well, so overall, we don't have, 855 00:32:46,145 --> 00:32:49,294 I don't see the students who are 856 00:32:49,294 --> 00:32:53,060 on a trajectory to do computational work. 857 00:32:53,060 --> 00:32:54,140 So I'm kind of 858 00:32:54,140 --> 00:32:56,165 absolved of that responsibility. 859 00:32:56,165 --> 00:32:57,589 However, there is, within 860 00:32:57,589 --> 00:32:58,850 the culture of public health is 861 00:32:58,850 --> 00:33:02,089 the distinction between what 862 00:33:02,089 --> 00:33:03,379 the working scientists do, 863 00:33:03,379 --> 00:33:05,345 perhaps at the state level, 864 00:33:05,345 --> 00:33:09,350 versus what Students will 865 00:33:09,350 --> 00:33:11,000 pick up on their own through 866 00:33:11,000 --> 00:33:13,969 their own maybe hobbyist approach 867 00:33:13,969 --> 00:33:17,029 to learning coding in particular. 868 00:33:17,029 --> 00:33:19,940 And so I've seen 869 00:33:19,940 --> 00:33:21,830 among my colleagues that there really 870 00:33:21,830 --> 00:33:25,369 is a teachable moment 871 00:33:25,369 --> 00:33:27,875 related to career development 872 00:33:27,875 --> 00:33:28,715 where they say like, 873 00:33:28,715 --> 00:33:29,960 you know, it's how 874 00:33:29,960 --> 00:33:31,999 great that you can do these things. 875 00:33:31,999 --> 00:33:36,334 But if you really want to get a job, 876 00:33:36,334 --> 00:33:37,760 it's important that you learn 877 00:33:37,760 --> 00:33:39,485 these other platforms. 878 00:33:39,485 --> 00:33:41,870 That's kinda the guidance of the faculty. 879 00:33:41,870 --> 00:33:44,285 Use what other platforms 880 00:33:44,285 --> 00:33:45,620 just just listed for you. 881 00:33:45,620 --> 00:33:48,680 Well, so I know I'm dating myself here 882 00:33:48,680 --> 00:33:50,149 because I haven't really 883 00:33:50,149 --> 00:33:52,715 like paid attention too much. 884 00:33:52,715 --> 00:33:55,429 Overall. 885 00:33:55,429 --> 00:33:59,179 The big distinction when I was paying 886 00:33:59,179 --> 00:34:02,840 attention was whether students were ready. 887 00:34:02,840 --> 00:34:03,890 This was actually before 888 00:34:03,890 --> 00:34:05,330 the revolution happened 889 00:34:05,330 --> 00:34:07,069 and encompasses whether students 890 00:34:07,069 --> 00:34:08,989 were ready to use. 891 00:34:08,989 --> 00:34:12,140 Sudan is, 892 00:34:12,140 --> 00:34:13,730 was popular in 893 00:34:13,730 --> 00:34:16,939 public health epidemiological studies. 894 00:34:16,939 --> 00:34:19,744 And let's see. 895 00:34:19,744 --> 00:34:21,575 I'm kinda blocked in now. 896 00:34:21,575 --> 00:34:23,060 That's okay. I didn't mean to put you 897 00:34:23,060 --> 00:34:24,574 on the spot really. I didn't know. 898 00:34:24,574 --> 00:34:25,865 If people have advice, 899 00:34:25,865 --> 00:34:27,109 just one of the questions that we 900 00:34:27,109 --> 00:34:28,760 have to try to answer a lot. 901 00:34:28,760 --> 00:34:30,170 How technically should I go? 902 00:34:30,170 --> 00:34:31,925 And if I move into technical direction, 903 00:34:31,925 --> 00:34:32,810 What's gonna be the most 904 00:34:32,810 --> 00:34:34,009 useful in this field, 905 00:34:34,009 --> 00:34:35,300 that field, et cetera. 906 00:34:35,300 --> 00:34:37,310 Totally well, I do have a corollary bit 907 00:34:37,310 --> 00:34:38,434 of information though. 908 00:34:38,434 --> 00:34:40,549 I went to a conference that 909 00:34:40,549 --> 00:34:43,790 was populated by a number of 910 00:34:43,790 --> 00:34:47,225 researchers from this great place about 911 00:34:47,225 --> 00:34:48,679 the U-Dub called 912 00:34:48,679 --> 00:34:49,790 the Institute for Health 913 00:34:49,790 --> 00:34:51,470 Metrics and Evaluation. 914 00:34:51,470 --> 00:34:53,389 But the conference was labeled as 915 00:34:53,389 --> 00:34:54,680 a global health conference 916 00:34:54,680 --> 00:34:55,700 and I teach global health. 917 00:34:55,700 --> 00:34:57,860 And so I pulled aside one of 918 00:34:57,860 --> 00:35:00,290 the presenters and asked him 919 00:35:00,290 --> 00:35:01,399 how I could get 920 00:35:01,399 --> 00:35:05,449 my students employed after shock. 921 00:35:05,449 --> 00:35:08,389 And his first question was, 922 00:35:08,389 --> 00:35:11,719 Can they program databases? Right? 923 00:35:11,719 --> 00:35:14,074 And I said when we're 924 00:35:14,074 --> 00:35:16,460 much more quickly hire someone 925 00:35:16,460 --> 00:35:18,545 that could program a database 926 00:35:18,545 --> 00:35:20,479 than someone who has 927 00:35:20,479 --> 00:35:22,039 substantive knowledge about global health. 928 00:35:22,039 --> 00:35:22,819 And we'll just teach 929 00:35:22,819 --> 00:35:23,900 them all about global health. 930 00:35:23,900 --> 00:35:27,350 So that was deflating, realistic. 931 00:35:27,350 --> 00:35:30,620 So good for System Science, right? 932 00:35:30,620 --> 00:35:32,075 And good for data science. 933 00:35:32,075 --> 00:35:33,199 Well, our students are 934 00:35:33,199 --> 00:35:34,519 learning and I'll stop talking. 935 00:35:34,519 --> 00:35:35,959 I know there might be other questions. 936 00:35:35,959 --> 00:35:37,339 Our students are realizing 937 00:35:37,339 --> 00:35:39,859 that if they can't claim to be, 938 00:35:39,859 --> 00:35:41,600 to some degree, reasonably 939 00:35:41,600 --> 00:35:42,904 well-trained in data science. 940 00:35:42,904 --> 00:35:45,120 They're going to have a hard time for 80, 941 00:35:45,120 --> 00:35:46,835 90% of the jobs out there 942 00:35:46,835 --> 00:35:48,995 knowing system science isn't enough. 943 00:35:48,995 --> 00:35:51,425 It's, it's positive on their resume. 944 00:35:51,425 --> 00:35:53,180 And so a lot of our students are spending 945 00:35:53,180 --> 00:35:54,874 some time over and see us 946 00:35:54,874 --> 00:35:58,609 getting at least a credible list of 947 00:35:58,609 --> 00:36:00,979 data science credentials one way 948 00:36:00,979 --> 00:36:02,029 or the other to 949 00:36:02,029 --> 00:36:03,259 compliment their system science. 950 00:36:03,259 --> 00:36:04,820 I think that's smart that they're doing 951 00:36:04,820 --> 00:36:06,739 that we haven't actually 952 00:36:06,739 --> 00:36:08,629 lead that maybe as 953 00:36:08,629 --> 00:36:10,430 much as we should in terms of advising. 954 00:36:10,430 --> 00:36:11,900 But that's why I'm asking these questions. 955 00:36:11,900 --> 00:36:14,099 Have experienced people. 956 00:36:14,140 --> 00:36:16,819 I see people trying to do 957 00:36:16,819 --> 00:36:18,830 stuff in SQL, but that, 958 00:36:18,830 --> 00:36:21,020 that strikes me is just being cute 959 00:36:21,020 --> 00:36:24,200 because there's other tools that you can use. 960 00:36:24,200 --> 00:36:25,669 We'll do it much better. 961 00:36:25,669 --> 00:36:27,289 And proving that you can do 962 00:36:27,289 --> 00:36:29,164 it in SQL is kinda cute. 963 00:36:29,164 --> 00:36:31,399 But it's not the way to do it. 964 00:36:31,399 --> 00:36:34,699 It's like, yeah, I could do that with SQL, 965 00:36:34,699 --> 00:36:36,050 but I can do it also in 966 00:36:36,050 --> 00:36:38,555 one line in Python. You know. 967 00:36:38,555 --> 00:36:40,759 I mean, yeah, there's 968 00:36:40,759 --> 00:36:43,204 also like there's a whole thing in GIS, 969 00:36:43,204 --> 00:36:44,629 like where people try to prove that they 970 00:36:44,629 --> 00:36:46,549 can do stuff using command line. 971 00:36:46,549 --> 00:36:48,695 Like I can do Command Line GIS. 972 00:36:48,695 --> 00:36:49,580 And it's like, it's 973 00:36:49,580 --> 00:36:50,884 cute that you can do that. 974 00:36:50,884 --> 00:36:52,099 But in the real-world, 975 00:36:52,099 --> 00:36:53,299 you're going to grab your mouse and go 976 00:36:53,299 --> 00:36:55,920 drag the thing on the screen. 977 00:36:55,930 --> 00:36:58,879 So I see a lot of people 978 00:36:58,879 --> 00:37:01,174 doing weird stuff out there, 979 00:37:01,174 --> 00:37:03,470 like with funny programming languages 980 00:37:03,470 --> 00:37:05,750 or stuff just to kinda prove that they can. 981 00:37:05,750 --> 00:37:09,709 But Python and R are going to be here. 982 00:37:09,709 --> 00:37:11,329 I mean, those are, those are the 983 00:37:11,329 --> 00:37:12,560 hardcore like if 984 00:37:12,560 --> 00:37:15,080 you know Python and R and system science, 985 00:37:15,080 --> 00:37:17,344 then you can go out and do amazing things. 986 00:37:17,344 --> 00:37:19,415 They personally, yeah. 987 00:37:19,415 --> 00:37:22,325 So maybe, maybe you mentioned this. 988 00:37:22,325 --> 00:37:23,869 I got a bit distracted there in the middle, 989 00:37:23,869 --> 00:37:25,490 but how big, how big 990 00:37:25,490 --> 00:37:27,319 is your datasets for this? 991 00:37:27,319 --> 00:37:28,610 And did you, did you execute it 992 00:37:28,610 --> 00:37:30,254 all on a local computer? 993 00:37:30,254 --> 00:37:32,180 Did you run some of this in the cloud? 994 00:37:32,180 --> 00:37:33,800 Yeah. Great question. 995 00:37:33,800 --> 00:37:35,270 I actually did. I kinda hit 996 00:37:35,270 --> 00:37:37,355 the limits on my local computer. 997 00:37:37,355 --> 00:37:38,554 I'm actually remote it in. 998 00:37:38,554 --> 00:37:40,520 I've got a really big workstation 999 00:37:40,520 --> 00:37:43,055 at Portland State in my office. 1000 00:37:43,055 --> 00:37:45,109 And I remote into it from home and 1001 00:37:45,109 --> 00:37:47,555 run all of my modeling and stuff there. 1002 00:37:47,555 --> 00:37:49,789 And some of these datasets 1003 00:37:49,789 --> 00:37:52,414 are enough to even slow that sucker down. 1004 00:37:52,414 --> 00:37:55,550 There were 300,000 points 1005 00:37:55,550 --> 00:37:58,175 in the final prediction. 1006 00:37:58,175 --> 00:38:07,319 So and then they trained on 144,000 points. 1007 00:38:07,660 --> 00:38:10,414 And I did some random sampling 1008 00:38:10,414 --> 00:38:13,444 to get the computation down. 1009 00:38:13,444 --> 00:38:15,200 But you're absolutely right, Glenn, 1010 00:38:15,200 --> 00:38:17,540 what I need to do next is actually move 1011 00:38:17,540 --> 00:38:19,280 this entire process out 1012 00:38:19,280 --> 00:38:20,644 to Google Earth Engine. 1013 00:38:20,644 --> 00:38:22,774 Because Google Earth Engine has access 1014 00:38:22,774 --> 00:38:25,129 in the Cloud to all that sudden old data. 1015 00:38:25,129 --> 00:38:26,660 And then I can just run my code 1016 00:38:26,660 --> 00:38:27,800 in Python right out 1017 00:38:27,800 --> 00:38:29,720 there and not have 1018 00:38:29,720 --> 00:38:31,999 to transfer any data back and forth. 1019 00:38:31,999 --> 00:38:34,175 Yeah, that sounds that sounds great. 1020 00:38:34,175 --> 00:38:35,930 Yeah, that's, that's the way that's 1021 00:38:35,930 --> 00:38:38,640 the way we need to go with this. 1022 00:38:38,650 --> 00:38:43,340 I see only one small I have a qualm. 1023 00:38:43,340 --> 00:38:44,734 Let's put it that way. 1024 00:38:44,734 --> 00:38:46,744 Oh, good to hear from you, Steve. 1025 00:38:46,744 --> 00:38:52,715 Well, maybe lidar data 1026 00:38:52,715 --> 00:38:53,870 should be available for much 1027 00:38:53,870 --> 00:38:55,159 of the state, right? 1028 00:38:55,159 --> 00:38:58,594 Oh, actually, no. We're working on that now. 1029 00:38:58,594 --> 00:38:59,840 Yeah, but yeah. 1030 00:38:59,840 --> 00:39:00,274 Go ahead. 1031 00:39:00,274 --> 00:39:02,240 It seems to me that's 1032 00:39:02,240 --> 00:39:04,220 your primary tool for 1033 00:39:04,220 --> 00:39:07,505 location of spaces, slopes. 1034 00:39:07,505 --> 00:39:08,735 Oh, yeah, definitely. 1035 00:39:08,735 --> 00:39:11,614 And watersheds, etc. 1036 00:39:11,614 --> 00:39:14,060 Yeah. So once you have that data 1037 00:39:14,060 --> 00:39:17,344 in and you organize 1038 00:39:17,344 --> 00:39:19,984 data in terms of 1039 00:39:19,984 --> 00:39:21,710 the Lidar slopes that 1040 00:39:21,710 --> 00:39:22,189 are because you're going 1041 00:39:22,189 --> 00:39:24,094 to end up with between two spaces, 1042 00:39:24,094 --> 00:39:26,224 because you've got pixels. 1043 00:39:26,224 --> 00:39:28,369 You now I've got a wave 1044 00:39:28,369 --> 00:39:29,869 spatially organizing it would 1045 00:39:29,869 --> 00:39:33,394 seem to me you're currently separated pixels. 1046 00:39:33,394 --> 00:39:35,749 The problem I see for mapping 1047 00:39:35,749 --> 00:39:38,119 a geological problem with a set of 1048 00:39:38,119 --> 00:39:40,864 separated locations is that 1049 00:39:40,864 --> 00:39:43,100 the space in Portland is 1050 00:39:43,100 --> 00:39:45,724 not the space in Eastern Oregon, 1051 00:39:45,724 --> 00:39:48,049 but you're being treating basically them 1052 00:39:48,049 --> 00:39:49,579 each is separated pixels 1053 00:39:49,579 --> 00:39:51,560 we characteristics based on light. 1054 00:39:51,560 --> 00:39:53,555 And now you're doing a little bit of large, 1055 00:39:53,555 --> 00:39:55,834 of a large area geometry. 1056 00:39:55,834 --> 00:39:58,219 But you really want to handle 1057 00:39:58,219 --> 00:39:59,930 two pixels that are side-by-side 1058 00:39:59,930 --> 00:40:01,970 as though they're side-by-side. 1059 00:40:01,970 --> 00:40:04,174 And solving that problem, 1060 00:40:04,174 --> 00:40:05,749 which I haven't given 1061 00:40:05,749 --> 00:40:09,239 enough thought to know how confused I am, 1062 00:40:10,150 --> 00:40:12,770 would seem to me to be 1063 00:40:12,770 --> 00:40:15,590 a key aspect of 1064 00:40:15,590 --> 00:40:18,065 coming up with a geological map. 1065 00:40:18,065 --> 00:40:19,850 That's a really good point, Steve. 1066 00:40:19,850 --> 00:40:20,929 And actually that it totally 1067 00:40:20,929 --> 00:40:22,520 ties into the work that Marty and I are 1068 00:40:22,520 --> 00:40:25,834 doing together because we're taking pixels 1069 00:40:25,834 --> 00:40:29,555 of land use data from a satellite map, 1070 00:40:29,555 --> 00:40:31,204 the national land cover map. 1071 00:40:31,204 --> 00:40:32,660 And we're looking at that pixel 1072 00:40:32,660 --> 00:40:34,009 plus the pixels of the north, 1073 00:40:34,009 --> 00:40:35,464 south, east, and west. 1074 00:40:35,464 --> 00:40:36,980 And we're treating those as 1075 00:40:36,980 --> 00:40:39,785 new variables in our model. 1076 00:40:39,785 --> 00:40:43,235 Yes, That's the general idea. 1077 00:40:43,235 --> 00:40:44,929 I'm not sure because you're 1078 00:40:44,929 --> 00:40:46,775 dealing with slopes and gravity, 1079 00:40:46,775 --> 00:40:50,330 whether or not you want to make it that 1080 00:40:50,330 --> 00:40:55,460 unconstrained in a geoloc mapping problem. 1081 00:40:55,460 --> 00:40:59,240 But it seems to me that you have to 1082 00:40:59,240 --> 00:41:00,575 do that in order to have 1083 00:41:00,575 --> 00:41:03,184 any possibility of success. 1084 00:41:03,184 --> 00:41:04,715 And I'd wonder about 1085 00:41:04,715 --> 00:41:08,119 the geological relationship of 1086 00:41:08,119 --> 00:41:09,890 that set of pixels, 1087 00:41:09,890 --> 00:41:11,959 which gave us his poor results. 1088 00:41:11,959 --> 00:41:17,074 Slope angle boundaries, right? 1089 00:41:17,074 --> 00:41:19,160 And that definitely, So slope is definitely 1090 00:41:19,160 --> 00:41:20,300 one of the other variables that 1091 00:41:20,300 --> 00:41:22,055 I want to put into the model. 1092 00:41:22,055 --> 00:41:25,189 Because certain types of units are gonna be 1093 00:41:25,189 --> 00:41:30,769 more inclined to be 1094 00:41:30,769 --> 00:41:34,535 flatline versus being actually inclined. 1095 00:41:34,535 --> 00:41:37,730 That's actually an attribute of 1096 00:41:37,730 --> 00:41:41,809 a geologic unit is how Angular does it yet. 1097 00:41:41,809 --> 00:41:44,360 So, yeah, that's a, that's a great, 1098 00:41:44,360 --> 00:41:46,670 That's definitely part of it. 1099 00:41:46,670 --> 00:41:48,829 The pixels are close enough 1100 00:41:48,829 --> 00:41:50,300 together that it's not a problem. 1101 00:41:50,300 --> 00:41:52,295 But what I think you might be saying is that 1102 00:41:52,295 --> 00:41:54,799 each pixel is being treated independently, 1103 00:41:54,799 --> 00:41:56,495 whereas it should be 1104 00:41:56,495 --> 00:41:58,414 treated as part of a neighborhood? 1105 00:41:58,414 --> 00:42:01,489 Yes. And perhaps I'm not 1106 00:42:01,489 --> 00:42:02,540 quite sure how to define 1107 00:42:02,540 --> 00:42:04,564 neighborhood for geological purposes. 1108 00:42:04,564 --> 00:42:07,564 I'm not a geologist, but nevertheless, 1109 00:42:07,564 --> 00:42:08,750 it would seem to me that that 1110 00:42:08,750 --> 00:42:10,849 would be important for 1111 00:42:10,849 --> 00:42:15,395 extracting what a compact, what would it be? 1112 00:42:15,395 --> 00:42:18,529 A compact set of symbol of where 1113 00:42:18,529 --> 00:42:20,060 our compact set of pixels 1114 00:42:20,060 --> 00:42:22,310 should be similar, right? 1115 00:42:22,310 --> 00:42:24,110 And that would seem to me to be 1116 00:42:24,110 --> 00:42:26,509 an important thing to try to get away from. 1117 00:42:26,509 --> 00:42:28,220 Part of what we have basically with 1118 00:42:28,220 --> 00:42:30,829 qualia is that we see things as similar, 1119 00:42:30,829 --> 00:42:33,289 somehow unlinked them in space. 1120 00:42:33,289 --> 00:42:35,089 We don't have a means of doing that yet 1121 00:42:35,089 --> 00:42:38,434 in in datasets as far as I know. 1122 00:42:38,434 --> 00:42:43,940 But that was that was that stood out to me. 1123 00:42:43,940 --> 00:42:45,410 And it would seem to me that that 1124 00:42:45,410 --> 00:42:47,389 would be an essential 1125 00:42:47,389 --> 00:42:49,384 to use that and trying to predict 1126 00:42:49,384 --> 00:42:51,305 your predictive model without that, 1127 00:42:51,305 --> 00:42:53,240 it seems to me to be. 1128 00:42:53,240 --> 00:42:55,895 It's hard to tell how good your data 1129 00:42:55,895 --> 00:42:58,789 is without that in terms of your prediction. 1130 00:42:58,789 --> 00:43:02,300 Yes, I agree. Steve, I think, 1131 00:43:02,300 --> 00:43:04,400 I think bringing slope in 1132 00:43:04,400 --> 00:43:07,910 to the model is going to help it quite a bit. 1133 00:43:07,910 --> 00:43:09,800 The two other things 1134 00:43:09,800 --> 00:43:11,539 I'd bring up is you probably have got 1135 00:43:11,539 --> 00:43:15,934 data from other industries on 1136 00:43:15,934 --> 00:43:19,399 some land-use questions which may affect what 1137 00:43:19,399 --> 00:43:20,929 your pixels are saying as far as 1138 00:43:20,929 --> 00:43:23,164 the frequencies that are displaying. 1139 00:43:23,164 --> 00:43:26,554 Roads e.g. will, will 1140 00:43:26,554 --> 00:43:28,279 definitely affect the way that picked 1141 00:43:28,279 --> 00:43:32,045 up a road would 1142 00:43:32,045 --> 00:43:34,580 have a reflection right next to him. 1143 00:43:34,580 --> 00:43:36,110 Piece of non-road which 1144 00:43:36,110 --> 00:43:37,579 were basically you'd want to 1145 00:43:37,579 --> 00:43:39,710 take the road out of your geological map. 1146 00:43:39,710 --> 00:43:40,400 Oh, good. 1147 00:43:40,400 --> 00:43:41,719 I'm so glad that you mentioned that, 1148 00:43:41,719 --> 00:43:42,620 Steve, that's one of 1149 00:43:42,620 --> 00:43:43,595 the things that I've been, 1150 00:43:43,595 --> 00:43:45,649 that that's another improvement in the model 1151 00:43:45,649 --> 00:43:46,490 and I forgot to add to 1152 00:43:46,490 --> 00:43:47,975 my final thoughts slide. 1153 00:43:47,975 --> 00:43:49,399 And that is that I can use 1154 00:43:49,399 --> 00:43:51,020 masking to actually 1155 00:43:51,020 --> 00:43:52,430 mask out areas that 1156 00:43:52,430 --> 00:43:54,064 I don't want to include in the model. 1157 00:43:54,064 --> 00:43:56,584 So I can say, take out this road, 1158 00:43:56,584 --> 00:43:57,935 take out this forest. 1159 00:43:57,935 --> 00:44:00,139 This forest is just confusing my models, 1160 00:44:00,139 --> 00:44:01,550 so let's just mask it out. 1161 00:44:01,550 --> 00:44:03,559 So yeah, we can totally do stuff like that 1162 00:44:03,559 --> 00:44:05,765 to an agricultural areas. 1163 00:44:05,765 --> 00:44:06,770 Yep. Yep. 1164 00:44:06,770 --> 00:44:08,569 That would, I think dramatically 1165 00:44:08,569 --> 00:44:10,040 improve things to begin with. 1166 00:44:10,040 --> 00:44:10,805 Yeah. 1167 00:44:10,805 --> 00:44:12,859 That's I don't think I 1168 00:44:12,859 --> 00:44:14,675 have anything else to say to 1169 00:44:14,675 --> 00:44:16,550 suggest except that when you 1170 00:44:16,550 --> 00:44:19,535 mentioned wanna give people a sense of size. 1171 00:44:19,535 --> 00:44:21,770 School bus is 40 ft long. 1172 00:44:21,770 --> 00:44:23,794 Oh, good. Thank You. 1173 00:44:23,794 --> 00:44:25,894 Do school but a little bit about 1174 00:44:25,894 --> 00:44:28,610 two-thirds to three-quarters of 1175 00:44:28,610 --> 00:44:30,290 two school buses and links, 1176 00:44:30,290 --> 00:44:36,319 hey hey purse, tennis courts. 1177 00:44:36,319 --> 00:44:43,715 The length of a tennis court is 23.8 m. Okay. 1178 00:44:43,715 --> 00:44:48,485 So that's a good illustration. 1179 00:44:48,485 --> 00:44:49,520 You were talking about what's 1180 00:44:49,520 --> 00:44:51,559 20 m? Yeah, that's perfect. 1181 00:44:51,559 --> 00:44:53,749 Yeah. A little bit shorter than 1182 00:44:53,749 --> 00:44:56,150 the length of a tennis court, right? 1183 00:44:56,150 --> 00:44:57,365 That's perfect. Yeah. 1184 00:44:57,365 --> 00:44:58,264 Thank you. Alright. 1185 00:44:58,264 --> 00:44:59,615 Now that I have before, 1186 00:44:59,615 --> 00:45:01,830 let me ask you another question. 1187 00:45:01,870 --> 00:45:06,359 Could you have used 1188 00:45:06,490 --> 00:45:09,875 among the machine-learning 1189 00:45:09,875 --> 00:45:11,405 methods that you used? 1190 00:45:11,405 --> 00:45:13,894 Or is it the case that 1191 00:45:13,894 --> 00:45:15,080 because our outcome isn't 1192 00:45:15,080 --> 00:45:17,090 the insight kid or isn't 1193 00:45:17,090 --> 00:45:19,759 then the aren't assist them that it is an 1194 00:45:19,759 --> 00:45:20,840 integrated into 1195 00:45:20,840 --> 00:45:23,705 this general machine-learning, 1196 00:45:23,705 --> 00:45:25,864 a suite that you, 1197 00:45:25,864 --> 00:45:27,484 that'll be difficult to use 1198 00:45:27,484 --> 00:45:29,585 Ockham to compare it to say 1199 00:45:29,585 --> 00:45:32,299 random far as neural nets 1200 00:45:32,299 --> 00:45:34,385 and support vector machines. 1201 00:45:34,385 --> 00:45:36,349 I think the tricky thing with using 1202 00:45:36,349 --> 00:45:38,120 Aquaman this would be coming up 1203 00:45:38,120 --> 00:45:39,530 with a rational way to 1204 00:45:39,530 --> 00:45:44,120 discretize the continuous satellite signal. 1205 00:45:44,120 --> 00:45:46,249 Because we don't know 1206 00:45:46,249 --> 00:45:48,755 where to put the class breaks in. 1207 00:45:48,755 --> 00:45:50,975 Okay, So, but if we didn't, 1208 00:45:50,975 --> 00:45:53,120 if we did some analysis on it and looked at 1209 00:45:53,120 --> 00:45:56,149 the distributions of each of those signals. 1210 00:45:56,149 --> 00:45:57,379 We might be able to come up with 1211 00:45:57,379 --> 00:45:58,715 reasonable class breaks and 1212 00:45:58,715 --> 00:46:00,080 divide each one into like 1213 00:46:00,080 --> 00:46:05,074 55 things, then we could do that. 1214 00:46:05,074 --> 00:46:06,499 And I did think 1215 00:46:06,499 --> 00:46:07,249 about that a couple of 1216 00:46:07,249 --> 00:46:08,149 days ago I was thinking, you know, 1217 00:46:08,149 --> 00:46:11,299 aka might do a pretty good job of predicting 1218 00:46:11,299 --> 00:46:13,639 these units if I can get 1219 00:46:13,639 --> 00:46:17,405 the data munge around the right way, 1220 00:46:17,405 --> 00:46:20,869 you know, the methods that you 1221 00:46:20,869 --> 00:46:25,475 use all treated the data as continuous data. 1222 00:46:25,475 --> 00:46:27,499 Is that right? Yeah, yeah, 1223 00:46:27,499 --> 00:46:30,365 That market is higher. 1224 00:46:30,365 --> 00:46:32,089 And I just finish 1225 00:46:32,089 --> 00:46:35,749 this study where we compare, OK, 1226 00:46:35,749 --> 00:46:38,210 on the two support vector machines and 1227 00:46:38,210 --> 00:46:41,330 neural nets, outcome did better. 1228 00:46:41,330 --> 00:46:44,240 And we had a discrete too high as 1229 00:46:44,240 --> 00:46:47,240 data for outcome and Bayesian networks. 1230 00:46:47,240 --> 00:46:49,129 And the support vector machines 1231 00:46:49,129 --> 00:46:50,690 and neural nets treated 1232 00:46:50,690 --> 00:46:52,265 the data as continuous 1233 00:46:52,265 --> 00:46:54,529 and still outcome did better. 1234 00:46:54,529 --> 00:46:57,020 So it's worth thinking about. 1235 00:46:57,020 --> 00:46:59,135 It is, yeah. 1236 00:46:59,135 --> 00:47:03,290 The question is, you 1237 00:47:03,290 --> 00:47:07,579 took training data on areas of 1238 00:47:07,579 --> 00:47:09,259 high resolution and then 1239 00:47:09,259 --> 00:47:11,794 you try to use that to 1240 00:47:11,794 --> 00:47:14,120 classify areas that are 1241 00:47:14,120 --> 00:47:16,654 at low resolution, right? Or two, yes. 1242 00:47:16,654 --> 00:47:21,319 So wouldn't it be desirable to kind of 1243 00:47:21,319 --> 00:47:25,160 test what you've learned 1244 00:47:25,160 --> 00:47:26,405 on the training data, 1245 00:47:26,405 --> 00:47:27,979 on something else that 1246 00:47:27,979 --> 00:47:29,539 you know the answers to 1247 00:47:29,539 --> 00:47:34,399 just to verify that in fact, 1248 00:47:34,399 --> 00:47:39,170 your validly applying it to 1249 00:47:39,170 --> 00:47:44,029 new to different terrain? Yes. 1250 00:47:44,029 --> 00:47:46,429 But so you mean like 1251 00:47:46,429 --> 00:47:48,770 samples from the actual well, 1252 00:47:48,770 --> 00:47:50,179 let's say if, if the 1253 00:47:50,179 --> 00:47:51,740 area used for the training data, 1254 00:47:51,740 --> 00:47:54,305 if you divide that into two, 1255 00:47:54,305 --> 00:47:57,200 you train on one part and you test on 1256 00:47:57,200 --> 00:48:00,229 the other part of the high resolution just 1257 00:48:00,229 --> 00:48:01,909 to be sure that in 1258 00:48:01,909 --> 00:48:03,799 fact it does generalize even 1259 00:48:03,799 --> 00:48:08,239 within that that similar terrain. 1260 00:48:08,239 --> 00:48:09,095 Right. 1261 00:48:09,095 --> 00:48:10,670 Maybe I should do a train test 1262 00:48:10,670 --> 00:48:12,035 validate split. 1263 00:48:12,035 --> 00:48:14,989 Yeah. Yeah. Yeah, that's a good idea. 1264 00:48:14,989 --> 00:48:17,420 I should do that because right now yeah, 1265 00:48:17,420 --> 00:48:19,264 I'm just using the train test splits 1266 00:48:19,264 --> 00:48:23,089 and getting getting in 1267 00:48:23,089 --> 00:48:29,330 the range of 70% accuracy in the train. Yeah. 1268 00:48:29,330 --> 00:48:30,289 Right. 1269 00:48:30,289 --> 00:48:32,270 But, you know, you're applying it to 1270 00:48:32,270 --> 00:48:34,624 a totally different kind of terrain. 1271 00:48:34,624 --> 00:48:36,319 So you don't know, 1272 00:48:36,319 --> 00:48:38,270 you haven't checked that 1273 00:48:38,270 --> 00:48:40,219 you are training as well, 1274 00:48:40,219 --> 00:48:44,089 generalize well, even to similar terrain? 1275 00:48:44,089 --> 00:48:45,739 Well, no, I think I'm 1276 00:48:45,739 --> 00:48:47,930 verifying it within the training set. 1277 00:48:47,930 --> 00:48:49,340 I'm validating it. 1278 00:48:49,340 --> 00:48:50,614 Very right. 1279 00:48:50,614 --> 00:48:52,880 And I'm making sure that the place that I'm 1280 00:48:52,880 --> 00:48:55,580 going to next is related to it. 1281 00:48:55,580 --> 00:48:56,810 So I should be seeing 1282 00:48:56,810 --> 00:48:59,249 the same kinds of stuff up there. 1283 00:48:59,920 --> 00:49:02,180 That was that physiographic 1284 00:49:02,180 --> 00:49:03,649 Gavin terrains where I had 1285 00:49:03,649 --> 00:49:06,499 one tree over here and one over here. Yeah. 1286 00:49:06,499 --> 00:49:09,484 So I am trying to I'm trying to do that. 1287 00:49:09,484 --> 00:49:11,614 That's a good idea. I should, 1288 00:49:11,614 --> 00:49:13,820 I should actually spend more time 1289 00:49:13,820 --> 00:49:17,370 validating the data from the training sets. 1290 00:49:21,610 --> 00:49:23,509 By Rick. 1291 00:49:23,509 --> 00:49:26,285 Oh, yeah. I have a question. 1292 00:49:26,285 --> 00:49:27,244 Yeah. Oh, okay. 1293 00:49:27,244 --> 00:49:28,760 Yeah. 1294 00:49:28,760 --> 00:49:31,684 Did I read correctly and you're upset that 1295 00:49:31,684 --> 00:49:33,829 part of part of it 1296 00:49:33,829 --> 00:49:36,319 included an ask or raising awareness that 1297 00:49:36,319 --> 00:49:39,260 you all could use help with this project. 1298 00:49:39,260 --> 00:49:41,585 Are you trying to recruit students? 1299 00:49:41,585 --> 00:49:42,800 I would love to have 1300 00:49:42,800 --> 00:49:44,764 some students involved with it. 1301 00:49:44,764 --> 00:49:47,359 So as I look at the read, 1302 00:49:47,359 --> 00:49:49,520 write territory here, it seems 1303 00:49:49,520 --> 00:49:51,530 like you've got a lot of work to be done. 1304 00:49:51,530 --> 00:49:54,679 How long do you think this will take to flesh 1305 00:49:54,679 --> 00:49:57,799 out the red areas 1306 00:49:57,799 --> 00:50:01,529 with higher res information? 1307 00:50:04,480 --> 00:50:06,739 What's the rate of progress, 1308 00:50:06,739 --> 00:50:07,850 I guess is what I'm saying. 1309 00:50:07,850 --> 00:50:09,559 I know. Yeah, that's the problem. 1310 00:50:09,559 --> 00:50:10,174 Right. 1311 00:50:10,174 --> 00:50:11,870 Like so part of my motivation 1312 00:50:11,870 --> 00:50:13,564 for this too is that I've been 1313 00:50:13,564 --> 00:50:15,890 working with the geologic map of Oregon for 1314 00:50:15,890 --> 00:50:17,120 like 20 years now since 1315 00:50:17,120 --> 00:50:18,709 I've been in the department. 1316 00:50:18,709 --> 00:50:21,754 And it hasn't changed much like 1317 00:50:21,754 --> 00:50:25,264 this level of poorly mapped stuff. 1318 00:50:25,264 --> 00:50:27,245 And I'm not supposed to say poorly mapped. 1319 00:50:27,245 --> 00:50:29,629 I'm actually, I'm supposed to say mapped at 1320 00:50:29,629 --> 00:50:32,119 low resolution because the mapping 1321 00:50:32,119 --> 00:50:34,295 that was done was high-quality mapping. 1322 00:50:34,295 --> 00:50:36,199 It's just that it was at a scale that's 1323 00:50:36,199 --> 00:50:38,510 not as good as what we needed to be now. 1324 00:50:38,510 --> 00:50:40,805 So it's not that it was poorly mapped. 1325 00:50:40,805 --> 00:50:42,650 It's just that it's mapped at low resolution. 1326 00:50:42,650 --> 00:50:44,165 But anyway, yeah, but the the, 1327 00:50:44,165 --> 00:50:45,499 the low resolution areas of 1328 00:50:45,499 --> 00:50:47,059 the map have not changed much. 1329 00:50:47,059 --> 00:50:49,519 And the priorities for the state, of course, 1330 00:50:49,519 --> 00:50:50,599 are the places where there's 1331 00:50:50,599 --> 00:50:52,639 high population density. 1332 00:50:52,639 --> 00:50:54,829 So I don't see us 1333 00:50:54,829 --> 00:50:56,509 getting the rest of this map done 1334 00:50:56,509 --> 00:50:58,235 using our current methods 1335 00:50:58,235 --> 00:51:00,365 for 20 years or more. 1336 00:51:00,365 --> 00:51:02,270 So I'm hoping, I'm hoping with 1337 00:51:02,270 --> 00:51:03,709 my satellite approach that maybe 1338 00:51:03,709 --> 00:51:06,329 we can get it done in like five years. 1339 00:51:06,430 --> 00:51:08,960 Well, my question is 1340 00:51:08,960 --> 00:51:10,129 a little self-serving and I'll be 1341 00:51:10,129 --> 00:51:11,329 turning 60 years old 1342 00:51:11,329 --> 00:51:12,919 this year and a couple of months. 1343 00:51:12,919 --> 00:51:14,179 And so I'm starting to think 1344 00:51:14,179 --> 00:51:15,499 about retirement, 1345 00:51:15,499 --> 00:51:16,804 write down the road. 1346 00:51:16,804 --> 00:51:20,389 And I think I've harbored this interest in 1347 00:51:20,389 --> 00:51:22,070 my retirement of being one of 1348 00:51:22,070 --> 00:51:24,019 those volunteer scientists that 1349 00:51:24,019 --> 00:51:25,819 goes out and does fieldwork, 1350 00:51:25,819 --> 00:51:29,434 go help with biology transects or so? 1351 00:51:29,434 --> 00:51:31,490 On Wednesday, I chaired a big meeting 1352 00:51:31,490 --> 00:51:32,509 and it was nerve-wracking 1353 00:51:32,509 --> 00:51:33,650 and to reward myself, 1354 00:51:33,650 --> 00:51:36,425 I attended the Wednesday geology talk 1355 00:51:36,425 --> 00:51:39,199 on local faultlines and 1356 00:51:39,199 --> 00:51:40,745 it was very good actually. 1357 00:51:40,745 --> 00:51:41,314 Yeah. 1358 00:51:41,314 --> 00:51:42,530 And she was so great. 1359 00:51:42,530 --> 00:51:44,569 It was just really a rich talk and it 1360 00:51:44,569 --> 00:51:47,569 made me think about weather. 1361 00:51:47,569 --> 00:51:49,190 There wouldn't be room for 1362 00:51:49,190 --> 00:51:50,450 someone not trained in 1363 00:51:50,450 --> 00:51:51,620 geology to go help you 1364 00:51:51,620 --> 00:51:53,450 with Huff at data collection. 1365 00:51:53,450 --> 00:51:55,954 There's lots of room for that, okay? 1366 00:51:55,954 --> 00:51:58,009 Yeah. In fact, actually 1367 00:51:58,009 --> 00:52:00,169 the US Geological Survey has had 1368 00:52:00,169 --> 00:52:02,434 a thing for a while called like 1369 00:52:02,434 --> 00:52:03,829 mapping core or geo 1370 00:52:03,829 --> 00:52:05,165 core or something like that. 1371 00:52:05,165 --> 00:52:08,059 Where they just like we would ask volunteers 1372 00:52:08,059 --> 00:52:11,450 to go out and collect samples or GPS, 1373 00:52:11,450 --> 00:52:14,610 something in or whatever needed to be done. 1374 00:52:23,860 --> 00:52:26,915 Percy, one of your interests 1375 00:52:26,915 --> 00:52:30,184 you said was fractals. 1376 00:52:30,184 --> 00:52:35,659 What have you thought at all? 1377 00:52:35,659 --> 00:52:38,480 To what extent that these maps 1378 00:52:38,480 --> 00:52:40,474 fractal and can you, 1379 00:52:40,474 --> 00:52:43,070 can you use fractal kind of 1380 00:52:43,070 --> 00:52:45,709 metrics to talk about regions? 1381 00:52:45,709 --> 00:52:47,630 And is fractal? 1382 00:52:47,630 --> 00:52:51,889 Is the idea of fractals relevant as possible? 1383 00:52:51,889 --> 00:52:54,845 I mean, we see fractals a lot in 1384 00:52:54,845 --> 00:52:57,215 GIS in terms of measuring coast 1385 00:52:57,215 --> 00:52:58,699 near the classic example 1386 00:52:58,699 --> 00:52:59,930 of the coastline, right? 1387 00:52:59,930 --> 00:53:01,099 Right. 1388 00:53:01,099 --> 00:53:05,824 And so what we would probably, 1389 00:53:05,824 --> 00:53:07,850 we've seen this actually in 1390 00:53:07,850 --> 00:53:09,845 our work with the national land cover data 1391 00:53:09,845 --> 00:53:12,199 when we went from the 90 meter grid cell size 1392 00:53:12,199 --> 00:53:13,835 down to the 30-meter grid cell size. 1393 00:53:13,835 --> 00:53:14,900 The patterns that we were 1394 00:53:14,900 --> 00:53:16,040 picking up when we had 1395 00:53:16,040 --> 00:53:17,734 90 meter resolution were 1396 00:53:17,734 --> 00:53:19,069 the same patterns that we saw at 1397 00:53:19,069 --> 00:53:20,450 the 30 meter resolution, 1398 00:53:20,450 --> 00:53:22,190 but the model is a little bit different. 1399 00:53:22,190 --> 00:53:23,899 But it was still basically it 1400 00:53:23,899 --> 00:53:25,534 was that sort of thing. 1401 00:53:25,534 --> 00:53:29,105 So I would expect that 1402 00:53:29,105 --> 00:53:33,619 if we change the resolution of the data, 1403 00:53:33,619 --> 00:53:37,669 that we would get similar patterns. 1404 00:53:37,669 --> 00:53:38,570 Yeah. 1405 00:53:38,570 --> 00:53:41,130 But I'm not sure. 1406 00:53:41,470 --> 00:53:43,849 I don't know if that would show up in 1407 00:53:43,849 --> 00:53:47,299 the actual band signals. 1408 00:53:47,299 --> 00:53:48,350 It's hard to see how that 1409 00:53:48,350 --> 00:53:50,164 would actually express itself. 1410 00:53:50,164 --> 00:53:52,339 I mean, I'm I'm guessing it would be there, 1411 00:53:52,339 --> 00:53:53,299 but I'm not sure how it 1412 00:53:53,299 --> 00:53:56,945 would it would show itself. 1413 00:53:56,945 --> 00:53:58,699 Just because something is 1414 00:53:58,699 --> 00:54:01,370 observed is not mean, it's predictive. 1415 00:54:01,370 --> 00:54:02,974 True. 1416 00:54:02,974 --> 00:54:06,290 And so we often see fractals, 1417 00:54:06,290 --> 00:54:07,580 but it's not clear how you'd use 1418 00:54:07,580 --> 00:54:09,960 the information for prediction. 1419 00:54:13,360 --> 00:54:15,244 Yeah. 1420 00:54:15,244 --> 00:54:17,720 And I'm I'm, I'm curious 1421 00:54:17,720 --> 00:54:20,989 about you mentioned the forest model. 1422 00:54:20,989 --> 00:54:22,220 Yeah. 1423 00:54:22,220 --> 00:54:27,035 Now, what is the mathematical transform 1424 00:54:27,035 --> 00:54:29,195 it's talking about and how is it used? 1425 00:54:29,195 --> 00:54:30,964 I haven't heard of this. 1426 00:54:30,964 --> 00:54:33,454 So random forest is 1427 00:54:33,454 --> 00:54:36,870 a collection of decision trees. 1428 00:54:37,720 --> 00:54:41,104 So you start with a decision tree. 1429 00:54:41,104 --> 00:54:43,669 You look at say, band one and you say, 1430 00:54:43,669 --> 00:54:46,699 at what point in band one does this 1431 00:54:46,699 --> 00:54:50,479 divide into unit a versus unit B? 1432 00:54:50,479 --> 00:54:52,580 And then you do that with Bantu 1433 00:54:52,580 --> 00:54:54,244 and bands or you make 1434 00:54:54,244 --> 00:54:55,939 decision cuts on each of 1435 00:54:55,939 --> 00:54:58,569 the variables and say, 1436 00:54:58,569 --> 00:55:03,950 I could show you a decision tree, 1437 00:55:03,950 --> 00:55:06,080 right, this second binary 1438 00:55:06,080 --> 00:55:07,399 choice and each one. 1439 00:55:07,399 --> 00:55:09,380 And then you combine the binary choices. 1440 00:55:09,380 --> 00:55:10,970 Yeah, Basically, yeah, it's like a, 1441 00:55:10,970 --> 00:55:12,995 it's a, it's a, it's an ensemble. 1442 00:55:12,995 --> 00:55:14,990 It's an ensemble technique. 1443 00:55:14,990 --> 00:55:16,760 So each decision tree and 1444 00:55:16,760 --> 00:55:18,109 each decision tree is actually 1445 00:55:18,109 --> 00:55:20,120 based on a subset of the data. 1446 00:55:20,120 --> 00:55:22,430 So it doesn't use the full dataset. 1447 00:55:22,430 --> 00:55:25,340 It uses maybe three columns out of the ten. 1448 00:55:25,340 --> 00:55:27,620 And instead of using all hundred and 1449 00:55:27,620 --> 00:55:28,999 44,000 rows and only 1450 00:55:28,999 --> 00:55:30,785 uses the first 20,000 rows. 1451 00:55:30,785 --> 00:55:32,555 And so each little random forest, 1452 00:55:32,555 --> 00:55:34,670 each decision tree is based on 1453 00:55:34,670 --> 00:55:37,070 some micro set of the data. 1454 00:55:37,070 --> 00:55:38,599 And then each of 1455 00:55:38,599 --> 00:55:40,190 the little decision trees votes 1456 00:55:40,190 --> 00:55:41,480 when it sees a new data. 1457 00:55:41,480 --> 00:55:44,990 And the ensemble aggregates 1458 00:55:44,990 --> 00:55:46,010 those votes and then 1459 00:55:46,010 --> 00:55:47,689 comes up with a final decision. 1460 00:55:47,689 --> 00:55:51,485 Does it use, does it redo the rent, 1461 00:55:51,485 --> 00:55:52,940 the random development of 1462 00:55:52,940 --> 00:55:56,194 the Random Forest repeatedly? 1463 00:55:56,194 --> 00:55:58,385 Know, you trade it up 1464 00:55:58,385 --> 00:56:00,769 iteratively until it gets to the point 1465 00:56:00,769 --> 00:56:01,939 where it's a stable model on 1466 00:56:01,939 --> 00:56:03,230 each iteration it starts 1467 00:56:03,230 --> 00:56:05,030 with a different random forest. 1468 00:56:05,030 --> 00:56:08,195 Yeah. Okay. 1469 00:56:08,195 --> 00:56:10,759 And so then you basically, 1470 00:56:10,759 --> 00:56:12,139 so you could do that 1471 00:56:12,139 --> 00:56:13,670 on a relatively small subset of 1472 00:56:13,670 --> 00:56:14,959 data and see if it would it 1473 00:56:14,959 --> 00:56:16,670 would go to a larger subset? 1474 00:56:16,670 --> 00:56:18,330 Definitely. 1475 00:56:19,150 --> 00:56:21,349 Okay. And whether or not 1476 00:56:21,349 --> 00:56:22,879 the weather, I'm sorry, 1477 00:56:22,879 --> 00:56:24,559 whether or not the random forest first 1478 00:56:24,559 --> 00:56:27,110 developed would extend is what I meant. 1479 00:56:27,110 --> 00:56:28,744 Yes. Okay, thanks. 1480 00:56:28,744 --> 00:56:30,125 Yeah. 1481 00:56:30,125 --> 00:56:32,720 So is, is the retina 1482 00:56:32,720 --> 00:56:36,395 just discretizing the data? 1483 00:56:36,395 --> 00:56:43,279 Now, notice that it decides where to cut. 1484 00:56:43,279 --> 00:56:45,590 Where to cut continua. 1485 00:56:45,590 --> 00:56:48,755 Yeah, let's discretize, 1486 00:56:48,755 --> 00:56:51,380 in effect, discretize the data. 1487 00:56:51,380 --> 00:56:52,879 Yeah, you're right, it does actually 1488 00:56:52,879 --> 00:56:54,200 kinda, well. 1489 00:56:54,200 --> 00:56:55,550 And then could you use random, 1490 00:56:55,550 --> 00:56:58,550 Could you use the random forest cuts 1491 00:56:58,550 --> 00:57:00,424 to discretize it for aka? 1492 00:57:00,424 --> 00:57:04,410 Exactly, Yeah, that's a great idea. 1493 00:57:05,830 --> 00:57:08,029 Hey guys, I'm gonna go ahead 1494 00:57:08,029 --> 00:57:09,244 and turn the recording off. 1495 00:57:09,244 --> 00:57:10,219 You can hang around, 1496 00:57:10,219 --> 00:57:11,360 but this way we've got a kind of 1497 00:57:11,360 --> 00:57:13,790 a nice stopping point right here. Exactly. 1498 00:57:13,790 --> 00:57:15,480 Thanks, Wayne.