1 00:00:02,670 --> 00:00:06,414 Welcome everyone to this system science noon, 2 00:00:06,414 --> 00:00:08,109 Friday noon seminar series. 3 00:00:08,109 --> 00:00:11,770 And today we have guy cutting. 4 00:00:11,770 --> 00:00:15,775 Who you how long ago did you graduate? 5 00:00:15,775 --> 00:00:17,365 In 2018? 6 00:00:17,365 --> 00:00:19,210 Yeah. So he's been an 7 00:00:19,210 --> 00:00:23,379 out on doing other things 8 00:00:23,379 --> 00:00:24,519 now for several years. 9 00:00:24,519 --> 00:00:26,020 But it's still strong connection 10 00:00:26,020 --> 00:00:28,104 to, to System Science. 11 00:00:28,104 --> 00:00:30,549 And we're looking forward to kind of 12 00:00:30,549 --> 00:00:33,580 a more pragmatic kind of conversation about, 13 00:00:33,580 --> 00:00:35,965 about the kind of work that's becoming 14 00:00:35,965 --> 00:00:38,710 increasingly in demand in 15 00:00:38,710 --> 00:00:40,030 related to data science, 16 00:00:40,030 --> 00:00:40,659 but about a little 17 00:00:40,659 --> 00:00:41,559 different than data science 18 00:00:41,559 --> 00:00:44,020 and I'll let him explain those differences. 19 00:00:44,020 --> 00:00:46,345 Yeah, Perfect. Hey everybody, thanks for 20 00:00:46,345 --> 00:00:48,400 letting me come in for for being here. 21 00:00:48,400 --> 00:00:50,514 When you set it up pretty perfectly. 22 00:00:50,514 --> 00:00:51,520 My name's Guy. 23 00:00:51,520 --> 00:00:52,599 I finished my masters in 24 00:00:52,599 --> 00:00:55,554 system science at the end of 2018. 25 00:00:55,554 --> 00:00:57,580 And system science was one 26 00:00:57,580 --> 00:00:58,585 of the best choices 27 00:00:58,585 --> 00:00:59,679 that I ever made for myself. 28 00:00:59,679 --> 00:01:00,549 I was really happy with that. 29 00:01:00,549 --> 00:01:03,070 When did I was kinda data science focused, 30 00:01:03,070 --> 00:01:05,665 took a lot of AI and machine learning, 31 00:01:05,665 --> 00:01:08,889 stuff like CS department and a lot 32 00:01:08,889 --> 00:01:10,720 of all the system 33 00:01:10,720 --> 00:01:12,310 science core stuff and some extra, 34 00:01:12,310 --> 00:01:13,480 some extra modeling classes. 35 00:01:13,480 --> 00:01:14,949 I was really interested in agent-based 36 00:01:14,949 --> 00:01:16,645 and some of those types of modeling. 37 00:01:16,645 --> 00:01:18,250 And I worked mostly in 38 00:01:18,250 --> 00:01:21,295 the data science area during, 39 00:01:21,295 --> 00:01:23,544 during and after getting my degree 40 00:01:23,544 --> 00:01:26,475 and enjoyed it and had some success with it. 41 00:01:26,475 --> 00:01:28,159 One thing I realized, 42 00:01:28,159 --> 00:01:29,540 the realization that I had coming out of that 43 00:01:29,540 --> 00:01:31,730 was Besides that I enjoy working, 44 00:01:31,730 --> 00:01:32,930 working more on the backend and 45 00:01:32,930 --> 00:01:34,669 kind of building data infrastructure. 46 00:01:34,669 --> 00:01:36,140 That's what were, some 47 00:01:36,140 --> 00:01:37,400 of the biggest need is arising 48 00:01:37,400 --> 00:01:39,020 where with a lot of companies that 49 00:01:39,020 --> 00:01:41,089 are embracing the data science thing. 50 00:01:41,089 --> 00:01:42,949 So yeah, this, this talk is a little 51 00:01:42,949 --> 00:01:44,390 more or much more 52 00:01:44,390 --> 00:01:46,220 pragmatically than academically focused. 53 00:01:46,220 --> 00:01:47,239 I just wanted to talk about 54 00:01:47,239 --> 00:01:48,679 my sort of experience 55 00:01:48,679 --> 00:01:49,969 and give people 56 00:01:49,969 --> 00:01:51,289 some insight into data engineering, 57 00:01:51,289 --> 00:01:53,419 which is a fast growing field. 58 00:01:53,419 --> 00:01:54,379 But one that, that term 59 00:01:54,379 --> 00:01:55,400 hasn't been around as much and 60 00:01:55,400 --> 00:01:56,629 probably received as much attention 61 00:01:56,629 --> 00:01:58,174 as data science. 62 00:01:58,174 --> 00:02:02,209 So yeah, so let me give you a brief overview. 63 00:02:02,209 --> 00:02:04,200 Roadmap, this one. 64 00:02:04,450 --> 00:02:08,089 So I'll talk a little bit about why I'm here. 65 00:02:08,089 --> 00:02:09,890 So philosophical question obviously, 66 00:02:09,890 --> 00:02:11,540 but more practically in 67 00:02:11,540 --> 00:02:12,380 terms of what, 68 00:02:12,380 --> 00:02:13,730 what I'd like to share with you. 69 00:02:13,730 --> 00:02:15,215 A little bit about demand for 70 00:02:15,215 --> 00:02:16,849 data engineers as a role. 71 00:02:16,849 --> 00:02:18,559 And some career statistics 72 00:02:18,559 --> 00:02:19,985 also frame this in terms 73 00:02:19,985 --> 00:02:23,180 of the other more commonly understood roles, 74 00:02:23,180 --> 00:02:25,340 Data Science and Software Development, 75 00:02:25,340 --> 00:02:26,540 that take 76 00:02:26,540 --> 00:02:27,800 a little bit of look at like the skills 77 00:02:27,800 --> 00:02:30,545 and tools that data engineer's use every day. 78 00:02:30,545 --> 00:02:32,510 And a little bit about 79 00:02:32,510 --> 00:02:34,639 learning path for some technical resources 80 00:02:34,639 --> 00:02:36,020 to acquire some of those skills because 81 00:02:36,020 --> 00:02:37,099 the data engineering stack 82 00:02:37,099 --> 00:02:39,020 is one of the bigger ones. 83 00:02:39,020 --> 00:02:40,820 If we have time, I have a little bit of 84 00:02:40,820 --> 00:02:42,619 a demo pipeline application. 85 00:02:42,619 --> 00:02:43,669 Let me see. 86 00:02:43,669 --> 00:02:45,620 I do want to leave some time 87 00:02:45,620 --> 00:02:46,640 at the end for question and 88 00:02:46,640 --> 00:02:47,840 answer and for for 89 00:02:47,840 --> 00:02:50,345 conversation for sure. So let's see. 90 00:02:50,345 --> 00:02:52,729 Slides are running and 91 00:02:52,729 --> 00:02:54,274 pending if we have time, 92 00:02:54,274 --> 00:02:55,400 I'll do this, this demo. 93 00:02:55,400 --> 00:02:56,630 I think I can just do it in a few minutes, 94 00:02:56,630 --> 00:02:59,390 so okay. 95 00:02:59,390 --> 00:03:01,610 So I'm going to just put this 96 00:03:01,610 --> 00:03:02,690 up and then say a little bit 97 00:03:02,690 --> 00:03:03,829 more again about my experience. 98 00:03:03,829 --> 00:03:05,509 So I I worked as 99 00:03:05,509 --> 00:03:07,689 do some work as a data scientist. 100 00:03:07,689 --> 00:03:09,800 My degree was kinda data science 101 00:03:09,800 --> 00:03:11,000 focused when I was when I 102 00:03:11,000 --> 00:03:14,269 was in system science and getting my masters. 103 00:03:14,269 --> 00:03:16,009 I feel like that was starting 104 00:03:16,009 --> 00:03:18,229 in 2015 to 2018. 105 00:03:18,229 --> 00:03:19,040 I feel like that was kind 106 00:03:19,040 --> 00:03:20,239 of peak data science. 107 00:03:20,239 --> 00:03:21,680 There was a lot of attention around 108 00:03:21,680 --> 00:03:23,780 that role and a lot of 109 00:03:23,780 --> 00:03:25,280 headlines about the kind of eye popping 110 00:03:25,280 --> 00:03:28,069 starting salaries for, for data scientists. 111 00:03:28,069 --> 00:03:29,810 In retrospect, I think 112 00:03:29,810 --> 00:03:31,519 a little bit of that was oversold. 113 00:03:31,519 --> 00:03:32,570 I mean, I, I remember seeing 114 00:03:32,570 --> 00:03:34,310 some some projections of 115 00:03:34,310 --> 00:03:35,870 how many millions of data scientists would be 116 00:03:35,870 --> 00:03:38,060 needed by 2021 or 2024. 117 00:03:38,060 --> 00:03:39,980 And while obviously a lot of companies are 118 00:03:39,980 --> 00:03:42,289 embracing data science there, 119 00:03:42,289 --> 00:03:43,460 I think there were a lot of people that kind 120 00:03:43,460 --> 00:03:44,660 of piled into that and had been 121 00:03:44,660 --> 00:03:47,270 having trouble finding full-time work. 122 00:03:47,270 --> 00:03:48,080 I had a little bit of 123 00:03:48,080 --> 00:03:49,610 that experience when I was 124 00:03:49,610 --> 00:03:50,899 applying for data science 125 00:03:50,899 --> 00:03:52,130 full-time data science jobs. 126 00:03:52,130 --> 00:03:53,300 I mean, the the the MIT 127 00:03:53,300 --> 00:03:55,130 the minimum number of applicants on 128 00:03:55,130 --> 00:03:56,840 any for any role when I was applying 129 00:03:56,840 --> 00:03:59,149 for data science jobs is like 200. 130 00:03:59,149 --> 00:04:01,580 So it's definitely a crowded market for that. 131 00:04:01,580 --> 00:04:03,410 And what data engineers, there's 132 00:04:03,410 --> 00:04:04,250 just not as many people 133 00:04:04,250 --> 00:04:05,090 to have the skills yet. 134 00:04:05,090 --> 00:04:06,080 So the supply and demand 135 00:04:06,080 --> 00:04:07,520 relationship is kind of different. 136 00:04:07,520 --> 00:04:09,035 Okay, so let's start with, 137 00:04:09,035 --> 00:04:10,700 this is probably, you know, 138 00:04:10,700 --> 00:04:12,890 a big part of the reason why I'm here is just 139 00:04:12,890 --> 00:04:15,469 to talk about the demand for data engineers. 140 00:04:15,469 --> 00:04:17,270 And also we'll talk a little bit about 141 00:04:17,270 --> 00:04:19,099 why system science in particular, 142 00:04:19,099 --> 00:04:19,790 because I think a lot of 143 00:04:19,790 --> 00:04:21,019 system science people might be 144 00:04:21,019 --> 00:04:22,040 interested in this kind of 145 00:04:22,040 --> 00:04:23,555 role that it might be a good fit. 146 00:04:23,555 --> 00:04:25,384 So as you can see here, 147 00:04:25,384 --> 00:04:26,810 data engineers and this 148 00:04:26,810 --> 00:04:30,019 data is just from either from 2020 or 2021, 149 00:04:30,019 --> 00:04:31,039 but this is very recent and 150 00:04:31,039 --> 00:04:32,344 this is during the pandemic. 151 00:04:32,344 --> 00:04:33,589 You can see here that data 152 00:04:33,589 --> 00:04:35,404 engineers, this is from, 153 00:04:35,404 --> 00:04:38,870 I believe, some dice jobs report data. 154 00:04:38,870 --> 00:04:41,824 And you can see that the data engineer role 155 00:04:41,824 --> 00:04:44,705 is growing very, very quickly. 156 00:04:44,705 --> 00:04:46,069 This has changed a lot in 157 00:04:46,069 --> 00:04:47,120 the last two or three years. 158 00:04:47,120 --> 00:04:48,200 The first time I even saw the 159 00:04:48,200 --> 00:04:49,549 labeled data engineer as 160 00:04:49,549 --> 00:04:51,199 a job title was two or three 161 00:04:51,199 --> 00:04:52,099 years ago and I had 162 00:04:52,099 --> 00:04:53,089 a good idea of what that was, 163 00:04:53,089 --> 00:04:54,394 but not, not quite. 164 00:04:54,394 --> 00:04:56,434 And then that two or three years that 165 00:04:56,434 --> 00:04:57,769 the growth of Data Engineering 166 00:04:57,769 --> 00:04:59,225 has, has been very rapid. 167 00:04:59,225 --> 00:05:02,239 So you can see the growth is right 168 00:05:02,239 --> 00:05:03,395 now is bigger than in 169 00:05:03,395 --> 00:05:05,405 back-end development, data science. 170 00:05:05,405 --> 00:05:06,829 Python developer, even some of 171 00:05:06,829 --> 00:05:09,109 these other in-demand positions. 172 00:05:09,109 --> 00:05:11,494 Here's, here's just some career statistics. 173 00:05:11,494 --> 00:05:13,969 So this this is from 174 00:05:13,969 --> 00:05:15,469 a combination of the Birch works 175 00:05:15,469 --> 00:05:18,630 professional survey and dice jobs report. 176 00:05:18,700 --> 00:05:21,185 81% of companies. 177 00:05:21,185 --> 00:05:23,359 Say they plan to hire more data engineer. 178 00:05:23,359 --> 00:05:24,665 So if you think about the 179 00:05:24,665 --> 00:05:25,790 technical environment that we 180 00:05:25,790 --> 00:05:26,284 live in, I mean, 181 00:05:26,284 --> 00:05:28,010 data companies are dealing 182 00:05:28,010 --> 00:05:30,590 with ever increasing volumes of data. 183 00:05:30,590 --> 00:05:32,089 What counted as big data 184 00:05:32,089 --> 00:05:33,139 a few years ago now has 185 00:05:33,139 --> 00:05:34,820 just become very common. 186 00:05:34,820 --> 00:05:37,819 We're well beyond terabytes and petabytes, 187 00:05:37,819 --> 00:05:39,530 an exabyte scale data at this point. 188 00:05:39,530 --> 00:05:41,119 So this 189 00:05:41,119 --> 00:05:42,994 is a big challenge for a lot of companies. 190 00:05:42,994 --> 00:05:44,989 A lot of companies have data and there's, 191 00:05:44,989 --> 00:05:46,250 there's still trying to figure out the best 192 00:05:46,250 --> 00:05:48,140 way to leverage that. 193 00:05:48,140 --> 00:05:51,620 And so there's, there's a lot of 194 00:05:51,620 --> 00:05:52,969 demand for data engineers right 195 00:05:52,969 --> 00:05:54,679 now to build the kind of the, 196 00:05:54,679 --> 00:05:57,500 the infrastructure for processing data 197 00:05:57,500 --> 00:05:58,519 and for delivering that to 198 00:05:58,519 --> 00:05:59,645 analysts and data scientists. 199 00:05:59,645 --> 00:06:00,800 So I'm going to get a little bit more in 200 00:06:00,800 --> 00:06:02,210 detail about those relationships 201 00:06:02,210 --> 00:06:02,689 here in a minute. 202 00:06:02,689 --> 00:06:04,549 But so yeah, 203 00:06:04,549 --> 00:06:05,599 there's a lot of companies are 204 00:06:05,599 --> 00:06:07,760 hiring for data engineers. 205 00:06:07,760 --> 00:06:09,079 If you go on. 206 00:06:09,079 --> 00:06:13,700 Indeed, there's a lot of active jobs. 207 00:06:13,700 --> 00:06:16,864 You know, it's it's so lucrative field. 208 00:06:16,864 --> 00:06:17,960 I mean, that's just goes 209 00:06:17,960 --> 00:06:19,550 with supply and demand. 210 00:06:19,550 --> 00:06:20,674 You know, there's, 211 00:06:20,674 --> 00:06:22,940 there's a lot of interest right now, 212 00:06:22,940 --> 00:06:24,574 a lot of companies hiring and of course it's 213 00:06:24,574 --> 00:06:26,150 a tighter labor market having to 214 00:06:26,150 --> 00:06:27,709 do with pandemic and some other reasons. 215 00:06:27,709 --> 00:06:30,169 So, you know, it's I mean, 216 00:06:30,169 --> 00:06:31,129 I feel like workers have 217 00:06:31,129 --> 00:06:32,449 gotten some of the leverage back 218 00:06:32,449 --> 00:06:33,799 and in recent the 219 00:06:33,799 --> 00:06:35,105 last year, in recent months. 220 00:06:35,105 --> 00:06:36,500 And combined with that, 221 00:06:36,500 --> 00:06:39,050 the demand for data engineers, 222 00:06:39,050 --> 00:06:40,249 it's, you know, 223 00:06:40,249 --> 00:06:41,704 it's a good time to be in the field. 224 00:06:41,704 --> 00:06:43,459 And you can see again that 50% 225 00:06:43,459 --> 00:06:45,469 growth in one-year number. 226 00:06:45,469 --> 00:06:48,049 Another thing that's appealing 227 00:06:48,049 --> 00:06:48,889 to a lot people but 228 00:06:48,889 --> 00:06:50,464 about the role is that 229 00:06:50,464 --> 00:06:52,340 even more than a lot of other tech roles, 230 00:06:52,340 --> 00:06:54,020 most of the DE jobs, I mean, 231 00:06:54,020 --> 00:06:55,550 almost all the jobs that I see 232 00:06:55,550 --> 00:06:57,335 posted are fully remote. 233 00:06:57,335 --> 00:06:58,669 And there's some, 234 00:06:58,669 --> 00:06:59,719 some companies say that 235 00:06:59,719 --> 00:07:00,709 once the pandemic is over, 236 00:07:00,709 --> 00:07:01,759 they're going to want people to 237 00:07:01,759 --> 00:07:02,840 transition back to the office. 238 00:07:02,840 --> 00:07:03,980 But a lot of these jobs, 239 00:07:03,980 --> 00:07:05,899 they state up front, this is fully remote. 240 00:07:05,899 --> 00:07:07,475 We know we know we don't plant ever. 241 00:07:07,475 --> 00:07:08,600 Have you come back to the office. 242 00:07:08,600 --> 00:07:10,024 Some of that is again, a reflection 243 00:07:10,024 --> 00:07:11,750 of demand for data engineers and some 244 00:07:11,750 --> 00:07:14,194 of that is the nature of the job because 245 00:07:14,194 --> 00:07:16,699 all the all the all the day-to-day skills 246 00:07:16,699 --> 00:07:17,749 did you use are things 247 00:07:17,749 --> 00:07:19,550 that can be done remotely 248 00:07:19,550 --> 00:07:22,144 through Cloud tools or whatever else. 249 00:07:22,144 --> 00:07:24,440 And there's not, while you interface with 250 00:07:24,440 --> 00:07:26,420 a lot of people as a data engineer, 251 00:07:26,420 --> 00:07:27,740 that there's not as much and forward 252 00:07:27,740 --> 00:07:28,879 facing like customer 253 00:07:28,879 --> 00:07:31,565 facing or public-facing role to it. 254 00:07:31,565 --> 00:07:33,784 So it's very compatible with, 255 00:07:33,784 --> 00:07:35,795 with a remote setup. 256 00:07:35,795 --> 00:07:37,070 And then you can see 257 00:07:37,070 --> 00:07:38,330 the last thing on this slide here. 258 00:07:38,330 --> 00:07:39,320 As I've said, demand 259 00:07:39,320 --> 00:07:40,430 is high and supply is low, 260 00:07:40,430 --> 00:07:43,010 which translates to make it easier 261 00:07:43,010 --> 00:07:46,054 to get a job and also to ticket pay. 262 00:07:46,054 --> 00:07:47,824 Part of the, part of that, 263 00:07:47,824 --> 00:07:49,070 part of that difference between 264 00:07:49,070 --> 00:07:50,495 demand and supply though, 265 00:07:50,495 --> 00:07:52,009 reflects a gap in skills. 266 00:07:52,009 --> 00:07:52,999 And that's a big part of 267 00:07:52,999 --> 00:07:54,019 what I want to talk about today, 268 00:07:54,019 --> 00:07:55,699 which is that data engineer 269 00:07:55,699 --> 00:07:57,395 as a role has only been defined, 270 00:07:57,395 --> 00:08:00,155 kind of well-defined more recently. 271 00:08:00,155 --> 00:08:02,330 So there's, there's fewer people that have, 272 00:08:02,330 --> 00:08:03,830 that have the skills. 273 00:08:03,830 --> 00:08:05,359 Partly because it's just a new role 274 00:08:05,359 --> 00:08:06,440 and partly because there 275 00:08:06,440 --> 00:08:07,699 are a number of different 276 00:08:07,699 --> 00:08:09,409 skills involved with it. 277 00:08:09,409 --> 00:08:11,090 So part of what I want to talk about 278 00:08:11,090 --> 00:08:13,730 is sort of a short path to getting some of 279 00:08:13,730 --> 00:08:15,769 those skills and what the tools are 280 00:08:15,769 --> 00:08:18,725 that employers are most interested in. 281 00:08:18,725 --> 00:08:21,079 Okay, so let's start to talk about, 282 00:08:21,079 --> 00:08:22,564 I'd like to frame this in terms of 283 00:08:22,564 --> 00:08:24,380 data engineer's versus other roles. 284 00:08:24,380 --> 00:08:28,189 I think a lot of system science people have, 285 00:08:28,189 --> 00:08:29,330 even if it, even if you're 286 00:08:29,330 --> 00:08:30,829 not doing a lot of data science, 287 00:08:30,829 --> 00:08:32,120 I think most systems science people are 288 00:08:32,120 --> 00:08:34,459 somewhat familiar with data, 289 00:08:34,459 --> 00:08:37,849 data science and with working with data. 290 00:08:37,849 --> 00:08:39,020 I'd like to just talk 291 00:08:39,020 --> 00:08:40,054 about this kind of in the, 292 00:08:40,054 --> 00:08:41,390 in terms of an example 293 00:08:41,390 --> 00:08:42,559 that I think everybody can relate to you, 294 00:08:42,559 --> 00:08:44,465 which is right here. 295 00:08:44,465 --> 00:08:46,070 So if you think about what's going 296 00:08:46,070 --> 00:08:47,855 on when you take a ride, 297 00:08:47,855 --> 00:08:49,294 the rideshare companies are 298 00:08:49,294 --> 00:08:50,300 some of the biggest, 299 00:08:50,300 --> 00:08:52,280 the biggest users of cloud infrastructure, 300 00:08:52,280 --> 00:08:53,869 and then they have a lot of data. 301 00:08:53,869 --> 00:08:54,665 So if you think about what's 302 00:08:54,665 --> 00:08:57,559 going on with rideshare, 303 00:08:57,559 --> 00:08:59,480 when you whenever you take a ride, 304 00:08:59,480 --> 00:08:59,929 there's there's 305 00:08:59,929 --> 00:09:01,190 multiple different kinds of data. 306 00:09:01,190 --> 00:09:02,989 There's geographical data which is coming 307 00:09:02,989 --> 00:09:05,914 in at a high velocity and small chunks. 308 00:09:05,914 --> 00:09:08,014 There's transaction data, 309 00:09:08,014 --> 00:09:10,234 there's writer and driver data. 310 00:09:10,234 --> 00:09:12,260 So that data is going to 311 00:09:12,260 --> 00:09:14,330 come into some data environment, 312 00:09:14,330 --> 00:09:15,680 probably in multiple sources. 313 00:09:15,680 --> 00:09:17,525 There's probably some to 314 00:09:17,525 --> 00:09:20,210 some standard relational SQL databases 315 00:09:20,210 --> 00:09:21,695 and some NoSQL databases. 316 00:09:21,695 --> 00:09:23,599 There might be other sources of data. 317 00:09:23,599 --> 00:09:25,220 Those things are ingested 318 00:09:25,220 --> 00:09:26,584 in some kind of raw form, 319 00:09:26,584 --> 00:09:28,939 then they're probably going to be processed 320 00:09:28,939 --> 00:09:32,119 and some other data extracted to get that 321 00:09:32,119 --> 00:09:34,550 more in a form that analysts and 322 00:09:34,550 --> 00:09:38,224 data scientists consume on the front end. 323 00:09:38,224 --> 00:09:39,830 And so that middle part that kind of 324 00:09:39,830 --> 00:09:42,184 connects the application development and 325 00:09:42,184 --> 00:09:43,669 the data sources and 326 00:09:43,669 --> 00:09:46,250 then builds that infrastructure to 327 00:09:46,250 --> 00:09:47,360 deliver it on the front 328 00:09:47,360 --> 00:09:48,410 end to data scientists and 329 00:09:48,410 --> 00:09:49,505 data analysts is where 330 00:09:49,505 --> 00:09:50,960 we're data engineers come in. 331 00:09:50,960 --> 00:09:53,360 And I think a big part of the role of 332 00:09:53,360 --> 00:09:55,670 the emergence of the data engineer role in 333 00:09:55,670 --> 00:09:57,530 the last few years has been that as 334 00:09:57,530 --> 00:10:00,485 companies have heard about data science, 335 00:10:00,485 --> 00:10:02,479 all the potential benefits and how, 336 00:10:02,479 --> 00:10:03,349 you know how important it 337 00:10:03,349 --> 00:10:04,460 is to leverage that. 338 00:10:04,460 --> 00:10:05,989 Whether it's, whether it's 339 00:10:05,989 --> 00:10:07,144 machine learning, strictly speaking, 340 00:10:07,144 --> 00:10:08,240 or at least having 341 00:10:08,240 --> 00:10:10,460 strong analytics and your organization. 342 00:10:10,460 --> 00:10:12,829 A lot of organizations have, 343 00:10:12,829 --> 00:10:16,070 have been trying to embrace 344 00:10:16,070 --> 00:10:19,025 data science and leveraging their data. 345 00:10:19,025 --> 00:10:20,569 But as they've done that and they're dealing 346 00:10:20,569 --> 00:10:22,265 with these larger volumes of data, 347 00:10:22,265 --> 00:10:25,549 they're finding that there's a lot of 348 00:10:25,549 --> 00:10:28,189 data infrastructure that's needed 349 00:10:28,189 --> 00:10:29,540 to make data available for 350 00:10:29,540 --> 00:10:31,310 data scientists and data analysts. 351 00:10:31,310 --> 00:10:33,980 It's my experience studying 352 00:10:33,980 --> 00:10:35,779 data science and I think a lot 353 00:10:35,779 --> 00:10:37,980 of people had this experience is like you, 354 00:10:37,980 --> 00:10:39,380 you, you learn to build 355 00:10:39,380 --> 00:10:41,630 a model in, maybe like do it. 356 00:10:41,630 --> 00:10:41,839 You're, 357 00:10:41,839 --> 00:10:42,739 you're writing some Python 358 00:10:42,739 --> 00:10:43,745 in a Jupyter notebook 359 00:10:43,745 --> 00:10:47,209 and your reading data from a CSV file. 360 00:10:47,209 --> 00:10:50,359 That's a great way to learn data science. 361 00:10:50,359 --> 00:10:51,649 But if you think about 362 00:10:51,649 --> 00:10:52,835 the issues involved with, 363 00:10:52,835 --> 00:10:53,869 let's say you identify a 364 00:10:53,869 --> 00:10:55,594 really good model that 365 00:10:55,594 --> 00:10:58,924 has great predictive accuracy, accuracy. 366 00:10:58,924 --> 00:11:00,904 But then you're going to say, how do we 367 00:11:00,904 --> 00:11:02,150 scale this up to be 368 00:11:02,150 --> 00:11:03,319 able to deliver it to many, 369 00:11:03,319 --> 00:11:04,625 many more users and 370 00:11:04,625 --> 00:11:07,399 operationalize that when data is coming 371 00:11:07,399 --> 00:11:09,830 in in real time and we have to frequently 372 00:11:09,830 --> 00:11:12,889 retrained models on new data that, 373 00:11:12,889 --> 00:11:14,780 that, that set of questions is 374 00:11:14,780 --> 00:11:16,670 very closely related to data science. 375 00:11:16,670 --> 00:11:18,784 But there's also a separate set of concerns, 376 00:11:18,784 --> 00:11:20,240 hmm, and that's again where 377 00:11:20,240 --> 00:11:22,399 data engineers come in. 378 00:11:22,399 --> 00:11:24,919 This, this breaks it down a little bit more 379 00:11:24,919 --> 00:11:27,770 into kind of the difference, 380 00:11:27,770 --> 00:11:29,180 sort of concerns of 381 00:11:29,180 --> 00:11:30,425 different roles. Oops, sorry. 382 00:11:30,425 --> 00:11:34,069 So data engineer is the bottom 383 00:11:34,069 --> 00:11:36,830 for the bottom four slots on 384 00:11:36,830 --> 00:11:38,239 this pyramid in the data engineering 385 00:11:38,239 --> 00:11:40,219 and data scientists as the top to see, 386 00:11:40,219 --> 00:11:41,390 you can see the data engineering rule 387 00:11:41,390 --> 00:11:43,295 includes on the very bottom, 388 00:11:43,295 --> 00:11:45,320 instrumentation and logging things 389 00:11:45,320 --> 00:11:47,569 that typically in the past were more 390 00:11:47,569 --> 00:11:48,619 of the domain of 391 00:11:48,619 --> 00:11:52,595 like an administrator, systems administrator. 392 00:11:52,595 --> 00:11:54,080 And now especially with the 393 00:11:54,080 --> 00:11:55,250 rise of Cloud technology, 394 00:11:55,250 --> 00:11:56,749 those those have become 395 00:11:56,749 --> 00:11:58,249 as the technology has gotten, 396 00:11:58,249 --> 00:11:59,929 were available to everyone 397 00:11:59,929 --> 00:12:01,805 and it's easier to build the tools. 398 00:12:01,805 --> 00:12:03,979 A lot of those monitoring tools are 399 00:12:03,979 --> 00:12:05,929 more easily available and a big part 400 00:12:05,929 --> 00:12:07,249 of the data engineering jobs is 401 00:12:07,249 --> 00:12:10,264 monitoring the diploid data processing. 402 00:12:10,264 --> 00:12:11,779 Says There's collection, 403 00:12:11,779 --> 00:12:12,964 There's move and store. 404 00:12:12,964 --> 00:12:15,019 Data infrastructure data pipelines, ETL, 405 00:12:15,019 --> 00:12:17,495 which stands for extract, transform and load. 406 00:12:17,495 --> 00:12:19,429 Choice of data processing systems 407 00:12:19,429 --> 00:12:20,329 structured or unstructured, 408 00:12:20,329 --> 00:12:21,830 structured or unstructured. 409 00:12:21,830 --> 00:12:23,659 Then there's cleaning and anomaly 410 00:12:23,659 --> 00:12:25,790 detection data prep, data exploration. 411 00:12:25,790 --> 00:12:27,364 This is something that data scientists, 412 00:12:27,364 --> 00:12:28,669 data engineers both do 413 00:12:28,669 --> 00:12:30,230 and there's a lot of crossover here. 414 00:12:30,230 --> 00:12:31,609 Sometimes the focus is a little 415 00:12:31,609 --> 00:12:33,019 bit different in the cleaning process, 416 00:12:33,019 --> 00:12:34,879 but that's another thing. 417 00:12:34,879 --> 00:12:35,689 A lot of times when you do 418 00:12:35,689 --> 00:12:37,130 a data science exercise, 419 00:12:37,130 --> 00:12:39,650 you start with relatively clean data. 420 00:12:39,650 --> 00:12:42,484 And a lot of times it's the data. 421 00:12:42,484 --> 00:12:43,579 The data becomes clean 422 00:12:43,579 --> 00:12:45,169 because a data engineer has done 423 00:12:45,169 --> 00:12:46,175 a lot of prep work and 424 00:12:46,175 --> 00:12:48,515 transformation to get it to that point. 425 00:12:48,515 --> 00:12:51,019 And then the top thing that the dangers 426 00:12:51,019 --> 00:12:53,000 do that is metrics. 427 00:12:53,000 --> 00:12:54,560 Segmenting this, this is again 428 00:12:54,560 --> 00:12:56,360 closer to what data scientists do in 429 00:12:56,360 --> 00:12:58,129 terms of like putting 430 00:12:58,129 --> 00:12:59,464 data in a structure that's going to 431 00:12:59,464 --> 00:13:03,185 be easily usable and consumable by data, 432 00:13:03,185 --> 00:13:05,599 science models, analytics models. 433 00:13:05,599 --> 00:13:07,760 And then you can see at the top is the, 434 00:13:07,760 --> 00:13:09,049 the things that we usually 435 00:13:09,049 --> 00:13:10,939 associate associated with data science. 436 00:13:10,939 --> 00:13:12,200 So model testing, 437 00:13:12,200 --> 00:13:14,929 experimentation, algorithms. 438 00:13:14,929 --> 00:13:17,074 And then at the top, you know, 439 00:13:17,074 --> 00:13:18,439 deep learning and then 440 00:13:18,439 --> 00:13:19,805 more sophisticated types of models. 441 00:13:19,805 --> 00:13:20,930 So you can see from this 442 00:13:20,930 --> 00:13:22,310 that one of the thing, 443 00:13:22,310 --> 00:13:23,719 one of the things I enjoy about being 444 00:13:23,719 --> 00:13:25,580 a data engineer is that it 445 00:13:25,580 --> 00:13:27,560 is having a good data science background 446 00:13:27,560 --> 00:13:28,909 is really helpful with that and need. 447 00:13:28,909 --> 00:13:29,929 And you do spend a lot of 448 00:13:29,929 --> 00:13:32,420 time interfacing with data scientists and 449 00:13:32,420 --> 00:13:33,860 being able to have a conversation is 450 00:13:33,860 --> 00:13:37,084 about that is very helpful for the role. 451 00:13:37,084 --> 00:13:38,450 I've just always been 452 00:13:38,450 --> 00:13:39,589 more of a back in person. 453 00:13:39,589 --> 00:13:40,280 I like working with 454 00:13:40,280 --> 00:13:41,539 databases and building things 455 00:13:41,539 --> 00:13:42,829 so that data engineer roles 456 00:13:42,829 --> 00:13:44,150 a little bit better fit for me. 457 00:13:44,150 --> 00:13:46,340 But it's, it's, having 458 00:13:46,340 --> 00:13:48,005 some data science background 459 00:13:48,005 --> 00:13:50,224 has been really useful for that. 460 00:13:50,224 --> 00:13:51,679 And those conversations can 461 00:13:51,679 --> 00:13:52,909 be really fun and interesting because 462 00:13:52,909 --> 00:13:54,575 you might not necessarily be developing 463 00:13:54,575 --> 00:13:56,134 the models are doing all that, 464 00:13:56,134 --> 00:13:57,500 the testing, but then questions 465 00:13:57,500 --> 00:13:59,690 about model selection and then how to, 466 00:13:59,690 --> 00:14:01,159 how to deploy those and how to scale. 467 00:14:01,159 --> 00:14:04,279 Those are very closely related to what 468 00:14:04,279 --> 00:14:06,050 data scientists do and those are 469 00:14:06,050 --> 00:14:07,519 interesting and challenging 470 00:14:07,519 --> 00:14:09,570 problems to work on. 471 00:14:10,060 --> 00:14:11,629 Okay? 472 00:14:11,629 --> 00:14:13,415 This is another good way. This is 473 00:14:13,415 --> 00:14:14,599 a little bit more in terms 474 00:14:14,599 --> 00:14:17,829 of specific spot Skills. 475 00:14:17,829 --> 00:14:19,519 This, I like this graphic because 476 00:14:19,519 --> 00:14:21,920 it relates software engineering, 477 00:14:21,920 --> 00:14:23,509 data engineering, and data science. 478 00:14:23,509 --> 00:14:25,969 And you can see kind of the different labs. 479 00:14:25,969 --> 00:14:27,499 So you can see that 480 00:14:27,499 --> 00:14:29,540 there's data engineers as, 481 00:14:29,540 --> 00:14:30,380 as we already kind of 482 00:14:30,380 --> 00:14:31,369 discussed in the right Sure. 483 00:14:31,369 --> 00:14:33,529 Example data engineer's definitely 484 00:14:33,529 --> 00:14:35,990 sort of fall between these two other roles. 485 00:14:35,990 --> 00:14:37,489 That to data engineering definitely 486 00:14:37,489 --> 00:14:39,620 has a strong component that 487 00:14:39,620 --> 00:14:40,279 looks a lot more 488 00:14:40,279 --> 00:14:41,510 like software engineering and 489 00:14:41,510 --> 00:14:42,619 a strong component that looks a 490 00:14:42,619 --> 00:14:44,150 lot more like data science. 491 00:14:44,150 --> 00:14:46,189 And you're definitely kind of the interface 492 00:14:46,189 --> 00:14:48,935 between, between those two. 493 00:14:48,935 --> 00:14:51,860 So you can see that the connection with 494 00:14:51,860 --> 00:14:53,449 software engineers obviously is more 495 00:14:53,449 --> 00:14:55,670 like the databases and administration. 496 00:14:55,670 --> 00:14:57,410 Linux, Java, javascript 497 00:14:57,410 --> 00:14:59,674 programming languages. In general. 498 00:14:59,674 --> 00:15:01,760 The connection with data scientist is 499 00:15:01,760 --> 00:15:04,805 more the business intelligence 500 00:15:04,805 --> 00:15:06,980 analysis side of things. 501 00:15:06,980 --> 00:15:10,400 Python, I would say is a crossover between, 502 00:15:10,400 --> 00:15:11,719 between both because it's a language, 503 00:15:11,719 --> 00:15:13,370 but definitely Python is 504 00:15:13,370 --> 00:15:14,599 the language of data scientists, 505 00:15:14,599 --> 00:15:16,024 data engineers. 506 00:15:16,024 --> 00:15:17,539 Big data. 507 00:15:17,539 --> 00:15:18,859 This is, I don't even 508 00:15:18,859 --> 00:15:19,939 use the term big data so much 509 00:15:19,939 --> 00:15:22,205 anymore because I feel like even, 510 00:15:22,205 --> 00:15:24,710 even companies that are dealing with, I mean, 511 00:15:24,710 --> 00:15:25,820 most companies are dealing with 512 00:15:25,820 --> 00:15:27,439 huge volumes of data now so that a lot of 513 00:15:27,439 --> 00:15:29,119 the tools like the Hadoop ecosystem 514 00:15:29,119 --> 00:15:30,319 and spark that most that 515 00:15:30,319 --> 00:15:32,270 were very advanced a few years ago 516 00:15:32,270 --> 00:15:33,769 that were not applicable to most companies. 517 00:15:33,769 --> 00:15:36,019 Now, most companies have 518 00:15:36,019 --> 00:15:37,699 enough data to use those. 519 00:15:37,699 --> 00:15:40,340 So this, this I think helps kind of relate. 520 00:15:40,340 --> 00:15:42,724 I think more people are more familiar. 521 00:15:42,724 --> 00:15:44,269 A lot of people are more familiar with like 522 00:15:44,269 --> 00:15:45,080 a software development 523 00:15:45,080 --> 00:15:46,595 or software engineer role, 524 00:15:46,595 --> 00:15:47,900 particularly in system science. 525 00:15:47,900 --> 00:15:48,589 Now I think people have 526 00:15:48,589 --> 00:15:49,399 more of an understanding 527 00:15:49,399 --> 00:15:50,509 kind of what data scientists do. 528 00:15:50,509 --> 00:15:51,980 But I like this graphic because it kind of 529 00:15:51,980 --> 00:15:54,929 helps relate all three of those things. 530 00:15:55,000 --> 00:15:56,480 Okay. 531 00:15:56,480 --> 00:15:58,400 Is, this is some material, 532 00:15:58,400 --> 00:15:59,735 this is a little bit more, 533 00:15:59,735 --> 00:16:01,114 a little bit different way to look at like 534 00:16:01,114 --> 00:16:03,049 what DEs actually do day to day. 535 00:16:03,049 --> 00:16:04,609 This is actually from the Google 536 00:16:04,609 --> 00:16:06,860 Professional Data Engineering Certification. 537 00:16:06,860 --> 00:16:07,744 And I'll talk more about 538 00:16:07,744 --> 00:16:08,930 certifications in a little while. 539 00:16:08,930 --> 00:16:11,179 We'll talk about kind of blurry pass. 540 00:16:11,179 --> 00:16:12,745 And I list these. 541 00:16:12,745 --> 00:16:13,429 We're just, we're not going to 542 00:16:13,429 --> 00:16:14,209 get into detail on these. 543 00:16:14,209 --> 00:16:15,560 I list these just as a, just 544 00:16:15,560 --> 00:16:17,044 as an idea of the range, 545 00:16:17,044 --> 00:16:18,200 the wide range of things 546 00:16:18,200 --> 00:16:20,135 that database engineers are responsible for. 547 00:16:20,135 --> 00:16:22,114 Selecting storage technologies, 548 00:16:22,114 --> 00:16:24,215 operationalizing storage systems, 549 00:16:24,215 --> 00:16:26,585 designing data pipelines and infrastructure, 550 00:16:26,585 --> 00:16:28,249 designing for security compliance, 551 00:16:28,249 --> 00:16:28,850 compliance. 552 00:16:28,850 --> 00:16:30,020 This is a big one. 553 00:16:30,020 --> 00:16:31,999 If you, if you're dealing with data, there's, 554 00:16:31,999 --> 00:16:32,930 there's almost always going 555 00:16:32,930 --> 00:16:33,409 to be some kind of 556 00:16:33,409 --> 00:16:35,284 regulatory issues and understanding that. 557 00:16:35,284 --> 00:16:36,049 That's another thing. 558 00:16:36,049 --> 00:16:36,394 Data, 559 00:16:36,394 --> 00:16:38,240 data engineers need to have some awareness 560 00:16:38,240 --> 00:16:40,729 of designing for reliability, 561 00:16:40,729 --> 00:16:42,619 scalability, and availability particularly. 562 00:16:42,619 --> 00:16:44,840 And this is a big part 563 00:16:44,840 --> 00:16:46,819 of the benefit of Cloud tools, 564 00:16:46,819 --> 00:16:48,680 but it's building 565 00:16:48,680 --> 00:16:50,269 robust and scalable 566 00:16:50,269 --> 00:16:51,844 applications is really important. 567 00:16:51,844 --> 00:16:54,184 Migrations and integrations is another, 568 00:16:54,184 --> 00:16:56,300 another big area for, for data. 569 00:16:56,300 --> 00:16:57,800 Engineer says we interface with 570 00:16:57,800 --> 00:16:59,780 a lot of different people in an organization. 571 00:16:59,780 --> 00:17:00,649 And so you think 572 00:17:00,649 --> 00:17:02,179 about the tools that companies are 573 00:17:02,179 --> 00:17:03,199 using now they probably 574 00:17:03,199 --> 00:17:04,790 have their data warehouse, 575 00:17:04,790 --> 00:17:05,900 but then there's probably 576 00:17:05,900 --> 00:17:07,910 customer relations management software. 577 00:17:07,910 --> 00:17:09,050 There's, you know, there's 578 00:17:09,050 --> 00:17:10,640 multiple different applications 579 00:17:10,640 --> 00:17:11,000 that are being 580 00:17:11,000 --> 00:17:12,634 used throughout the organization 581 00:17:12,634 --> 00:17:14,719 that all need data. 582 00:17:14,719 --> 00:17:17,600 From that. A lot of times you to share data. 583 00:17:17,600 --> 00:17:19,189 And so being able to have data 584 00:17:19,189 --> 00:17:21,199 centralize and 585 00:17:21,199 --> 00:17:22,610 available to different applications, 586 00:17:22,610 --> 00:17:23,749 but also secure and 587 00:17:23,749 --> 00:17:25,054 have proper access controls. 588 00:17:25,054 --> 00:17:26,869 That's a big part of the role. 589 00:17:26,869 --> 00:17:29,179 Here. See, here you can see some more of 590 00:17:29,179 --> 00:17:30,680 the crossover 591 00:17:30,680 --> 00:17:32,540 between DEs and data scientists. 592 00:17:32,540 --> 00:17:35,390 So deploying machine-learning pipelines, 593 00:17:35,390 --> 00:17:37,009 choosing the infrastructure for 594 00:17:37,009 --> 00:17:38,029 that and then measuring, 595 00:17:38,029 --> 00:17:38,659 monitoring and 596 00:17:38,659 --> 00:17:40,069 troubleshooting machine learning model 597 00:17:40,069 --> 00:17:41,780 sees these things all kind of go together. 598 00:17:41,780 --> 00:17:43,609 And as I said that data 599 00:17:43,609 --> 00:17:44,839 engineers are focused more 600 00:17:44,839 --> 00:17:48,364 on the deployment and the specific choice of, 601 00:17:48,364 --> 00:17:50,779 of, of how which tools to use to do that and 602 00:17:50,779 --> 00:17:51,889 also scaling that up to 603 00:17:51,889 --> 00:17:53,764 larger number of users. 604 00:17:53,764 --> 00:17:55,279 So this is again 605 00:17:55,279 --> 00:17:56,239 an area where there's a lot of 606 00:17:56,239 --> 00:17:58,940 crossover between the two roles. 607 00:17:58,940 --> 00:18:00,590 And then the last thing on this list, 608 00:18:00,590 --> 00:18:01,774 leveraging pre-built models. 609 00:18:01,774 --> 00:18:03,709 It's a service, this is a good thing. 610 00:18:03,709 --> 00:18:05,600 Something that's really important now 611 00:18:05,600 --> 00:18:07,430 in the cloud computing environment, 612 00:18:07,430 --> 00:18:08,779 cloud computing has just become 613 00:18:08,779 --> 00:18:09,980 sort of the dominant paradigm 614 00:18:09,980 --> 00:18:10,865 in recent years. 615 00:18:10,865 --> 00:18:12,709 And this is another reason why IDEs, 616 00:18:12,709 --> 00:18:13,129 there's a lot of 617 00:18:13,129 --> 00:18:14,269 good opportunities because a lot of 618 00:18:14,269 --> 00:18:16,415 companies are trying to, even, 619 00:18:16,415 --> 00:18:17,944 even if they've started to 620 00:18:17,944 --> 00:18:19,700 have mostly made the move to the Cloud, 621 00:18:19,700 --> 00:18:21,200 a lot of companies are still trying to best 622 00:18:21,200 --> 00:18:25,190 understand that all the available tools, 623 00:18:25,190 --> 00:18:27,710 how to make that purchase talk to each other. 624 00:18:27,710 --> 00:18:29,660 And, you know, just, just to take 625 00:18:29,660 --> 00:18:31,040 the Google Cloud example, I mean, 626 00:18:31,040 --> 00:18:37,414 they have built-in auto, 627 00:18:37,414 --> 00:18:39,064 auto machine learning. 628 00:18:39,064 --> 00:18:41,330 The different APIs, the vision 629 00:18:41,330 --> 00:18:43,399 and Video Intelligence API is. 630 00:18:43,399 --> 00:18:44,119 I mean, there's a lot 631 00:18:44,119 --> 00:18:46,700 of the AI machine learning 632 00:18:46,700 --> 00:18:48,770 has come really far in the last few years. 633 00:18:48,770 --> 00:18:49,189 There's a lot of 634 00:18:49,189 --> 00:18:50,749 pre-built tools so that you don't have to 635 00:18:50,749 --> 00:18:52,009 constantly build and train 636 00:18:52,009 --> 00:18:53,615 your own models by hand. 637 00:18:53,615 --> 00:18:55,789 And so those make the data science 638 00:18:55,789 --> 00:18:57,845 more accessible than at the same time. 639 00:18:57,845 --> 00:19:00,109 It's more to know in terms of what tools are 640 00:19:00,109 --> 00:19:01,400 available out there in this kind 641 00:19:01,400 --> 00:19:03,305 of complicated tech landscape. 642 00:19:03,305 --> 00:19:05,299 So anyway, this gives you an idea of some of 643 00:19:05,299 --> 00:19:08,600 the major themes for data engineers. 644 00:19:08,600 --> 00:19:10,189 This is, I'm going 645 00:19:10,189 --> 00:19:11,209 to just go through this quickly. 646 00:19:11,209 --> 00:19:12,200 This just gives you a good idea. 647 00:19:12,200 --> 00:19:13,100 This is going to segue 648 00:19:13,100 --> 00:19:15,619 into the Y system science. 649 00:19:15,619 --> 00:19:16,610 People in particular might 650 00:19:16,610 --> 00:19:18,169 be, might be uncertain. 651 00:19:18,169 --> 00:19:20,389 Data engineers, I like this because it shows 652 00:19:20,389 --> 00:19:21,739 just a lot of 653 00:19:21,739 --> 00:19:23,555 the different areas that you have to be, 654 00:19:23,555 --> 00:19:25,790 that you have to be, have some, 655 00:19:25,790 --> 00:19:27,260 some proficiency in 656 00:19:27,260 --> 00:19:29,000 some foundational tech things, 657 00:19:29,000 --> 00:19:30,469 deployment environments, 658 00:19:30,469 --> 00:19:32,675 organization specific meaning, 659 00:19:32,675 --> 00:19:34,010 some, some knowledge, some 660 00:19:34,010 --> 00:19:36,035 domain knowledge about different businesses. 661 00:19:36,035 --> 00:19:40,849 As I said, governance and regulation, 662 00:19:40,849 --> 00:19:42,154 legal environment, 663 00:19:42,154 --> 00:19:44,780 interpersonal is really important. 664 00:19:44,780 --> 00:19:46,910 Again, data engineer's interface 665 00:19:46,910 --> 00:19:48,005 with a lot of different people. 666 00:19:48,005 --> 00:19:50,810 So being able to speak multiple languages, 667 00:19:50,810 --> 00:19:53,375 some very technical, some not so technical. 668 00:19:53,375 --> 00:19:54,980 Decision-makers and managers 669 00:19:54,980 --> 00:19:56,840 is really, really important. 670 00:19:56,840 --> 00:19:59,330 Being creative, being collaborative, 671 00:19:59,330 --> 00:20:00,829 having some experience more on 672 00:20:00,829 --> 00:20:02,690 the development side of things which is very 673 00:20:02,690 --> 00:20:04,039 closely related to just 674 00:20:04,039 --> 00:20:06,349 the software engineering. 675 00:20:06,349 --> 00:20:08,089 And then obviously these emerging things, 676 00:20:08,089 --> 00:20:09,109 machine learning and AI. 677 00:20:09,109 --> 00:20:10,339 And then we've already talked about that one, 678 00:20:10,339 --> 00:20:11,360 a little more obvious, but then 679 00:20:11,360 --> 00:20:12,679 streaming and real-time data, 680 00:20:12,679 --> 00:20:15,244 which increasingly a huge, 681 00:20:15,244 --> 00:20:17,749 a major domain, particularly 682 00:20:17,749 --> 00:20:19,160 for data engineers. 683 00:20:19,160 --> 00:20:20,840 Because when you have 684 00:20:20,840 --> 00:20:21,920 an application like right here, 685 00:20:21,920 --> 00:20:23,599 we have large amounts of 686 00:20:23,599 --> 00:20:25,835 data streaming in real time. 687 00:20:25,835 --> 00:20:27,349 That implies a very, 688 00:20:27,349 --> 00:20:28,849 very different processing and 689 00:20:28,849 --> 00:20:31,324 often storage model from 690 00:20:31,324 --> 00:20:34,129 an environment where you have batch data that 691 00:20:34,129 --> 00:20:35,180 can be processed in 692 00:20:35,180 --> 00:20:36,755 batches at a certain time. 693 00:20:36,755 --> 00:20:38,209 And the tools, the tools and 694 00:20:38,209 --> 00:20:39,679 techniques are actually much 695 00:20:39,679 --> 00:20:40,849 different between those 696 00:20:40,849 --> 00:20:42,845 and the understanding of 697 00:20:42,845 --> 00:20:45,230 what situation you're in 698 00:20:45,230 --> 00:20:48,785 and what choice of tools is really important. 699 00:20:48,785 --> 00:20:50,224 So this gives you a kind of a good, 700 00:20:50,224 --> 00:20:51,619 a good overview of a lot of 701 00:20:51,619 --> 00:20:52,879 a wide range of things that 702 00:20:52,879 --> 00:20:55,129 data engineers have to be able to do. 703 00:20:55,129 --> 00:20:57,815 You know, we also like to be honest about, 704 00:20:57,815 --> 00:21:00,695 about data engineering. I should also say so. 705 00:21:00,695 --> 00:21:03,374 I worked for a small data 706 00:21:03,374 --> 00:21:05,149 ensuring focus software consultancy here in 707 00:21:05,149 --> 00:21:06,410 Portland were actually pretty 708 00:21:06,410 --> 00:21:07,924 tightly with PSU that my office 709 00:21:07,924 --> 00:21:09,094 that I'm here at today is that 710 00:21:09,094 --> 00:21:09,830 the Portland State 711 00:21:09,830 --> 00:21:11,884 business accelerator building? 712 00:21:11,884 --> 00:21:15,529 So we like to 713 00:21:15,529 --> 00:21:16,985 we'd like to do 714 00:21:16,985 --> 00:21:18,199 consulting technical training. 715 00:21:18,199 --> 00:21:19,340 We'd like to talk to people about data 716 00:21:19,340 --> 00:21:20,599 engineering because they're so a lot of 717 00:21:20,599 --> 00:21:21,950 people that don't really have 718 00:21:21,950 --> 00:21:23,464 as much information about that role. 719 00:21:23,464 --> 00:21:25,039 And, you know, we think it's going to be 720 00:21:25,039 --> 00:21:27,170 honest about data engineering. 721 00:21:27,170 --> 00:21:28,310 I mean, it's, it's a good, 722 00:21:28,310 --> 00:21:29,510 it's a good field in a lot of ways. 723 00:21:29,510 --> 00:21:31,910 It's really interesting. It's getting 724 00:21:31,910 --> 00:21:32,854 a lot of sense right now, 725 00:21:32,854 --> 00:21:34,444 but it's not for everybody. 726 00:21:34,444 --> 00:21:36,140 You know, a big, a big 727 00:21:36,140 --> 00:21:37,460 thing with data engineering is. 728 00:21:37,460 --> 00:21:38,720 And definitely much more about 729 00:21:38,720 --> 00:21:40,309 kind of behind the scenes role. 730 00:21:40,309 --> 00:21:42,154 For me personally, I like that. 731 00:21:42,154 --> 00:21:43,910 I'm not as much of a customer 732 00:21:43,910 --> 00:21:45,860 facing or forward facing person. I like. 733 00:21:45,860 --> 00:21:47,179 Sort of just putting my head down and 734 00:21:47,179 --> 00:21:49,249 doing things that are more technical. 735 00:21:49,249 --> 00:21:51,559 Silent Hero. You can 736 00:21:51,559 --> 00:21:52,760 see that phrase here on the slides. 737 00:21:52,760 --> 00:21:54,019 It's data scientists have 738 00:21:54,019 --> 00:21:55,039 gotten a lot of attention in 739 00:21:55,039 --> 00:21:57,409 that role as kind of a lot more visible. 740 00:21:57,409 --> 00:21:58,729 You see these like kind of miracle 741 00:21:58,729 --> 00:21:59,960 stories of companies getting 742 00:21:59,960 --> 00:22:01,370 this den site into their data so 743 00:22:01,370 --> 00:22:02,810 that the data scientists, 744 00:22:02,810 --> 00:22:04,039 I think you get more of the, 745 00:22:04,039 --> 00:22:05,030 get more credit for that. 746 00:22:05,030 --> 00:22:06,080 But in a lot of ways, data 747 00:22:06,080 --> 00:22:07,250 engineers who are just as important 748 00:22:07,250 --> 00:22:11,089 because we provide a lot 749 00:22:11,089 --> 00:22:12,379 of the infrastructure that makes 750 00:22:12,379 --> 00:22:14,810 that front end analysis possible. 751 00:22:14,810 --> 00:22:16,070 But data engineer's, 752 00:22:16,070 --> 00:22:17,344 it's just not something that you, 753 00:22:17,344 --> 00:22:18,560 that there's not as much attention 754 00:22:18,560 --> 00:22:20,699 on, even though there's lot of demands. 755 00:22:20,699 --> 00:22:23,905 A good a data engineer as detailed oriented. 756 00:22:23,905 --> 00:22:24,970 I mean, I think that goes without 757 00:22:24,970 --> 00:22:26,334 saying with technical work. 758 00:22:26,334 --> 00:22:29,319 But there are a lot 759 00:22:29,319 --> 00:22:31,899 of a lot of different range of details, 760 00:22:31,899 --> 00:22:33,339 a lot of, a lot of different things 761 00:22:33,339 --> 00:22:34,645 that you'd have to be good at. 762 00:22:34,645 --> 00:22:36,069 This slide also says jack 763 00:22:36,069 --> 00:22:37,090 of all trades, master of none. 764 00:22:37,090 --> 00:22:38,379 It's, you know, 765 00:22:38,379 --> 00:22:40,449 we live in a world of specialization. 766 00:22:40,449 --> 00:22:43,945 And for data engineers, 767 00:22:43,945 --> 00:22:45,834 it's a little bit harder because there, 768 00:22:45,834 --> 00:22:46,599 there are, there are 769 00:22:46,599 --> 00:22:48,160 some tools that you have to focus on. 770 00:22:48,160 --> 00:22:49,269 But in general, it's a very, 771 00:22:49,269 --> 00:22:51,220 very wide toolset. 772 00:22:51,220 --> 00:22:55,510 So it's, it's a world of specialization. 773 00:22:55,510 --> 00:22:57,639 Generalists aren't quite as much appreciated. 774 00:22:57,639 --> 00:23:00,189 So that's, that's another potential drawback. 775 00:23:00,189 --> 00:23:03,240 There's no entry level jobs in the center. 776 00:23:03,240 --> 00:23:05,195 You know, obviously there's entry-level 777 00:23:05,195 --> 00:23:06,380 entry-level jobs and every 778 00:23:06,380 --> 00:23:07,549 position it's just I 779 00:23:07,549 --> 00:23:09,199 think the reason that we say it this way, 780 00:23:09,199 --> 00:23:11,224 It's, there's, it's not as well-defined. 781 00:23:11,224 --> 00:23:12,349 How to get your start. 782 00:23:12,349 --> 00:23:13,580 You know, if you're a software developer, 783 00:23:13,580 --> 00:23:15,049 a lot of people start out like maybe in 784 00:23:15,049 --> 00:23:17,884 quality assurance or a row like that. 785 00:23:17,884 --> 00:23:19,759 You know, there's a better defined path 786 00:23:19,759 --> 00:23:21,784 to sort of climbing that ladder. 787 00:23:21,784 --> 00:23:23,269 And with data engineering, because 788 00:23:23,269 --> 00:23:24,349 it's a newer role, 789 00:23:24,349 --> 00:23:26,225 it's just, it's not at all to fight yet. 790 00:23:26,225 --> 00:23:28,760 And then the last thing here is, 791 00:23:28,760 --> 00:23:31,549 I've kind of already said is 792 00:23:31,549 --> 00:23:34,129 that the list of 793 00:23:34,129 --> 00:23:36,184 tools as long with data engineers. 794 00:23:36,184 --> 00:23:39,035 If you look at if you look at jaw, 795 00:23:39,035 --> 00:23:40,940 if you look at like job postings, 796 00:23:40,940 --> 00:23:42,050 I mean, there's such a wide range 797 00:23:42,050 --> 00:23:43,355 of tools that you see. 798 00:23:43,355 --> 00:23:44,389 But I do want to 799 00:23:44,389 --> 00:23:45,350 talk a little bit about though, 800 00:23:45,350 --> 00:23:46,790 how to simplify that and 801 00:23:46,790 --> 00:23:49,710 make it seem a little bit less complicated. 802 00:23:49,960 --> 00:23:52,625 So this is kind of, 803 00:23:52,625 --> 00:23:53,945 this is our short version 804 00:23:53,945 --> 00:23:55,459 of what we tell people. 805 00:23:55,459 --> 00:23:56,570 It's like if you're interested 806 00:23:56,570 --> 00:23:58,234 in data engineering, the, 807 00:23:58,234 --> 00:23:59,554 the kind of the five, 808 00:23:59,554 --> 00:24:02,629 the five important points of things to know. 809 00:24:02,629 --> 00:24:04,804 If you'd look at what I'm going to, 810 00:24:04,804 --> 00:24:06,229 I'll pull up a job posting here in 811 00:24:06,229 --> 00:24:07,774 a minute so you can see what submit, 812 00:24:07,774 --> 00:24:09,440 what employers are typically looking for. 813 00:24:09,440 --> 00:24:12,649 But is Bash or 814 00:24:12,649 --> 00:24:14,254 shell scripting working in a command line, 815 00:24:14,254 --> 00:24:15,589 being able, being able to have 816 00:24:15,589 --> 00:24:16,940 some familiarity with how to, 817 00:24:16,940 --> 00:24:18,259 How did you command line work? 818 00:24:18,259 --> 00:24:19,699 A lot of the cloud tools now 819 00:24:19,699 --> 00:24:21,409 obviously haven't like graphical interfaces, 820 00:24:21,409 --> 00:24:22,939 but having some, some 821 00:24:22,939 --> 00:24:24,695 command-line proficiency is really important. 822 00:24:24,695 --> 00:24:25,940 Python is definitely the 823 00:24:25,940 --> 00:24:27,169 language if working with data, 824 00:24:27,169 --> 00:24:28,264 whether it's on the data science, 825 00:24:28,264 --> 00:24:29,164 data engineering side. 826 00:24:29,164 --> 00:24:30,770 So that's a no-brainer. 827 00:24:30,770 --> 00:24:33,110 They're SQL Structured Query Language, 828 00:24:33,110 --> 00:24:34,099 which for many decades 829 00:24:34,099 --> 00:24:35,059 has been the language at 830 00:24:35,059 --> 00:24:35,960 interacting with 831 00:24:35,960 --> 00:24:37,504 relational data and databases. 832 00:24:37,504 --> 00:24:39,409 So that once that 833 00:24:39,409 --> 00:24:40,505 one's a big one on the list. 834 00:24:40,505 --> 00:24:42,540 Second is Docker and Kubernetes. 835 00:24:42,540 --> 00:24:43,879 These are both container technology. 836 00:24:43,879 --> 00:24:46,024 So containerization has really changed 837 00:24:46,024 --> 00:24:48,320 the landscape of software development. 838 00:24:48,320 --> 00:24:49,459 The last few years. 839 00:24:49,459 --> 00:24:52,160 Moons, containers package code 840 00:24:52,160 --> 00:24:53,430 with dependencies and other, 841 00:24:53,430 --> 00:24:55,149 make them a lot more portable 842 00:24:55,149 --> 00:24:56,785 across different platforms. 843 00:24:56,785 --> 00:24:58,599 So Docker has definitely 844 00:24:58,599 --> 00:24:59,815 taken over the world and the Kubernetes, 845 00:24:59,815 --> 00:25:01,269 it's kind of the next level of 846 00:25:01,269 --> 00:25:03,040 containerization technology communities, 847 00:25:03,040 --> 00:25:03,310 just like 848 00:25:03,310 --> 00:25:04,809 the container orchestration system 849 00:25:04,809 --> 00:25:05,260 that you will 850 00:25:05,260 --> 00:25:06,339 is heavily invested and 851 00:25:06,339 --> 00:25:07,704 that's a big open source project. 852 00:25:07,704 --> 00:25:09,459 Kubernetes actually started 853 00:25:09,459 --> 00:25:10,600 as a Google project. 854 00:25:10,600 --> 00:25:13,000 They had an internal project called Borg, 855 00:25:13,000 --> 00:25:14,875 or in the early days, 856 00:25:14,875 --> 00:25:17,739 that was their own orchestration for things 857 00:25:17,739 --> 00:25:19,060 like Google Search and 858 00:25:19,060 --> 00:25:21,264 Maps and these massive scales. 859 00:25:21,264 --> 00:25:22,809 I saw Google, one of 860 00:25:22,809 --> 00:25:24,399 the Google people talking 861 00:25:24,399 --> 00:25:25,630 one time and he said, 862 00:25:25,630 --> 00:25:27,729 Just for one of their want to, 863 00:25:27,729 --> 00:25:29,619 just for a single one of their applications. 864 00:25:29,619 --> 00:25:30,729 It was like they were doing 865 00:25:30,729 --> 00:25:32,740 several billion container pulls a week 866 00:25:32,740 --> 00:25:34,600 and their deployment operations 867 00:25:34,600 --> 00:25:36,405 for that. So they're working it. 868 00:25:36,405 --> 00:25:39,170 This massive scale and Kubernetes comes 869 00:25:39,170 --> 00:25:40,399 in with that because it does 870 00:25:40,399 --> 00:25:41,645 the orchestration for that, 871 00:25:41,645 --> 00:25:44,375 that containerize to delivering. 872 00:25:44,375 --> 00:25:46,729 Okay thirds ApacheSpark, 873 00:25:46,729 --> 00:25:47,765 this is probably something that 874 00:25:47,765 --> 00:25:48,859 maybe some data science you, 875 00:25:48,859 --> 00:25:49,969 we're more familiar with this false 876 00:25:49,969 --> 00:25:51,500 firmly under the Hadoop ecosystem. 877 00:25:51,500 --> 00:25:54,169 If you've heard that term Spark is for 878 00:25:54,169 --> 00:25:56,960 like data, data processing. 879 00:25:56,960 --> 00:25:58,129 And then also now that there was 880 00:25:58,129 --> 00:25:59,674 a peasant streaming component, 881 00:25:59,674 --> 00:26:01,489 spark lets you do is 882 00:26:01,489 --> 00:26:03,200 one of the key big data tools because it lets 883 00:26:03,200 --> 00:26:05,299 you parallelize 884 00:26:05,299 --> 00:26:08,510 your data processing and get really, 885 00:26:08,510 --> 00:26:10,819 really large jobs distributed across 886 00:26:10,819 --> 00:26:12,365 multiple different compute nodes 887 00:26:12,365 --> 00:26:13,849 and get the processing 888 00:26:13,849 --> 00:26:15,200 done efficiently and handles a lot 889 00:26:15,200 --> 00:26:16,550 of the fault tolerance and things to 890 00:26:16,550 --> 00:26:20,795 that developer the engineer does after you. 891 00:26:20,795 --> 00:26:23,239 Next thing on here is Kafka, Apache Kafka. 892 00:26:23,239 --> 00:26:24,259 And these last three, by the way, are 893 00:26:24,259 --> 00:26:26,540 all Apache open-source projects. 894 00:26:26,540 --> 00:26:28,729 Kafka is like used 895 00:26:28,729 --> 00:26:30,230 for Event Messaging about queuing. 896 00:26:30,230 --> 00:26:32,269 So in the, excuse me, 897 00:26:32,269 --> 00:26:34,039 in the cloud architecture 898 00:26:34,039 --> 00:26:35,149 that's got really popular, 899 00:26:35,149 --> 00:26:37,490 more of a decoupled paradigm 900 00:26:37,490 --> 00:26:38,540 for software development, 901 00:26:38,540 --> 00:26:39,199 which I'll talk a 902 00:26:39,199 --> 00:26:41,159 little bit more about in a minute. 903 00:26:41,470 --> 00:26:43,565 Kafka, an 904 00:26:43,565 --> 00:26:44,389 event messaging in 905 00:26:44,389 --> 00:26:45,259 general is really important 906 00:26:45,259 --> 00:26:48,739 because it can connect a lot of 907 00:26:48,739 --> 00:26:50,839 different individual Cloud services 908 00:26:50,839 --> 00:26:52,730 like databases and storage. 909 00:26:52,730 --> 00:26:54,230 The actual processing something 910 00:26:54,230 --> 00:26:55,745 like Spark and then 911 00:26:55,745 --> 00:26:57,770 going into a different form of storage like 912 00:26:57,770 --> 00:27:00,290 a data warehouse or analytical database. 913 00:27:00,290 --> 00:27:02,284 And then the last thing on the list 914 00:27:02,284 --> 00:27:04,129 that's also an Apache tool is airflow, 915 00:27:04,129 --> 00:27:05,509 which is used for orchestration. 916 00:27:05,509 --> 00:27:06,860 So as data engineer's, 917 00:27:06,860 --> 00:27:08,149 a lot of what we do is build 918 00:27:08,149 --> 00:27:10,880 data pipelines, data processing systems. 919 00:27:10,880 --> 00:27:13,669 And typically those are multi-stage, 920 00:27:13,669 --> 00:27:15,185 you know, it's like there's data ingestion. 921 00:27:15,185 --> 00:27:16,850 There's several stages of processing. 922 00:27:16,850 --> 00:27:18,470 There's data flowing between 923 00:27:18,470 --> 00:27:20,284 different points of storage, 924 00:27:20,284 --> 00:27:21,620 like different sources, 925 00:27:21,620 --> 00:27:22,850 different intermediary points, 926 00:27:22,850 --> 00:27:24,829 different target endpoints. 927 00:27:24,829 --> 00:27:26,840 So airflow is a great tool. 928 00:27:26,840 --> 00:27:28,100 It's Python-based. 929 00:27:28,100 --> 00:27:32,269 It uses tags are directed a cyclic graph. 930 00:27:32,269 --> 00:27:33,139 So some of you 931 00:27:33,139 --> 00:27:34,940 probably heard that term before. 932 00:27:34,940 --> 00:27:36,109 Too. 933 00:27:36,109 --> 00:27:37,789 Conceptual model for 934 00:27:37,789 --> 00:27:39,560 computational structures. 935 00:27:39,560 --> 00:27:42,079 And you can define dependencies between 936 00:27:42,079 --> 00:27:43,370 different jobs and specify 937 00:27:43,370 --> 00:27:44,570 the order that you want to do things in. 938 00:27:44,570 --> 00:27:46,190 So airflow is great for orchestrating 939 00:27:46,190 --> 00:27:48,830 these big complex data processing jobs. 940 00:27:48,830 --> 00:27:50,720 So these things, this 941 00:27:50,720 --> 00:27:52,265 is kind of a shortlist of things that, 942 00:27:52,265 --> 00:27:53,750 you know, if you, if you have 943 00:27:53,750 --> 00:27:55,340 some familiarity with these tools, 944 00:27:55,340 --> 00:27:56,569 you can't go wrong in terms 945 00:27:56,569 --> 00:27:58,370 of in terms of data engineering 946 00:27:58,370 --> 00:28:00,365 and being able to 947 00:28:00,365 --> 00:28:02,480 get started working as a data engineer. 948 00:28:02,480 --> 00:28:04,565 And then the last point here is 949 00:28:04,565 --> 00:28:06,290 be very aware of Cloud 950 00:28:06,290 --> 00:28:07,489 and learning these things on the Cloud. 951 00:28:07,489 --> 00:28:10,039 So at our shopper where Google Cloud focus. 952 00:28:10,039 --> 00:28:11,630 So it's like Google Cloud has 953 00:28:11,630 --> 00:28:13,399 really good implementations of 954 00:28:13,399 --> 00:28:14,210 all these things that are 955 00:28:14,210 --> 00:28:16,100 close to the open source. 956 00:28:16,100 --> 00:28:19,009 But all the major cloud vendors have 957 00:28:19,009 --> 00:28:20,750 really strong implementations of 958 00:28:20,750 --> 00:28:22,459 all the Apache tools. 959 00:28:22,459 --> 00:28:26,810 It's, you know, it's, it's definitely, it's, 960 00:28:26,810 --> 00:28:27,949 it also makes it easier to get 961 00:28:27,949 --> 00:28:29,345 started because if you're interested, 962 00:28:29,345 --> 00:28:31,100 say, in learning about Spock sparks. 963 00:28:31,100 --> 00:28:32,464 So the, the Google version of that is, 964 00:28:32,464 --> 00:28:34,159 there are tools called Dataproc. 965 00:28:34,159 --> 00:28:36,875 If you Google Cloud 966 00:28:36,875 --> 00:28:38,809 website, if you sign up for free, 967 00:28:38,809 --> 00:28:40,669 you can get $300 free trial credit 968 00:28:40,669 --> 00:28:41,779 and start playing around with some of 969 00:28:41,779 --> 00:28:43,760 the crowd cloud tools 970 00:28:43,760 --> 00:28:45,050 and use their tutorials. 971 00:28:45,050 --> 00:28:46,400 It's a really good way to learn. 972 00:28:46,400 --> 00:28:47,300 And I'll talk more about 973 00:28:47,300 --> 00:28:48,784 that in a minute here. 974 00:28:48,784 --> 00:28:50,449 Okay. 975 00:28:50,449 --> 00:28:52,970 Now, why, why am I here? 976 00:28:52,970 --> 00:28:53,945 And system science, 977 00:28:53,945 --> 00:28:55,955 particularly to give this talk today? 978 00:28:55,955 --> 00:28:58,250 I have a background in system science. 979 00:28:58,250 --> 00:29:00,739 It, as a system science person, 980 00:29:00,739 --> 00:29:02,209 I think there's a number of things 981 00:29:02,209 --> 00:29:04,130 that some characteristics that we might 982 00:29:04,130 --> 00:29:05,990 all have in common that would make 983 00:29:05,990 --> 00:29:07,100 this role of interest 984 00:29:07,100 --> 00:29:08,870 is just some science people. 985 00:29:08,870 --> 00:29:10,010 One of them is 986 00:29:10,010 --> 00:29:11,479 just solving interesting problems. 987 00:29:11,479 --> 00:29:12,619 I mean, I think a lot of system 988 00:29:12,619 --> 00:29:14,060 science people in particular, 989 00:29:14,060 --> 00:29:15,290 really curious and looking 990 00:29:15,290 --> 00:29:17,450 for interesting and new ways to do things. 991 00:29:17,450 --> 00:29:19,129 And data engineering is definitely a field 992 00:29:19,129 --> 00:29:20,839 where it can be intense, 993 00:29:20,839 --> 00:29:23,495 it can be challenging, but you're always, 994 00:29:23,495 --> 00:29:25,535 you're always solving interesting problems. 995 00:29:25,535 --> 00:29:27,499 The next one is just building things, 996 00:29:27,499 --> 00:29:29,194 you know, and, and I, 997 00:29:29,194 --> 00:29:30,890 and I say cloud is fun because 998 00:29:30,890 --> 00:29:32,419 the last few years when I start to get 999 00:29:32,419 --> 00:29:34,220 more experience with developing 1000 00:29:34,220 --> 00:29:35,060 software in the Cloud 1001 00:29:35,060 --> 00:29:36,455 and with the Cloud tools. 1002 00:29:36,455 --> 00:29:38,510 It can be really overwhelming at first, 1003 00:29:38,510 --> 00:29:39,559 but then when you start to 1004 00:29:39,559 --> 00:29:40,849 get some familiarity with it, 1005 00:29:40,849 --> 00:29:42,425 It's really fun because 1006 00:29:42,425 --> 00:29:44,600 unlike many years ago 1007 00:29:44,600 --> 00:29:45,740 when I got my start in TAC, 1008 00:29:45,740 --> 00:29:47,885 if you were going to deploy an application, 1009 00:29:47,885 --> 00:29:50,510 companies would buy a bunch of servers, 1010 00:29:50,510 --> 00:29:52,430 deploy these proprietary applications. 1011 00:29:52,430 --> 00:29:54,274 That process would take months and months. 1012 00:29:54,274 --> 00:29:56,090 And now it's like you can 1013 00:29:56,090 --> 00:29:58,160 log into one of the cloud vendors. 1014 00:29:58,160 --> 00:30:01,144 You can provision some, some resources. 1015 00:30:01,144 --> 00:30:03,800 And you can upload some code and build a, 1016 00:30:03,800 --> 00:30:05,630 build an application that will 1017 00:30:05,630 --> 00:30:07,880 automatically scale in a matter of minutes. 1018 00:30:07,880 --> 00:30:09,080 I mean, it's really actually, 1019 00:30:09,080 --> 00:30:11,284 it's really actually made and it's fun. 1020 00:30:11,284 --> 00:30:13,400 The next thing on the list, this 1021 00:30:13,400 --> 00:30:14,480 is definitely 1022 00:30:14,480 --> 00:30:15,590 a system science topic is 1023 00:30:15,590 --> 00:30:17,239 complexity management, right? 1024 00:30:17,239 --> 00:30:20,390 Because I think anybody 1025 00:30:20,390 --> 00:30:22,220 has an experience and tech, 1026 00:30:22,220 --> 00:30:24,839 we probably all know at this point that, 1027 00:30:24,839 --> 00:30:26,200 you know, it's a very, 1028 00:30:26,200 --> 00:30:27,759 it's a very complicated landscape. 1029 00:30:27,759 --> 00:30:30,759 It can be, it cuts both ways because there's 1030 00:30:30,759 --> 00:30:32,229 just amazing set of 1031 00:30:32,229 --> 00:30:33,669 tools and technologies that are 1032 00:30:33,669 --> 00:30:35,079 out there like anything that you 1033 00:30:35,079 --> 00:30:36,565 can think about at this point. 1034 00:30:36,565 --> 00:30:38,755 Basically in principle, you can build. 1035 00:30:38,755 --> 00:30:40,989 But that, that, 1036 00:30:40,989 --> 00:30:42,550 that tool landscape though 1037 00:30:42,550 --> 00:30:43,300 is sometimes 1038 00:30:43,300 --> 00:30:44,440 works against you because there's, 1039 00:30:44,440 --> 00:30:45,399 there's so many things 1040 00:30:45,399 --> 00:30:46,510 competing for your attention. 1041 00:30:46,510 --> 00:30:47,649 There's so many different options 1042 00:30:47,649 --> 00:30:49,029 out there it can, it can be very, 1043 00:30:49,029 --> 00:30:50,559 very far from obvious, 1044 00:30:50,559 --> 00:30:52,659 like what the proper choice of tools is, 1045 00:30:52,659 --> 00:30:53,409 where you should focus 1046 00:30:53,409 --> 00:30:54,909 your time and attention. 1047 00:30:54,909 --> 00:30:56,529 And that's where the complexity 1048 00:30:56,529 --> 00:30:58,539 management aspect comes in. 1049 00:30:58,539 --> 00:31:00,610 So data engineering has a role that's, 1050 00:31:00,610 --> 00:31:02,065 that's not as well-defined 1051 00:31:02,065 --> 00:31:03,340 for people that are looking for 1052 00:31:03,340 --> 00:31:05,199 something that's like got 1053 00:31:05,199 --> 00:31:06,549 much more clear boundaries. 1054 00:31:06,549 --> 00:31:08,779 That's, you know, more role 1055 00:31:08,779 --> 00:31:09,859 that's been defined for awhile. 1056 00:31:09,859 --> 00:31:10,789 They'd engineering is probably 1057 00:31:10,789 --> 00:31:11,749 not as much interests, 1058 00:31:11,749 --> 00:31:14,359 but for somebody who's looking for something 1059 00:31:14,359 --> 00:31:15,620 that is going to be 1060 00:31:15,620 --> 00:31:17,299 maybe somewhat different day-to-day. 1061 00:31:17,299 --> 00:31:18,830 And that you're going 1062 00:31:18,830 --> 00:31:19,339 to have to think a 1063 00:31:19,339 --> 00:31:20,330 little bit more creatively. 1064 00:31:20,330 --> 00:31:23,015 Data engineering is definitely a great role. 1065 00:31:23,015 --> 00:31:25,235 Okay, and the last thing on this list is, 1066 00:31:25,235 --> 00:31:26,119 as I've already kind of 1067 00:31:26,119 --> 00:31:26,930 talked about a little bit, 1068 00:31:26,930 --> 00:31:28,939 generalism and there being a broad skill 1069 00:31:28,939 --> 00:31:30,500 set that we live 1070 00:31:30,500 --> 00:31:32,404 in a world of specialization and that's, 1071 00:31:32,404 --> 00:31:34,504 there are a lot of benefits to that. 1072 00:31:34,504 --> 00:31:36,499 But it's partly kind of 1073 00:31:36,499 --> 00:31:38,359 why it's hard to get an appointment 1074 00:31:38,359 --> 00:31:39,620 with a general practitioner 1075 00:31:39,620 --> 00:31:40,730 doctor at this point because 1076 00:31:40,730 --> 00:31:41,884 everybody wants to be 1077 00:31:41,884 --> 00:31:44,014 a surgeon or some kind of specialist. 1078 00:31:44,014 --> 00:31:45,860 And it's similar with data engineers. 1079 00:31:45,860 --> 00:31:47,150 I think that it's 1080 00:31:47,150 --> 00:31:49,759 like some of the more specialized areas. 1081 00:31:49,759 --> 00:31:51,230 Science received more attention, 1082 00:31:51,230 --> 00:31:53,540 but data engineering is 1083 00:31:53,540 --> 00:31:54,949 rapidly becoming one of 1084 00:31:54,949 --> 00:31:56,570 the most important roles. 1085 00:31:56,570 --> 00:31:59,074 Data engineer's that term is relatively new. 1086 00:31:59,074 --> 00:31:59,809 People have been doing 1087 00:31:59,809 --> 00:32:01,280 that kind of work for awhile. 1088 00:32:01,280 --> 00:32:03,109 But it's only in recent years that 1089 00:32:03,109 --> 00:32:05,089 companies have really strongly, 1090 00:32:05,089 --> 00:32:07,580 strongly codify that as its own, 1091 00:32:07,580 --> 00:32:09,800 its own, its own role. 1092 00:32:09,800 --> 00:32:12,769 And that's part of why I'm here to sort of 1093 00:32:12,769 --> 00:32:13,969 fill in some of the details on 1094 00:32:13,969 --> 00:32:16,399 that and what that evolves. 1095 00:32:16,399 --> 00:32:20,300 Okay, so patho data engineering, 1096 00:32:20,300 --> 00:32:22,325 the, there's, 1097 00:32:22,325 --> 00:32:23,749 the nice thing is there's an amazing 1098 00:32:23,749 --> 00:32:25,744 amount of resources out there. 1099 00:32:25,744 --> 00:32:28,880 So the first thing that I'm going to say 1100 00:32:28,880 --> 00:32:32,000 here is if you're. 1101 00:32:32,000 --> 00:32:32,929 If data engineering is 1102 00:32:32,929 --> 00:32:34,009 something that interests you and this is 1103 00:32:34,009 --> 00:32:35,345 just general career advice 1104 00:32:35,345 --> 00:32:36,935 independent of any role is, 1105 00:32:36,935 --> 00:32:38,630 you know, just go even 1106 00:32:38,630 --> 00:32:40,235 if you're not actively looking for a job. 1107 00:32:40,235 --> 00:32:41,690 But down the road, you're like trying to, 1108 00:32:41,690 --> 00:32:42,034 you know, 1109 00:32:42,034 --> 00:32:43,489 you're interested in a particular field. 1110 00:32:43,489 --> 00:32:45,920 This pay close attention to like 1111 00:32:45,920 --> 00:32:47,210 the job descriptions 1112 00:32:47,210 --> 00:32:49,009 that organizations posting. 1113 00:32:49,009 --> 00:32:50,825 This is actually a really good way. 1114 00:32:50,825 --> 00:32:52,474 Even if you're not looking for a job, 1115 00:32:52,474 --> 00:32:53,899 you're just sort of forming 1116 00:32:53,899 --> 00:32:55,625 your own self-learning path 1117 00:32:55,625 --> 00:32:58,549 is stay in close touch 1118 00:32:58,549 --> 00:33:00,019 with what organizations are 1119 00:33:00,019 --> 00:33:01,670 looking for it like go on Indeed, 1120 00:33:01,670 --> 00:33:03,019 go on LinkedIn, go on 1121 00:33:03,019 --> 00:33:04,850 whatever your favorite yacht website is. 1122 00:33:04,850 --> 00:33:05,735 Fine. 1123 00:33:05,735 --> 00:33:07,265 Find jobs in that area. 1124 00:33:07,265 --> 00:33:09,020 See exactly what skills 1125 00:33:09,020 --> 00:33:10,610 and responsibilities are listed. 1126 00:33:10,610 --> 00:33:13,460 Keep that in mind as you grow your skills. 1127 00:33:13,460 --> 00:33:14,855 And to step back for a minute. 1128 00:33:14,855 --> 00:33:15,709 I mean, part of the reason 1129 00:33:15,709 --> 00:33:16,639 I talk about this too, 1130 00:33:16,639 --> 00:33:20,104 is that my experience 1131 00:33:20,104 --> 00:33:24,079 with graduate education at PSU, 1132 00:33:24,079 --> 00:33:26,179 was that particularly 1133 00:33:26,179 --> 00:33:27,260 on the system science side, 1134 00:33:27,260 --> 00:33:29,255 I feel like are on the computer science side, 1135 00:33:29,255 --> 00:33:30,410 system science does get 1136 00:33:30,410 --> 00:33:32,359 more sort of practically focused. 1137 00:33:32,359 --> 00:33:35,120 My experience in system science was great, 1138 00:33:35,120 --> 00:33:36,500 but it also particularly 1139 00:33:36,500 --> 00:33:37,639 on the like machine learning and 1140 00:33:37,639 --> 00:33:41,149 AI site was that it was 1141 00:33:41,149 --> 00:33:42,440 a lot more theoretically 1142 00:33:42,440 --> 00:33:44,210 focused and focused on like 1143 00:33:44,210 --> 00:33:46,654 the mathematical foundations 1144 00:33:46,654 --> 00:33:48,110 of machine learning and AI, 1145 00:33:48,110 --> 00:33:49,789 which is very important. 1146 00:33:49,789 --> 00:33:51,349 If you're going to actually work 1147 00:33:51,349 --> 00:33:52,730 in those areas and do more, 1148 00:33:52,730 --> 00:33:56,465 just use models that other people have built. 1149 00:33:56,465 --> 00:33:58,940 But and there were, there was, 1150 00:33:58,940 --> 00:34:00,334 there were obviously in my 1151 00:34:00,334 --> 00:34:01,730 CS machine learning classes. 1152 00:34:01,730 --> 00:34:03,530 There are a lot of hands-on exercises, 1153 00:34:03,530 --> 00:34:04,790 but those typically focused 1154 00:34:04,790 --> 00:34:08,525 more on like implementing 1155 00:34:08,525 --> 00:34:11,224 textbook machine learning algorithms 1156 00:34:11,224 --> 00:34:12,739 from scratch to make sure 1157 00:34:12,739 --> 00:34:13,774 that you've had a really good 1158 00:34:13,774 --> 00:34:14,989 under the hood it understanding 1159 00:34:14,989 --> 00:34:16,610 of what those algorithms for doing. 1160 00:34:16,610 --> 00:34:19,474 As opposed to using 1161 00:34:19,474 --> 00:34:21,650 the most recent Python libraries 1162 00:34:21,650 --> 00:34:23,359 or the Cloud tools. 1163 00:34:23,359 --> 00:34:26,359 The, the kind of more everyday tools 1164 00:34:26,359 --> 00:34:27,679 that working machine 1165 00:34:27,679 --> 00:34:28,745 learning or data engineers, 1166 00:34:28,745 --> 00:34:30,905 data scientists were using. 1167 00:34:30,905 --> 00:34:34,309 So I bring say that I 1168 00:34:34,309 --> 00:34:35,540 definitely had the experience 1169 00:34:35,540 --> 00:34:36,830 of for as much 1170 00:34:36,830 --> 00:34:38,315 as I learned in graduate school. 1171 00:34:38,315 --> 00:34:40,160 Then still realizing at some point that in 1172 00:34:40,160 --> 00:34:42,710 terms of practical skills and 1173 00:34:42,710 --> 00:34:44,060 particularly liked the most 1174 00:34:44,060 --> 00:34:45,620 recent developments in 1175 00:34:45,620 --> 00:34:47,210 the tools that people are using out 1176 00:34:47,210 --> 00:34:49,580 there in industry, in the professional world. 1177 00:34:49,580 --> 00:34:52,854 There is, you know, 1178 00:34:52,854 --> 00:34:54,099 there there were, there were, 1179 00:34:54,099 --> 00:34:55,239 there were more things that I was going to 1180 00:34:55,239 --> 00:34:56,874 have to learn on my own. 1181 00:34:56,874 --> 00:34:58,839 So that's and that's 1182 00:34:58,839 --> 00:35:00,100 part of what I want to fill in for 1183 00:35:00,100 --> 00:35:01,749 everyone here is sort of like a good way 1184 00:35:01,749 --> 00:35:02,319 to do that because it 1185 00:35:02,319 --> 00:35:03,670 can be really overwhelming. 1186 00:35:03,670 --> 00:35:05,170 The cloud certifications are 1187 00:35:05,170 --> 00:35:06,459 just text certifications in 1188 00:35:06,459 --> 00:35:08,185 general are also a really good way 1189 00:35:08,185 --> 00:35:09,775 to form your learning path. 1190 00:35:09,775 --> 00:35:11,305 All the major cloud vendors, 1191 00:35:11,305 --> 00:35:12,099 even if you're not 1192 00:35:12,099 --> 00:35:13,165 talking about data engineering 1193 00:35:13,165 --> 00:35:15,235 or something that's not as much on the Cloud. 1194 00:35:15,235 --> 00:35:17,169 If you're interested in cybersecurity, 1195 00:35:17,169 --> 00:35:19,539 if you're interested, or purely a networks. 1196 00:35:19,539 --> 00:35:21,280 It's like they're the Cloud 1197 00:35:21,280 --> 00:35:22,840 Native Computing Foundation is one 1198 00:35:22,840 --> 00:35:24,249 that has other certifications 1199 00:35:24,249 --> 00:35:25,299 that are separate from 1200 00:35:25,299 --> 00:35:26,214 what I'm talking about. 1201 00:35:26,214 --> 00:35:27,444 It's like whatever 1202 00:35:27,444 --> 00:35:28,765 whatever your interests are, 1203 00:35:28,765 --> 00:35:31,540 the certification path can 1204 00:35:31,540 --> 00:35:32,590 be can be really helpful 1205 00:35:32,590 --> 00:35:33,669 because it helps you learn, 1206 00:35:33,669 --> 00:35:35,049 but also gives you 1207 00:35:35,049 --> 00:35:36,820 a credential that helps distinguish you 1208 00:35:36,820 --> 00:35:38,470 from a lot of 1209 00:35:38,470 --> 00:35:41,200 these other people in a crowded job market. 1210 00:35:41,200 --> 00:35:42,639 The other thing that's great 1211 00:35:42,639 --> 00:35:43,839 right now is that sorted 1212 00:35:43,839 --> 00:35:45,699 the certification process has 1213 00:35:45,699 --> 00:35:47,110 gotten a lot easier. 1214 00:35:47,110 --> 00:35:48,399 That used to involve 1215 00:35:48,399 --> 00:35:50,079 that was more like the LSAT where you 1216 00:35:50,079 --> 00:35:50,890 would have to schedule 1217 00:35:50,890 --> 00:35:51,999 a test like weeks ahead of 1218 00:35:51,999 --> 00:35:54,430 time and go to an on-site proctored thing. 1219 00:35:54,430 --> 00:35:55,539 And it was it was 1220 00:35:55,539 --> 00:35:56,605 something that took a while, 1221 00:35:56,605 --> 00:35:59,124 you know, it's kind of a long a long game. 1222 00:35:59,124 --> 00:36:00,700 And you had to think ahead a little bit. 1223 00:36:00,700 --> 00:36:04,194 We're now particularly with the pandemic. 1224 00:36:04,194 --> 00:36:05,559 Most, I know at least 1225 00:36:05,559 --> 00:36:07,150 with what the cloud stuff like, 1226 00:36:07,150 --> 00:36:09,189 all the major cloud vendors have shifted 1227 00:36:09,189 --> 00:36:10,809 their certifications to having 1228 00:36:10,809 --> 00:36:12,355 an online proctoring option. 1229 00:36:12,355 --> 00:36:13,749 So you can usually register 1230 00:36:13,749 --> 00:36:15,219 day over a day or two ahead of 1231 00:36:15,219 --> 00:36:15,940 time and take 1232 00:36:15,940 --> 00:36:18,235 a certification from your house. 1233 00:36:18,235 --> 00:36:19,599 There's still proctored, but 1234 00:36:19,599 --> 00:36:20,920 it's the hope the whole process, 1235 00:36:20,920 --> 00:36:21,939 it really kinda makes 1236 00:36:21,939 --> 00:36:23,500 the process a lot easier. 1237 00:36:23,500 --> 00:36:26,770 And most of the exotic certifications have 1238 00:36:26,770 --> 00:36:29,319 some very clear study guide that will walk 1239 00:36:29,319 --> 00:36:30,759 you through exactly the material 1240 00:36:30,759 --> 00:36:31,735 you need to know. 1241 00:36:31,735 --> 00:36:33,145 Theoretical and practical, 1242 00:36:33,145 --> 00:36:34,090 give you sample question. 1243 00:36:34,090 --> 00:36:36,069 So that's a great, that's a great way 1244 00:36:36,069 --> 00:36:37,629 to really sort of boost 1245 00:36:37,629 --> 00:36:39,160 or just practical understanding of 1246 00:36:39,160 --> 00:36:39,969 what the tools that 1247 00:36:39,969 --> 00:36:41,470 are being used in industry. 1248 00:36:41,470 --> 00:36:42,580 A few more things here real 1249 00:36:42,580 --> 00:36:43,839 quick on this, on self-learning. 1250 00:36:43,839 --> 00:36:46,899 It's like anything that you 1251 00:36:46,899 --> 00:36:48,459 do when you're writing code or building 1252 00:36:48,459 --> 00:36:50,425 projects, do it on GitHub. 1253 00:36:50,425 --> 00:36:52,075 This is something that 1254 00:36:52,075 --> 00:36:54,490 employers really look for very strongly. 1255 00:36:54,490 --> 00:36:55,479 I mean, this has become one of 1256 00:36:55,479 --> 00:36:56,829 the most prominent ways 1257 00:36:56,829 --> 00:36:58,540 employers to vet employees 1258 00:36:58,540 --> 00:37:00,190 even before they interview them. 1259 00:37:00,190 --> 00:37:02,770 Because, you know, if, 1260 00:37:02,770 --> 00:37:04,000 if you're in a tech area 1261 00:37:04,000 --> 00:37:05,920 and you can involves coding, 1262 00:37:05,920 --> 00:37:08,590 your GitHub is sort of a history 1263 00:37:08,590 --> 00:37:10,704 of what you've been doing and, you know, 1264 00:37:10,704 --> 00:37:11,859 before they even talk to you, 1265 00:37:11,859 --> 00:37:13,389 employers can go and see if 1266 00:37:13,389 --> 00:37:14,050 you've actually been 1267 00:37:14,050 --> 00:37:15,130 doing something interesting. 1268 00:37:15,130 --> 00:37:16,870 And if you, if you've been 1269 00:37:16,870 --> 00:37:17,830 writing code and what 1270 00:37:17,830 --> 00:37:18,939 kind of things you've been working on. 1271 00:37:18,939 --> 00:37:20,709 If and if they don't see much there 1272 00:37:20,709 --> 00:37:22,374 that, you know, they, 1273 00:37:22,374 --> 00:37:23,350 they might go and 1274 00:37:23,350 --> 00:37:24,370 look for somebody else who has 1275 00:37:24,370 --> 00:37:25,750 a better GitHub portfolio 1276 00:37:25,750 --> 00:37:27,700 so that Darwin's big. 1277 00:37:27,700 --> 00:37:29,425 There's a lot of resources. I think 1278 00:37:29,425 --> 00:37:31,344 probably most people here know about Kaggle. 1279 00:37:31,344 --> 00:37:32,950 Kaggle is more data science oriented, 1280 00:37:32,950 --> 00:37:34,150 but there's a lot of great data sets, 1281 00:37:34,150 --> 00:37:35,920 interactive tutorials, data hub, 1282 00:37:35,920 --> 00:37:38,480 bio, those sorts of things. 1283 00:37:39,080 --> 00:37:41,160 Free online resources. 1284 00:37:41,160 --> 00:37:42,570 Youtube is actually a great place for 1285 00:37:42,570 --> 00:37:45,909 learning coding and tech skills. 1286 00:37:46,700 --> 00:37:50,910 Just a quick Google search will usually 1287 00:37:50,910 --> 00:37:52,499 give you what you need to know if you're 1288 00:37:52,499 --> 00:37:54,570 looking for some tutorials on learn. 1289 00:37:54,570 --> 00:37:56,249 Also, there's some, some paid 1290 00:37:56,249 --> 00:37:57,540 online programs 1291 00:37:57,540 --> 00:37:59,520 like Udemy data camp, Coursera. 1292 00:37:59,520 --> 00:38:00,269 I think we're probably all 1293 00:38:00,269 --> 00:38:01,934 familiar with those at this point. 1294 00:38:01,934 --> 00:38:03,570 Let me show you real quick 1295 00:38:03,570 --> 00:38:10,740 to this is just to take they will example, 1296 00:38:10,740 --> 00:38:12,359 but particularly with 1297 00:38:12,359 --> 00:38:14,250 clouds, with cloud stuff. 1298 00:38:14,250 --> 00:38:15,539 One of the great things about, 1299 00:38:15,539 --> 00:38:16,829 about it is that it makes it 1300 00:38:16,829 --> 00:38:19,120 really easy to learn. 1301 00:38:19,160 --> 00:38:23,725 And Let's see. 1302 00:38:23,725 --> 00:38:27,534 So if you go, if you go, 1303 00:38:27,534 --> 00:38:29,230 particularly if you go on 1304 00:38:29,230 --> 00:38:30,669 the Google Cloud Console or any of 1305 00:38:30,669 --> 00:38:33,370 the major vendors are like this. 1306 00:38:33,370 --> 00:38:34,059 Right? 1307 00:38:34,059 --> 00:38:37,300 I'm looking for yeah, 1308 00:38:37,300 --> 00:38:39,099 So Google has this great thing and most 1309 00:38:39,099 --> 00:38:40,900 of the cloud vendors do is like any, 1310 00:38:40,900 --> 00:38:42,309 any of the Cloud services. 1311 00:38:42,309 --> 00:38:44,470 And you can sign up and start a free trial. 1312 00:38:44,470 --> 00:38:46,435 These great built-in tutorials 1313 00:38:46,435 --> 00:38:49,210 where like in the Cloud Console, 1314 00:38:49,210 --> 00:38:51,175 the tools that you're going to run. 1315 00:38:51,175 --> 00:38:54,250 It has these tutorials 1316 00:38:54,250 --> 00:38:58,284 that literally will walk you through. 1317 00:38:58,284 --> 00:38:59,799 You choose what service you want to 1318 00:38:59,799 --> 00:39:05,019 start and you tutorial. 1319 00:39:05,019 --> 00:39:06,669 And it's like you can walk through in 1320 00:39:06,669 --> 00:39:08,590 the actual Cloud Console 1321 00:39:08,590 --> 00:39:10,060 and go to Place and 1322 00:39:10,060 --> 00:39:11,080 resources and build an 1323 00:39:11,080 --> 00:39:12,490 application or its units. 1324 00:39:12,490 --> 00:39:14,410 So this is a rate weight or I mean, 1325 00:39:14,410 --> 00:39:15,429 this is probably one of the 1326 00:39:15,429 --> 00:39:17,410 most helpful things that, 1327 00:39:17,410 --> 00:39:18,849 that I did when I 1328 00:39:18,849 --> 00:39:21,039 was learning a lot of this cloud stuff, 1329 00:39:21,039 --> 00:39:23,275 loosened my skills because it's just 1330 00:39:23,275 --> 00:39:25,600 most of them take 10 minutes or five minutes. 1331 00:39:25,600 --> 00:39:26,740 So it's something you can you can 1332 00:39:26,740 --> 00:39:28,480 squeeze in between other things. 1333 00:39:28,480 --> 00:39:30,504 You don't have to have two hours to do it. 1334 00:39:30,504 --> 00:39:32,530 So this is another great way, 1335 00:39:32,530 --> 00:39:33,820 just particularly for like 1336 00:39:33,820 --> 00:39:35,740 technically focused education and 1337 00:39:35,740 --> 00:39:37,780 getting practically getting that familiarity. 1338 00:39:37,780 --> 00:39:40,105 Because I know with Cloud in particular, 1339 00:39:40,105 --> 00:39:43,209 when I was getting my masters and starting 1340 00:39:43,209 --> 00:39:44,620 to become more familiar with 1341 00:39:44,620 --> 00:39:46,509 the file plants and even cloud tools. 1342 00:39:46,509 --> 00:39:48,429 I mean, it was really overwhelming. 1343 00:39:48,429 --> 00:39:49,749 You log in and there's 1344 00:39:49,749 --> 00:39:51,190 like all these services, 1345 00:39:51,190 --> 00:39:52,824 you know, it's like, oh, okay. 1346 00:39:52,824 --> 00:39:55,120 Like administration, 1347 00:39:55,120 --> 00:39:56,590 you know, different compute options, 1348 00:39:56,590 --> 00:39:58,164 different storage options, 1349 00:39:58,164 --> 00:40:00,204 multiple databases, 1350 00:40:00,204 --> 00:40:03,939 network layer, different operations stuff. 1351 00:40:03,939 --> 00:40:07,180 You know, it's that continuous integration, 1352 00:40:07,180 --> 00:40:09,249 continuous development tools, 1353 00:40:09,249 --> 00:40:11,740 then all these big data tools, 1354 00:40:11,740 --> 00:40:13,674 multiple different processing thing. 1355 00:40:13,674 --> 00:40:15,955 So it's super overwhelming. 1356 00:40:15,955 --> 00:40:19,059 And so when we first get started, I mean, 1357 00:40:19,059 --> 00:40:20,680 it can be really hard to know, 1358 00:40:20,680 --> 00:40:23,380 like where to put your time and attention. 1359 00:40:23,380 --> 00:40:25,119 And the built-in tutorials 1360 00:40:25,119 --> 00:40:26,350 and that kind of stuff is great because 1361 00:40:26,350 --> 00:40:27,610 it'll help you just really get up 1362 00:40:27,610 --> 00:40:28,930 to speed and start to get kind of 1363 00:40:28,930 --> 00:40:30,369 a mental map of what 1364 00:40:30,369 --> 00:40:33,325 the different tools are, what you're doing. 1365 00:40:33,325 --> 00:40:35,260 I would say one last thing 1366 00:40:35,260 --> 00:40:37,645 about the certification path. 1367 00:40:37,645 --> 00:40:41,200 That's really good is most of 1368 00:40:41,200 --> 00:40:42,909 the most of 1369 00:40:42,909 --> 00:40:44,680 the or all the cloud vendors 1370 00:40:44,680 --> 00:40:46,240 office offer certifications 1371 00:40:46,240 --> 00:40:46,900 at different levels. 1372 00:40:46,900 --> 00:40:48,249 So there's usually some kind of 1373 00:40:48,249 --> 00:40:49,989 like a cloud engineering. 1374 00:40:49,989 --> 00:40:52,570 We're sort of starting 1375 00:40:52,570 --> 00:40:55,224 level certification and those are still very, 1376 00:40:55,224 --> 00:40:56,995 very useful variant demand. 1377 00:40:56,995 --> 00:40:58,390 So it's like starting with an associate 1378 00:40:58,390 --> 00:40:59,980 level certification can be a really good way 1379 00:40:59,980 --> 00:41:03,009 to practically get some hands-on skills with, 1380 00:41:03,009 --> 00:41:04,389 with Cloud or some of 1381 00:41:04,389 --> 00:41:05,740 these other modern tools 1382 00:41:05,740 --> 00:41:08,229 and distinguish yourself from the pack. 1383 00:41:08,229 --> 00:41:09,490 This other part of 1384 00:41:09,490 --> 00:41:10,240 the slide numbers you're 1385 00:41:10,240 --> 00:41:10,990 going to step through. 1386 00:41:10,990 --> 00:41:12,519 We also were very focused at 1387 00:41:12,519 --> 00:41:13,960 my company and Technical Education 1388 00:41:13,960 --> 00:41:15,100 and we're getting ready to roll 1389 00:41:15,100 --> 00:41:17,514 out a data engineering focus bootcamp. 1390 00:41:17,514 --> 00:41:18,369 I'm not going to get too 1391 00:41:18,369 --> 00:41:19,149 much into the details 1392 00:41:19,149 --> 00:41:19,809 of this because I'll just 1393 00:41:19,809 --> 00:41:20,589 get the link at the end. 1394 00:41:20,589 --> 00:41:22,389 But that's an option for a lot of people, 1395 00:41:22,389 --> 00:41:23,799 I think with system science where people 1396 00:41:23,799 --> 00:41:25,930 are probably better on 1397 00:41:25,930 --> 00:41:26,889 that self-learning side of 1398 00:41:26,889 --> 00:41:27,549 things because you already 1399 00:41:27,549 --> 00:41:30,984 have some technical background. 1400 00:41:30,984 --> 00:41:33,505 So let's, let me pause for a minute. 1401 00:41:33,505 --> 00:41:34,825 I do have this demo, 1402 00:41:34,825 --> 00:41:36,595 but it's it's quarter till. 1403 00:41:36,595 --> 00:41:38,109 So I want to pause for 1404 00:41:38,109 --> 00:41:39,880 a minute and see just if 1405 00:41:39,880 --> 00:41:41,409 there's questions because I'd love to 1406 00:41:41,409 --> 00:41:43,914 answer any questions or have a conversation. 1407 00:41:43,914 --> 00:41:45,490 So let let me let me take 1408 00:41:45,490 --> 00:41:46,780 a break just for a minute and see 1409 00:41:46,780 --> 00:41:49,269 if anybody anybody has questions. 1410 00:41:49,269 --> 00:41:51,279 And if you do, I'll be glad to answer them. 1411 00:41:51,279 --> 00:41:52,990 And then if we have any time 1412 00:41:52,990 --> 00:41:55,015 unless stillbirth for this demo. 1413 00:41:55,015 --> 00:41:56,650 What I have a question for 1414 00:41:56,650 --> 00:41:59,034 people who like working in groups, 1415 00:41:59,034 --> 00:42:00,640 can you get started 1416 00:42:00,640 --> 00:42:02,679 with other people at your level of 1417 00:42:02,679 --> 00:42:04,930 ignorance and kind of help each other 1418 00:42:04,930 --> 00:42:06,040 as opposed to doing is 1419 00:42:06,040 --> 00:42:08,184 kind of all my yourself. 1420 00:42:08,184 --> 00:42:10,209 Does the question isn't yeah. 1421 00:42:10,209 --> 00:42:11,739 Oh, no way. And that's, that's, that's great. 1422 00:42:11,739 --> 00:42:15,415 So this, this speaks to learning paths. 1423 00:42:15,415 --> 00:42:16,990 Let me go back to this other 1424 00:42:16,990 --> 00:42:19,014 slide here, just 1 second. 1425 00:42:19,014 --> 00:42:21,040 Yeah, I mean, this is definitely 1426 00:42:21,040 --> 00:42:23,259 like one of the best, you know, 1427 00:42:23,259 --> 00:42:24,399 one of the best things 1428 00:42:24,399 --> 00:42:25,510 that I would add to this 1429 00:42:25,510 --> 00:42:28,659 is one of the best things 1430 00:42:28,659 --> 00:42:29,799 and we put 1431 00:42:29,799 --> 00:42:30,910 this on the bootcamp side of the slide, 1432 00:42:30,910 --> 00:42:32,544 but this is true in journals like coding with 1433 00:42:32,544 --> 00:42:35,260 others or just developing with others. 1434 00:42:35,260 --> 00:42:37,480 So there's a lot of good resources for that. 1435 00:42:37,480 --> 00:42:38,485 I don't have those, 1436 00:42:38,485 --> 00:42:39,549 particularly on the slide, 1437 00:42:39,549 --> 00:42:41,199 but I'll I'll put these in the notes 1438 00:42:41,199 --> 00:42:42,880 and then I'll just speak to them right now, 1439 00:42:42,880 --> 00:42:45,339 which is that like there are a lot of 1440 00:42:45,339 --> 00:42:48,640 good a lot of good online resources. 1441 00:42:48,640 --> 00:42:49,750 There's, there's some different, 1442 00:42:49,750 --> 00:42:53,335 there's things like replica that is. 1443 00:42:53,335 --> 00:42:55,329 Here I'll open Sundays when I'm 1444 00:42:55,329 --> 00:42:56,439 talking about There's things like rap 1445 00:42:56,439 --> 00:43:00,519 let that are like a browser, 1446 00:43:00,519 --> 00:43:03,129 browser based coding that let you 1447 00:43:03,129 --> 00:43:07,615 do like that also let you do collaboration. 1448 00:43:07,615 --> 00:43:09,699 There's things like, I think 1449 00:43:09,699 --> 00:43:14,599 it's called glitch there. 1450 00:43:14,910 --> 00:43:21,219 Which lets you do. Yeah, This 1451 00:43:21,219 --> 00:43:22,510 is, yeah, there's calculus. 1452 00:43:22,510 --> 00:43:24,025 So this, this is more, 1453 00:43:24,025 --> 00:43:25,719 this is, this is glitches, 1454 00:43:25,719 --> 00:43:29,349 a platform that has a lot of interactives and 1455 00:43:29,349 --> 00:43:32,649 like collaborative tutorials and 1456 00:43:32,649 --> 00:43:34,629 you can work with people. 1457 00:43:34,629 --> 00:43:37,509 And so there's definitely 1458 00:43:37,509 --> 00:43:41,049 like easy to use tool specifically. 1459 00:43:41,049 --> 00:43:42,699 And then as far as kind of like the 1460 00:43:42,699 --> 00:43:44,559 organizational and the social part of it, 1461 00:43:44,559 --> 00:43:46,120 it's like even though 1462 00:43:46,120 --> 00:43:47,994 the pandemic is scrambled a lot of things, 1463 00:43:47,994 --> 00:43:49,899 the fact that everybody has really 1464 00:43:49,899 --> 00:43:52,854 embraced sort of virtual meetings 1465 00:43:52,854 --> 00:43:54,130 kind of makes it easier. 1466 00:43:54,130 --> 00:43:56,424 It's like, you know, 1467 00:43:56,424 --> 00:43:58,525 if you definitely like, 1468 00:43:58,525 --> 00:43:59,830 the thing is it's it can be 1469 00:43:59,830 --> 00:44:00,939 overwhelming when you're when 1470 00:44:00,939 --> 00:44:02,109 you're trying to break into tech. 1471 00:44:02,109 --> 00:44:03,219 Because a lot of times that it 1472 00:44:03,219 --> 00:44:04,720 feels sort of lonely but definitely 1473 00:44:04,720 --> 00:44:06,039 take advantage or make use 1474 00:44:06,039 --> 00:44:07,420 of your, you know, your, 1475 00:44:07,420 --> 00:44:10,150 your social resources that's like fight, 1476 00:44:10,150 --> 00:44:10,959 find those people in 1477 00:44:10,959 --> 00:44:12,249 your classes that are interested in 1478 00:44:12,249 --> 00:44:13,629 the same thing and find a project 1479 00:44:13,629 --> 00:44:15,025 to work on together. 1480 00:44:15,025 --> 00:44:15,940 You know, that's something that 1481 00:44:15,940 --> 00:44:16,809 a lot of times happens 1482 00:44:16,809 --> 00:44:19,210 and some science or computer science, I mean, 1483 00:44:19,210 --> 00:44:20,890 I didn't number of kind of projects that 1484 00:44:20,890 --> 00:44:24,340 were group projects and that's really useful. 1485 00:44:24,340 --> 00:44:25,540 Sometimes, sometimes you have to 1486 00:44:25,540 --> 00:44:26,859 organize that a little bit better yourself. 1487 00:44:26,859 --> 00:44:28,150 But then with a lot 1488 00:44:28,150 --> 00:44:29,290 of these online platforms that 1489 00:44:29,290 --> 00:44:30,700 makes it even easier because even if you 1490 00:44:30,700 --> 00:44:31,360 don't know somebody, 1491 00:44:31,360 --> 00:44:32,170 if you're same department, 1492 00:44:32,170 --> 00:44:33,369 you can go find somebody 1493 00:44:33,369 --> 00:44:34,749 that wants to team up 1494 00:44:34,749 --> 00:44:37,824 on on a on a glitch project or whatever. 1495 00:44:37,824 --> 00:44:39,970 So yes, it's doing it that way. 1496 00:44:39,970 --> 00:44:41,080 And the other advantage of 1497 00:44:41,080 --> 00:44:42,639 that is that simulates that 1498 00:44:42,639 --> 00:44:44,049 the experience that you're 1499 00:44:44,049 --> 00:44:44,920 going to have when you 1500 00:44:44,920 --> 00:44:45,759 get out there and start 1501 00:44:45,759 --> 00:44:47,154 working in the real world, 1502 00:44:47,154 --> 00:44:48,729 you know, like when you're working 1503 00:44:48,729 --> 00:44:50,259 for a company and particularly tech company, 1504 00:44:50,259 --> 00:44:51,459 you're probably going to be working with 1505 00:44:51,459 --> 00:44:52,270 a team that's trying 1506 00:44:52,270 --> 00:44:53,649 to solve problems together. 1507 00:44:53,649 --> 00:44:55,449 So having some of that experience, 1508 00:44:55,449 --> 00:44:57,459 that collaborative experience before you go 1509 00:44:57,459 --> 00:45:00,470 into that environment is invaluable. 1510 00:45:05,810 --> 00:45:09,029 Other questions please, guys, 1511 00:45:09,029 --> 00:45:10,169 inviting us to chime in. 1512 00:45:10,169 --> 00:45:12,669 If we add thoughts or questions. 1513 00:45:14,450 --> 00:45:17,294 I have a question regarding, 1514 00:45:17,294 --> 00:45:18,435 I guess likes 1515 00:45:18,435 --> 00:45:23,340 skill set and pathway landscape, I suppose. 1516 00:45:23,340 --> 00:45:25,560 Like, are you seeing that 1517 00:45:25,560 --> 00:45:28,260 there's a lot of diversification 1518 00:45:28,260 --> 00:45:30,269 and specialization in 1519 00:45:30,269 --> 00:45:33,945 the code development data environment. 1520 00:45:33,945 --> 00:45:35,459 Like are you seeing a lot of people 1521 00:45:35,459 --> 00:45:38,550 who were Web developers transition to? 1522 00:45:38,550 --> 00:45:41,310 I had an engineering for are you seeing like 1523 00:45:41,310 --> 00:45:43,329 tech biologists transferring 1524 00:45:43,329 --> 00:45:45,220 directly to data engineering? 1525 00:45:45,220 --> 00:45:48,039 Is it's a, it's a mix. 1526 00:45:48,039 --> 00:45:50,380 I would say that there are, 1527 00:45:50,380 --> 00:45:52,435 there are a lot of people. 1528 00:45:52,435 --> 00:45:53,350 It cuts both ways 1529 00:45:53,350 --> 00:45:54,924 with data engineering because, 1530 00:45:54,924 --> 00:45:56,799 because it's a new role and 1531 00:45:56,799 --> 00:45:58,690 a lot of people are finding out about it. 1532 00:45:58,690 --> 00:46:00,729 There definitely are people 1533 00:46:00,729 --> 00:46:02,439 like there's a lot of 1534 00:46:02,439 --> 00:46:04,750 people that are coming from other fields, 1535 00:46:04,750 --> 00:46:06,759 like either either data scientists, 1536 00:46:06,759 --> 00:46:08,079 people like me who've been doing 1537 00:46:08,079 --> 00:46:09,955 the data science thing but have 1538 00:46:09,955 --> 00:46:11,710 interest in like a different part of 1539 00:46:11,710 --> 00:46:13,794 the tech stack or just in a different, 1540 00:46:13,794 --> 00:46:16,660 like a role, a different set of interfaces. 1541 00:46:16,660 --> 00:46:18,515 I would say that it's, it, 1542 00:46:18,515 --> 00:46:20,529 it is probably more people that are coming 1543 00:46:20,529 --> 00:46:23,125 to it with some existing tech background. 1544 00:46:23,125 --> 00:46:25,179 Because like 1545 00:46:25,179 --> 00:46:26,439 I said, with, with data engineering, 1546 00:46:26,439 --> 00:46:28,719 there's, it's, it, is it a little bit, 1547 00:46:28,719 --> 00:46:30,279 There's a little bit harder. 1548 00:46:30,279 --> 00:46:32,079 There's just not as well, but as well, 1549 00:46:32,079 --> 00:46:34,194 I'm a defined path yet at the, 1550 00:46:34,194 --> 00:46:35,589 at the entry level. 1551 00:46:35,589 --> 00:46:36,639 But then again that, that, 1552 00:46:36,639 --> 00:46:38,754 that in itself creates opportunities. 1553 00:46:38,754 --> 00:46:40,359 Because even though there are a lot of 1554 00:46:40,359 --> 00:46:42,144 tools to learn at the same time, 1555 00:46:42,144 --> 00:46:44,979 the companies are very aware of that. 1556 00:46:44,979 --> 00:46:47,574 And like when I've talked to recruiters, 1557 00:46:47,574 --> 00:46:49,389 a lot of times when you look 1558 00:46:49,389 --> 00:46:52,074 at when you look at, 1559 00:46:52,074 --> 00:46:53,740 and I'll show you this example here. 1560 00:46:53,740 --> 00:46:55,240 It's like if you look at 1561 00:46:55,240 --> 00:47:06,054 job postings, you know, scalable. 1562 00:47:06,054 --> 00:47:13,630 So if you look at job postings, 1563 00:47:13,630 --> 00:47:15,669 I mean this is just one example. 1564 00:47:15,669 --> 00:47:18,354 And then indeed, 1565 00:47:18,354 --> 00:47:20,469 for remote data engineer 1566 00:47:20,469 --> 00:47:22,135 is like strong Siegel skills. 1567 00:47:22,135 --> 00:47:22,929 These people are always 1568 00:47:22,929 --> 00:47:24,130 saying the Microsoft side 1569 00:47:24,130 --> 00:47:25,989 because it's Power BI, 1570 00:47:25,989 --> 00:47:28,690 microsoft SQL and Azure 1571 00:47:28,690 --> 00:47:31,389 is Microsoft's Cloud platform. 1572 00:47:31,389 --> 00:47:33,730 You know, technical knowledge of 1573 00:47:33,730 --> 00:47:35,049 ETL systems visualization 1574 00:47:35,049 --> 00:47:37,045 and business intelligence solutions. 1575 00:47:37,045 --> 00:47:39,070 This is one of the, this is one of 1576 00:47:39,070 --> 00:47:41,170 the kinda less specific job posting. 1577 00:47:41,170 --> 00:47:42,699 Sometimes you see them that have long, 1578 00:47:42,699 --> 00:47:44,920 long list of particular tech skills. 1579 00:47:44,920 --> 00:47:46,089 But then my experience has been 1580 00:47:46,089 --> 00:47:47,350 talking to like recruiters and 1581 00:47:47,350 --> 00:47:48,940 people are hiring is that 1582 00:47:48,940 --> 00:47:50,409 a lot of times, like, 1583 00:47:50,409 --> 00:47:52,179 especially with a lot of unfilled data 1584 00:47:52,179 --> 00:47:53,500 engineering positions now is 1585 00:47:53,500 --> 00:47:56,079 that there's a lot more flexibility 1586 00:47:56,079 --> 00:47:58,134 around like individual tools. 1587 00:47:58,134 --> 00:47:59,379 And they're just trying to find 1588 00:47:59,379 --> 00:48:00,850 people who have like in general, 1589 00:48:00,850 --> 00:48:04,390 a big picture understanding 1590 00:48:04,390 --> 00:48:05,529 of like how a lot of 1591 00:48:05,529 --> 00:48:08,290 the tools and techniques fit together. 1592 00:48:08,290 --> 00:48:10,149 And I think this is where Cloud is 1593 00:48:10,149 --> 00:48:11,409 important because if you have like 1594 00:48:11,409 --> 00:48:13,644 a good understanding of 1595 00:48:13,644 --> 00:48:17,365 cloud architecture and you know how, 1596 00:48:17,365 --> 00:48:19,509 how, how software works 1597 00:48:19,509 --> 00:48:20,575 in the Cloud environment. 1598 00:48:20,575 --> 00:48:21,745 That, that can definitely, 1599 00:48:21,745 --> 00:48:22,854 that can definitely really, 1600 00:48:22,854 --> 00:48:23,830 really set you apart. 1601 00:48:23,830 --> 00:48:24,639 And there's a lot of ways 1602 00:48:24,639 --> 00:48:25,569 is more important than 1603 00:48:25,569 --> 00:48:29,515 any particular choice of vendor or tools. 1604 00:48:29,515 --> 00:48:31,359 So I would say, 1605 00:48:31,359 --> 00:48:33,280 I would say we see people from, 1606 00:48:33,280 --> 00:48:34,630 from a lot of different backgrounds. 1607 00:48:34,630 --> 00:48:35,890 I mean, their depth there definitely 1608 00:48:35,890 --> 00:48:37,449 are new peoples attack that 1609 00:48:37,449 --> 00:48:39,789 or they're getting into 1610 00:48:39,789 --> 00:48:41,245 it and finding out about it. 1611 00:48:41,245 --> 00:48:42,670 But there's also a lot of 1612 00:48:42,670 --> 00:48:43,960 people who have been in 1613 00:48:43,960 --> 00:48:46,629 other roles as developers, 1614 00:48:46,629 --> 00:48:50,695 as data scientists, as analysts. 1615 00:48:50,695 --> 00:48:52,269 I mean, you used to see the term 1616 00:48:52,269 --> 00:48:54,010 like Database Developer, 1617 00:48:54,010 --> 00:48:55,660 database administrator a lot 1618 00:48:55,660 --> 00:48:57,160 more often and now 1619 00:48:57,160 --> 00:48:58,930 the data engineer role 1620 00:48:58,930 --> 00:49:02,244 incorporates some, some of that. 1621 00:49:02,244 --> 00:49:04,974 So we see people from, 1622 00:49:04,974 --> 00:49:07,090 from a lot of different skill levels and 1623 00:49:07,090 --> 00:49:09,174 a lot of different roles crossing over. 1624 00:49:09,174 --> 00:49:11,469 Partly just because it is, You know, 1625 00:49:11,469 --> 00:49:12,969 it's, It's, it's a 1626 00:49:12,969 --> 00:49:14,545 little bit less well-defined. 1627 00:49:14,545 --> 00:49:16,719 And even though there are a lot of skills, 1628 00:49:16,719 --> 00:49:19,000 there's there's not enough people 1629 00:49:19,000 --> 00:49:20,155 to fill the positions. 1630 00:49:20,155 --> 00:49:21,265 So at this point, 1631 00:49:21,265 --> 00:49:22,645 a lot of companies are willing, 1632 00:49:22,645 --> 00:49:24,489 even if you don't have every single skill, 1633 00:49:24,489 --> 00:49:27,279 if you have a pretty good understanding 1634 00:49:27,279 --> 00:49:29,139 of just the like how the, 1635 00:49:29,139 --> 00:49:31,610 how the pieces fit together. 1636 00:49:32,250 --> 00:49:34,269 They're willing to hire you just 1637 00:49:34,269 --> 00:49:35,694 because they've been trying to fill 1638 00:49:35,694 --> 00:49:36,879 a data engineer role for 1639 00:49:36,879 --> 00:49:39,354 three months and they could get five people. 1640 00:49:39,354 --> 00:49:41,649 So it's even, even though there are a lot of 1641 00:49:41,649 --> 00:49:43,735 tools potentially to learn, 1642 00:49:43,735 --> 00:49:44,980 there's also a lot of 1643 00:49:44,980 --> 00:49:47,659 opportunities just because of the demand. 1644 00:49:51,900 --> 00:49:55,970 Anymore questions, feel free. 1645 00:50:01,590 --> 00:50:03,250 We're going to go ahead and 1646 00:50:03,250 --> 00:50:04,300 show us your little demo 1647 00:50:04,300 --> 00:50:06,519 of somebody who's in Yeah. 1648 00:50:06,519 --> 00:50:08,139 Question. Yeah. If nobody 1649 00:50:08,139 --> 00:50:10,450 else has a question, I got one more for you. 1650 00:50:10,450 --> 00:50:12,970 So in terms of 1651 00:50:12,970 --> 00:50:15,985 the work landscape and the jobs landscape, 1652 00:50:15,985 --> 00:50:18,850 are you seeing data engineering be sort 1653 00:50:18,850 --> 00:50:23,439 of siloed or departmentalize? 1654 00:50:23,439 --> 00:50:25,600 Whereas is a kind of a jack 1655 00:50:25,600 --> 00:50:27,550 of all trades skill where every company 1656 00:50:27,550 --> 00:50:28,750 is trying to fill 1657 00:50:28,750 --> 00:50:32,980 this new role and it's pretty amorphous. 1658 00:50:32,980 --> 00:50:34,365 Yeah, definitely more. 1659 00:50:34,365 --> 00:50:37,570 The latter, I mean, I 1660 00:50:37,570 --> 00:50:38,980 would say that actually. 1661 00:50:38,980 --> 00:50:40,254 And this is part of why I like to 1662 00:50:40,254 --> 00:50:42,070 data engineering is it's probably one of 1663 00:50:42,070 --> 00:50:43,810 the least siloed positions 1664 00:50:43,810 --> 00:50:45,790 and tech because I mean, 1665 00:50:45,790 --> 00:50:46,930 one of the one thing I 1666 00:50:46,930 --> 00:50:48,340 haven't set explicitly yet, 1667 00:50:48,340 --> 00:50:49,869 but that is good 1668 00:50:49,869 --> 00:50:51,160 to know about being a data engineer. 1669 00:50:51,160 --> 00:50:52,299 Is it like beside any 1670 00:50:52,299 --> 00:50:55,899 particular school skill or, 1671 00:50:55,899 --> 00:50:59,020 or like any particular technical requirement? 1672 00:50:59,020 --> 00:51:00,280 I mean, one of 1673 00:51:00,280 --> 00:51:01,539 the biggest responsibilities of 1674 00:51:01,539 --> 00:51:03,655 a data engineer is that as a data engineer, 1675 00:51:03,655 --> 00:51:05,905 you are the person in an organization, 1676 00:51:05,905 --> 00:51:07,600 like in the big picture that is most 1677 00:51:07,600 --> 00:51:09,879 responsible for organizations data from 1678 00:51:09,879 --> 00:51:13,659 ingestion to storage and processing 1679 00:51:13,659 --> 00:51:15,954 to delivering that to 1680 00:51:15,954 --> 00:51:18,069 internal or external users. 1681 00:51:18,069 --> 00:51:20,920 So because of the nature of the role, 1682 00:51:20,920 --> 00:51:23,440 you interface with developers, 1683 00:51:23,440 --> 00:51:24,775 data scientists, with 1684 00:51:24,775 --> 00:51:26,395 different kinds of managers. 1685 00:51:26,395 --> 00:51:27,039 I mean, you pretty 1686 00:51:27,039 --> 00:51:28,150 much interfaced with almost 1687 00:51:28,150 --> 00:51:28,329 all of 1688 00:51:28,329 --> 00:51:30,295 the different technical people at some point. 1689 00:51:30,295 --> 00:51:32,470 So that's something to keep in 1690 00:51:32,470 --> 00:51:33,340 mind because a lot 1691 00:51:33,340 --> 00:51:34,464 of people wouldn't like that. 1692 00:51:34,464 --> 00:51:35,875 A lot of people like being 1693 00:51:35,875 --> 00:51:38,019 more in their in 1694 00:51:38,019 --> 00:51:39,940 their one area that they work in. 1695 00:51:39,940 --> 00:51:41,514 But for people who are, 1696 00:51:41,514 --> 00:51:43,915 as you said, we're kinda jack of all trades. 1697 00:51:43,915 --> 00:51:47,034 And people who don't like to just live in, 1698 00:51:47,034 --> 00:51:48,834 you know, in one place all the time. 1699 00:51:48,834 --> 00:51:50,409 The data engineering role is great 1700 00:51:50,409 --> 00:51:53,529 because it does involve having, 1701 00:51:53,529 --> 00:51:54,730 having an understanding of 1702 00:51:54,730 --> 00:51:56,635 a lot of different too. 1703 00:51:56,635 --> 00:51:58,000 Is it the tech and business 1704 00:51:58,000 --> 00:51:59,874 landscape and being able to, 1705 00:51:59,874 --> 00:52:02,319 being able to have substantive conversations 1706 00:52:02,319 --> 00:52:06,639 with technical people in different areas, 1707 00:52:06,639 --> 00:52:10,689 but also with decision-makers 1708 00:52:10,689 --> 00:52:11,860 and people who might be less 1709 00:52:11,860 --> 00:52:13,315 technical and might be more concerned, 1710 00:52:13,315 --> 00:52:16,044 just about like business goals. 1711 00:52:16,044 --> 00:52:19,465 So it's, yeah, it's definitely more of it, 1712 00:52:19,465 --> 00:52:20,920 more of a jack of all trades role, 1713 00:52:20,920 --> 00:52:23,530 but it's something that in some ways 1714 00:52:23,530 --> 00:52:26,409 that's a plus because as you say you're not, 1715 00:52:26,409 --> 00:52:27,715 you're not a siloed. 1716 00:52:27,715 --> 00:52:29,559 You can get, you know, get 1717 00:52:29,559 --> 00:52:31,240 out of get out of fear. 1718 00:52:31,240 --> 00:52:33,070 Your narrow, you're near a little window. 1719 00:52:33,070 --> 00:52:34,299 So yeah, it's sort of 1720 00:52:34,299 --> 00:52:35,755 some of that is just a matter of taste. 1721 00:52:35,755 --> 00:52:41,049 Yeah. And then I 1722 00:52:41,049 --> 00:52:42,550 guess I have one more quick Yeah. 1723 00:52:42,550 --> 00:52:45,099 Go for it. Plan about job progression. 1724 00:52:45,099 --> 00:52:46,165 Sorry, I got a lot of questions. 1725 00:52:46,165 --> 00:52:47,845 I know it's I this is awesome. 1726 00:52:47,845 --> 00:52:49,510 What is kind of the 1727 00:52:49,510 --> 00:52:52,734 career progression track looking like? 1728 00:52:52,734 --> 00:52:54,490 Like is Data Engineering 1729 00:52:54,490 --> 00:52:55,749 looking like a transfer 1730 00:52:55,749 --> 00:52:59,109 to a CIO level position and companies? 1731 00:52:59,109 --> 00:53:00,490 Or that 1732 00:53:00,490 --> 00:53:01,749 this is actually a great question too. 1733 00:53:01,749 --> 00:53:03,520 So one thing I should say, 1734 00:53:03,520 --> 00:53:05,454 one thing I should say in general 1735 00:53:05,454 --> 00:53:07,779 is the long-term outlook. 1736 00:53:07,779 --> 00:53:08,350 You know, there's a lot 1737 00:53:08,350 --> 00:53:09,805 of fats attack, right? 1738 00:53:09,805 --> 00:53:10,929 But the long-term outlook for 1739 00:53:10,929 --> 00:53:12,684 data engineering is really solid 1740 00:53:12,684 --> 00:53:14,860 because theta companies aren't 1741 00:53:14,860 --> 00:53:16,555 going to have less data anytime soon. 1742 00:53:16,555 --> 00:53:17,860 I mean, the trends that we're seeing, 1743 00:53:17,860 --> 00:53:19,074 I think we can expect these. 1744 00:53:19,074 --> 00:53:21,175 I think these are really structural trends. 1745 00:53:21,175 --> 00:53:21,999 To speak more into 1746 00:53:21,999 --> 00:53:24,024 the specifics of your question. 1747 00:53:24,024 --> 00:53:26,724 So the path, as I said, like on the, 1748 00:53:26,724 --> 00:53:28,809 on the sort of introductory level 1749 00:53:28,809 --> 00:53:30,264 is a little less well-defined. 1750 00:53:30,264 --> 00:53:32,619 If you're a web developer 1751 00:53:32,619 --> 00:53:34,284 or a software engineer, 1752 00:53:34,284 --> 00:53:35,829 some of these other roles, There's a little 1753 00:53:35,829 --> 00:53:38,289 bit better, like there's, 1754 00:53:38,289 --> 00:53:39,880 there's more internships and 1755 00:53:39,880 --> 00:53:41,335 sort of low-level positions 1756 00:53:41,335 --> 00:53:42,400 that are a little bit more 1757 00:53:42,400 --> 00:53:43,450 easily available for 1758 00:53:43,450 --> 00:53:44,620 people that are just getting 1759 00:53:44,620 --> 00:53:45,849 started to then work their way 1760 00:53:45,849 --> 00:53:47,424 up into that role. 1761 00:53:47,424 --> 00:53:49,839 So that data engineering, 1762 00:53:49,839 --> 00:53:51,700 you don't see like 1763 00:53:51,700 --> 00:53:53,259 What we say on our slide about 1764 00:53:53,259 --> 00:53:54,760 like there's no entry level positions. 1765 00:53:54,760 --> 00:53:56,349 Obviously people have to start somewhere. 1766 00:53:56,349 --> 00:53:57,610 It's just that that kind of 1767 00:53:57,610 --> 00:53:59,080 ladder is not as well. 1768 00:53:59,080 --> 00:54:00,250 You know, a lot of companies, it's like 1769 00:54:00,250 --> 00:54:01,690 they have for their, 1770 00:54:01,690 --> 00:54:03,084 for their software engineers, 1771 00:54:03,084 --> 00:54:04,345 software developers. 1772 00:54:04,345 --> 00:54:06,160 They have QA people and then they 1773 00:54:06,160 --> 00:54:09,370 have multiple different levels of developers. 1774 00:54:09,370 --> 00:54:10,359 Up to management. 1775 00:54:10,359 --> 00:54:11,514 You know, that like 1776 00:54:11,514 --> 00:54:14,169 that hierarchy is better defined. 1777 00:54:14,169 --> 00:54:17,335 There's, there's definitely, you know, 1778 00:54:17,335 --> 00:54:19,255 Junior and medium and sort of 1779 00:54:19,255 --> 00:54:22,060 senior level data engineers. 1780 00:54:22,060 --> 00:54:23,560 And then I think, 1781 00:54:23,560 --> 00:54:27,099 you know from the from the, 1782 00:54:27,099 --> 00:54:29,290 from the from the senior level, 1783 00:54:29,290 --> 00:54:30,369 I think that the next 1784 00:54:30,369 --> 00:54:31,509 progression from that, yeah, 1785 00:54:31,509 --> 00:54:36,479 it would be some sort of like yeah, 1786 00:54:36,479 --> 00:54:38,985 tech related executive position. 1787 00:54:38,985 --> 00:54:42,299 And but yeah, so and another let's see, 1788 00:54:42,299 --> 00:54:43,229 one more thing I would I 1789 00:54:43,229 --> 00:54:44,339 would say an answer to 1790 00:54:44,339 --> 00:54:48,494 that question is just another way. 1791 00:54:48,494 --> 00:54:50,415 On the other side, 1792 00:54:50,415 --> 00:54:52,799 another way that I 1793 00:54:52,799 --> 00:54:53,820 AND logic block I know that 1794 00:54:53,820 --> 00:54:54,840 have had success with this. 1795 00:54:54,840 --> 00:54:58,290 Just like to get some hands-on experience. 1796 00:54:58,290 --> 00:55:01,334 Just to get sort of a job. 1797 00:55:01,334 --> 00:55:03,209 Track is like, Well, system science. 1798 00:55:03,209 --> 00:55:05,069 System science is great. I wait, 1799 00:55:05,069 --> 00:55:05,850 I remember when we did 1800 00:55:05,850 --> 00:55:07,635 that Daimler project for dime where 1801 00:55:07,635 --> 00:55:09,104 when we're taking 1802 00:55:09,104 --> 00:55:11,805 the discrete system simulation class, 1803 00:55:11,805 --> 00:55:13,619 that kind of thing, it's great, but also 1804 00:55:13,619 --> 00:55:14,849 just if there's accompany 1805 00:55:14,849 --> 00:55:15,750 the Earth-Sun working in, 1806 00:55:15,750 --> 00:55:16,320 do you do 1807 00:55:16,320 --> 00:55:19,929 a little bit eerie contact somebody about? 1808 00:55:19,929 --> 00:55:22,914 Just say, Hey, like I'm learning and 1809 00:55:22,914 --> 00:55:24,670 I'm just like a kind of 1810 00:55:24,670 --> 00:55:25,750 a hands-on project to work 1811 00:55:25,750 --> 00:55:27,145 within some real data. 1812 00:55:27,145 --> 00:55:28,749 And if you're willing to just 1813 00:55:28,749 --> 00:55:30,189 put a little bit of time in that way, 1814 00:55:30,189 --> 00:55:31,539 but you can demonstrate your skills. 1815 00:55:31,539 --> 00:55:33,279 I mean, that's that's also a really, 1816 00:55:33,279 --> 00:55:33,970 really great way to 1817 00:55:33,970 --> 00:55:35,200 get your foot in the door. 1818 00:55:35,200 --> 00:55:36,879 So as I said with 1819 00:55:36,879 --> 00:55:38,814 data engineering that it's a little bit, 1820 00:55:38,814 --> 00:55:41,420 the path is a little bit less well-defined. 1821 00:55:41,420 --> 00:55:43,180 But, but on the other side, 1822 00:55:43,180 --> 00:55:45,339 that creates opportunities because 1823 00:55:45,339 --> 00:55:47,664 there's not as much of a strictly 1824 00:55:47,664 --> 00:55:50,259 like canonical resume progression 1825 00:55:50,259 --> 00:55:51,520 that people expect to see with 1826 00:55:51,520 --> 00:55:53,034 that, with data engineers. 1827 00:55:53,034 --> 00:55:54,100 You know, it's like if you're 1828 00:55:54,100 --> 00:55:55,420 in the development world 1829 00:55:55,420 --> 00:55:56,980 and you haven't followed that, 1830 00:55:56,980 --> 00:55:59,349 that, that path as closely, 1831 00:55:59,349 --> 00:56:00,385 a lot of people won't even 1832 00:56:00,385 --> 00:56:01,359 look at your resume. 1833 00:56:01,359 --> 00:56:02,829 Whereas with data engineering, 1834 00:56:02,829 --> 00:56:06,249 there's an understanding that, you know, 1835 00:56:06,249 --> 00:56:07,210 everybody's taken 1836 00:56:07,210 --> 00:56:08,499 a different path to get there, 1837 00:56:08,499 --> 00:56:10,689 especially at this early phase so that, 1838 00:56:10,689 --> 00:56:12,444 that in itself creates 1839 00:56:12,444 --> 00:56:14,365 an extra set of opportunities. 1840 00:56:14,365 --> 00:56:16,090 Most applicants won't already know 1841 00:56:16,090 --> 00:56:18,159 everything they need to know exactly. 1842 00:56:18,159 --> 00:56:19,090 There's, there's much more 1843 00:56:19,090 --> 00:56:20,049 of an understanding that 1844 00:56:20,049 --> 00:56:22,299 like anybody we bring into this organization. 1845 00:56:22,299 --> 00:56:23,889 And the thing is the tool landscape is 1846 00:56:23,889 --> 00:56:25,929 like so broad and complex that like even, 1847 00:56:25,929 --> 00:56:27,369 even at the senior level, 1848 00:56:27,369 --> 00:56:28,659 I've had a discussion with 1849 00:56:28,659 --> 00:56:29,709 like recruiters and hiring 1850 00:56:29,709 --> 00:56:31,104 people or it's like, 1851 00:56:31,104 --> 00:56:33,519 you know, even at the senior level, 1852 00:56:33,519 --> 00:56:34,660 it's like they're not going to fight. 1853 00:56:34,660 --> 00:56:35,739 Especially maybe at senior 1854 00:56:35,739 --> 00:56:36,850 levels like they're not going to find 1855 00:56:36,850 --> 00:56:39,324 somebody who probably has 1856 00:56:39,324 --> 00:56:42,729 every single exact skill 1857 00:56:42,729 --> 00:56:43,660 that they're looking for. 1858 00:56:43,660 --> 00:56:45,039 So there's a lot more willingness 1859 00:56:45,039 --> 00:56:46,750 to say, like I said, 1860 00:56:46,750 --> 00:56:48,790 as long as there's some familiarity 1861 00:56:48,790 --> 00:56:49,990 with the exact tools we're using. 1862 00:56:49,990 --> 00:56:50,800 But then the person has 1863 00:56:50,800 --> 00:56:51,999 a big picture understanding 1864 00:56:51,999 --> 00:56:54,415 of how these systems work together. 1865 00:56:54,415 --> 00:56:56,739 And not just the technical systems 1866 00:56:56,739 --> 00:56:58,090 and the data processing systems, 1867 00:56:58,090 --> 00:56:58,644 but the kind of 1868 00:56:58,644 --> 00:57:00,955 overall business environment system 1869 00:57:00,955 --> 00:57:02,529 and the different how 1870 00:57:02,529 --> 00:57:04,150 the different departments interact. 1871 00:57:04,150 --> 00:57:05,379 Because that's another big piece 1872 00:57:05,379 --> 00:57:06,100 of the data engineering. 1873 00:57:06,100 --> 00:57:07,810 It's, it's like it's not, you're not even, 1874 00:57:07,810 --> 00:57:09,339 you're not necessarily just talking 1875 00:57:09,339 --> 00:57:11,710 about data processing system. 1876 00:57:11,710 --> 00:57:12,759 It's like that, that has 1877 00:57:12,759 --> 00:57:14,349 a lot of different components because you're 1878 00:57:14,349 --> 00:57:16,119 probably sharing data 1879 00:57:16,119 --> 00:57:17,934 from from different departments. 1880 00:57:17,934 --> 00:57:19,180 So then there's there's 1881 00:57:19,180 --> 00:57:21,010 organizational considerations 1882 00:57:21,010 --> 00:57:22,360 and those sorts of things. 1883 00:57:22,360 --> 00:57:24,010 But yeah, when you make 1884 00:57:24,010 --> 00:57:26,200 a good point that like it's there's 1885 00:57:26,200 --> 00:57:27,790 a lot less expectation of you having 1886 00:57:27,790 --> 00:57:28,899 every single skill on 1887 00:57:28,899 --> 00:57:30,220 the list and that makes it, 1888 00:57:30,220 --> 00:57:31,540 that makes it a little bit where 1889 00:57:31,540 --> 00:57:32,949 he can actually, yes. 1890 00:57:32,949 --> 00:57:34,359 I wanted to just draw 1891 00:57:34,359 --> 00:57:35,919 your attention to the comment from 1892 00:57:35,919 --> 00:57:37,510 Kathleen and God are 1893 00:57:37,510 --> 00:57:38,095 you probably didn't get 1894 00:57:38,095 --> 00:57:38,920 chance to look at that, 1895 00:57:38,920 --> 00:57:41,185 but if you could pop up that oh, yes, sorry. 1896 00:57:41,185 --> 00:57:41,650 You'll see there's 1897 00:57:41,650 --> 00:57:42,879 a real interesting comments 1898 00:57:42,879 --> 00:57:44,770 sort of trying to sort of give a 1899 00:57:44,770 --> 00:57:46,435 little more distinction between 1900 00:57:46,435 --> 00:57:48,099 what data scientists might be having to 1901 00:57:48,099 --> 00:57:49,660 deal with vs. didn't ears 1902 00:57:49,660 --> 00:57:52,459 and probably will resonate for you a bit. 1903 00:57:53,850 --> 00:57:55,420 Yeah. 1904 00:57:55,420 --> 00:57:56,664 Catherine, This is great. 1905 00:57:56,664 --> 00:57:58,780 I mean, this is and you bring up a couple of 1906 00:57:58,780 --> 00:58:00,039 specific categories 1907 00:58:00,039 --> 00:58:01,209 that I haven't mentioned here, 1908 00:58:01,209 --> 00:58:02,950 but this is the big the big picture. 1909 00:58:02,950 --> 00:58:04,570 This is the exact same point is like, Yeah, 1910 00:58:04,570 --> 00:58:06,819 Catherine saying, doing 1911 00:58:06,819 --> 00:58:08,410 some data engineering, also, 1912 00:58:08,410 --> 00:58:09,640 some project management, 1913 00:58:09,640 --> 00:58:11,170 coordinating vendors, 1914 00:58:11,170 --> 00:58:14,560 trying to communicate modelling ideas 1915 00:58:14,560 --> 00:58:16,509 for managers and exact, Yeah, absolutely. 1916 00:58:16,509 --> 00:58:17,185 All of the above. 1917 00:58:17,185 --> 00:58:18,670 I mean, this is that 1918 00:58:18,670 --> 00:58:20,544 there's a long list of things that, 1919 00:58:20,544 --> 00:58:21,880 that you might, might do 1920 00:58:21,880 --> 00:58:23,469 as a data engineer and Catherine, 1921 00:58:23,469 --> 00:58:24,820 I like your 0.1 of those is 1922 00:58:24,820 --> 00:58:27,429 just the communication element because 1923 00:58:27,429 --> 00:58:30,190 it's like something that even, 1924 00:58:30,190 --> 00:58:32,109 even not at the senior level 1925 00:58:32,109 --> 00:58:33,475 is as a data engineer, I mean, 1926 00:58:33,475 --> 00:58:34,629 something that you're going to be doing 1927 00:58:34,629 --> 00:58:36,250 a lot is communicating 1928 00:58:36,250 --> 00:58:37,750 technical concepts to other people 1929 00:58:37,750 --> 00:58:39,640 who might have a technical background, 1930 00:58:39,640 --> 00:58:40,990 but it's in a different area. 1931 00:58:40,990 --> 00:58:42,939 Or maybe two people who 1932 00:58:42,939 --> 00:58:44,320 maybe to technical managers 1933 00:58:44,320 --> 00:58:45,459 that have some technical background, 1934 00:58:45,459 --> 00:58:46,150 but then maybe to 1935 00:58:46,150 --> 00:58:47,679 like executives and decision-makers 1936 00:58:47,679 --> 00:58:48,820 who really don't know that 1937 00:58:48,820 --> 00:58:50,185 much about the technology. 1938 00:58:50,185 --> 00:58:51,564 So there's a lot of 1939 00:58:51,564 --> 00:58:54,700 communication to people who are like 1940 00:58:54,700 --> 00:58:56,140 different points in their 1941 00:58:56,140 --> 00:58:58,030 technical advancement 1942 00:58:58,030 --> 00:58:59,214 or technical understanding. 1943 00:58:59,214 --> 00:59:01,659 And so that's that's 1944 00:59:01,659 --> 00:59:03,639 something that is very much, 1945 00:59:03,639 --> 00:59:05,649 very much a part of the job. 1946 00:59:05,649 --> 00:59:08,800 So we are technically out of time. 1947 00:59:08,800 --> 00:59:10,359 And Guy, if you want. 1948 00:59:10,359 --> 00:59:11,755 To spend a few more minutes, 1949 00:59:11,755 --> 00:59:12,880 I'm kind of invite people will 1950 00:59:12,880 --> 00:59:14,079 have to get going to feel 1951 00:59:14,079 --> 00:59:17,079 this is a graceful time period to exit. 1952 00:59:17,079 --> 00:59:18,699 I can either leave the recording 1953 00:59:18,699 --> 00:59:20,334 on if people oh, 1954 00:59:20,334 --> 00:59:22,029 living or if you don't have to go, 1955 00:59:22,029 --> 00:59:23,364 then we just need no, no, no. 1956 00:59:23,364 --> 00:59:24,729 I don't I'm not in a hurry. 1957 00:59:24,729 --> 00:59:26,499 So just real quick. 1958 00:59:26,499 --> 00:59:27,609 There's more if you're 1959 00:59:27,609 --> 00:59:29,064 curious about this sort of thing. 1960 00:59:29,064 --> 00:59:30,310 These two websites 1961 00:59:30,310 --> 00:59:31,809 that are consultancy is two 1962 00:59:31,809 --> 00:59:33,055 r.io and 1963 00:59:33,055 --> 00:59:35,260 our boot campus data stacked I Academy. 1964 00:59:35,260 --> 00:59:36,879 Both of those will give you more information 1965 00:59:36,879 --> 00:59:38,620 about like theta integral path. 1966 00:59:38,620 --> 00:59:39,954 So if you're curious, 1967 00:59:39,954 --> 00:59:41,110 go there and check it out. 1968 00:59:41,110 --> 00:59:43,930 And let me also put let me 1969 00:59:43,930 --> 00:59:45,069 also put I'll put it in 1970 00:59:45,069 --> 00:59:47,395 the chat is my e-mail. 1971 00:59:47,395 --> 00:59:51,280 Like if you have anybody if you know, 1972 00:59:51,280 --> 00:59:54,295 if you have questions about any part of this, 1973 00:59:54,295 --> 00:59:55,840 feel free to reach out to me. 1974 00:59:55,840 --> 00:59:56,769 I love talking about 1975 00:59:56,769 --> 00:59:57,639 not just data engineering, 1976 00:59:57,639 --> 00:59:58,929 but about tech stuff, 1977 00:59:58,929 --> 01:00:01,420 bus system science, any Ollie above. 1978 01:00:01,420 --> 01:00:02,439 So yeah, feel free to 1979 01:00:02,439 --> 01:00:03,399 reach out and people that need 1980 01:00:03,399 --> 01:00:05,290 to jump off do your thing. 1981 01:00:05,290 --> 01:00:06,669 I'll stay around at least for 1982 01:00:06,669 --> 01:00:07,480 a few more minutes or as 1983 01:00:07,480 --> 01:00:08,500 long as there's any questions. 1984 01:00:08,500 --> 01:00:12,165 So keep asking if you will. 1985 01:00:12,165 --> 01:00:14,124 So I would invite you 1986 01:00:14,124 --> 01:00:15,159 because a demo could be pretty 1987 01:00:15,159 --> 01:00:16,405 interesting and show you things. 1988 01:00:16,405 --> 01:00:18,370 You don't get to just verbal description, 1989 01:00:18,370 --> 01:00:19,689 so yeah, you want to do that? 1990 01:00:19,689 --> 01:00:21,669 I'll leave the recorder on and then anybody 1991 01:00:21,669 --> 01:00:23,830 who has to go could come back and, you know, 1992 01:00:23,830 --> 01:00:25,839 fast-forward through the recording and find 1993 01:00:25,839 --> 01:00:27,100 the demo if they wanted to watch it 1994 01:00:27,100 --> 01:00:29,319 that way they get the best of both worlds. 1995 01:00:29,319 --> 01:00:31,450 Yeah, give me just one minute 1996 01:00:31,450 --> 01:00:34,100 to set this up here. 1997 01:00:35,400 --> 01:00:37,854 So this is a demo of 1998 01:00:37,854 --> 01:00:41,199 something that is like very much. 1999 01:00:41,199 --> 01:00:43,750 This is, this is a demo, 2000 01:00:43,750 --> 01:00:44,829 some coordinated hearing stuff. 2001 01:00:44,829 --> 01:00:46,884 So this is what we call a data pipeline. 2002 01:00:46,884 --> 01:00:49,330 So I'll show you here that kind of overview, 2003 01:00:49,330 --> 01:00:50,785 a schematic overview of this. 2004 01:00:50,785 --> 01:00:53,050 So this is a data processing pipeline 2005 01:00:53,050 --> 01:00:53,649 that we built on 2006 01:00:53,649 --> 01:00:55,150 Google Cloud kinda visit demo. 2007 01:00:55,150 --> 01:00:56,200 And this is a good example of 2008 01:00:56,200 --> 01:00:58,900 what's called serverless architecture, 2009 01:00:58,900 --> 01:01:02,350 which is served serverless Cloud services 2010 01:01:02,350 --> 01:01:03,519 refers to ones that 2011 01:01:03,519 --> 01:01:05,695 don't have specific instances. 2012 01:01:05,695 --> 01:01:06,415 You don't have to 2013 01:01:06,415 --> 01:01:09,189 instantiate virtual machine. 2014 01:01:09,189 --> 01:01:10,749 It's, it's all, all of the 2015 01:01:10,749 --> 01:01:12,190 kind of provisioning and 2016 01:01:12,190 --> 01:01:13,899 management is done behind 2017 01:01:13,899 --> 01:01:15,715 the scenes by the Cloud vendor. 2018 01:01:15,715 --> 01:01:17,920 So you have to do a PSD up and that, 2019 01:01:17,920 --> 01:01:18,940 you know, there's a lot of 2020 01:01:18,940 --> 01:01:20,620 automated processing the happenings. 2021 01:01:20,620 --> 01:01:22,960 So two things in particular are the ones 2022 01:01:22,960 --> 01:01:24,309 outside cloud Functions is 2023 01:01:24,309 --> 01:01:25,479 an event processing things. 2024 01:01:25,479 --> 01:01:27,524 So those are Python, 2025 01:01:27,524 --> 01:01:28,990 lot of other languages that use those, 2026 01:01:28,990 --> 01:01:30,340 but these are used for like short 2027 01:01:30,340 --> 01:01:32,065 processing applications were like, 2028 01:01:32,065 --> 01:01:32,800 like if you go to 2029 01:01:32,800 --> 01:01:34,149 a website where that somebody uploads 2030 01:01:34,149 --> 01:01:36,039 a PDF and there's some automatic processing, 2031 01:01:36,039 --> 01:01:37,089 there's a good chance they're using 2032 01:01:37,089 --> 01:01:38,469 something like Cloud Functions where it 2033 01:01:38,469 --> 01:01:40,585 seems like short single use, 2034 01:01:40,585 --> 01:01:43,149 event triggered functions. 2035 01:01:43,149 --> 01:01:45,400 And then Pub Sub is Google's implementation 2036 01:01:45,400 --> 01:01:47,920 of Apache Kafka messaging, 2037 01:01:47,920 --> 01:01:50,259 buffering and queuing system in 2038 01:01:50,259 --> 01:01:53,305 the Cloud like development paradigm, 2039 01:01:53,305 --> 01:01:54,699 which is I don't know if he turns 2040 01:01:54,699 --> 01:01:56,650 her microservices are decoupled 2041 01:01:56,650 --> 01:01:58,360 architecture where instead of 2042 01:01:58,360 --> 01:02:00,099 having like a monolithic application, 2043 01:02:00,099 --> 01:02:01,630 you have separate services 2044 01:02:01,630 --> 01:02:02,980 that need to talk to each other. 2045 01:02:02,980 --> 01:02:04,809 So the messaging layer 2046 01:02:04,809 --> 01:02:05,829 is really important in that. 2047 01:02:05,829 --> 01:02:07,299 So just real quickly you can see we have 2048 01:02:07,299 --> 01:02:08,995 like an ingestion portion of this. 2049 01:02:08,995 --> 01:02:11,230 We have some data in JSON files. 2050 01:02:11,230 --> 01:02:12,790 Those are processed with 2051 01:02:12,790 --> 01:02:13,840 an automatic function 2052 01:02:13,840 --> 01:02:15,385 through a messaging layer, 2053 01:02:15,385 --> 01:02:17,860 with some more automatic processing through 2054 01:02:17,860 --> 01:02:19,149 what we might call ETL 2055 01:02:19,149 --> 01:02:20,619 or data transformation. 2056 01:02:20,619 --> 01:02:23,380 And then at the end, too, 2057 01:02:23,380 --> 01:02:25,240 BigQuery, which is Google's 2058 01:02:25,240 --> 01:02:28,735 like database analytical system. 2059 01:02:28,735 --> 01:02:31,960 So what I can do here is show you, 2060 01:02:31,960 --> 01:02:34,390 let me pull the logs up for this. 2061 01:02:34,390 --> 01:02:37,450 So I'm going to have, 2062 01:02:37,450 --> 01:02:39,489 what I'm gonna do here is like 2063 01:02:39,489 --> 01:02:42,579 I'm going to say this all get started with, 2064 01:02:42,579 --> 01:02:43,929 if I upload this file, 2065 01:02:43,929 --> 01:02:45,280 I have a JSON file, 2066 01:02:45,280 --> 01:02:46,329 some data that's got 2067 01:02:46,329 --> 01:02:49,164 like airline ticket data in it. 2068 01:02:49,164 --> 01:02:51,429 And you can see like if 2069 01:02:51,429 --> 01:02:53,289 I when I kick this off, 2070 01:02:53,289 --> 01:02:55,819 when I upload this file, 2071 01:02:55,980 --> 01:02:57,804 sorry, let me see if I get 2072 01:02:57,804 --> 01:02:59,515 my terminal over here. 2073 01:02:59,515 --> 01:03:00,999 So I just, sorry, 2074 01:03:00,999 --> 01:03:02,439 I didn't have my terminal on the screen, 2075 01:03:02,439 --> 01:03:03,940 but I just did a, an upload, 2076 01:03:03,940 --> 01:03:07,390 a terminal upload of 2077 01:03:07,390 --> 01:03:09,175 of a data file 2078 01:03:09,175 --> 01:03:11,290 to a Google Cloud Storage bucket. 2079 01:03:11,290 --> 01:03:13,059 So cloud storage is like they're just, 2080 01:03:13,059 --> 01:03:14,814 they're they're standard Cloud Storage 2081 01:03:14,814 --> 01:03:17,110 for, for file storage. 2082 01:03:17,110 --> 01:03:20,450 Give this a minute to refresh. 2083 01:03:21,660 --> 01:03:24,310 So you can see, 2084 01:03:24,310 --> 01:03:25,629 okay, so this is the law. 2085 01:03:25,629 --> 01:03:27,220 This isn't the Google heartburn logs, 2086 01:03:27,220 --> 01:03:27,699 it's four. 2087 01:03:27,699 --> 01:03:29,859 So you can see this happening in real time. 2088 01:03:29,859 --> 01:03:31,479 So it's like before I 2089 01:03:31,479 --> 01:03:33,160 triggered before I kick this off a minute, 2090 01:03:33,160 --> 01:03:33,939 five minutes ago, 2091 01:03:33,939 --> 01:03:34,929 there were no new trip data, 2092 01:03:34,929 --> 01:03:36,849 so I uploaded this file automatically. 2093 01:03:36,849 --> 01:03:38,515 Now you can see the Cloud Function. 2094 01:03:38,515 --> 01:03:40,465 Cloud Functions got triggered. 2095 01:03:40,465 --> 01:03:43,119 It recognizes that there's, 2096 01:03:43,119 --> 01:03:45,249 that there's data in there. 2097 01:03:45,249 --> 01:03:49,045 It's pulling trips from trip planning topic. 2098 01:03:49,045 --> 01:03:50,439 So now this is using like 2099 01:03:50,439 --> 01:03:52,450 the messaging queue layer to 2100 01:03:52,450 --> 01:03:53,740 automatically process 2101 01:03:53,740 --> 01:03:56,964 those messages into another layer. 2102 01:03:56,964 --> 01:04:00,084 So you can see this as the like, 2103 01:04:00,084 --> 01:04:01,855 as the logs are updating. 2104 01:04:01,855 --> 01:04:03,324 You can see in real time 2105 01:04:03,324 --> 01:04:05,485 these different functions getting kicked off. 2106 01:04:05,485 --> 01:04:09,999 And then that data goes too big, 2107 01:04:09,999 --> 01:04:11,889 which comes from a JSON file, 2108 01:04:11,889 --> 01:04:14,919 and then goes to BigQuery. 2109 01:04:14,919 --> 01:04:17,740 So I'll show you this. 2110 01:04:17,740 --> 01:04:20,439 So now we have tables. 2111 01:04:20,439 --> 01:04:22,479 There's a couple of different tables 2112 01:04:22,479 --> 01:04:24,234 because we have the raw, 2113 01:04:24,234 --> 01:04:27,744 the raw data goes into a table. 2114 01:04:27,744 --> 01:04:29,409 A table where 2115 01:04:29,409 --> 01:04:32,679 just the whole JSON block is stored. 2116 01:04:32,679 --> 01:04:34,555 And then from there there's some, 2117 01:04:34,555 --> 01:04:36,759 some more additional processing 2118 01:04:36,759 --> 01:04:37,989 that goes to pull out 2119 01:04:37,989 --> 01:04:41,949 this like specific passenger information. 2120 01:04:41,949 --> 01:04:45,520 So this is this file 2121 01:04:45,520 --> 01:04:47,230 is not really a database. 2122 01:04:47,230 --> 01:04:48,850 You're having to extract 2123 01:04:48,850 --> 01:04:49,944 it knowing 2124 01:04:49,944 --> 01:04:51,595 something about its structure, not, 2125 01:04:51,595 --> 01:04:53,320 not in the classic database sort 2126 01:04:53,320 --> 01:04:55,120 of women didn't look like exactly well, 2127 01:04:55,120 --> 01:04:56,620 this is, and this is, We give us 2128 01:04:56,620 --> 01:04:57,340 similar because this is 2129 01:04:57,340 --> 01:04:58,210 a really common thing. 2130 01:04:58,210 --> 01:05:00,460 I mean, a lot of times, depending on your, 2131 01:05:00,460 --> 01:05:01,359 you know, what, 2132 01:05:01,359 --> 01:05:02,859 your organization is technical. 2133 01:05:02,859 --> 01:05:04,435 A lot of times your data 2134 01:05:04,435 --> 01:05:06,220 starts in a database 2135 01:05:06,220 --> 01:05:07,539 and then you do transformations there. 2136 01:05:07,539 --> 01:05:09,324 But it's still super common 2137 01:05:09,324 --> 01:05:11,664 that you might be working with file data. 2138 01:05:11,664 --> 01:05:13,284 So it's like this is like 2139 01:05:13,284 --> 01:05:16,164 starting from data that's not in a database, 2140 01:05:16,164 --> 01:05:17,379 just an a file, but it 2141 01:05:17,379 --> 01:05:19,255 is in a structured format. 2142 01:05:19,255 --> 01:05:22,315 And then reading, reading that data, 2143 01:05:22,315 --> 01:05:24,654 reading that data out of a file. 2144 01:05:24,654 --> 01:05:27,145 The messaging layer is important in this 2145 01:05:27,145 --> 01:05:29,649 because this is a little bit of a toy model. 2146 01:05:29,649 --> 01:05:30,700 I mean, this is only like 2147 01:05:30,700 --> 01:05:32,184 a few thousand records, 2148 01:05:32,184 --> 01:05:34,179 but it's not a toy model 2149 01:05:34,179 --> 01:05:35,379 in the sense that this is 2150 01:05:35,379 --> 01:05:37,540 built with an architecture 2151 01:05:37,540 --> 01:05:39,444 that is designed to be scalable. 2152 01:05:39,444 --> 01:05:41,079 So it's like even though this file 2153 01:05:41,079 --> 01:05:42,820 that I'm uploading only has like, 2154 01:05:42,820 --> 01:05:43,299 I don't know, a 2155 01:05:43,299 --> 01:05:45,459 100 megabytes of data or something. 2156 01:05:45,459 --> 01:05:48,880 This is built so that if you had hundreds 2157 01:05:48,880 --> 01:05:52,000 and hundreds of GB or some terabytes of data, 2158 01:05:52,000 --> 01:05:53,814 that you would, that 2159 01:05:53,814 --> 01:05:56,199 the process would work exactly the same, 2160 01:05:56,199 --> 01:05:57,910 like the data ingestion 2161 01:05:57,910 --> 01:05:59,379 would happen automatically. 2162 01:05:59,379 --> 01:06:01,675 Those that data would go, 2163 01:06:01,675 --> 01:06:05,154 would get separated into individual messages, 2164 01:06:05,154 --> 01:06:06,429 through the messaging later 2165 01:06:06,429 --> 01:06:07,585 and into the processing. 2166 01:06:07,585 --> 01:06:08,709 So that if you've had 2167 01:06:08,709 --> 01:06:10,240 a very large amount of data, 2168 01:06:10,240 --> 01:06:12,549 that it wouldn't overwhelm any one of you 2169 01:06:12,549 --> 01:06:13,600 wouldn't have a bottleneck at 2170 01:06:13,600 --> 01:06:14,969 any one of these particular steps. 2171 01:06:14,969 --> 01:06:16,030 It would take longer 2172 01:06:16,030 --> 01:06:18,115 to ingest and process that data. 2173 01:06:18,115 --> 01:06:20,440 In this architecture would 2174 01:06:20,440 --> 01:06:21,625 be able to handle that. 2175 01:06:21,625 --> 01:06:23,980 So this is a little bit scaled down, 2176 01:06:23,980 --> 01:06:26,139 but this is, this is a very typical example. 2177 01:06:26,139 --> 01:06:27,790 So the data adheres to all the time, 2178 01:06:27,790 --> 01:06:30,594 which is read data from one or more sources, 2179 01:06:30,594 --> 01:06:33,114 like handle the handle the messaging, 2180 01:06:33,114 --> 01:06:34,540 deal with the processing which is 2181 01:06:34,540 --> 01:06:37,915 often a multistage process on its own. 2182 01:06:37,915 --> 01:06:40,150 Deliver that to some, you know, 2183 01:06:40,150 --> 01:06:41,379 some databases 2184 01:06:41,379 --> 01:06:43,569 are some usually do some kind of 2185 01:06:43,569 --> 01:06:45,580 database that's more of 2186 01:06:45,580 --> 01:06:47,289 an analytical database where 2187 01:06:47,289 --> 01:06:49,419 the data scientists and data analysts, 2188 01:06:49,419 --> 01:06:52,090 they're there Tableau or their Power BI, 2189 01:06:52,090 --> 01:06:55,299 or the data scientists 2190 01:06:55,299 --> 01:06:56,380 might be working with the 2191 01:06:56,380 --> 01:06:57,835 data a little more raw form, 2192 01:06:57,835 --> 01:06:58,689 like they might be using 2193 01:06:58,689 --> 01:07:00,399 Pandas and a notebook. 2194 01:07:00,399 --> 01:07:02,619 Bigquery also is a great tool 2195 01:07:02,619 --> 01:07:04,524 because Google BigQuery has 2196 01:07:04,524 --> 01:07:05,620 BigQuery ML is like 2197 01:07:05,620 --> 01:07:09,130 a built-in machine learning system 2198 01:07:09,130 --> 01:07:10,780 that so you can basically set up 2199 01:07:10,780 --> 01:07:11,949 machine learning models with 2200 01:07:11,949 --> 01:07:13,600 SQL style queries and not 2201 01:07:13,600 --> 01:07:15,310 have to then go into some other tool like 2202 01:07:15,310 --> 01:07:16,674 a Jupyter Notebook to 2203 01:07:16,674 --> 01:07:18,940 like process and train your models. 2204 01:07:18,940 --> 01:07:21,100 So yeah, so anyway, this is kind of a, 2205 01:07:21,100 --> 01:07:21,909 this is kind of 2206 01:07:21,909 --> 01:07:23,350 a just a demo of something, you know, 2207 01:07:23,350 --> 01:07:24,400 a Common Data Engineering 2208 01:07:24,400 --> 01:07:25,810 pipeline that I show you, 2209 01:07:25,810 --> 01:07:27,249 the cloud component because this is 2210 01:07:27,249 --> 01:07:29,379 like mostly, I mean, 2211 01:07:29,379 --> 01:07:29,739 they're still 2212 01:07:29,739 --> 01:07:31,180 organizations that are dealing with 2213 01:07:31,180 --> 01:07:33,909 a lot of on-premises data and migrating it. 2214 01:07:33,909 --> 01:07:35,920 But, but most organizations now 2215 01:07:35,920 --> 01:07:37,029 are aware of the importance 2216 01:07:37,029 --> 01:07:37,854 of the Cloud and a lot, 2217 01:07:37,854 --> 01:07:39,700 a lot of this type of data engineering work 2218 01:07:39,700 --> 01:07:42,459 is involving the Cloud tools specifically. 2219 01:07:42,459 --> 01:07:43,360 So that's that's why I 2220 01:07:43,360 --> 01:07:45,410 showed that component of it. 2221 01:07:46,260 --> 01:07:48,295 Okay. 2222 01:07:48,295 --> 01:07:49,749 Is that enough? 2223 01:07:49,749 --> 01:07:50,079 Yeah. 2224 01:07:50,079 --> 01:07:51,880 I mean, I I didn't actually see 2225 01:07:51,880 --> 01:07:53,995 the final outcome that what you said, 2226 01:07:53,995 --> 01:07:55,419 sorry, the final outcome is that 2227 01:07:55,419 --> 01:07:57,130 the data lands and BigQuery. 2228 01:07:57,130 --> 01:07:58,960 And so then we get the data into this. 2229 01:07:58,960 --> 01:08:01,435 There's actually probably in the, in, in, in, 2230 01:08:01,435 --> 01:08:02,800 like in the real world this would 2231 01:08:02,800 --> 01:08:04,254 probably not be the final outcome 2232 01:08:04,254 --> 01:08:04,870 because then you would 2233 01:08:04,870 --> 01:08:07,119 have data scientists or analysts. 2234 01:08:07,119 --> 01:08:08,229 You would have their own tools 2235 01:08:08,229 --> 01:08:09,205 that they would, you would, 2236 01:08:09,205 --> 01:08:10,119 you would talk to them 2237 01:08:10,119 --> 01:08:11,140 and make sure they know how to 2238 01:08:11,140 --> 01:08:12,280 connect to the BigQuery 2239 01:08:12,280 --> 01:08:13,570 database to get their data. 2240 01:08:13,570 --> 01:08:16,029 But then the data is in a database in 2241 01:08:16,029 --> 01:08:17,860 a structured form that somebody can easily 2242 01:08:17,860 --> 01:08:19,959 pull and use in their model. 2243 01:08:19,959 --> 01:08:21,685 But then what you're saying is this 2244 01:08:21,685 --> 01:08:25,329 features Get Data Engineers part of the job. 2245 01:08:25,329 --> 01:08:27,189 More than it features like 2246 01:08:27,189 --> 01:08:29,139 the data scientists can do with that exactly. 2247 01:08:29,139 --> 01:08:30,594 This is the part that like 2248 01:08:30,594 --> 01:08:32,050 this is the part that a lot of 2249 01:08:32,050 --> 01:08:33,310 people have less visibility 2250 01:08:33,310 --> 01:08:35,170 into because, you know, 2251 01:08:35,170 --> 01:08:36,789 as a data scientists, you usually just, 2252 01:08:36,789 --> 01:08:38,154 there's some data that's already, 2253 01:08:38,154 --> 01:08:39,310 even if it needs some cleaning, 2254 01:08:39,310 --> 01:08:40,704 there's some data that's 2255 01:08:40,704 --> 01:08:42,774 already there to start with. 2256 01:08:42,774 --> 01:08:44,499 And this is like, this is 2257 01:08:44,499 --> 01:08:45,415 a good kind of 2258 01:08:45,415 --> 01:08:46,900 demonstration of the process of 2259 01:08:46,900 --> 01:08:48,039 how you might have 2260 01:08:48,039 --> 01:08:50,229 some raw data that's coming in in 2261 01:08:50,229 --> 01:08:52,119 a form that's like nowhere near being 2262 01:08:52,119 --> 01:08:54,310 able to be used for analysis yet. 2263 01:08:54,310 --> 01:08:56,200 And then there's some different processing 2264 01:08:56,200 --> 01:08:57,790 required to get it to that for it. 2265 01:08:57,790 --> 01:08:58,735 So yeah. 2266 01:08:58,735 --> 01:08:59,349 Thanks. 2267 01:08:59,349 --> 01:09:01,075 And I'm going to turn the recorder off just 2268 01:09:01,075 --> 01:09:03,949 to keep us from getting too long.