[00:00:00.000 --> 00:00:10.000] [MUSIC]
[00:00:10.000 --> 00:00:12.000] Welcome back to Knowledge Distillation,
[00:00:12.000 --> 00:00:16.440] where we explore how AI is reshaping the role of the data analyst.
[00:00:16.440 --> 00:00:20.880] I'm your host, Katrin Ribant, CEO and founder of Ask-Y.
[00:00:20.880 --> 00:00:25.240] My guest today is someone I've known for 15 years,
[00:00:25.240 --> 00:00:29.240] back when we were both riding the first wave of what would become the data
[00:00:29.240 --> 00:00:30.800] revolution.
[00:00:30.800 --> 00:00:35.160] Mike Driscoll is the co-founder and CEO of Rill Data, but
[00:00:35.160 --> 00:00:39.520] before that Mike founded MetaMarkets in 2010,
[00:00:39.520 --> 00:00:43.560] where he co-invented Apache Druid, one of the most important
[00:00:43.560 --> 00:00:46.720] real-time analytics databases of the last decade.
[00:00:46.720 --> 00:00:48.320] And we'll talk about that.
[00:00:48.320 --> 00:00:52.640] Mike, you're also a founding partner of Data Collective,
[00:00:52.640 --> 00:00:57.680] a VC fund that raised over a billion dollars to fund data ventures.
[00:00:57.680 --> 00:01:03.680] And before that, when we met in 2010, you founded Dataspora,
[00:01:03.680 --> 00:01:08.320] a data science consultancy back when data scientist was all the rage.
[00:01:08.320 --> 00:01:12.520] I mean, technically before data scientist was all the rage.
[00:01:12.520 --> 00:01:17.160] I should also mention that you hold a PhD in bioinformatics.
[00:01:17.160 --> 00:01:19.720] So, I have to start with that actually.
[00:01:19.720 --> 00:01:21.120] Can you tell us a bit about that?
[00:01:21.120 --> 00:01:22.960] What is bioinformatics?
[00:01:22.960 --> 00:01:26.800] And how did you end up deciding to have an entrepreneur carrier and
[00:01:26.800 --> 00:01:28.720] a VC carrier both in analytics?
[00:01:28.720 --> 00:01:37.400] Well, I actually, there's one company that I started before I did my PhD.
[00:01:37.400 --> 00:01:40.440] And it was a retailer,
[00:01:40.440 --> 00:01:45.400] one of the early online retailers called custominc.com.
[00:01:45.400 --> 00:01:47.040] It's a t-shirt business.
[00:01:47.040 --> 00:01:52.120] And I think in some ways the way I ended up doing a PhD in bioinformatics is
[00:01:52.120 --> 00:02:00.080] that I had gone very deep into custom apparel as an entrepreneur.
[00:02:00.080 --> 00:02:05.320] And almost a year into that, I realized that if I continued on this path,
[00:02:05.320 --> 00:02:09.440] I would be kind of in the t-shirt world potentially for decades.
[00:02:09.440 --> 00:02:12.040] And I realized that I didn't just want to be an entrepreneur.
[00:02:12.040 --> 00:02:14.320] I wanted to, I love data.
[00:02:14.320 --> 00:02:16.960] I loved computer science.
[00:02:16.960 --> 00:02:19.520] I was sort of a frustrated computer scientist.
[00:02:19.520 --> 00:02:22.640] I'd never really been technical in undergraduate.
[00:02:22.640 --> 00:02:31.240] So I sold that business to my college roommate who went on to run it for 25 years.
[00:02:31.240 --> 00:02:35.440] If anyone's ever bought t-shirts online, you probably have heard of custominc.
[00:02:35.440 --> 00:02:38.320] And I left for grad school.
[00:02:38.320 --> 00:02:40.480] And so what is bioinformatics?
[00:02:40.480 --> 00:02:43.320] It's really computer science and biology.
[00:02:43.320 --> 00:02:45.920] Some people call it computational biology.
[00:02:45.920 --> 00:02:51.160] But the way I got into it is I was a self-taught programmer and was working
[00:02:51.160 --> 00:02:56.400] for the Human Genome Project as a developer.
[00:02:56.400 --> 00:02:58.480] And it was just fascinating to me.
[00:02:58.480 --> 00:02:59.520] I've always loved data.
[00:02:59.520 --> 00:03:04.520] And the origins of big data, actually, that term really
[00:03:04.520 --> 00:03:06.040] come from the life sciences.
[00:03:06.040 --> 00:03:14.920] It was this recognition that our DNA is this incredible rich trove of information.
[00:03:14.920 --> 00:03:20.560] And so at that point, this is the late 1990s, early 2000s,
[00:03:20.560 --> 00:03:22.760] we were sequencing the human genome.
[00:03:22.760 --> 00:03:26.160] We'd started with Drosophila, which is the fruit fly.
[00:03:26.160 --> 00:03:28.960] And we were doing some really interesting techniques
[00:03:28.960 --> 00:03:31.160] to sequence the human genome.
[00:03:31.160 --> 00:03:33.880] And I found that absolutely fascinating.
[00:03:33.880 --> 00:03:38.960] And I decided rather than become a t-shirt entrepreneur for the rest of my life,
[00:03:38.960 --> 00:03:44.880] I actually went to that PhD program thinking I would become a biotech entrepreneur.
[00:03:44.880 --> 00:03:51.360] Of course, as the saying says, man plans and God laughs.
[00:03:51.360 --> 00:03:53.160] So I had a plan.
[00:03:53.160 --> 00:03:54.160] Yes.
[00:03:54.160 --> 00:03:56.120] I ended up going in a different direction.
[00:03:56.120 --> 00:03:58.320] But yeah, that's how I ended up in computational biology,
[00:03:58.320 --> 00:04:04.480] bioinformatics was leaving e-commerce and just inspired
[00:04:04.480 --> 00:04:07.160] by the data and the life sciences.
[00:04:07.160 --> 00:04:13.520] So Mike, you and I met at the onset of the rise of the data scientists around 2010.
[00:04:13.520 --> 00:04:17.440] This was before Harvard's Business Review declared it the sexiest job
[00:04:17.440 --> 00:04:21.200] of the 21st century in 2012.
[00:04:21.200 --> 00:04:25.880] And when everyone was scrambling to figure out really,
[00:04:25.880 --> 00:04:30.280] like what is actually meant, I remember those debates at the time.
[00:04:30.280 --> 00:04:32.680] This is not about that debate, by the way.
[00:04:32.680 --> 00:04:33.440] Yes.
[00:04:33.440 --> 00:04:39.040] At the time, I was migrating Havasys data platform from Oracle basically
[00:04:39.040 --> 00:04:41.280] to an MPP architecture.
[00:04:41.280 --> 00:04:45.640] And I was in the process of selecting a vendor amongst the MPP sort of darlings
[00:04:45.640 --> 00:04:46.840] of the time.
[00:04:46.840 --> 00:04:51.720] My POC included Netiza, Vertica, Greenplum, and a few others.
[00:04:51.720 --> 00:04:56.280] And the founder of Greenplum, Scottiara, which I ended up selecting actually,
[00:04:56.280 --> 00:05:00.320] Greenplum, introduced us during the workshop.
[00:05:00.320 --> 00:05:04.160] I remember you were really in the trenches solving the hardest technical
[00:05:04.160 --> 00:05:07.840] problems in high scale real time analytics.
[00:05:07.840 --> 00:05:12.600] So tell us about the life of an analyst in 2010, like the issues with managing
[00:05:12.600 --> 00:05:17.280] queries at scale and how using AI/ML really looked like at that point.
[00:05:17.280 --> 00:05:17.780] Sure.
[00:05:17.780 --> 00:05:24.480] I mean, it's incredible sometimes to look back and--
[00:05:24.480 --> 00:05:25.920] It's only 15 years.
[00:05:25.920 --> 00:05:26.420] Right.
[00:05:26.420 --> 00:05:30.360] And how many cycles of Moore's law have we had, right?
[00:05:30.360 --> 00:05:34.560] 10 cycles of Moore's law in 15 years.
[00:05:34.560 --> 00:05:39.920] And so at the time, the data seemed enormous.
[00:05:39.920 --> 00:05:43.480] I distinctly remember being in New York when we met.
[00:05:43.480 --> 00:05:49.980] And I was brought in as a consultant to the Greenplum team as their kind of data
[00:05:49.980 --> 00:05:51.080] scientist for hire.
[00:05:51.080 --> 00:05:58.080] Gosh, everything was back then was new.
[00:05:58.080 --> 00:06:02.880] AWS was nascent.
[00:06:02.880 --> 00:06:09.320] The idea of massive parallel MPP databases was new.
[00:06:09.320 --> 00:06:12.280] Greenplum was based on Postgres.
[00:06:12.280 --> 00:06:14.040] It was a distributed Postgres engine.
[00:06:14.040 --> 00:06:19.200] Postgres, of course, continues to thrive today.
[00:06:19.200 --> 00:06:22.560] I mean, in some ways, things change, but they stay the same.
[00:06:22.560 --> 00:06:27.040] I think one of the things that has always been a challenge in the world when
[00:06:27.040 --> 00:06:32.800] you're a data analyst is really the data transformation and the data preparation.
[00:06:32.800 --> 00:06:36.800] So what I recall from those early days working with Greenplum, and I think
[00:06:36.800 --> 00:06:42.920] Havas was one of the clients we worked with, boy, moving data out of Oracle,
[00:06:42.920 --> 00:06:48.360] for instance, into another system was always a huge headache.
[00:06:48.360 --> 00:06:50.960] I remember working with one of Scott's colleagues.
[00:06:50.960 --> 00:06:58.520] And so data movement, orchestration, ETL was often 80% of the effort.
[00:06:58.520 --> 00:07:02.600] That project cost me a few years of my life, for sure.
[00:07:02.600 --> 00:07:06.240] My first gray hairs came in at that point.
[00:07:06.240 --> 00:07:09.280] But I also recall, again, similar trends.
[00:07:09.280 --> 00:07:15.400] One of the very common themes we saw was that while we could design analytics
[00:07:15.400 --> 00:07:19.520] and run, for instance, and back then I was very heavily involved in the R community,
[00:07:19.520 --> 00:07:24.720] we were using the open source R programming language for doing statistical
[00:07:24.720 --> 00:07:28.640] analysis that was very popular among the data scientists at the time.
[00:07:28.640 --> 00:07:32.120] While we could get something running on a laptop with a sample of data,
[00:07:32.120 --> 00:07:37.040] again, scaling up that analysis and having it run in a system like Greenplum
[00:07:37.040 --> 00:07:39.280] was also a major hurdle.
[00:07:39.280 --> 00:07:44.240] And so what I recall, again, working with Havas was that just translating
[00:07:44.240 --> 00:07:49.040] those models from something you could get to work on a small, an hour or day
[00:07:49.040 --> 00:07:56.200] of data and then migrating it to something that could work on a month or a year of data,
[00:07:56.200 --> 00:07:57.280] that was always a challenge.
[00:07:57.280 --> 00:08:00.760] We were kind of rewriting our scripts and rewriting kind of linear regressions.
[00:08:00.760 --> 00:08:03.040] We were doing sophisticated things for the time.
[00:08:03.040 --> 00:08:08.320] I mean, we had user level data on the hit level and we were doing
[00:08:08.320 --> 00:08:12.760] attribution like dynamic attribution modeling on that in 2010, right?
[00:08:12.760 --> 00:08:13.320] Yes.
[00:08:13.320 --> 00:08:15.200] That wasn't popular at the time.
[00:08:15.200 --> 00:08:20.360] Yeah, there were really no out of the box solutions, I think,
[00:08:20.360 --> 00:08:23.120] for the things we were working on.
[00:08:23.120 --> 00:08:27.520] And yeah, so it was definitely the early days.
[00:08:27.520 --> 00:08:30.000] And I think big data was not that big, right?
[00:08:30.000 --> 00:08:32.960] It really wasn't that big, no.
[00:08:32.960 --> 00:08:37.200] But back then, the challenge was simply having tools powerful enough to handle
[00:08:37.200 --> 00:08:41.120] vast amounts of data, which, you know, sure, weren't that big at the time,
[00:08:41.120 --> 00:08:43.840] but still too big for the tools, right?
[00:08:43.840 --> 00:08:50.880] Today, largely, that problem is solved-ish, but reasonably good enough, I would say.
[00:08:50.880 --> 00:08:55.640] We have incredible databases, transformation pipelines, BI platforms.
[00:08:55.640 --> 00:08:59.960] But now we face something, I think, far more subtle and far more critical.
[00:08:59.960 --> 00:09:04.160] Maintaining context throughout the entire analytics process.
[00:09:04.160 --> 00:09:09.680] So let's have a look at that evolution and how we get to where we are today.
[00:09:09.680 --> 00:09:15.040] So do you want to talk to us a little bit about Droid and MetaMarkets?
[00:09:15.040 --> 00:09:16.560] Sure, sure.
[00:09:16.560 --> 00:09:22.600] So I think, and maybe I'll try to embed it in, I think, a larger context,
[00:09:22.600 --> 00:09:28.360] which kind of a through line as we look back at, you know, 2010 to the present.
[00:09:28.360 --> 00:09:36.000] I think that one of the reasons why many of us who love data analysis end up
[00:09:36.000 --> 00:09:44.760] in the world of media and publishing and advertising is because that's where the data is.
[00:09:44.760 --> 00:09:47.080] Lots of data and lots of analysis.
[00:09:47.080 --> 00:09:52.000] You know, the media business was one of the first to digitally transform.
[00:09:52.000 --> 00:09:54.280] The products are digital.
[00:09:54.280 --> 00:10:02.160] And so, you know, kind of getting digital signal and exhaust from the consumption of digital media.
[00:10:02.160 --> 00:10:05.520] So it's sort of natural, a very natural thing.
[00:10:05.520 --> 00:10:13.760] I think outside of essentially IT observability, you know, the places where like folks like Splunk play,
[00:10:13.760 --> 00:10:20.200] I think digital media has always really been the tip of the spear for innovation and analytics and data infrastructure.
[00:10:20.200 --> 00:10:31.560] And so when we started MetaMarkets, there was an issue where Hadoop was solving the problem of scale.
[00:10:31.560 --> 00:10:36.880] Hadoop had come on the scene and obviously now we've evolved to things like Spark.
[00:10:36.880 --> 00:10:41.240] But scale was something that was at least solved.
[00:10:41.240 --> 00:10:44.440] What was not solved was speed at scale.
[00:10:44.440 --> 00:10:55.920] And the vision that I had was I really wanted end users who were doing analytics and observing trends
[00:10:55.920 --> 00:11:05.240] for advertising campaigns or trying to sort of go deep and do user level analysis, cohort,
[00:11:05.240 --> 00:11:08.520] understand how different cohorts were performing.
[00:11:08.520 --> 00:11:12.960] All of the questions that many folks want to ask of their data,
[00:11:12.960 --> 00:11:19.520] those questions frankly would take too long to answer in an interactive manner with something like Hadoop on the back end.
[00:11:19.520 --> 00:11:25.280] And so my experience with Greenplum really inspired me.
[00:11:25.280 --> 00:11:29.840] And Greenplum was a very powerful engine for delivering speed at scale.
[00:11:29.840 --> 00:11:32.800] It was one of the first.
[00:11:32.800 --> 00:11:41.840] But it still really wasn't designed for the level of concurrency that we were looking to deliver for kind of user facing dashboards.
[00:11:41.840 --> 00:11:49.720] So our early customers at MetaMarkets for digital advertising platforms like Jim Payne's Mopub was an early customer,
[00:11:49.720 --> 00:11:52.720] ultimately acquired by Twitter.
[00:11:52.720 --> 00:11:57.520] OpenX was still around one of the early platforms.
[00:11:57.520 --> 00:12:05.560] And so we decided to build, we did what often is a crazy thing, we decided to roll our own database.
[00:12:05.560 --> 00:12:15.680] We had a very narrow set of use cases and requirements for powering interactive dashboards at multi,
[00:12:15.680 --> 00:12:20.400] let's say, a multi hundred gigabyte scale, which at the time was a lot of data.
[00:12:20.400 --> 00:12:28.200] And so one of our engineers had an idea for an architecture that I think he always wanted to build.
[00:12:28.200 --> 00:12:36.640] And so we basically developed Druid as an in-memory columnar distributed data store.
[00:12:36.640 --> 00:12:39.840] It was a NoSQL data store. It didn't have SQL support.
[00:12:39.840 --> 00:12:48.000] And we used that for probably the next eight years until we were acquired by Snap.
[00:12:48.000 --> 00:12:57.120] Druid powered all of our interactive dashboards for all of the leading digital media platforms out there.
[00:12:57.120 --> 00:13:06.520] So that was the genesis was just trying to build something that was fat, not just scalable, but had high performance at scale.
[00:13:06.520 --> 00:13:11.720] And I remember at the time you had to be very specific about the use case that you serve because,
[00:13:11.720 --> 00:13:15.120] you know, you remember we competed a few times in pitches, right?
[00:13:15.120 --> 00:13:19.400] And we were asking each other, we were like, but why are we in the same pitch?
[00:13:19.400 --> 00:13:21.040] Right. You're at Datorama, right?
[00:13:21.040 --> 00:13:28.360] Yes. You know, and why did they put MetaMarkets and Datorama in the same pitch?
[00:13:28.360 --> 00:13:36.320] Because, I mean, MetaMarkets serves one type of use case is Datorama serves pretty much exactly the opposite type of use case.
[00:13:36.320 --> 00:13:39.960] Right. How is that confusing to anyone?
[00:13:39.960 --> 00:13:46.640] And I suppose that really speaks to the type of confusion there is in any emerging technology.
[00:13:46.640 --> 00:13:52.320] When something is new, the market doesn't really understand the nuances of what is what.
[00:13:52.320 --> 00:13:58.880] And you end up lumped in in categories that really you don't belong to.
[00:13:58.880 --> 00:14:07.760] Right. And so then from MetaMarkets acquisition by Snap in 2017, I think, right?
[00:14:07.760 --> 00:14:15.120] Yes. To Rill today, how did your thinking about the analyst's role evolve?
[00:14:15.120 --> 00:14:20.000] And how did that lead you to Dashboard as code?
[00:14:20.000 --> 00:14:34.360] So I think the experience of going to Snap was illuminating because I think one of the questions we always asked ourselves at MetaMarkets was,
[00:14:34.360 --> 00:14:41.440] you know, is this very powerful, interactive, exploratory tool that we've built?
[00:14:41.440 --> 00:14:45.800] Does it have use cases beyond digital media platforms?
[00:14:45.800 --> 00:14:52.760] That's where we had been quite successful in building tens of millions of dollars of recurring revenue in that vertical.
[00:14:52.760 --> 00:14:56.720] And we ultimately made the decision to exit the business.
[00:14:56.720 --> 00:15:06.960] But I think I always in the back of my mind thought, you know, could this could this be valuable beyond the sets of kind of vertical use cases we defined?
[00:15:06.960 --> 00:15:12.720] I think when we got to Snap, we saw that that that analytics platform was valuable.
[00:15:12.720 --> 00:15:23.600] Snap began an elastic search stack that they had built and migrated a lot of their analytics onto the platform that we brought in through MetaMarkets and was involved not just in
[00:15:23.600 --> 00:15:35.040] obviously advertising optimization, which was critical for Snap at that period, but also for experimentation, for crash analytics at the time they were they were trying to launch an Android app.
[00:15:35.040 --> 00:15:48.720] And so quite literally looking at, you know, a trillion plus events a day coming back from all of the telemetry of their user base at scale and trying to get an Android app that didn't crash built.
[00:15:48.720 --> 00:16:01.320] So that led me to confirm that this tech was valuable from our philosophy was valuable beyond the use case that was in the MetaMarkets.
[00:16:01.320 --> 00:16:09.960] And the in terms of the role of the analyst, I think the hardest thing for a lot of technologies is their adoption.
[00:16:09.960 --> 00:16:15.280] And so MetaMarkets was an enterprise sales motion.
[00:16:15.280 --> 00:16:19.960] It took weeks to months to get customers on board.
[00:16:19.960 --> 00:16:30.680] And around, you know, I guess 2017, 2018, you know, I looked at some of the emerging platforms that developers were adopting.
[00:16:30.680 --> 00:16:35.840] And I think we saw things like what was Next.js that ultimately became Vercell.
[00:16:35.840 --> 00:16:42.400] I saw the rise of infrastructures, code companies like HashiCorp doing quite well.
[00:16:42.400 --> 00:16:47.400] Grafana, Grafana Labs ultimately, I think was very inspiring.
[00:16:47.400 --> 00:16:59.360] And so I think there was this set of developer led, developer led growth, you know, companies where they made it very easy to adopt their their tool.
[00:16:59.360 --> 00:17:04.480] And Druid was never that easy to adopt either as even as an open source database.
[00:17:04.480 --> 00:17:17.640] And so really, I think my shift was to recognize that analysts were becoming more technical or at least there was a cohort of analysts that were becoming more technical.
[00:17:17.640 --> 00:17:20.800] I think it's really it's a general movement.
[00:17:20.800 --> 00:17:24.560] Analysts have become considerably more technical.
[00:17:24.560 --> 00:17:31.840] I think across the span of, you know, of the analytics roles, there's definitely been a strong shift.
[00:17:31.840 --> 00:17:35.440] And so we sort of see people talked about this term analytics engineer.
[00:17:35.440 --> 00:17:53.320] We saw the rise of, you know, DBT out of Fishtown Analytics, where again, you had a group of analysts who were comfortable not just writing SQL, but, you know, using something like Git, Python, you know, becoming more and more used in the analyst community.
[00:17:53.320 --> 00:18:23.280] And so what I've sort of witnessed was maybe that analysts on one end of the continuum, analytical engineers, data engineers, there's sort of this compression where folks were able to be a data engineer writing maybe an ETL pipeline, but also able to write some Python to do transformation and get their data from, you know, an object store into a database and then even, you know, write some SQL to get a dashboard.
[00:18:23.280 --> 00:18:30.120] Built, you know, whether that be a Grafana dashboard or something like SuperSet or Tableau or Looker.
[00:18:30.120 --> 00:18:38.120] And so that was kind of the thesis was could we build something that was similar inspired by meta markets?
[00:18:38.120 --> 00:18:45.520] We actually ended up spinning the core tech out of Snap, the meta markets technology and stack and rebuilding it from the ground up.
[00:18:45.520 --> 00:19:00.760] But what we added was this layer of BIS code to say, can we let analysts define an entire stack from data to dashboard in a single GitHub repo?
[00:19:00.760 --> 00:19:04.640] And so that's the journey we've been on.
[00:19:04.640 --> 00:19:13.560] And so that's and then there was an observation that there was a rise of well, Druid was a great first generation analytics database.
[00:19:13.560 --> 00:19:34.600] I think there's a there's a new class of real time analytical databases like ClickHouse and DuckDB and Starrocks and Pino that actually deliver the performance at scale that we felt is necessary for the kinds of data applications that we want to deliver at real time.
[00:19:34.600 --> 00:19:45.600] So, so yeah, I think the evolution was just thinking that analysts are much more capable than they were, you know, 10 or 15 years ago.
[00:19:45.600 --> 00:19:56.360] And could we lean into that and lean into this sort of code first trend that we were observing having success in a lot of other areas of software?
[00:19:56.360 --> 00:20:07.360] It's fascinating, right? Because I so agree with you with the shift to the left of the skill set of the average, the average analysts across the board from data engineer to business analyst.
[00:20:07.360 --> 00:20:10.840] Right. There's really that really has happened.
[00:20:10.840 --> 00:20:21.600] I think that that is reinforced by the rise of the AI analysts, because having code generated for you really does help that shift to the left.
[00:20:21.600 --> 00:20:22.040] Yes.
[00:20:22.040 --> 00:20:43.200] I also do think that LLMs help the shift to the right because you do have an ability to have more business context and to have, you know, better visualization skills, better storytelling skills with the help of whoever, you know, Claude or whoever, whoever you like to use.
[00:20:43.200 --> 00:20:46.520] In my case, it's Claude, who's my favorite for that.
[00:20:46.520 --> 00:20:52.200] But it does really help a shift to the left, to the right as well.
[00:20:52.200 --> 00:21:02.760] And so our thesis is that the rise of the AI analysts basically powers a full stack shift of the skill set.
[00:21:02.760 --> 00:21:17.200] Not that everybody will become a specialist in everything, but I think that your general skills as an analyst become the foundation of you being able to leverage additional help to expand your skill set.
[00:21:17.200 --> 00:21:17.480] Right.
[00:21:17.480 --> 00:21:22.520] And hence fight commoditization of your jobs, etc.
[00:21:22.520 --> 00:21:26.640] I think that happens to software engineer first.
[00:21:26.640 --> 00:21:28.440] We've seen, we've observed that.
[00:21:28.440 --> 00:21:28.640] Right.
[00:21:28.640 --> 00:21:33.960] I mean, we've got the rise of the AI engineer that we've been able to observe.
[00:21:33.960 --> 00:21:51.560] I think that that also powers a lot of confusion in the market, because when I look at AI analyst jobs currently in the market, there is this big confusion between we want a software engineer that will set up infrastructure, but they should also be a data scientist.
[00:21:51.560 --> 00:21:52.000] Right.
[00:21:52.000 --> 00:21:57.840] And they should also be a data scientist who has built LLMs, preferably.
[00:21:57.840 --> 00:21:58.640] Right.
[00:21:58.640 --> 00:22:01.440] Or been close to somebody who built LLMs.
[00:22:01.440 --> 00:22:03.160] Maybe that counts.
[00:22:03.160 --> 00:22:10.720] But they should also be able to do like just data analysis and present dashboards to stakeholders.
[00:22:10.720 --> 00:22:11.640] Right.
[00:22:11.640 --> 00:22:17.280] I mean, it's kind of really funny in a way when you look at those job descriptions.
[00:22:17.280 --> 00:22:18.880] It's also really not funny.
[00:22:18.880 --> 00:22:24.200] Again, it speaks to the confusion in the market at the beginning of any big trend.
[00:22:24.200 --> 00:22:28.720] Do you see some of that with your clients, with the people that you work with?
[00:22:28.720 --> 00:22:32.440] Like you work with a very technical user, right?
[00:22:32.440 --> 00:22:33.440] What do you see?
[00:22:33.440 --> 00:22:55.480] So first, I think you put your finger on it when you said we're kind of in the early chapters of this AI revolution that's changing every aspect of both software development as well as your business processes.
[00:22:55.480 --> 00:23:19.400] I think we still do have effectively there's two stakeholders that we work with and I think many companies work with, which is you have developers that are implementing tools and analysts to some extent increasingly are moving towards that sort of developer persona.
[00:23:19.400 --> 00:23:29.880] And then you have business users who are using the tools that the AI analyst or AI engineer kind of sets up for them.
[00:23:29.880 --> 00:23:39.040] I think that there's a gap, obviously, that's always been there where the business users often have the domain expertise.
[00:23:39.040 --> 00:23:40.320] They have the business questions.
[00:23:40.320 --> 00:23:51.400] They have that sort of tacit knowledge of what metrics matter and what questions to ask.
[00:23:51.400 --> 00:23:59.600] I think the opportunity, you talk about sort of bridging this gap, this gap has always existed and been a huge issue.
[00:23:59.600 --> 00:24:10.920] I think if you think about the way that analytics, this analytics supply chain has traditionally worked, it has been sort of, I mean, it's been broken, frankly, for years, right?
[00:24:10.920 --> 00:24:19.640] You have this idea of a business user like the chief marketing officer says, hey, why are conversions down today?
[00:24:19.640 --> 00:24:22.600] And they kind of throw that question over the wall.
[00:24:22.600 --> 00:24:32.720] Maybe they write a JIRA ticket and there's a data team that picks it up and maybe they build a dashboard and that dashboard gets sent back to the CMO.
[00:24:32.720 --> 00:24:37.040] And then, of course, it's never the end of the discussion.
[00:24:37.040 --> 00:24:39.560] The CMO says, I want to know why.
[00:24:39.560 --> 00:24:47.880] And in Phoenix, we saw this particular drop or why in APAC, Android devices were not converting as well.
[00:24:47.880 --> 00:24:50.800] There's always a question that follows every question.
[00:24:50.800 --> 00:25:00.280] And that ping pong ball of JIRA tickets or that back and forth today is just so slow.
[00:25:00.280 --> 00:25:20.240] And so I think the promise and the excitement of AI powered analytics is really to maybe bridge this gap where instead of humans on the data team that are
[00:25:20.240 --> 00:25:28.520] getting that ticket when they wake up in the morning and then building a pipeline and building a dashboard that we probably are going to see.
[00:25:28.520 --> 00:25:34.000] And by the way, I think the employment numbers already sort of back this up.
[00:25:34.000 --> 00:25:40.720] I do think that a lot of data teams are going to drastically reduce their size.
[00:25:40.720 --> 00:25:52.440] We are going to be replacing data teams with analyst agents who will instead of waiting for that JIRA ticket, picking it up and taking a few hours.
[00:25:52.440 --> 00:26:02.640] I think we have now the opportunity for the CMO to get answers from a conversational prompt about why conversions are down in APAC for Android devices.
[00:26:02.640 --> 00:26:07.120] That answer could come back maybe in minutes.
[00:26:07.120 --> 00:26:11.200] And so then the question is, what's the role of the analyst in this new world?
[00:26:11.200 --> 00:26:15.680] I think the analyst is not the person who's fishing.
[00:26:15.680 --> 00:26:23.640] You know, the old saying of give a person the fish and they eat for a day and teach them the fish and they eat for a lifetime.
[00:26:23.640 --> 00:26:33.840] I think that the real role of these AI engineers, AI analysts is to set up the systems that the business users interact with.
[00:26:33.840 --> 00:26:37.280] And so data teams are not giving answers.
[00:26:37.280 --> 00:26:40.400] They're building tool chains.
[00:26:40.400 --> 00:26:46.040] They're creating infrastructure that the business can use.
[00:26:46.040 --> 00:26:49.120] And so what does that infrastructure look like?
[00:26:49.120 --> 00:27:00.400] I've got some theories of what types of infrastructure you and I are both building tools that I think this next generation set of teams should stand up.
[00:27:00.400 --> 00:27:14.920] But I think there is promise finally for companies to go beyond dashboards and go beyond kind of JIRA tickets and data teams and this sort of almost sclerotic,
[00:27:14.920 --> 00:27:28.520] slow, bureaucratic way of doing being data driven and replace it with something much more fluid and natural in terms of how business users interface
[00:27:28.520 --> 00:27:34.160] with the most critical data and metrics in their business.
[00:27:34.160 --> 00:27:36.080] I think we're on the cusp of that.
[00:27:36.080 --> 00:27:48.920] And that's really the role of the analyst, I think, is to set up these internal tools, which will streamline for the first time self-serve analytics inside of businesses.
[00:27:48.920 --> 00:27:53.640] So you and I are both on the tool building side of that revolution, right?
[00:27:53.640 --> 00:28:04.160] But for the AI analysts that are listening, what in your point of view and with the types of users that you are obviously interacting with,
[00:28:04.160 --> 00:28:07.160] but also, you know, got a very long career behind you.
[00:28:07.160 --> 00:28:08.560] You've hired a lot of people.
[00:28:08.560 --> 00:28:18.360] You've seen a lot of different eras and different evolutions, different crazies for different languages and different technologies and all of that stuff.
[00:28:18.360 --> 00:28:21.840] We've all seen quite a bit of that.
[00:28:21.840 --> 00:28:36.240] If you think about being an AI or digital analyst today, shifting to become an AI analyst, upscaling to get the best out of this new technology to help them.
[00:28:36.240 --> 00:28:42.360] What would you say would be the areas that they should focus on in terms of upscaling?
[00:28:42.360 --> 00:28:45.600] Is it prompt engineering?
[00:28:45.600 --> 00:28:49.080] Is it? What is it?
[00:28:49.080 --> 00:28:57.040] I am a strong believer that data engineering remains an absolutely critical skill for...
[00:28:57.040 --> 00:28:58.200] I agree.
[00:28:58.200 --> 00:28:59.400] I agree.
[00:28:59.400 --> 00:29:08.080] And it's a superpower for analysts because I'll tell a story of going, harkening back to the early days of big data.
[00:29:08.080 --> 00:29:27.960] One of the signature events at that period was when Netflix open sourced a portion of their ratings data and they initiated what was called the Netflix Prize.
[00:29:27.960 --> 00:29:48.280] To say they were going to award, I think it was a million dollars for a machine learning scientist that could have the best prediction of which movies their Netflix users were likely to watch based on their historical rating patterns.
[00:29:48.280 --> 00:29:55.800] And it was a great sort of corpus of real data to test this idea that could be do predictive analytics at scale.
[00:29:55.800 --> 00:30:12.520] And Netflix and their being smart said, yeah, we'll open source this and we'll have a Netflix prize where every week, you know, I think they released new data and a new data set and folks would try to predict what Netflix users were watching.
[00:30:12.520 --> 00:30:21.680] And what I remember talking to some of the prize contestants was one of them I was talking to was a PhD at Berkeley.
[00:30:21.680 --> 00:30:29.560] And he said how he is in the statistics department and so many statisticians would come to him and say, oh, David, you should try this algorithm.
[00:30:29.560 --> 00:30:31.560] This algorithm would be a lot more powerful.
[00:30:31.560 --> 00:30:33.800] And he would say, sure, well, you could try it yourself.
[00:30:33.800 --> 00:30:43.920] But they had no ability to get so much of the data prep work of getting the data from Netflix, putting it into a matrix that would fit in memory.
[00:30:43.920 --> 00:30:48.600] There was just a lot of work to prepare and get that data amenable for analysis.
[00:30:48.600 --> 00:30:50.080] That was ETL work.
[00:30:50.080 --> 00:30:55.080] And the vast majority of people who wanted to compete in the Netflix prize didn't have any data engineering skills.
[00:30:55.080 --> 00:30:58.960] They needed that data to be put on the silver plate for them.
[00:30:58.960 --> 00:31:01.280] And that's a huge crutch for folks.
[00:31:01.280 --> 00:31:20.040] And so I think, especially in the era of, you know, Claude and Copilot and, you know, now we've got Codex, I think that if an analyst is able not just to build, you know, a metrics layer or semantic layer or,
[00:31:20.040 --> 00:31:37.840] you know, build dashboards or visualizations or prompts, but to go down the stack and actually be able to orchestrate raw logs of data, get close to the source and actually get that data all the way through the stack to business users.
[00:31:37.840 --> 00:31:39.160] I think that's a superpower.
[00:31:39.160 --> 00:31:41.800] And that does require data engineering skill.
[00:31:41.800 --> 00:31:43.680] It's getting easier.
[00:31:43.680 --> 00:31:58.120] Again, the AI agents can help you write transformation code to extract from a data lake or, you know, object storage and get it into a database like ClickHouse or Redshift or Snowflake as you desire.
[00:31:58.120 --> 00:32:08.840] So I think data engineering is critical because what you actually have the opportunity to come true there is the one human data team.
[00:32:08.840 --> 00:32:14.960] I think people talk about the one person, you know, billion dollar company, which I think you're on the cusp of seeing that.
[00:32:14.960 --> 00:32:18.360] But I think AI provides leverage.
[00:32:18.360 --> 00:32:29.720] And so I think we're on the cusp of seeing one person data teams that can really span the gamut from data engineer to analysts.
[00:32:29.720 --> 00:32:30.800] And I've seen it actually.
[00:32:30.800 --> 00:32:39.600] I've seen some of our users at RIL, our CTOs and VPs of engineering who on the one hand have business context because they're in the meetings with the CMO and the CFO.
[00:32:39.600 --> 00:32:42.720] And on the other hand, they know where the where the data lives.
[00:32:42.720 --> 00:32:43.960] They've got the credentials.
[00:32:43.960 --> 00:32:47.520] They know where their cloudflare logs are located.
[00:32:47.520 --> 00:33:13.680] And they are capable in a couple of hours of essentially getting to the bottom of really mission critical business questions and again build a tool that can answer key business questions and without having to bring a team of folks and frankly without having to bring, you know, 17 SaaS vendors of the modern stack, you know, into the into the picture.
[00:33:13.680 --> 00:33:18.000] Well, that really is a dream skill set, right?
[00:33:18.000 --> 00:33:25.560] Being able to go like fully to the left and then extend fully for stack.
[00:33:25.560 --> 00:33:34.680] And I think the critical aspect is really in that is obviously you have to have the technical skills because you can generate all the code you want if you don't know how to architect.
[00:33:34.680 --> 00:33:36.440] It's not going to be very good.
[00:33:36.440 --> 00:33:50.480] So you really do need to have skills there. And so much of the data engineering, the pipeline creation is about the business context because you can put that data together in many ways.
[00:33:50.480 --> 00:33:53.680] And most of them are not very useful.
[00:33:53.680 --> 00:34:02.400] You really do need to understand something about how it's going to be used in order to create the data pipeline correctly.
[00:34:02.400 --> 00:34:03.000] Right.
[00:34:03.000 --> 00:34:03.560] Absolutely.
[00:34:03.560 --> 00:34:04.960] Which is why it's so valuable.
[00:34:04.960 --> 00:34:27.600] If you can, you know, if you can traverse that, you know, all the go all the way from the left of data engineering and pipelines all the way to the right of the of the business use cases, it is so powerful because we use that term predicate push down in analytics where, you know, if you're going to filter data out, you'd like to filter it out at the database level.
[00:34:27.600 --> 00:34:31.560] Not after you've, you know, moved all that data out of the database.
[00:34:31.560 --> 00:34:37.280] You know, you'll say it's much more efficient to push logic down the stack.
[00:34:37.280 --> 00:34:48.360] Similarly, I mean, in our world, for example, in digital media, not every, you know, not every advertising event is created equal.
[00:34:48.360 --> 00:35:03.400] And, and we know that in online programmatic auctions, you know, oftentimes in any auction marketplace, you'll have many, many bidders and only one bidder will win an auction.
[00:35:03.400 --> 00:35:20.480] And if you don't have that business context to know that it's the winning bid that matters, you can end up carrying around, you know, two orders of magnitude more data than you need to answer a business question at the at the end of that pipeline.
[00:35:20.480 --> 00:35:40.960] And so the ability to kind of, I would say, filter out a lot of the noise and enrich the signal have that requires business expertise to know what signal is really important and what log files you might be able to sample or even just throw away, you know, and throw that away at the beginning of your analysis.
[00:35:40.960 --> 00:35:44.480] So a lot of that engineers don't have that business context.
[00:35:44.480 --> 00:35:49.720] That's fascinating because that almost answers what was going to be my next question.
[00:35:49.720 --> 00:35:58.600] I think really the, you know, the main challenge is the context and the continuity of the context across the entire process.
[00:35:58.600 --> 00:35:59.280] Right.
[00:35:59.280 --> 00:36:07.880] And if you have only one person managing that whole process, then you can consider that that person has all the context.
[00:36:07.880 --> 00:36:12.080] But then if you have questions, you also always have to go to that that person.
[00:36:12.080 --> 00:36:12.680] Right.
[00:36:12.680 --> 00:36:14.560] And that person has limited time.
[00:36:14.560 --> 00:36:17.400] Yes, like everybody.
[00:36:17.400 --> 00:36:22.400] Embedding the context in the process, in the tools.
[00:36:22.400 --> 00:36:22.880] Yes.
[00:36:22.880 --> 00:36:30.000] You know, we used to hear, oh, you have to do documentation, which is very funny when you've been in this for 20 years.
[00:36:30.000 --> 00:36:34.920] Right. You're like, yeah, show me one instance when that actually works.
[00:36:34.920 --> 00:36:49.000] The context has to be something that is alive and that has to be that is, you know, automated with every interaction and has to be something that is intelligent because it has to handle exactly the kind of scenarios that you just mentioned.
[00:36:49.000 --> 00:36:51.160] Right.
[00:36:51.160 --> 00:36:56.680] How do you see that context question coming teams?
[00:36:56.680 --> 00:36:59.120] I'm not talking about the one person data team, right?
[00:36:59.120 --> 00:37:04.000] But common the type of teams that real works with.
[00:37:04.000 --> 00:37:10.200] Well, I think there's there's two broad categories of context that really matter.
[00:37:10.200 --> 00:37:12.400] And one is a lot easier to solve for than the other.
[00:37:12.400 --> 00:37:15.360] The first, but both very valuable.
[00:37:15.360 --> 00:37:20.400] The first, I would say, category of context is sort of macro context.
[00:37:20.400 --> 00:37:24.040] This is what the large language models are are actually great at.
[00:37:24.040 --> 00:37:29.960] They have a notion of what an advertising campaign is.
[00:37:29.960 --> 00:37:33.000] They have a notion of what an auction is.
[00:37:33.000 --> 00:37:45.960] They they understand sort of, you know, conceptual entities, frankly, better than any junior or even senior data engineer might know.
[00:37:45.960 --> 00:38:00.640] And so when you I've seen with great success, if you give a data set to an LLM and you say, can you do you know, can you sort of define a set of an ontology for this data set?
[00:38:00.640 --> 00:38:17.280] Can you provide me a set of dimensions and measures that are useful for a tool from the status that it will do a very good job at defining a metric like MA or DAU given user session locks.
[00:38:17.280 --> 00:38:19.400] And it will, you know, which is fantastic.
[00:38:19.400 --> 00:38:21.880] That's often better than most data engineers would do.
[00:38:21.880 --> 00:38:25.440] It's it's got a role model.
[00:38:25.440 --> 00:38:28.640] And so that's that's great.
[00:38:28.640 --> 00:38:51.000] The harder the second category of context, I would say, I don't know if I would call it macro micro context, but I I might call it, you know, you know, company domain context is that while, you know, there's a set of concepts out there in the world that MA and DAU that everyone uses every week.
[00:38:51.000 --> 00:39:09.520] You and I know from working with lots of companies that every company has their own bespoke custom concepts and ontologies and generally every department has has their own their own variation and takes the source for different fields in different systems.
[00:39:09.520 --> 00:39:10.520] Right.
[00:39:10.520 --> 00:39:15.720] And this is the the the messy reality.
[00:39:15.720 --> 00:39:33.680] You know, probably the only one of the only areas where things have gotten a little cleaned up after centuries is you look at accounting, right, which is kind of in some ways that, you know, the first real data scientists and data analysts worked your accounting money.
[00:39:33.680 --> 00:39:40.440] And we have the generally accepted accounting principles gap, you know, that defines a set of standards.
[00:39:40.440 --> 00:39:50.760] After centuries, we can finally mostly agree on what EBITDA and profit and bookings and, you know, these terms are these metrics are only mostly yes.
[00:39:50.760 --> 00:39:51.440] Mostly, right.
[00:39:51.440 --> 00:39:55.920] We still see people on Wall Street debating, you know, and gaming these metrics.
[00:39:55.920 --> 00:40:02.920] But but when it talks about when we talk about what is a customer, how do we count active customers?
[00:40:02.920 --> 00:40:07.120] You know, that's still there's a lot of right.
[00:40:07.120 --> 00:40:11.120] And forget customers. How do we count whether, you know, an item was shipped?
[00:40:11.120 --> 00:40:18.800] Is it, you know, or an item was returned or all of these bespoke concepts that every company has?
[00:40:18.800 --> 00:40:20.880] That's I think could be extremely challenging.
[00:40:20.880 --> 00:40:25.120] And that's where I think there's opportunity.
[00:40:25.120 --> 00:40:34.040] And so it's sometimes just very operationally sound to have differences like what is revenue?
[00:40:34.040 --> 00:40:40.800] If I want to optimize my ads, it's going to be revenue that is counted by my ad pixel.
[00:40:40.800 --> 00:40:46.320] If I want to optimize overall, it's going to be revenue, deduplicate, MGA4, right.
[00:40:46.320 --> 00:40:51.400] Or whatever counts that like operationally makes complete sense to have that flexibility.
[00:40:51.400 --> 00:40:59.200] Yes, sure. And we have and I think we've seen again in the world of, you know, let's say retail,
[00:40:59.200 --> 00:41:00.960] you know, retail advertisers.
[00:41:00.960 --> 00:41:09.400] If you're Warby Parker or you're if you're for your Netflix and you're trying to optimize an advertising campaign,
[00:41:09.400 --> 00:41:14.120] there are different ways to measure the return on that advertising spent.
[00:41:14.120 --> 00:41:18.360] And so, like you said, you want that flexibility.
[00:41:18.360 --> 00:41:25.520] Some retailers might say we want to optimize for number of glasses sold.
[00:41:25.520 --> 00:41:30.240] Some people want to optimize for number of new customers that they sign up.
[00:41:30.240 --> 00:41:35.400] So someone optimize for just pure revenue.
[00:41:35.400 --> 00:41:43.840] So there's so many ways that I think companies may decide that there's some metric that they want to optimize,
[00:41:43.840 --> 00:41:49.520] you know, advertising spend for as an example that you need that flexibility.
[00:41:49.520 --> 00:41:58.480] Right. And if you're running a platform, you may you may be working with a hundred different advertisers who all have different metrics that they want.
[00:41:58.480 --> 00:42:01.160] They actually want to pay you based on a different set of metrics.
[00:42:01.160 --> 00:42:08.280] So you need to be able to flex your measurement for all of those kind of combinations.
[00:42:08.280 --> 00:42:20.880] And so talking about, you know, building this and automating those processes in a world where we have analytics agents that allow us to automate some parts of our workflows
[00:42:20.880 --> 00:42:30.320] and where analysts become sort of more orchestrators of these agents than necessarily what we are today as analysts,
[00:42:30.320 --> 00:42:36.720] which is mostly people who type on a keyboard and our operators are different of different platforms.
[00:42:36.720 --> 00:42:40.920] Do you already see some of that within your users?
[00:42:40.920 --> 00:42:45.080] And if you do, how does that manifest?
[00:42:45.080 --> 00:42:55.640] So if maybe maybe flesh that out for me for a moment, you're suggesting the shift in how the analysts sort of do their role.
[00:42:55.640 --> 00:43:04.280] Is that what you're? Yeah. Well, basically, I am thinking, you know, currently, let's take a very simple example.
[00:43:04.280 --> 00:43:14.200] Code generation. Right. Right. What do you do when do when you when you get an LLM to generate code for you or cursor or whatever it is, you prompt it.
[00:43:14.200 --> 00:43:22.040] It starts working. And depending on how good your prompt is and how good the agent and how complex can take a while.
[00:43:22.040 --> 00:43:25.120] You're not going to stare at your screen during that time. Right.
[00:43:25.120 --> 00:43:29.720] You're going to in parallel do another process and in parallel do another process.
[00:43:29.720 --> 00:43:38.920] And then at some point, not right now. Right. But you hope that at some point we get to a stage where process one is going to call you saying, hi.
[00:43:38.920 --> 00:43:41.760] Right. Decision point. You know, checkpoint here.
[00:43:41.760 --> 00:43:52.320] Please make a decision. And in analytics, this is probably a little more prevalent than in software engineering because of that exact iteration loop that you mentioned about, you know, we have a question.
[00:43:52.320 --> 00:43:56.320] We have an answer that that basically means three more questions. Yes.
[00:43:56.320 --> 00:44:02.360] Because every decision point, every every data point becomes a decision and a fork. Right.
[00:44:02.360 --> 00:44:07.080] In the process. And there is no unit test that tells you everything you did is valid. Right.
[00:44:07.080 --> 00:44:13.680] So within a world like that, you end up spending more time sort of orchestrating different processes.
[00:44:13.680 --> 00:44:15.600] Yes. Architecting. Yes.
[00:44:15.600 --> 00:44:27.120] Normal processes. And this is something that talking to, you know, somebody who handles a large data team in a company that actually builds tools as well.
[00:44:27.120 --> 00:44:38.520] Yeah. She told me, oh, we're actually, you know, that project planning, that project management and that chunking and planning aspect is something that is difficult for a lot of people.
[00:44:38.520 --> 00:44:44.480] So I'm thinking this is something, you know, an aspect that we need upskilling in. Right. Right.
[00:44:44.480 --> 00:44:49.880] It's sort of sort of I don't want to call it analytics architecture because like it's not about architecting tools.
[00:44:49.880 --> 00:44:57.320] It's about architecting processes and sort of managing a virtual team that does part of these processes for you.
[00:44:57.320 --> 00:45:10.760] So that's what I'm asking is like within the users that you that you have, you know, because they're deeply technical and they're used to thinking architecturally because they have to architect complex systems for big data analytics.
[00:45:10.760 --> 00:45:17.200] It's not simple. Right. Yes. Do you see some of that already emerging?
[00:45:17.200 --> 00:45:32.880] For sure. I think, you know, in some ways, we're seeing a shift in the nature of knowledge work from, as you mentioned, for a long time, you had kind of the managers and then the, you know, the doers.
[00:45:32.880 --> 00:45:36.120] Right. And so a manager would say, we're going to break up this analytics project.
[00:45:36.120 --> 00:45:43.400] I'm going to have my data engineer go write the pipeline. I'm going to have my analyst to find a set of measures and dimensions on that data.
[00:45:43.400 --> 00:45:52.000] And I'm going to have my, you know, collaborator in the marketing team give us input on the, you know, the dashboards that we're going to design.
[00:45:52.000 --> 00:45:55.160] Right. For the business stakeholders.
[00:45:55.160 --> 00:46:07.880] And I think, I think now increasingly a lot of the work will be done by agents. And so any of us who are workers really become managed, everyone becomes a manager.
[00:46:07.880 --> 00:46:20.920] Everyone needs to learn how to break work up, how to frankly, you know, prompting is a form of managing how to clearly communicate to agents the task that needs doing.
[00:46:20.920 --> 00:46:35.600] And so absolutely, I think the, the, those who are good, you know, break thinking about the architecture of a project and how to break it up into a set of smaller tasks that could be delegated.
[00:46:35.600 --> 00:46:43.880] Whether that was previously to humans and increasingly to agents, I think will be much more successful.
[00:46:43.880 --> 00:47:02.320] I think that also, I think there's again, unlike, not unlike management, there's something that, you know, people talk about task relevant maturity in management where depending on the maturity of a team member you're working with,
[00:47:02.320 --> 00:47:06.480] do you let them work for days before you kind of check in and see how they're doing?
[00:47:06.480 --> 00:47:11.800] Or do you kind of have a daily check in or even, you know, multi multiple times a day check in?
[00:47:11.800 --> 00:47:18.240] I think right now in the, in the current state of agents, I do think they're not very mature.
[00:47:18.240 --> 00:47:23.040] And so the idea that they're toddlers, yes, they're they're very junior.
[00:47:23.040 --> 00:47:30.400] And so you would just like you would never have an intern go off for two weeks and then come back and give you the results of their work.
[00:47:30.400 --> 00:47:36.280] And, you know, a lot of people don't hire interns because they do require a lot of management, a lot of check in.
[00:47:36.280 --> 00:47:42.280] I would say that a lot of agents today are at the level of like, intern, intern level capability.
[00:47:42.280 --> 00:47:46.680] Now, there's lots of them, but you do need to check in several times a day.
[00:47:46.680 --> 00:47:51.000] And just that's going to change.
[00:47:51.000 --> 00:47:52.480] We're going to seek more and more maturity.
[00:47:52.480 --> 00:47:56.560] But, but I think the nature of work is shifting.
[00:47:56.560 --> 00:48:05.240] And yes, people who the upskilling is how do you manage workers, communicate with them, evaluate their work, give good feedback.
[00:48:05.240 --> 00:48:08.880] All of these things that, frankly, not everyone loves to manage.
[00:48:08.880 --> 00:48:20.000] Right. Some people like just to write code, but writing code might go the way of data entry and, you know, digging, digging holes with a shovel.
[00:48:20.000 --> 00:48:25.360] You know, writing code may eventually not be a job that any human does.
[00:48:25.360 --> 00:48:28.680] No, but I think reading code is right.
[00:48:28.680 --> 00:48:35.120] Because what you what you expressed, I love this analogy of management, because basically, as you said,
[00:48:35.120 --> 00:48:37.880] prompting is like talking to your team.
[00:48:37.880 --> 00:48:39.800] You need to know the LLM.
[00:48:39.800 --> 00:48:42.320] You need to really understand LLMs.
[00:48:42.320 --> 00:48:43.880] They're not magic. They're tools.
[00:48:43.880 --> 00:48:47.760] You need to really understand how they work and how they're different.
[00:48:47.760 --> 00:48:51.360] And that's, you know, know your team.
[00:48:51.360 --> 00:48:57.920] Conversely, like, you know, this is probably the first time in history where we can actually say, I wrote this, but I didn't read it.
[00:48:57.920 --> 00:48:59.400] And a lot of people do.
[00:48:59.400 --> 00:49:02.480] That's generally not a good, not a good result.
[00:49:02.480 --> 00:49:07.560] If you generate code, you have to learn how to read it and how to evaluate it.
[00:49:07.560 --> 00:49:08.360] Absolutely.
[00:49:08.360 --> 00:49:12.320] Because it still has to integrate with the rest of your system, etc.
[00:49:12.320 --> 00:49:14.840] All of that all of that still has to happen.
[00:49:14.840 --> 00:49:19.640] I think most importantly, you have to keep your critical sense about you.
[00:49:19.640 --> 00:49:22.680] You have to you have to actually check everything.
[00:49:22.680 --> 00:49:25.240] Right. Right. Yes, absolutely.
[00:49:25.240 --> 00:49:32.920] I think the role of architects is actually going to be even more important than ever.
[00:49:32.920 --> 00:49:39.200] Because I think, you know, it's not enough just to ask an agent to solve a problem with code.
[00:49:39.200 --> 00:49:43.560] You really do need to hint about how you would like it to solve the problem.
[00:49:43.560 --> 00:49:46.120] And you need to have enough domain.
[00:49:46.120 --> 00:49:52.880] You need to have enough expertise that you guide that agent to solve it, I think, in a thoughtful way.
[00:49:52.880 --> 00:50:02.480] Because I'll just observe that just like Junior, you know, when I was a teaching assistant in graduate school,
[00:50:02.480 --> 00:50:07.160] we observed that the best programmers would often write the shortest.
[00:50:07.160 --> 00:50:13.040] Their homework assignments would have the fewest number lines of code to solve the problem.
[00:50:13.040 --> 00:50:21.040] The least mature developers often, you know, often solve problems with them with many more lines of code.
[00:50:21.040 --> 00:50:27.960] And so if you look at AI agents writing software today, they tend to be on the verbose side.
[00:50:27.960 --> 00:50:37.000] And that can be a problem. Unmanaged, you end up with a lot of just, you know, slop essentially in your code base.
[00:50:37.000 --> 00:50:41.720] So, Mike, as usually when we talk, I could do this for hours.
[00:50:41.720 --> 00:50:47.080] However, we are coming out of time and we sort of have to wrap up a little bit, unfortunately.
[00:50:47.080 --> 00:50:49.520] So really is in public better.
[00:50:49.520 --> 00:50:53.760] You're tackling this data lake to dashboard in minutes vision.
[00:50:53.760 --> 00:51:01.040] And you've got this unique architecture combining last mile ETL in memory database, operational dashboards, all in one tool.
[00:51:01.040 --> 00:51:05.760] Where can people find real? So shameless clock time.
[00:51:05.760 --> 00:51:09.400] Where should it go to try it out? And where should it follow your work?
[00:51:09.400 --> 00:51:13.720] You know, keep up with what you're building. Also, are you hiring?
[00:51:13.720 --> 00:51:17.960] You know, anything you want to sort of put out as a message?
[00:51:17.960 --> 00:51:26.520] Sure. Well, first, if you're if you are a data engineer or data engineering,
[00:51:26.520 --> 00:51:35.880] you know, biased data analyst, you can you can use our tool and get it running on on your local Mac book in literally seconds.
[00:51:35.880 --> 00:51:40.320] Real data dot com. It's our I L L data dot com.
[00:51:40.320 --> 00:51:47.920] We are actually beyond public beta. At this point, we're alive with some of dozens of the largest enterprises.
[00:51:47.920 --> 00:51:54.880] And the world are using real today. Folks like Comcast, AT&T, some of the largest fintech firms are leveraging our tool.
[00:51:54.880 --> 00:52:01.160] But we have a free open source real developer tool that you can run just, you know, on your own.
[00:52:01.160 --> 00:52:07.920] And so folks are this is just download. You don't even have to give to pay with your personal data.
[00:52:07.920 --> 00:52:12.000] That's right. You can. That's amazing locally. You can write locally and securely.
[00:52:12.000 --> 00:52:15.680] And yeah, we support DuckDB and ClickHouse on that tool.
[00:52:15.680 --> 00:52:23.800] So, yeah, I would encourage any A.I. native data engineer that wants to impress your colleagues.
[00:52:23.800 --> 00:52:28.800] You can build not just dashboards, but conversational analytics and minutes with the tool.
[00:52:28.800 --> 00:52:37.480] So we would love to and we have a Discord channel if you want to hang out with a bunch of other data geeks that are trying well,
[00:52:37.480 --> 00:52:40.880] which you can sign up for. It's fun. We're in that channel. It's fun.
[00:52:40.880 --> 00:52:46.480] Yes, it's a lot. You know, I'm just going to, you know, shameless plug we we use real as well at Ask-Y.
[00:52:46.480 --> 00:52:50.240] Of course. Of course. We love having you.
[00:52:50.240 --> 00:52:53.800] And so, well, thank you for that, Mike. It's been a pleasure.
[00:52:53.800 --> 00:52:57.720] So that's episode two of Knowledge Distillation.
[00:52:57.720 --> 00:53:04.400] If you're a data analyst trying to navigate this evolution from worrying about scale to worrying about context,
[00:53:04.400 --> 00:53:08.000] check out ask-y.ai and try Prism.
[00:53:08.000 --> 00:53:15.920] Thank you for listening. And remember, bots handle the what, A.I. analysts handle the why.
[00:53:15.920 --> 00:53:19.760] Thanks to Tom Fuller for the editing magic on this episode.
[00:53:19.760 --> 00:53:26.080] If you want to work with Tom, head to ask-y.ai and check out the show notes for his contact info.
[00:53:26.080 --> 00:53:29.520] [MUSIC]