The Role Of Model Development In Machine Learning Systems

Hello, and welcome to The Machine Learning Podcast. The podcast about going from idea idea to delivery with machine

learning.

Your host is Tobias Macy. And today, I'm interviewing Josh Tobin about the state of industry best practices for designing and building ML models and the work that he's doing at Gantry to help support teams who are taking that journey. So, Josh, can you start by introducing yourself?

Hi. I'm Josh Tobin. I'm cofounder and CEO of a machine learning tooling and infrastructure

startup called Gantry.

My background is in machine learning research. I was a research scientist at OpenAI for a few years in the early days working on deep learning robotics.

Did a PhD at Berkeley doing similar things, and I'm really excited to be here. And do you remember how you first got started in machine learning? Yeah. I do.

I actually

was a grad student at Berkeley in applied math and,

you know, was trying to figure out what exactly I was going to apply math to. And so I was taking a bunch of classes,

in different departments and stumbled into,

an incredible class that I was

wholly unprepared for on, machine learning applied to robotics,

but just pretty quickly fell in love with it and saw the potential for this technology to have a huge impact and, for robotics to have a huge impact as well. And so that's that's,

it was all downhill from there, I guess.

And so

the topic at hand is around the practice of modeling and what that really means in the current ML ecosystem and climate.

And before we get too far down the road, I'm wondering if you can just start by giving what your sense of a so called traditional process for building a model looks

like and some of the forces that helped to shape those,

so called best practices at the time that they were established.

Yeah. Absolutely. So when I learned how to do machine learning in school,

the process that I learned was, you know, someone gives you a dataset that you're supposed to work on, and

they tell you this is the metric that you need to optimize on that dataset. And then your job as a machine learning practitioner is to make that metric go up or make it go down.

And machine learning as a field evolved to be really, really good at that process of taking a dataset and a metric and

optimizing it super well to get to the the best possible outcome for that combination.

And I think in industry,

you know, in the early days of machine learning adoption, it largely followed the same pattern. I think, you know, Kaggle was a big driver of that, right, where, you know, sort of Kaggle reproduces that that pattern of, you know, assuming you have a fixed dataset and a fixed metric. But I I see that pretty quickly changing in the companies that we work with.

In terms of that kind of fixed metric and a fixed dataset, what are some of the

aspects of the current climate around data and and machine learning and the ways that it's being applied that are starting to

unravel some of those assumptions and push teams into new and different what

machine

the what machine learning teams are being tasked to to do.

So, you know, in academia, again, your your your goal is to beat a benchmark and write a paper. Right? And and if you, are able to make your metric go below a certain number, then, you know, you get a NERICS paper and you get a PhD and you get to go work at Google and, you know, make a $1, 000, 000. You know, and in the early days of machine learning getting applied in most companies, like, let's call it, like, the 2010 to 2020 era,

outside of, you know, the largest and most successful tech companies.

The machine learning teams were kind of tasked with applying modeling in a similar way, often in an r and d function or sometimes in an analytics function where your job is to produce an answer once And, you know, maybe that answer is used to do something,

to do something strategic, like decide where to build the next McDonald's,

or decide, you know, which leads your sales team should spend most of their time on. But that answer was, you know, rarely being applied in kind of, like, a real time setting in front of actual end users.

So as

machine learning kind of moves into

the 20 twenties,

I see machine learning teams increasingly

getting tasked with doing solving a different problem, which is, you know, rather than trying to produce the right answer once, they're being tasked with actually building

real products that are

out in the world interacting with the, end users of the company, whether that's, you know, their actual customers, their their actual users, or internal end users.

And that difference has pretty profound implication

because

end users

don't have, like, the the data that end users, interact with is, the data that end users produce when they interact with the model is not static.

They, end users'

behavior tends to change all the time

and,

the world changes around them. And so this this assumption that we make in machine learning academia or machine learning as an analytics function that

we're, it's okay to make static assumptions about the data that we work with,

Doesn't really hold true in the sort of new practice of building products with ML.

And there are a couple of things that I'm interested in digging into from what you were talking about. 1 that I'll add a note for and maybe we'll come back to it is that topic of real time data, which has been gaining a lot of ground in the data engineering space as well. And I've talked about it in a bunch in my other show, the data engineering podcast.

And

I'm just curious, maybe we'll circle back to it, whether you see the adoption of real time data

being driven by

machine learning teams to be able to incorporate that into their models or

the driving force of real time machine learning being the fact that there is availability of that data now. Because for a long time, streaming was again kind of the the realm of the elite tech companies and everybody else just wished they were doing it. Yeah. And then the other piece that I'm interested in digging into

is on that question of machine learning

kind of as an analytics function versus machine learning being the product. And I think that might be the more interesting angle to go down right now and kind of what you see as being some of the

pivotal forces that have moved us from

the the idea of, you know, static analytics to predictive analytics to prescriptive analytics to now machine learning being the actual product that people are interacting with versus just a kind of sidecar function.

Yeah. Sounds good.

What's, like, what's most interesting for me to dig into there?

I guess, just what you see as being some of the factors that have enabled that as a consideration

and have pushed people to move out of that state of machine learning is just there to give me answers to questions that I have as a business owner versus

I actually want machine learning to be the core front and center element of what I'm offering to my customers.

Yeah. Absolutely.

So I I 1 factor is data availability. So, as organizations mature

their underlying data infrastructure,

I think they go through a maturity curve where, you know, early on in that process,

if, you know, data is completely spread out in different silos in the organization and

data access is limited, data quality is poor. It's really difficult to build machine learning systems on top of that. But as companies move through that data maturity curve

and, you know, start to centralize data and have better data access, better data quality,

better better data governance,

then, that reduces the barrier to teams, you know, trying to build stuff with that data. And, you know, 1 of the more interesting things that they can build with it is, you know, tends to be machine learning applications.

The other factor that's driving it,

well, a second factor I think that's driving it is

the,

a reduced

barrier to adoption of machine learning as a technology.

So, you know, when I first do it started doing machine learning, if you wanted to be on the cutting edge, like, if you wanted to be,

building state of the art machine learning systems,

it was

an exercise in pain tolerance, I would say. You know, I remember spending a lot of time,

setting up a GPU machine under my desk and making sure I always had the latest CUDA drivers installed and,

you know, downloading

these

somewhat questionable,

very difficult to use,

frameworks for building deep learning systems like, Caffe and Theano back in the day.

Now if you wanna access state of the art in machine learning, it's just an API call away.

So state of the art is getting easier and easier to use. More and more people know that know enough of the basics in order to be able to use it. So, I taught a a,

production machine learning class at Berkeley,

couple years back, and I was asking 1 of the the undergrad students,

you know, what fraction of Berkeley CS undergrads

these days take at least 1 machine learning class? And his answer was, yeah. I mean, it seems like it's probably at least 70%. Right? So the next generation of software engineers has at least the fundamental skill set to do this,

And the algorithms and the frameworks are are converging. So the second factor that's driving this, in addition to data availability, is just, like, lowering the barrier to entry to try this technology.

And then the third is,

like, rapidly,

expanding capabilities. Right? So, you know, GPT

3 and,

but even going beyond that, like state of the art models and other domains

are becoming more and more capable. And so the types of tasks that you can apply them to, are wider and wider, and I'm seeing, you know, businesses that otherwise

would be probably years away from adopting machine learning where the technology was 4 or 5 years ago that are experimenting

with putting it into their products,

much earlier in their company life cycle than they otherwise would have. And as people are moving into this realm now of deep learning being probably the default way that people are thinking about building their machine learning systems

and the general availability of the tooling to be able to make use of it and the improved sophistication of the underlying data capabilities,

what are some of the practices and lessons that are still necessary and useful in this modern age of modeling that people maybe are starting to

fall out of the habit of doing and that they should be reminded of as they are kind of exploring the new and evolving frontier of what machine learning and machine and and model building looks like.

Yeah. Absolutely.

I I have a bunch of thoughts on that, but I think maybe it'd be productive first

to break down I I think it's actually productive to think about 3 different paradigms for doing machine learning. So the the first paradigm is, like, you can think of it as, like, ML 1. Right? And that's traditional machine learning. It's building regression models. It's building,

tree based models. Basically, your your your scikit learn toolkit. And that's usually,

applying it to tabular data.

It's,

and it's usually pretty,

pretty bespoke to the to the actual, like, business application that you're doing. It's highly

reliant on the,

specifics of your problem domain, and a lot of the work involved in it is,

feature engineering because, you know, you have to really understand

your data and your problem area to make the algorithms work well, you know, by designing the features that will allow them to solve the prediction problem that you need them to solve.

Then in starting in around 2012, like, deep learning emerged as a second paradigm.

Deep learning makes a lot of things easier, but also it makes some things harder. The things that it makes easier are that it's,

it's more standardized in the sense that, you know, typically it operates on unstructured data, text images, things like that. And, because of that, the the effort involved in

needing to kind of tell the algorithm really, really specifically,

you know, these are the features that you need to look at in order to solve this problem, is reduced. You You know, oftentimes,

the these models take a pretty standard set of inputs, just, you know, some some text or some images

or, or what have you.

So that's the thing that's a lot easier about deep learning than,

than, like, ML 1 dot o.

The thing that's harder is that machine learning or is that deep learning

is typically more resource hungry. It's more data hungry. It's more compute hungry, and it's a bit more of a specialized skill set. So it's, like, maybe more expensive to get started, but then once you have a model up and running, it can be more standardized.

And then I think what's really interesting is that there's

a third paradigm emerging for building machine learning applications,

which you can think of as, like, machine learning 3.0.

And what this is is, like,

building machine learning powered products on top of,

foundation models, on top of large language models like GPT 3, ChatGPT,

and Equivalentsmart from, you know, an open source or from other companies.

And I think

it's you know, a lot of people who who haven't built with this technology yet don't realize how different it is from that machine learning 2.0,

toolkit and skill set. And the the big, big difference is that you can get really far in this, like, 3.0 paradigm

without needing to train models at all.

These models have become capable enough that just by

engineering the context of what you sent to the model, the prompt of what you send to the model, you can usually get to a prototype that works pretty well,

without needing more than, like, maybe a handful of of, of data points.

And, you know, that doesn't mean that you won't eventually need to train the model or eventually need to put a lot of engineering work into making the model work well if you wanted to, you know, solve all of the problems that you want to solve. But

the difference between, you know, needing to have, like, a a dataset and,

a training infrastructure setup and,

you know, all of the work that goes into being ready to even run a deep learning experiment, the difference between that and being able to just, you know, fire off an API call and hack around with a prompt is, like, I think, multiple orders of magnitude

improvement in the speed of getting to an MVP.

And that I think is a pretty profound implication for how companies are are, are building applications

with this, like, 3 0 stack. And

for teams who

are using this large language model or foundation model approach,

As you said, it's definitely a lot faster to get started. It's a lot faster to get some get to something that is functional.

But I'm curious what you see as some of the risk factors

in evaluating

which model do you build your foundation or do you use as your foundation?

What are some of the

potential customizations that you might need to do? How do you identify whether or not you're going to be able to get all the way to your desired end state with that foundation versus having to get, you know, halfway there or 3 quarters of the way there, then having to go all the way back to the beginning to actually build your own custom model because it doesn't quite have all the right kind of source information or source parameters that you need to be able to get to your kind of final desired end state.

I think there's like, the the 2 biggest risk factors that I see are, like, a platform risk and a technology risk.

The platform risk is that

if you are building on top of

these foundation model providers,

if you're building on top of,

OpenAI,

Cohere, really any of these providers,

the these platforms

introduce constraints.

They have outages.

They have, you know, latency requirements that you can't really work around.

And the the access that they give you to the models themselves is

somewhat limited.

You can do a lot of things with them, but you can't do everything with them. And so that, you know, that introduces the risk that at some point down the line, you're gonna realize, hey,

this

the the offering from this provider is not

flexible enough for me to really do the thing the exact thing that I want to do, and so I need to go backward.

I think that's that's a risk, but I think the bigger risk is the, like, underlying technology risk, which is that, you know, these

these systems,

these

large language model based systems are incredibly capable and

very easy to get started with relative to other machine learning systems, but they're not silver

bullets.

They

have

very real and very pronounced failure modes as we've all seen at this point with,

Bing and Sydney and things like that. And

getting from an MVP

to an actual real

reliable working product that runs in production

and, you know, consistently produces the output

that you feel comfortable putting in front of your users is very difficult.

And that is, at this point, an inherent limit, limitation of the technology.

It's overcomeable, but it's,

you know, I I think, like, the biggest failure mode that I see with companies applying these technologies right now is

you can get to, you can get to something that works 80 or 90% of the time extremely quickly, which produces a lot of confidence that, like, hey. We can you know, we've got a product here. We can ship this thing.

But then getting from that, like, 80 or 90% level to something that's as reliable as you'd want is often the hardest part. And for teams who are

either

new to machine learning or maybe they've been doing it for a while, I'm curious what you see as some of

the variance in terms of the necessary skills and

internal capabilities and tooling and some of the types of investment that they need to be thinking about making to

bring themselves into this modern era of model building and validation and operation?

Yeah.

I I think the interesting thing

about this, like, 3 0 foundation model stack is that you really don't need

traditional machine learning skills to get started.

A lot of the companies that I see building with this,

they went out and tried to hire someone who had experience with

this

skill set and couldn't, you know, couldn't find anyone because it's brand new. And so they just had their engineering teams try to get started and were able to figure things out pretty quickly. So that's, I think, 1 of the big differences.

You know, in in the old world for deep learning, you kind of do need people who are pretty specialized in understanding the technology

more or less right away,

in order to have, you know, in order to, like, have a highlight of success for your projects.

But I think that the risk there, the the thing that a lot of these companies miss

is that,

you know, just because this is a new

and easier way of building machine learning applications doesn't mean that the fundamentals of machine learning don't apply. And in particular, I think a lot of,

a lot of companies

that are, you know, that are

that are starting now and are building things,

with this new stack

are underestimating the value of

a lot of what, you know, the previous 2 generations of machine learning teams, realized a long time ago, which is,

you know, reproducibility,

evaluation,

and, the ability to, you know, iterate quickly,

using

the the feedback that you're getting on your model's predictions from your end users in production.

And so I think that a lot of the companies that are you know, that don't have experience in earlier paradigms of machine learning that are adopting

a foundation model based stack now

are,

going to, you know, reinvent the wheel and, like, find their way full circle back to a lot of the, you know, production ML and MLOps practices that,

experienced ML practitioners are already familiar with.

And

another piece of this that I'm curious about is for teams who don't have that existing

expertise in machine learning.

What are some of the kind of initial pieces of education that they need to consider around how to even determine

which foundation model is going to give them the the,

capabilities that they want or if it's really just a matter of try it out and see what works and just iterate until you find something that sticks?

Yeah. It's it's pretty easy to try it out and see if it works these days.

I think,

you know, that there is

there is a bit of a skill set around

getting these models to like, getting quickly to

figure out the limitations of the capabilities of any of these models,

and that skill set is not really very well codified today. A lot of people who learn how to do this learn by, you know, reading

the dark corners of Twitter and Reddit and finding, you know, finding the the toolkit of of tips and tricks

in, in the deep recesses of the Internet.

We're actually teaching a class.

I help organize a machine learning education

series called full stack deep learning, which started at started off as kind of like the the first MLOps class or the at least the first MLOps class that was focused on deep learning.

And,

we are teaching a

new class,

coming up here in a couple of months

on the on building with the foundation model stack. And so, hopefully, we'll produce some materials there that will be, you know, kind of that, like, 101 or 201 level material that you need in order to, you know, get to your first MEP or your first product.

And so

for teams who are just saying, we wanna be able to use machine learning for x. I'm curious

what your typical

first recommendation

is now and whether you view

kind of modeling as a discipline

as being something that is on the way out, or is it 1 of those things that is evergreen, but it's just going to be an increasingly specialized role similar to something like a database administrators

database administrator in the era of the cloud?

Yeah.

As sad as this is for me to say as someone who is trained, you know,

in how to be you know, I spent a lot of time building machine learning models from scratch. I think that,

the skill set of training machine learning models is never gonna go anywhere, but it's becoming an increasingly smaller part of the pie. And I think the way that most companies are going to consume machine learning is

not by,

you know, not by hiring people who are experts in the modeling side of the house. It's by, you know, consuming

a higher level abstraction that uses machine learning under the hood and, like, does the optimization process for them under the hood. And what they're going to be specialized in is

applying machine learning,

not at the algorithmic level, but at the process level

and at the data and data analytics level to the specific business problems that they're being tasked with.

So, I I think, you know, the the,

the the role of modeling in most companies building products with ML

is rapidly diminishing.

But, you know, machine learning as a whole is also rapidly growing in importance.

So the,

you know, maybe now is not the best time to get into machine learning modeling, but, I I think there will always be a role for people who know how to build models.

Over the past

decade or so, there has been a lot of investment put into all of the tooling and education and resources around around how to manage that modeling life cycle of how do you get the data into a state where it's clean? How do you,

develop the model initially? What are some of the algorithmic

concepts that you need to be aware of? How do you manage the experimentation

flow and figuring out kind of what are the optimal parameters or hyperparameter

tuning?

And so now that we've built up to this level where we can say, actually, you don't need to worry about any of that. Most of that's already done. You just take this model and then tweak the bits that you care about.

Now that we are at that point, what are the kind of next steps? What are the what are the resources that teams do need when they say, I don't need to be an expert in modeling. I just need to know which model does the thing that I want, you know, but there's still a lot that goes on and around

actually operationalizing those models, managing them,

figuring out how to identify when they are in some sort of failure mode. And I'm curious

what those additional capabilities are and that the areas of expertise that teams need to be investing in to be able to

consume those models and use them as an operational part of their products.

Well, a couple of things I I wanna say here. 1st, you know, just like

you don't need to go study

operating systems or database systems in order to get a job and, you know, be an effective software engineer today.

But if your goal was to be the best possible software engineer, it'd still be probably be worth studying those things because understanding the fundamentals

is still often very helpful even if you're working at a higher level of abstraction.

I think the same is true with ML. So if you you know, even if you don't expect to be needing to spend all of your time, you know, fuddling with model architectures and tuning hyperparameters and things like that, Understanding those things at a deep level will help you

when you when the abstractions that you're using fail you.

So I I do still think there's value in learning the fundamentals of training models.

And the other thing I'll say is that this is this is all changing very quickly.

So I think if, you know, if if I were running a machine learning team today,

or running a company that was looking to adopt machine learning today,

I don't know that I would make the bet that we don't need to hire ML specialists

because,

I think this is you know, being able to build ML without with with no machine learning specialists is

just starting to be pot it's, like, at the edge of the possible. And so I think a much more reliable path would still be to go hire people who know how to do the modeling piece as, you know, because there there might be pieces that you need to figure out. There probably will be pieces that you need to figure out that have not been baked into these abstractions yet. That being said, you know, I think,

I don't think there is a single place that you can go today to learn how to

apply machine learning without,

you know, from, from the perspective of a practitioner who is not going to be a machine learning expert.

But hopefully, the this, like, new full stack deep learning LLM class will be a helpful starting point for folks.

And so in terms of what you're building at Gantry, I'm curious if you can talk through some of the capabilities that you're providing and what you see as being the

support structure that is necessary and useful for teams to be able to actually

incorporate these models into their systems

and be confident that they are doing what they want them to do?

So let let me start with, like, a bit of a longer term view. So we've been talking about this idea that, hey. You know, in the future,

a lot of teams that are tasked with building products with machine learning

might not need to actually understand the modeling piece

of machine learning,

at the level that they do today or that they did 2 or 3 years ago. And so in that future world,

we can ask ourselves the question,

what are those teams going to be doing? Like, what is the main activity that's going to go into building production machine learning systems, if not training models?

And I think the irreducible complexity

in building,

applications of machine learning is that,

is actually

taking,

a model which is which, you know, models themselves are not products. Models are,

are are,

algorithms that can make predictions on input data

And figuring out how to apply

that model to

the the business context, the the actual problem that you, as an organization, need to solve. So what does that look like? It looks like

understanding the data really well, understanding,

the problem really well,

and also understanding how the model maps onto the data and to the problem. Where is the model performing well? Where is it not performing well? Where do we trust it? Where don't we trust it?

And what does the roadmap look like for us

to make this model

more reliable or perform better for the critical

use cases that we have for it, not just, like, general performance,

but actually

task specific performance.

And so that work, I to me, looks less like being a,

machine learning researcher today and more like being, like, a traditional data analyst or, you know, someone or, or in some ways, even, like, overlapping with a product manager where your goal is, like, to understand the current state and help guide development towards something that that looks better.

So in the longer term, if that's the way that organizations

are building,

machine learning applications,

then Gantry is the tool that these folks are gonna use to actually do this, like, this process of taking a taking a generic model that works pretty well

and adapting it over time to be more and more task specific.

What does that look like?

It looks like well, you know, first, you need to be able to actually

understand

whether the model is doing what it's supposed to be doing and what its failure modes are and what are the opportunities to make it better. So the first part of our product offering is sort of an analytics suite that helps you to understand model performance, diagnose failures, and find opportunities to make your model better. And then the second thing that you need to be able to do is

not just detect problems. Right? Because,

you know, if you if you figure out a problem with your machine learning system, but you can't fix it, that's not very useful. So you also need to be able to actually use those insights that you get from looking at how your model performs in production and use that to make your model better over time. And so that that involves, you know, basically folding your training data back into your,

production model

and retraining your model to evolve it over time to adapt to the needs of your product contacts or your end users.

And so that's that's essentially what our our product is designed to do. And in the shorter term, you know, the the users of this product are not, like, necessarily the folks that are, non machine learning experts who are just tasked with, like, taking it off the shelf model and adapting it. Our product is really useful for machine learning experts and machine learning teams as well to perform that same function of, you know, effectively,

taking what we know, like, how what we know about the problem that we're trying to solve

and,

and then adapting our model over time to solve that problem better. So the benefit that, that companies tend to see from this is,

models that perform better, for less cost

because, you know, you have to put less manual work into actually getting them to that level of performance

and the reduced cost of maintenance because, you know, oftentimes,

it's easier it's easier today than ever to get models into production, but keeping them there is is oftentimes,

what ends up being more difficult. And

in terms of your product management, product design, I'm curious what you what you've encountered as some of the,

unique challenges of building a

platform and tool chain

focused on supporting

machine learning use cases,

but with the anticipation that a lot of the people who are using it aren't necessarily going to be machine learning experts in their own. Right? And being able to

surface useful information and guide them in the proper direction to be able

to understand the kind of scope and ramifications of the errors that they're experiencing

and

help to direct them along the path of finding and implementing a resolution for those problems.

Yeah. And, you know, today, even though today it is mostly machine learning experts or at least, like, very machine learning familiar folks that use our platform and do this job in most companies,

we're definitely building towards a future where,

not everyone working on this process in companies needs to be machine learning experts. And so,

the main way that we try to do that is by

abstracting away a lot of the sort of things that are specific to models in our platform. And so,

oftentimes our platform, you know, feels more like using a data analytics platform where, you know, the thing that you really need to be able to understand is the data that's going to the model

and the sort of outcome that you're aiming for And then we help you, like, find subsets of data or find

subsets of users on which the model is not doing what it's supposed to be doing and then make it really easy to fold

those insights into training without necessarily needing to be the 1 who is, like,

designing the model architecture or writing the training loop yourself. And

in terms of the implementation, I'm wondering if you can talk through some of the kind of engineering that has gone into building Gantry

and some of the process for teams to incorporate it into their systems and workflow so that they can take advantage of the services that you're providing?

Yeah. For sure.

Under the surface of building something that,

is designed to feel to be simple to adopt and feel simple to use, there is a lot of, like, really complex

data engineering, and data infrastructure work that goes into that.

In particular, you know, we need to be able to ingest

large

streams of data in near real time. We've had we have customers

who send us, you know, 100 of millions or bordering on billions of records per day. We need to be able to process all that data efficiently.

We need to be able to handle delayed feedback.

So, you know, oftentimes in machine learning, you have a model that takes an input from a user, it makes a prediction, And then at some point in the future, you get feedback on whether that was the correct prediction or not. Maybe maybe it's, not too long in the future, like, a few seconds after you make the prediction, the user gives you implicit feedback by clicking on your recommendation or not clicking on your recommendation.

But in other cases, that feedback happens long in the future,

you know, weeks or sometimes even months later, like in the case of, you know, fraud detection, where you might find out if there was a chargeback on the the prediction that this transaction was fraudulent,

you know,

weeks after the prediction was made. So the second, like, really big sort of data infrastructure technical challenge is how do we join

these massive streams of data together,

and compute metrics on the fly,

even when we don't know when the feedback is going to occur.

Then, you know, a third big challenge is, you know, like, you have all of this, like, massive amount of data that's coming in. Some of that data is tabular data, but it's often also unstructured data. It's images, it's text,

it's things of that nature and we need to be able to allow users to query that data rapidly in real time to be able to analyze, you know, hey, we we we saw this, like, new failure mode occur all of a sudden. What's the cause of that? Like, what is what subset of our data is driving that? What,

you know, what user group is is driving that?

And so the the query side of it is, is a big challenge as well. And then lastly, you know, the there's just a ton of complexity in the machine learning

workflow.

So, you know, you have

model training, which is often done by different people than the people who deploy the model.

When you deploy models, you have to keep track of the dataset that those models are trained on. You have to,

understand, like, when the models were deployed, when you're rolling out a new version,

and so there's a there's a really difficult sort of metadata management challenge as well, which involves, like, basically gathering metadata from, all these different, often disparate parts of the system, putting it all into 1 place, so that the people that need to be able to manage this workflow have access to it and can also start to build automations and, like, orchestrations on top of it. So those are some of the big technical challenges that we're, that we're wrangling with. Not to mention things on the algorithmic side, like active learning, drift detection, borderline research questions as well.

And

1 of the trends that I've been seeing as I talk to folks who are working in the machine learning, in particular, ML operations and some of the kind of supporting infrastructure space

is the risk

of starting with a particular problem that you're trying to solve and then ending up branching into all of the other adjacent problems because it's still a very young space, and so it's not always clear kind of what are the appropriate

boundaries for a given product category.

And I'm curious as you've been going through this process,

what what are some of the kind of clear boundaries that you're setting for yourself to avoid some of that tendency to sprawl into

neighboring capabilities

and some of the pieces that you are consciously

not trying to target because of the need to be able to stay focused and deliver on the core problem that you're trying to solve?

Yeah. So we are entirely focused on the problem of what happens to models after you deploy the first model. Right? So our our thesis is that,

modeling is no longer the hard part for most companies.

You, as a data science team, you probably have figured out already how to build the machine learning model that solves your problem offline in the abstract on a fixed dataset with a fixed metric.

Our goal is to help you once you've deployed that model, help you maintain it, and ideally help you improve it over time

by using signals from production

like outcome data or user feedback data.

And so, you know, we help with, like, continual learning, active learning,

model maintenance, model retraining, reinforcement learning from human feedback,

problems like that that, you know, where you really only start to face them after the first model is deployed.

And so the parts of the workflow that we don't touch

are the experimentation workflow,

so, how do you,

build your first dataset? How do you get that data cleaned? How do you figure out what model works best on that dataset? How do you keep track of all those experiments?

And how do you deploy that initial model into production?

And, you know, I think there's a a relatively clean break there, but there's also some overlap. Right? Because,

you know, for example,

when you're doing continual learning,

you need to be able to retrain models.

Our assumption is that our users are gonna tell us how to retrain the models,

and we're just going to be, like, relative to that, we're just going to be an orchestrator and a data provider.

But that distinction, you know, is mostly pretty clean, but that that's sort of where we start to get into some overlap with

the the training side of the house.

Yeah. And particularly, as you get into some of the more specialized approaches to modeling, I'm curious to what you are focusing on as the

core capability that you want to support. Thinking in terms of things like,

you know, batch versus

continual training,

federated versus centralized learning, things like that?

Yeah.

We you know, for better or for worse,

mostly for better, we have not made a lot of assumptions in our platform about,

what is the type of machine learning that you're doing. So we have customers doing everything from audio processing to computer vision,

to,

you know, traditional ML use cases, recommender systems, search.

And then but I would say, like, probably the most common use case for our platform right now is NLP, And I think there's a there's a few reasons for that, but,

that's that's, I think, like, the most common thing that people are using us for. So, you know, we

really don't

make a lot of assumptions about what type of ML you're doing, and we've designed our platform in such a way that it can support a lot of different use cases.

The big advantage of that is for a lot of teams,

especially teams in larger companies,

you know, you you have you probably have a lot of different types of machine learning problems that you need to support.

For example, you might have some document digitization problems,

some, you know, internal knowledge managements, NLP, or search problems,

you know, as well as some, like, analytics, ML,

or recommendation

system problems. And so being able to have 1 system that supports all of those use cases, we found is is, is super valuable for teams

as they try to scale up to,

you know, maintaining a larger number of of production applications without necessarily ballooning the size of their team.

And as you have been building the system and working with some of your customers

and keeping track of what's happening in the broader environment, what are some of the most interesting or innovative or unexpected ways that you've seen teams approach this challenge of

building their models, deploying their models, and making sure that they're generating appropriate feedback to figure out what

Yeah. It's tough. I mean, nothing surprises me anymore.

So

yeah. I mean, I I guess 1 thing is interesting that's interesting is, I've started to see some a lot of teams

become,

really scrappy on the the infrastructure side.

So, you know, for example, like, there are a lot of companies that we talk to try to just do as much of their workload in their data warehouse as they can.

They, you know, maybe rather than, like, buy going out and

building or buying a feature store, they just use,

DBT models

to define feature transformations and then dump the output of those into Redis,

and that turns out to be good enough for them, to start out with. And,

Yeah. So I think, like, the thing I'm often impressed with is just the scrappiness of machine learning teams,

who often

have, you know, really should have, like, 10 different roles, but it's, you know, a small and lean team that's sort of forced to do all all the the different components of the job. Absolutely.

And

in terms of your experience of building gantry, I'm wondering what are some of the most interesting or unexpected or not

a software engineer,

not a product manager. And so I think,

I I think I knew this is gonna be going

hard going in, but I have gained an a new admiration

and respect,

for

people who build really great, like, delightful products.

Because, you know, if if you've never tried to do something like this before, I think the the amount of work that and, thoughtfulness that goes into making all the little decisions that you take for granted when you're using,

you know, a great developer tool or a great product of any kind,

there's just it's it's enormous in scale and magnitude. And so,

it's been it's been humbling to learn how to do that, hopefully learn how to do that well. And, it's definitely given me a lot of respect for, you know, some of the products that we use on a day to day basis.

And so for teams who are in the process of getting the models into production and figuring out an appropriate workflow to be able to continue to evolve and evaluate and

improve their models? What are the cases where Gantry is the wrong choice?

Yeah. I think, Gantry is the wrong choice if you're

doing machine learning more in this analytics

driven paradigm.

If you have models that are going to be used as part of a decision making process that don't need to be refreshed very often,

I I just don't think you really need something as complex as Gantry to

monitor and maintain those models because

you already have, you know, humans that are sort of evaluating,

the quality of those models as you go. But I think anytime,

especially that you have models that are gonna be interacting with end users,

and where your predictions are gonna be in in any way part of your product, whether that's, like, your actual,

customer facing product or whether that's a product that you're gonna be using internally to make things more efficient.

My recommend my strong recommendation is to deploy Gantry,

or a system like it as part of the first deployment that you do because there's just so much benefit. Like, most of the most of the gains that you'll see in model performance and,

more importantly, like, downstream task performance will come after you deploy the initial MVP model if you're doing things right.

And so I think, you know, having the right tooling set up for that and, being able to to do that process more efficiently is

incredibly valuable.

And

for people who are interested in staying apprised of the so so called state of the art or best practices around model development, model maintenance, what are some of the resources that you have found most helpful to be able to stay up to date, particularly since this is a very nascent and constantly moving space?

Yeah.

I I get most of my, like, I, you know, I what I found to be the most helpful, honestly, is just social media, following,

people who I respect on Twitter and LinkedIn.

I do my best to try to aggregate some of the things that I'm reading,

as part of our newsletter, and then we also teach classes on this through full stack deep learning. So the, you know, previous full stack deep learning that's focused on more traditional model development,

and, you know, how to how to do that effectively is all available for free online.

And if you're interested in learning how to do more of this, like,

ML, you know, 3 0 paradigm, this, like, large foundation model based approach to building AI applications, then, we'd encourage you to check out our upcoming boot camp in April.

Are there any other aspects of this space

of modeling as a practice and the ways that it's evolving or the work that you're doing at Gantry to support that that we didn't discuss yet that you'd like to cover before we close out the show?

I think that covered it pretty well. Okay. Well, for anybody who wants to get in touch with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to

the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest barrier to adoption for machine learning today. I think the biggest barrier for adoption to machine learning today is,

going from a prototype to something that you can trust and that you can rely on to work well and consistently for your for your end users in production.

I think it's never been easier

than now to get started and to build something that, you know, makes a really cool demo,

or a really cool proof of concept,

and where you can start to see the promise of the technology.

And that's great because that's,

opening a lot of people's eyes to how,

how much that these technologies can improve your product experience. But the hard part for most companies today is how do you go from that to something that, you you know, that you can that you can rely on and that you feel comfortable with your customers interacting with on a day to day basis.

Awesome.

Well, thank you very much for taking the time today to join me and share the work that you're doing at Gantry and your perspective

on the

necessary skills and capabilities and,

areas of focus for machine learning model development and deployment. Appreciate all the time and energy that you and your team are putting into that, and I hope you enjoy the rest of your day. Thank you. I enjoyed it, and, I hope you enjoy the rest of your day as well.

Thank you for listening, and don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest in modern data management,

and podcasts.init,

which covers the Python language, its community, and the innovative ways it is being used. You can visit the site at the machine learning podcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at themachinelearningpodcast.com

with your story. To help other people find the show, please leave a a review on Apple Podcasts and tell your friends and coworkers.

AI Engineering Podcast