Harnessing The Engine Of AI

Hello, and welcome to the AI Engineering podcast, your guide to the fast moving world of building scalable and maintainable

AI systems.

Seamless data integration into AI applications often falls short, leading many to adopt RAG methods which come with high costs, complexity, and limited scalability.

Cogni offers a better solution with its open source semantic memory engine that automates data ingestion and storage, creating dynamic knowledge graphs from your data.

Cogni enables AI agents to understand the meaning of your data, resulting in accurate responses at a lower cost.

Take full control of your data and LLM apps without unnecessary overhead.

Visitaiengineeringpodcast.com/cogni,

that's

cogne,

today to learn more and elevate your AI apps and agents. Your host is Tobias Macy. And today, I'm interviewing Ron Green about the wheels that we need for harnessing the power of the generative AI engine. So, Ron, can you start by introducing yourself?

Yeah. I'm Ron Green. I'm cofounder and chief technology

of KungFu AI.

And do you remember how you first got started working in the ML and AI space?

I do. I remember vividly. I was actually a computer science major at the University of Texas at Austin.

And I was in my last semester,

and

it was, you know, just

such a grind

going through

all of those, you know, really deeply technical courses.

And I remember my last semester,

heading into it, I was pretty burned out. And I remember thinking to myself,

I don't know what I'm gonna do professionally, but it's probably not gonna involve

software. Because I was at that point just I was thinking

I needed a big change.

And I had,

like, one elective left, and I took an introduction to artificial intelligence.

And instantly, I mean, within, like, 2 weeks, I knew exactly what I wanted to do with the rest of my life.

And,

what's funny about this is that this is in the nineties, and,

I remember thinking, oh, man. I'm too late. They've they've got everything figured out. You know, there were textbooks and all these, you know, really deep

theorems and perspectives,

and I thought, you know,

shoot. Am I smart enough to do this? That probably doesn't even matter. I'm too late to do this anyway. You know, little did I know it was gonna be a, you know, like, another 20 years before it really took off, but that's how I got involved.

Yeah. It's definitely funny

that the cycles that the industry goes through of we hit a certain peak and we think, oh, we've done as much as we can here, and

we're on a glide path and then

everything stalls out. And we have to go through another cycle of discovery to realize, oh, this actually was just a local maxima. There's a whole other mountain range to climb.

That's exactly right. I mean, funny story is I did, I did a master's in artificial intelligence at the University of Sussex in England. And I remember

probably in, like, 2,005,

I was talking to,

a colleague.

And I'd gotten out of AI at this point because,

you know, in 2,005, there was almost nothing happening outside academia. And I was talking to a professor a colleague of mine and mentioned that I'd

specialized in artificial intelligence.

And he his reaction was like, oh, you know, oh, tough, tough choice, man. That was a a a big waste of time.

And in, o five, it kinda felt like that might have been the case.

Now here we are where AI is on the tips of everyone's tongues. It has

bridged the divide and has now entered into the consumer arena where everybody is talking about AI in different contexts.

And I'm wondering

given the current arena of AI in the space of generative AI and LLMs,

what you see as the main shortcomings

of those LLMs as a standalone solution

to basically anything?

Yeah. The the biggest

the biggest problem we're having right now with large language models as a production tool

is control.

If you are using a chatbot, if you're inter you know, if you're using chat gbt or you're using, you know, llama or something like that, and you're interacting with it,

if you are

prompting it,

looking at the output, assessing it, it works great. And the same thing goes for code assistant tools. Let's say, for example, Copilot or Cursor.

You can prompt it

to,

you know, refactor something or generate some some code from scratch.

But in all of these instances,

nobody at this point in time would just take the output

and and use it sight unseen. Right? You wouldn't have it write an email,

generate code, and just

push it commit it and push it up. And so control is the issue. And it and it's not it's it's not just, like, hallucinations, which I think are

probably the biggest risks.

But we've we've done many generative AI production engagements at Kung Fu AI. And the main challenge is you may want to steer the model away

from certain

domains. Right? As a company,

you may,

I won't name the company we're working with, but we're doing this generative AI project for, sort of a photo,

site where you could put together,

you know, scrapbooks and and photo books and things like that. And

the generative solution was able to auto organize the photos, understand what was in the photos,

put them together,

in a sort of themed

flows,

and then caption those with, you know, really great,

examples. I remember there was one. There were a bunch of dogs on the page, and it and the the the caption it put was, like, furry friends forever. And that's just terrific.

But

there were photos that us the people had taken, like, on vacation in Europe, and they they were around,

churches and mosques, and it started outputting

content around religion. And,

you know, understandably, the client was like, I don't wanna touch that. Let's completely steer away from religion.

And getting models

aligned

where they will say what you want and steer away from things you don't want is the hardest problem right now with LLMs in production.

I think that in many ways, they are effectively

very talkative 5 year olds where they'll say lots of things. You can get them to do interesting stuff, but they're also gonna say things that you never saw coming.

That's exactly right. And so what we we we always tell our clients

is generative AI, incredibly powerful, but at this stage,

it really should be viewed as a human augmenter. So you can take things and transform content or

generate new content, whatever it may be. But there almost always needs to be either a human in the loop

or

another model in the loop performing some type of assessment on that

because you it we it the lack of control, the lack of explicit hard control

is the challenging part about putting

generative AI to production.

And in preparing for this conversation,

I came across a blog post that you have on your site that framed this in the metaphor of LLMs are very powerful engine that are in search of a vehicle to add that sense of control and steering.

And so given that framing,

the most established vehicle that we have for putting that LLM engine into a guiding context is the rag stack. And I'm curious what you see as some of the main limitations or shortcomings of that as a product and production oriented solution.

Yeah. The RAG approach, you know, retrieval,

augmented generation, has been fantastically

successful. It's it's probably the number one

generative

approach that companies are taking out there in the world as sort of their first step into AI.

And it it works,

I would say, holistically

pretty well most of the time. The the biggest challenges are,

one, it's really only as good as the data you have. And so

we will occasionally

work with clients

who,

you know, may may have a little bit of misconceptions

based upon their use of, like, chat GPT and things like that. It may not understand that, you know, if the data is outdated or incomplete or poorly organized within their own infrastructure, you know, that is not something that a reg pipeline can fix.

Another limitation is large documents, I mean, really, really large documents can still,

present a problem. Because

the context windows for

LLMs are growing,

but they struggle beyond certain,

sizes. And so if you're dealing with documents that have, you know,

100 of thousands

of words within them, that

that can present a problem. And, of course, hallucinations, you know, and control, like we talked about, even with rag, you do you do not have certitude

that everything the model produces,

even if it

produces it with, citations, will be accurate. We're finding this is actually really interesting. We're finding, there's more and more evidence that using richer content

markup is more effective. So for example, if you have HTML documents,

use them as is. Don't don't pre process them into text. It that additional formatting structure, there's increasing evidence. It actually improves

the outputs of these RAG pipelines.

So it's early days, but,

and there are challenges there, but I would definitely recommend for most companies some type of RAG solution is a great, first entry into AI.

And another

more sophisticated

approach

to that guidance system for the LLMs is the idea of multiagent or mixture of experts where you have multiple LLMs working in concert to try and keep each other in check,

which

conceptually sounds

reasonable. And it sounds like it would be effective, but still is subject to the challenge of hallucinations

where if one of those models does go off the rails, then maybe it acts as a compounding factor to bring the whole system further afield than it would have gone on its own. And I'm wondering how you see the pro versus con conversation happening around that pattern and also the way that it exists in conjunction with that rag pattern.

Yeah. That's a great question. I'm I'm really

excited about multi agent

and mixture of expert approaches. This also obviously

is sort of very close to the the

momentum that is growing around Agintiq AI.

So, you know, the pros are

if you can if you can deal with sort of a mixture of expert or a multi agent scenario, you do get improved performance

in those individual agents because you're essentially not asking

for for one model

to be good at everything. You can

specialize and have experts or agents that

are refined on just performing one set of tasks really, really well.

It also means that you can scale a little bit more easily because

it reduces the computational overhead and latency

associated with that. It can be cost effective

because each of those smaller models

will cost us to train, will cost less on inference. And this matters probably less, but it you do get improved interpretability

if you need to because each of those smaller

models could

be designed in isolation

to maximize

or even to be explicitly

interpretable.

And that can that can vary. If you're dealing with, like, product recommendations, it's probably not

really critical.

You're dealing with,

loan

decisioning, you might have regulatory

requirements

around, you know, explainability, interpretability.

The the cons of this approach are, you know, the complexity. It's it's hard to orchestrate

these complex systems.

Latency sometimes can become an issue too because you you have all this task routing and this inter agent communication.

Because the agents themselves are typically pretty lightweight,

that's not

going to be a deal breaker.

And then the last one is it's a little bit more of,

a a wildcard, but, you know, you do have emergent behavior risk.

Orchestration is complicated,

and you are also dealing with agents acting,

you know, potentially in unpredictable ways. And this, you know, kinda comes full circle to our original topic, which is we're at this we're at this really interesting stage of AI where

the systems are incredibly

powerful,

but

they're they're the fact that they're kinda black box

and the fact that they do have these

very impressive emergent behaviors

makes control

a little bit more difficult. And so I'm excited about this this move, but there are it's definitely early days still.

Another aspect of the current ecosystem that we're in is that there's all this excitement around generative AI of, oh, it's so powerful. It will solve all of my problems that I think it also causes us to overlook a lot of the

more well established

ML patterns, whether that is something like a linear regression or a random forest or even deep learning

in favor of these transformer

models.

And I also am curious how you're seeing some of the challenges around the technical and organizational

sophistication

required for generative AI, maybe leapfrogging the organization's actual capabilities where they never actually established that capacity for some of the earlier generations of ML.

I this is actually

a a topic I'm pretty passionate about because

I'm a big believer in the power of generative AI. I absolutely

think it's a transformative

capability.

But

I personally think at this stage in our maturation,

most companies should be looking at what what I call domain specific AIs.

And I I you know, it's really kind of

immaterial

whether you're you're looking at, like, as you said, deep learning or random forest or or, hell, you know, even even linear regression or something like that. The bigger point is that

generative AI, as powerful as it is,

is, as we've talked about, more difficult

to control. And so the investment can be quite high aside from, you know, sort of like rag type systems.

What we typically advise our companies to do is really look at domain specific AIs.

So for example,

very, very often the best first step that companies can take in to adopting artificial intelligence is build a capability

that is very narrowly focused with a high with a really high ROI. Like, we'll advise our companies, you know, at Kung Fu AI, we do just a ton of, you know, custom engineering and strategy development. We won't recommend any projects that don't have at least a 10 x ROI. So, for example, we built

a loan

decision system for one of our clients. It went live earlier this summer. That thing is now trading

60% of their 2,600,000,000

in transactions per month,

And that's all it can do. It's very narrow. It knows how to do one thing. It's, you know, it's not generative.

It's not broadly capable. It has no emergent capabilities,

but this is going to transform their business.

Their stock is up their stock is up, like, 36%

since that system was released. This is a publicly traded company. And so

I would encourage everybody that's listening to this

to absolutely explore generative AI approaches, but don't miss out on the opportunity for more

narrow domain specific AI

that will, frankly,

cost less to implement

and and and operate

and may deliver many, many times

more ROI

than some type of, LLM approach.

The other aspect of the

generative AI ecosystem

beyond the models and their capabilities and the patterns around them is also the

ecosystem of tooling

and frameworks

and point solutions

to the various problems in productionizing

these LLMs.

And I'm curious how you're seeing that volatility

in the market, the

current lack of maturity for some of those solutions,

and the rapid pace of change influencing the ways that organizations are thinking about adoption of generative AI or their willingness to actually invest in a more

generalized framework for l l m usage versus just let me just pay company x to do it all for me.

Right. Right.

It's a it's a great question also.

We deal with companies every day that are, you know, sort of early in their AI

adoption curve.

And we see

a lot of the same things that you might expect, sort of decision paralysis. Like, where do you even start? Like, how do you assess

how do you assess AI products? How do you assess,

the feasibility

of

different potential initiatives that you that the company might take on? How do you even how do you even figure out

what

AI initiatives

might be feasible?

And so one of the things we really recommend is a strategy

to

holistically

look at your business and make assessments that are

geared

to

the domain

and the context that your company's within, and come up with a roadmap. And so, you know, I at Comfortway Point AI, we've been around 7 years now. In the I'd say the 1st 5 years, we were mostly working with really early adopters, you know, people that were on the cutting edge, who had specific problems that they wanted to try to solve.

More and more now, we're talking with companies, and they say things like, we need some AI.

Don't care what it is, but we've gotta have some AI to, you know, make Wall Street happy. And and that's a dangerous

perspective. So

start with strategy, look holistically,

and be aware that, you know, AI products can be difficult to integrate. And the the reason is that almost

all of the dominant

powerful techniques within AI right now

are deep learning based

supervised learning

algorithms. And so that requires, you know, strong data. And,

you know, one of the one of the,

challenges with with current AI is that, you know, garbage in, garbage out as far as data. And so it can take quite a bit

to productionalize systems,

if your data

if your data story, if your data context is not very clean. And so custom solutions

are very often the way to go initially on versus some productized solutions where you might be sold something that actually cannot quite live up to the hype, if that makes sense.

Given that

context of supervised learning and your point about the challenges of data for these systems,

What do you see as the viability of using the LLMs

more for that

data labeling,

synthetic data generation

method for them feeding into maybe a deep learning system that is the actual production unit that you deploy where you're using the engine to power your tooling system and run your assembly line so that you can build a bicycle.

Yeah. I love that. I love that idea.

That is a powerful approach. It can actually work quite well for I'll give you some examples that I that I think everybody would you maybe enjoy hearing about. So for example, we've seen situations where

companies are dealing with datasets

that are really heterogeneous,

and

they need they they literally had, you know, hundreds and hundreds of different predicate rules that they had to manage and and keep up to date.

We were able to build an LLM

and then fine tune that, and it can make contextual

decisions on extracting information and formatting the data,

not only in all the situations that they were able to explicitly,

you know, state with predicates before,

but for

analogous situations

or situations they'd never even seen before. And that's, you know, again, the power of these AI techniques is that they will generalize.

And so that that, you know, really makes a a big difference.

The other really

interesting thing about

these language models

is that

as you train them, like you said, you can use them to kind of bootstrap yourself

into a more powerful net net solution.

But you can also do that with a technique called active learning,

where you you take a model, it may know nothing.

You point it at a bunch of data, and you have a user

evaluate

the model's predictions on that data. And so imagine you're, you're trying to detect

fraud as an example. And so the model will start off, you know, with just nothing more than a random guess. And as the user corrects the model's predictions,

the model will then retrain on

the feedback that the humans have given it and then go run its predictions on the dataset. And this is where the clever part happens.

It will go look at the entire dataset and find all of

the inputs where it has the most

uncertainty, where it's like 5050. It'll flip a coin like, I'm 50% sure that's

fraud,

and I'm 50% sure it's not. And it has the highest entropy. And it will ask the humans

to

label those.

And then

by doing that with this active learning sort of feedback loop, you're essentially maximizing the amount of information that the model's learning on. And you can speed through datasets like that, and you're essentially bootstrapping the model. And it can auto label more and more of that dataset

as it learns to generalize with the human.

That aspect of bringing the models into the process of building the models is interesting. I'm also seeing some of that being applied in the data engineering context of using the models to understand how to build the pipelines that feed into the data that powers the model. So it it's turning into the the Ouroboros

system where I I also see some of the challenge there too of

any

error in those models act as a compounding factor where you need to be able to identify early on in the process

where it's starting to go wrong because, otherwise, it's going to amplify that problem. And I know that I'm seeing that in

the training of these transformer models and the foundation models too where

as we're consuming more of the web to power the data that goes into the models, a lot of the data on the web is now being generated by those models. And so the models are sort of working together to make themselves dumber.

Yeah. Yeah. The,

you know, this this whole idea of,

of, you know,

the entire world being drowned in synthetic data and the models, you know, kinda losing their way. I I'm I'm largely optimistic there

on the on the grand scale because I I think that I think what we're gonna see

and you you it's exactly what you articulated

is

we are finally at the stage with AI

where

models

can now be used to train the next generation. And we you know,

we've seen things analogous to this in technology before. Like, if you look at CPUs, you know, you know, the CPUs

from the previous generations could be used

to design

more powerful

chips.

And there was this sort of positive feedback loop. Every generation of chip was more powerful and allowed us to design more powerful chips. The difference with AI

is that,

it can do this in a much, much tighter loop, and it can do it to itself. So

it's these AI systems can actually be used to train the next AI system

without a human in the loop. And and, for example, you know, the META open source models, the LAMA open source models for META, both LAMA

open source models for META, both LAMA 1 was used to train LAMA 2, LAMA 2 was used to train LAMA 3,

LAMA

31 was used to train LAMA 32. And right now, Yann LeCun, the chief AI scientist in Medell, actually just last week said they're training LAMA 4

right now.

Owen, I I I remember this, he said 100000

h 100 GPUs. That's about

that's somewhere in the order of about $2,000,000,000

worth of GPUs.

But

much of that knowledge

and guidance

on especially on the reinforcement learning with human feedback phase with the sort of the alignment phase after the pretraining phase,

a lot of that is gonna be done by the previous models, by the LAMA 3,

family of models. It's amazing.

And to that point,

beyond just the tooling and the frameworks,

the models themselves are in a rapid state of flux with

either larger models or more specific models being introduced constantly?

And how does that change the way that businesses are thinking about

whether and when to invest in that AI capability because of the fact that, oh, well, whatever model I select now is going to be outdated by next week.

It's true. And we see it even with the techniques, meaning

problems that we solved 3 years ago that might have taken us 4 months,

we

could approach with, you know, an an entirely new class of algorithm or or modeling techniques. And

not only achieve much, much better

accuracy

at the top line, but we probably could have done it much more quickly and more easily.

You know, I think this is a classic sort of technological

progression

question, which is like, when is it too late to jump in? When is it too early?

The way I think about it is there's gonna be a certain amount of investment that you have to write off in long term just simply because things are moving too fast as a business. And

I I would I I think businesses have to think about it that way because the benefits you're gonna get in the short term are gonna be more than sufficient

to accommodate that that write off. And the other the other fact is,

you know, if you ignore AI,

your competitors

aren't. And so

that is gonna put you sort of at a massive

competitive advantage. And, again, this is the reason I would encourage people, don't jump in and just do something in AI because you feel like you have to or you you you feel like there's a much pressure. Be really thoughtful about it and make sure that there is a really, really strong ROI associated with any initiative. Because most companies haven't done anything,

there's an enormous amount of, low hanging fruit for almost

any company to embrace AI in a way where it will really be immaterial. If you have to go and replace some some modeling system 3 years from now, you won't care because the return on that investment would have been so high. And I would just

encourage companies go in open eyed like this and and move forward with the understanding that it's a rapidly advancing field.

And to that point of where we are in the timeline of AI

and bringing us back around to that metaphor of needing vehicles,

where do you see us on the timeline of the automobile? Are we at the point of the model t yet? Or are we before that? Are we past that?

I feel like to some degree, we're maybe at the point where we're at the model a where everybody's building their own special hot rods.

I I think I think you're right about that. I don't think we're at the model t yet. And the reason is that,

you know, like we like we said at the beginning, I've been doing this a long time. And I get asked every now and then. There's been 2 AI winners. You know, why am I so confident

that there won't be a 3rd winter? And it's and it's really simple. It's because

it's a few things. One is we were always we were always overpromising on what AI could do before. We we would we would get good results, and we would extrapolate

out,

but

the the curves didn't hold. And so then we would

end up having overpromised and underdelivered. And you do that too many times in investors and

adoption just stops.

We finally now have AI systems that can operate

at the human level or superhuman level across almost all the tasks

that you might care to think about, whether it's vision or speech or

generative

capabilities across almost any domain. Right?

So we're not going back.

That said, we're basically day 0 because there are really simple things we haven't even tried yet. Like, if you take the transformer architecture,

it's got this quadratic computational complexity, which is really powerful,

but it is not gonna scale. We're not gonna get to,

context sizes in the trillions with that that type of architecture.

And there are simpler,

approaches coming out almost daily that are showing really, really great capabilities, like, I think a Mamba with, like, sort of its like, the state space model approach. And so

the lack of control that we've mentioned as well, I think, is the reason. We'll be at we'll be at the model t stage once we've

sussed out

these control and interpretability

issues, and then there is really gonna take off. And I I I genuinely think that most people have no

idea

how much AI is gonna mature in the next 20 years. It's it's going to be mind blowing. And and to take one example of software development, it will be baked into

every piece of software. Right? Because why would you not wanna have the ability for

the tools you're working with to understand speech

and have sophisticated vision capabilities and all that stuff? And right now it's it's it's the exception, it will become the rule. Just like every just like every piece of software now has networking,

Internet capabilities baked in and it would be silly to think that they would operate in isolation,

we're gonna see that the same thing with, AI adoption. So I I believe we're really at the early stages.

To that point of

transformers

being the dominant architecture for this current generation of Gen AI models, I know that we have been seeing a lot of reports

recently of starting to hit the scaling limits of that transformer architecture where feeding more data, feeding more tokens is having diminishing returns in terms of the

successive

capabilities of those models. And

given your perspective as somebody who's been in this industry for a while and seeing the successive generations

of machine learning

techniques and architectures,

what are your thoughts on

some of the future trajectory of AI model architectures? Are we going to continue trying to push those limits of

the transformer architecture by throwing better hardware at it, or are we at a inflection point where we need to be looking at other approaches? I'm thinking in particular in terms of the liquid network

techniques that came out of MIT recently.

You know, I'm

I'm not convinced we're at the end of the scaling.

I think I think it's I think we're

seeing some slowdown, but it's not clear to me exactly

how much slowdown and where we're on that curve. I I I I could be wrong. My guess is we're probably gonna see

one more order of magnitude increase before we really have the slope

shift downward.

The the the things that I'm really excited about right now, though, and the reason I think that we're we're gonna continue to

see really big performance

improvements

are we are just

just now starting to look at sort of inference time,

investments. So to date, it's all been about how big can we make these models,

how much data can we pump into these models, and the scaling laws have held for about 10 orders of magnitude. You can go back over 20 years,

and the scaling laws hold hold pretty well, hold pretty predictably.

Just in the last, you know, 18, 24 months have we started looking at the inference time and started focusing on in exploring the idea of, like, well, what if the model's

inference compute wasn't fixed? What if the model was able to,

use techniques like chain of thought where, you know, you can think of it almost like the model's talking to itself, producing output,

assessing

whether it's on the right track,

altering approaches,

and and iterating in that inference

time

compute cycle in ways that

will allow it to improve itself and not just have some sort of, you know, fixed finite deterministic output. And the early results we're seeing from probably the leader on this is OpenAI with their

o one models. The preview models

are are already showing

much improved

reasoning capabilities,

and the,

the OpenAI claims the 0 one,

full model

will be staggeringly capable on that side. And, again, it's early days there. We've barely begun exploring this part of the spectrum.

So I think we're gonna see,

if anything, modest

slowdowns on the scaling, at least probably for the next 2 or 3 years before before we need to go back to the well.

Another interesting aspect of all of the conversations that happen around

AI is the language that we use to talk about

how it operates, where you use the concept of reasoning in that example of chain of thought

where

there's also a lot of debate around

the level of actual understanding

or ascensions or etcetera, what whatever terminology you want to use to anthropomorphize

these models.

What are some of the

challenges that that imposes in terms of how we actually think about applying these models where

because we want to anthropomorphize

things, we say, oh, well, the model understands the input that I'm giving it, so it gives me this output where, really, it's just sophisticated statistics, and the model has no

concrete

understanding of it in the way that we think about our understanding of the world around us. And

so there have been investments in terms of things like, cognitive AI where we start with maybe a more simplistic model, but we use means of trying to

generate these contextual maps of the environment that it's executing in, the idea of GraphRag where you have an underlying knowledge graph for being able to

give some sort of semantic

semantic framing of the context that is being fed, the idea of

memory being

bolted onto the models in terms of the runtime to be able to contextualize things a bit better. And I'm wondering how you see some of those aspects of cognitive science

and conceptual

understanding

being folded back into the ways that the models are built versus being just a bolt on to the runtime environment.

I love that question.

I personally think that

these large language models are hands down the most important scientific discovery

of the 21st century. And what I mean by that is

the emergent behavior that we get out of these large language models, which, again, you know, all they were trained to do is

given some input, predict the next, you know, token, predict the next word. I don't think there's anybody

on the planet who anticipated

the type of capabilities

we would see that that are that emerge at scale in these large language models. In fact, I have colleagues I've worked with, like, when the GPT 2 paper came out in 2020, didn't believe it. Thought, you know, some of the few shot examples within the within the paper were were impossible. It just couldn't be true. And so I say

I say that I think that this is the most important scientific

discovery of the 21st century because

the emergent capabilities weren't predicted. And I think it tells us a lot about

intelligence.

You know, if I say that the model is, you know, quote, unquote reasoning during inference time, I don't really mean that it's reasoning

exactly in the same way we do.

But

that presupposes we even know how we reason, and we don't. And, you know, if you go back to the history of AI, it's really kinda funny. You know, in the fifties sixties, they thought, oh, if we could build a computer,

and that computer

could play chess at, you know, the the,

you know, grand master level, it would certainly

have, you know, AGI capabilities. And it turned out not to be true. We we solved that problem in the nineties, and things that we didn't think were complicated, things that we took for granted like our vision systems or in our speech systems and our auditory systems, we just thought were

relatively simple problems to solve. In fact, Minsky famously in the sixties gave, like, an undergrad at MIT, like, the a summer project to build a computer vision system because they didn't think it was that complicated.

And the reason is that we can't introspect

our cognitive processes. And so, you know, our visual cortex is unbelievably

complicated.

So

the point I'm trying to make is this.

We don't really know how we see at a deep way. We can't introspect

our

consciousness or our thought process.

So I don't know

exactly how my own brain works. So it's kinda hard to

speak deeply about the differences

in what might be consciousness

or what might be intelligence, what might be reasoning

within AI

when we can't even speak deeply about it with humans.

All I know is that

it is absolutely

stunning

that large language models have these emergent

capabilities

at scale, and I think we should

keep exploring that and see how far we can push this.

And

another pressure that AI is having on the world that we live in is in terms of the computing systems that we build where for a long time, we've had the Von Neumann architecture

that has served us well.

And now with

the growth of AI both on the training side, but in particular on inference, which

is from a distribution perspective, more ubiquitous, everybody needs to be able to do inference and particularly as we start to push things into the edge and on mobile devices.

And I'm wondering how you see the engine of AI

forcing us to rethink how we construct the drive train to be able to actually harness that power and some of the effect that it's having on the

systems architecture

at the compute level and how we think about actually building our computing systems?

That is a really difficult question to answer.

There are there are all kinds of examples within,

AI right now where the techniques

bend

to accommodate the hardware.

And then there are instances where the hardware will be modified

to

specialize in optimizing for some algorithmic advancement,

transformer being, you know, the best example of that.

Right now,

you know, it it is absolutely fair to say that that deep learning is a dominant approach, and within deep learning, transformers are dominant approach. And if you look at if you look at a if you look at a transformer,

you know, it it one of the funny jokes is that that the the famous paper that the transformer architecture came out of was called attention is all you need. But if you actually look at

the amount of parameters within

any transformer model, most of the parameters are still on the multilayer perceptrons

that are that are at the end of each of the attention blocks. And so

that's just

that's just linear algebra. That's just matrix multiplication. And so,

I think for at least the foreseeable future, the bottleneck

within AI is going to be that ability to do dot products at scale. And I think we're gonna see companies like NVIDIA just pouring more and more money

and, you know, resources and and and time into seeing how much they can scale up

and and move to a,

concurrent

parallel,

computation of these enormous, you know, matrix operations.

Beyond that, candidly, I just don't I just don't have a lot of visibility.

In your work of investing in this ecosystem of generative AI and helping organizations

figure out how best to harness that motivating force of the LLM as engine,

what are some of the most

interesting or challenging

or innovative ways that you have seen people

try to conceive of the

ways that those LLMs

are able to have a transformative

force on

their organization

or on the ecosystem in which they're operating?

I okay. I think probably the the thing that I'm most excited about are within sort of the that domain, the way you described it there, are not just pure LLMs, but these sort of multimodal

language models. So these large

language vision models. And we're seeing more and more examples of sort of multimodal models that are conditioned

in a way that allow them to provide

outputs

and capabilities that, you know, frankly, it just seems like magic to me. So I I'll give you maybe a couple examples.

We're seeing companies take, multimodal

language models

and condition them on,

3d3d

sort of CAD space

like problems. And then you can literally,

write in English, in text, what you want the CAD to generate and manipulate it with really pretty high success, you know, these these

AI generated meshes.

We're we're seeing this also at the intersection of

health care on health care data for assessing that. There was there was actually, an article just, like 2 days ago in the New York Times talking about

how LLMs

were dramatically beating doctors in this relatively small case study of doing patient assessment. And even when, even when the doctors were paired with the language models and they were able to

collaborate with them, language models actually

outperform the doctors.

And in their sort of, like, post evaluation of why, it was because the doctors

came in with some preconceptions.

And when the language models pointed out flaws in that, they basically ignored it. And another example and again, this is why I say we're at, like, day 0. We are very early into this. Is, you know, there are now these multimodal

models that that are capable of on the fly game generation. So there was an example of, like, a sort of a Minecraft

generation game that you can type in and build the world,

but its world model is really weak. So, like, if you're looking at a view

and you turn and you do a 360, when you come back, it's changed. Like, in the moment, right, its world view is just very ephemeral.

But it was conditioned on those

Minecraft,

contexts and and can generate, you know, at a at a high frame rate, you know, this imagined world

already.

So I think that those are probably maybe the most radical examples, and you'll notice all those are

kinda mostly toys still, and that's because it's just really, really early days.

And in your own work of navigating this space

and trying to grasp the current phase of AI that we're in, what are some of the most interesting or unexpected or challenging lessons that you've learned personally?

I I think that

I am

continually

surprised at the power of the diffusion approach, I think. I think that may be the thing that I'm most excited about right now

overall.

You know, the diffusion

the diffusion approach just for our listeners, is this idea of taking

some

some input and adding

some some

proturbance to it, typically noise. And so if you take maybe the canonical example of images,

you take an image, you gradually add, let's say, Gaussian noise, and you train a model to be able to remove that noise at different

stages

of that

process.

And at the end of the process with images, you know, you've just got an image that's just full noise. There's nothing there that's that's even remotely,

recognizable.

But you've conditioned

that model throughout this whole process

on a text input

that was embedded in such a way that the model can learn

what the image

contains

semantically.

And at the end of this, you can you can literally take a text string of something you wanna create that maybe has never existed in the universe

and give that model an image with just pure noise in that string describing the what you want, and you lie to the model. And you say, this actually is that image. It's just actually is that image. It's just got a bunch of noise in it, and it will denoise it. That approach, we're seeing that work in robotics.

We're seeing that work in protein folding. For example, AlphaFold 3, which is the just

breathtakingly

powerful computational

biology model released by Google DeepMind this year. In fact,

DeepMind CEO, Dimas

Hassabis, and, John John Burdick both won Nobel Prizes in chemistry for this work. It uses a diffusion model. What they do is they basically

put in

coordinates of the atoms, the the different atoms within protein molecules, and they perturb it. And what this allows them to do is

use what's called a pair former. It's a variation on a transformer

to generate

potential,

proteins

that,

amino acid sequences will generate and then use the diffusion models to refine them, and they're getting fantastic

accuracy on

this. And so we're gonna be able to do, you know, genetic

therapies, drug therapies, infectious

disease therapies that are all going to be

AI generated

approaches,

each one of which might have been a PhD

dissertation. Right? You would have spent maybe 5 years trying to figure out how that that protein folded. Now you can you can enter the amino acid sequence and go get a cup of coffee and come back and have the answer. So I think the diffusion approach right now is the most important thing happening within sort of architectural advancements within AI.

Given all of the excitement

and

fewer over generative AI as a solution to whatever problem domain you want to introduce it to, what are the cases where you would advise against the application of generative AI or LLMs?

Anytime

you need

absolute

certitude,

I would I would say you need to be very careful. Now if you're willing to have a human in the loop, which I would argue,

you absolutely should with almost

almost any generative approach right now then you're fine.

But, you know, you you you definitely would not wanna live in a world where, you know, the doctor comes to you and says, well, we need to perform surgery. And you say, why? And the doctor says, I don't know. But, you know, the AI model told me that's what we need to do. So

generative

language models,

etcetera,

incredibly powerful.

At this stage,

treat them as human augmentations,

and you can go to town. You you can you can build really, really powerful systems. Just avoid them

as sources of truth at this point because we're still struggling with control.

Are there any other aspects of LLMs

and the vehicles that we need to build for them or the aspects of control and challenges around that, or you just experience working in this space that we didn't discuss that you would like to cover before we close out the show?

Probably the probably the only area that we didn't discuss that I'm pretty excited about is interpretability.

And in particular, I think the work from Anthropic

over the last year has been fascinating.

They're using sparse auto encoders to really dig in and try to understand how these large language models

are representing

inside

the parameter space, different concepts.

And they have the famous example where they they, were able to isolate,

let's like, a concept like the Golden Gate Bridge in San Francisco. And they found some really fascinating things. One,

that that concept was spread out across

many

neurons within the model.

2, that

it didn't matter what language

you were operating in, whether it was English or Korean or Russian,

the same representation

was used across those languages,

including images.

So if you for a multimodal language model,

they found that the Golden Gate Bridge

image

capabilities

also uses same neurons. And then lastly,

they did this I just think this is so fascinating. What they did is they

they asked the the model, you know, to describe what it looked like physically. And the model said, well, you know, I'm I have no physical form. I am a I'm an AI program, etcetera.

And then they,

manipulated

the model, and they took the neurons that,

is that they'd learned that encoded the concept of the Golden Gate Bridge, and they they forced those to output it 10 times their normal level.

And ask the question again. And the model came back and said, oh, I'm the Golden Gate Bridge, and I have, you know, this shape and this form and this color. And so you could

manipulate the model to say things you want. And so

this is, I think, a very, very major step forward

in interpretability

and explainability, and I think that this will bear fruit over the next 5 years in a big way. And it will allow us to not only get around some of the control issues we're seeing right now, but it will also make these models

much more likely to be used in domains where explainability,

interpretability,

like medical cases is just,

you know, nonnegotiable.

It absolutely has to be there. So I'm I'm super excited about that stuff.

Yeah. The

visibility

into the internal state, I think, is definitely a very important

area of investment where we need to dig into. So I'm I'm excited to see more progress in that space. So,

So, yeah, definitely excited to see where things go from here

when we get to the point of the model t and when we progress to the point where we actually have some of the current generation of vehicles where they have all of the bells and whistles of safety features, and it knows where I'm about to, park too close to the guardrail or what have you and starts beeping at me. So

Exactly. Yeah. The the these models are really

powerful and smart. We need to we need to, get them to be a little more reliable.

Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gaps in the tooling technology or human trading that's available for AI systems

today? I think the biggest limitation

are the 2 things we've hit on, which are,

control and interpretability.

And

they are not deal breakers,

but they are,

I think, limiting

the velocity,

of adoption in different domains where we really where we really need them.

But I'm

absolutely optimistic that we'll figure that out. It is,

it is not an exaggeration

to say that I think

as a part of this journey towards

understanding in a deeper way the way these large,

deep learning systems work and and as we make them less of a black box, we are simultaneously

probably going to start understanding how our own brain works. It'll probably

go in tandem. And even though, you know, we can build jets and they don't flap their wings, you know, there are many different ways to fly. I think that's also true with,

intelligence,

but I think we'll probably be surprised to find there are going to be a lot more overlaps than we initially suspected.

Well, thank you very much for taking the time today to join me and share your experience and expertise in the space and your perspective on where we are in the journey of AI adoption and AI capabilities

and some of the areas of investment that we need to make to improve the operability of these models. So thank you again for taking the time and for the work that you're doing to help organizations tackle those problems, and I hope you enjoy the rest of your day.

Thank you so much. This was a really, really fun conversation.

Thank you for listening, and don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest in modern data management,

and podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used. You can visit the site at the machine learning podcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at themachinelearningpodcast.com

with your story. To help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

AI Engineering Podcast