The Power of Community in AI Development with Oumi

Hello, and welcome to the AI Engineering podcast,

your guide to the fast moving world of building scalable and maintainable

AI systems.

Your host is Tobias Macy, and today, I'm interviewing Manos Kokamidis about Oumi, an all in one production ready open platform to build, evaluate, and deploy AI models. So, Manos, can you start by introducing yourself?

Yeah. Thank you very much to Obias for, having me here today. I'm Manos.

I'm the CEO of Oumi.

Until about nine months ago, I was at Google Cloud. We're supporting, all the natural language AI services.

Also, Bootstrap had boosted the efforts for

Palm. That was the model as we called it before Gemini.

Let all the whole v team until we'll, went to general availability in May of twenty three before it was,

rebranded as Gemini and moved, for the DeepMind.

And, yeah, before that, spent some time at a start up at Meta working on conversational AI,

Microsoft where I was building something like 02/2016.

All these things now that we say embedding based retrieval and drag, who were doing them back then?

The only difference that we're using kind of with older technologies, LSTM, third generation, they were not transformers at the time.

And, yeah, before that, I was doing a PhD on,

on device AI.

And do you remember how you first got started working in the ML and AI space and why you've decided to stick with it?

Yeah. That's a great question. So, I started actually my PhD research on

what was called, Internet of of Things,

distributed sensor networks, and things like that.

But, right when I was starting was when the first iPhone was coming out, and I go back to my advisor and say, you know what? One day everybody's gonna have one device like this. It was way before iPhone became so popular.

And all this device that had all these sensors and all these ways to collect data, I was like, you know what? You need to be able to do something with all this data. And that's how I got into machine learning. And then soon after I graduated,

and I went to Microsoft then, I I just started working with NLP

way before

Google,

Microsoft, and many others started announcing that we're an AI first company. I was already working on AI

between PhD for four, five years and then already at Microsoft for another three, four. So I was already,

very deep into AI and, you know, it was clear to me that's gonna be the way things go. There's all this data. You need to do something with it, and

it was just

fascinating or intoxicating to have such a powerful tool to do something with data.

You've mentioned a lot of your background

and the fact that you're building Oumi. Wondering if you can just give a bit of an overview about what it is that you're building and why you decided to leave what you were doing at Google and start this new venture at this time.

Yeah. Yeah. So Oumi is,

a platform and a community

to advance Frontier AI in the open. This is,

exactly the opposite of what I was, doing

before at Google even before JetCBT started, like, a couple before, I was announced. A couple months before that, I was starting the effort for what became Cloud Palm and then again, rebranded later as Gemini.

And even within a couple months into developing,

SaaSipity coming out, there was quite a few enterprise. They're like, okay.

I've I've tried to SaaSipity from OpenAI.

I'm happy to try Palma as well. But can I get the model myself so I can use it the way I want and change it the way I want? And we're like, no. Sorry. It's a it's a black box. You can call it through an API. So I started realizing back then, you know, how limiting it is for many enterprises to

expose this powerful technologies as a black box behind an API. But then where kind of what made an even bigger impact on me is that I started realizing that I mean, I knew for many years that AI is gonna be very powerful. But I started realizing that these foundation models are being used now for and they're gonna be used increasingly for almost anything.

People use them for material science, climate science. People use pre trained language or multimodal models to solve partial differential equations

and, even health care or even I mean, everything. Like, again, including health care. And when you have something that is so foundational,

it should be like a utility, right, something that's gonna be powering everything in science and industry. It's just a disservice to humanity to put in a black box just so that some companies can freely monetize

and then impede the progress of science and all these other enterprises. And I was like, okay. Philosophically, it's it's it's it's a bad thing. Yeah. We we should be a more accessible technology.

And where but I was okay. This is philosophically the right thing to do, but what about practically? You know, would that ever be a reality? Because all these large companies brag about how many GPUs they have, and that's their mode, and it's very hard for anybody else other than them to do it. And then I started realizing that this was a lot of tailwinds. There's a lot of all the cloud providers, unless they are, as I say, unless they're one of the few aspiring AI oligarchs, you know, then those few companies that know who they are, nobody else wants closed source AI. Everybody wants

open source AI to succeed. All the other different cloud providers, they want there to be a competitive open source model so they have something to serve on their clouds. All the accelerator providers including NVIDIA and others, they want there to be strong open source model so they can optimize also their hardware on them. Because otherwise, everybody's gonna go use the closed source model and the specific accelerator by the closed source model provider that is optimized across software and hardware. So and then even consumer companies, like Mark Zuckerberg said, it will be very problematic for us, even a company like Meta, if we have to go to one of these closed model providers to get access to the AI that we need. Very problematic. It's a key powerful technology. We it's gonna be used everything what we do. We need to have free access, unconstrained access to it. So when I realized all these stalewinds, so, okay, philosophically, it's bad for humanity. There's all these stalewinds, all these big players in the ecosystem of an open source AI succeed. And then on top of that, I started realizing that it's actually such a complex technology. It's not just about the GPUs.

And, arguably, if you have somebody like Meta or somebody or or one of these big companies putting in the GPUs to pretrain the model, then it's more about

how many creative minds do you have that could spend time to improve those models especially with post training

on all these different capabilities across all these different modalities.

And then it's more a matter of

a community about having many smart people that can iterate and improve these things in parallel

and less about having a huge cluster. So,

so, yeah, start realizing that actually community will be a more powerful mode,

more powerful thing than having more access to GPUs

within a single data center. And, yeah, combination of those. It's

philosophically bad for humanity.

Practically, it can happen. Actually, it's the most plausible scenario that the best AI could be developed in the open by a community

with, all our nice on it,

to both advance faster and safer. So, you know, this is just way too compelling. I should just go build it. On that note of open versus closed in terms of models, there's been a lot of conversation around what that even means where the initial

batch of models that were available for people to download and run were termed open source, and then there is a lot of debate about what does that even mean for a model to be open source because it's not just the code. The data is equally important as well as the parameters that go into training and tuning, etcetera.

And

in terms of models that can be truly termed open source, there have been a handful. The most notable one that I'm familiar with is the OMO and and Omo two models from the Allen Institute for AI. And I'm curious what your

thinking is, particularly in the context of what you're building at Oumi, what it really means for a model to be open and open source, and what you're focused on

enabling along that spectrum and any lessons that you've learned from some of those precursors such as Olmo and, I think the NEEMO model, the the embedding model as well at least.

Also, really, really great question. So starting from first from the definition, so we're on the same page. You know what we mean by open source? In my mind, you know, to to suffer for something to qualify as open source,

it means that somebody else could reproduce

what you built and extend it and make it better.

Which means, at least by the OSI definition, which is very comprehensive, that it should be open data,

open code, open models, open weights. Because quite often, as you mentioned,

most of the companies, they just released the weights. They just released the model, and they say this is open source. But this is not really open source, again, not by the OSI definition, which I think is very accurate because to be, you know, something that, translates well to AI, again, you need to have all these three ingredients so that you can reproduce,

that work. So I think this is the very bare minimum. And, as you mentioned, AI too with, almost one of the very few organizations that hit all these three points.

One other thing, though, that we have as a key goal is we say, okay. It needs to be open

data, open,

code, and then what you develop the models the way it should be open as well. But also should be, open collaboration. And we mean kind of like two things,

when we talk about this. It means that, as I mentioned before, that it's not necessary

for somebody to theoretically

be able to reproduce their work and extend it. It needs to be easy and practical,

and it needs to be inclusive. So what I mean by this is that there's quite a few open source solutions, open source.

Actually, a handful, not quite a few, including a two. But

it's very important that

any of these things are very easy for anybody else to reuse

and experiment

to continue advancing those technologies.

If you don't make it easy and you say, you know, it's okay. I released my code, my data, everything, and, yeah, if you try hard, maybe you're gonna be able to reproduce it as well. It still impedes greatly the community to make progress. So that's one key element, making easy starting from there. You know what? It's it should be very easy for somebody to go and reproduce

and then extend what I built. And the other aspect is that because actually it's such a complex technology,

we and actually our aspiration is to do, and we're already starting to do that, anything that it would take to help the community to collaborate and contribute. For example, this means encouraging and having efforts that encourage

anybody in the community, not just a few selected contributors or partners that we have from a university or two, but anybody in the community. So, you know, you want to contribute, here's all the different ways. And we're gonna structure the efforts in a way so that depending on your background, you're more technical, less technical, you can still find ways to contribute. Because again, it's a technology that needs all hands on deck. It's extremely complex, and that's why I think many of the existing closed model providers, they are failing us. You know, some areas they're not or or many areas they're not as advanced or as safe as they should be. So that's what we aim to do. Again, they open data, open code, open models, but also open collaboration.

Collaboration is another interesting aspect in this ecosystem

because it's not as straightforward as just opening a pull request on GitHub and offering a patch and waiting for it to get merged. Because in order to collaborate on the model itself,

you need to be able to, as you said, build it, test it, evaluate it, and

it's a much more lengthy process to doing all of that versus just let me clone a repo, make a couple lines change, and push it back up where you can have that rapid cycle of iteration and adoption.

And Hugging Face has been

the focal point for a lot of the community building around

the kind of open models, both in the era of generative AI and large language models that we're in and multimodal models, but also even leading up to that. And I'm wondering

how you think about that aspect of community building, collaboration,

and the role of the community in terms of being able to actually iterate on those things and any of the supporting infrastructure that's necessary to be able to facilitate that

rapid adoption and collaboration and testing and evaluation?

Yeah. Yeah.

Very interesting question as well. So quite often when people talk about, and we talk about all this frontier AI and doing a collaborative open source way, indeed, models are a big part of this, but actually, I would argue it's about the overall ecosystem. Because you can develop the models, but unless, again, you have all the other tools, you know, better data preprocessing,

you improve all the pre training, especially the scalable

distributed training algorithms.

You have great implementations for different techniques like the g the new GRPO that DeepSec use. All of these things are actually just a you could say pull request. Somebody can send the pull request and contribute. And having such a powerful ecosystem,

that helps the community to do this research end to end, and I'll I can elaborate a little bit more of what I'm a mean about this later. It's also as I would say as important as the actual model itself. Now when you go to the model development, it's actually also something that, and this may be less intuitive to at least most people that I talk to, but it's also something that can be easily done across a a big community. The good news is that quite often for many of these improvements, especially when you talk about post training, which, you know, is massive opportunities. Most of the latest improvements

for the models have come from post training improvements is that somebody could test those ideas at reasonably small scales like a seven, eight b model, which you don't need so many GPUs to do. And then, sure, it may be up to a bigger organization

like Meta, perhaps ourselves, or some of our partners. So you know what? Okay. We're gonna take all these great recipes that the community tested at small medium scale

across all these different capabilities, across all these different aspects, and then combine them together to put them in a then train a bigger model just because, you know, perhaps some university didn't have enough resources. So anyway, so the meta point is that the improvements don't need to be done directly to the Frontier models. They can be done at small to medium scales which are very accessible to the community.

And those improvements can be, if not, let's say, like, deeper architecture improvements, even just

data improvements. This can go a long way. Or algorithmic approach, like, hey. Do GRPO, and here's how I tested this. And and now, you know, go test it on many more,

you know, at a much bigger scale. So that's why

there's there's so many sales contributions that an open community can do, again, at a small scale, and then you can combine them to train a bigger model one day by some bigger organization. And the things that you need is, again, to have an infrastructure that lets you go through all the different steps that you need, whether you

are curating your data, synthesizing

and curating your data. You're experimenting with some new training algorithms. So it's good if these are implemented and readily available to you. Also reasonably well optimized so you're not wasting your limited perhaps GPU cycles. And then even things like, okay, you test it. You build what you want to do, and you want to evaluate it to make sure it improved on a certain capability you're working on, but didn't regress on everything else because then it's

not as meaningful.

But even that currently, you know, it takes a lot of effort. You have to integrate, I don't know, against how many repositories so you can test foundation models comprehensively.

And it it shouldn't be like that because all that, it's impeding

anybody in the community to make progress.

And by the way, that's actually what, what we're doing with Oumi, to make all these steps across end to end easy. So it is easy for anybody in the community

to make these small contributions that in a way that they adopt. Because they're all done in the same end to end platform that all the contributions they did, they're fully recordable, and then easy for anybody else to go and combine them. So, you know, I'm gonna take that data, make sure that this guy created without that, and then combine them together.

So digging into

Oumi itself, as I was preparing for this interview, I saw a couple of references to the fact that you're aiming for it to become the Linux of the AI ecosystem where it is the common substrate that

becomes as natural as,

water to fish or air to people living on land. And Linux, even itself, never actually set out with that mission. It was just, hey. Here's something, and then it just gained adoption because of the fact that it was accessible. People could use it. I'm wondering what you see as some of the challenges to building that type of momentum for Oumi and some of the ways that you're thinking about greasing the skids to be able to actually start that snowball rolling and build up the momentum needed for it to become as much of a de facto standard as Linux has become? As you mentioned, it's hard to plan and say, hey. We're gonna be the Linux of AI. But that's indeed,

the aspiration. And more than, you know, hey. You know, we think that, you know, we should be the Linux of AI. What I tell people is it's really important that somebody, like, that has our strategy let's say, if we fail, I really hope that somebody else with our strategy succeed because it's very important for humanity, every single enterprise, sciences, humanity for something like that to exist. And usually when I use the thermostat Linux is for people to to relate to something they understand that, you know what, the same way

that UNIX had come out, everybody said, you know what, this is the best technology developed by, I think, the best engineers in Bell Labs, I think it was, until Linux came out. So you know what? It was not actually as good at the beginning, but because it was flexible, open source, people say, you know what? I can take this, customize to my needs, and it's gonna be actually as good, if not better than the black box because I can change that. And then I have all the benefits of open source. Again, I can full control of my own destiny. It's actually lower cost. So, and that's what we want to suggest to people that, you know what? You can use the UNIX because now you think it's easy. But if there's a Linux that is almost as good and actually are you better and, you have all the benefits of open source, you know, wouldn't you not want that? And the way to success because, usually, you can't say, hey. This is the Linux and the whole community just flocks. You know, it just doesn't work that way. Is to make sure that what you build the way, again, when Linux first came out, that it provides value

to solo researchers or solo organizations to use it. And again, like Unix, you know what, this is a more flexible alternative. It's as good, even if it's a little bit less let's say less good. Just because you can customize, you have all the control, you have all these benefits, you can make it at the end to become a better tool for you than the black box. And then more and more organizations use it. Again, the same thing happened with Linux to the point that the community slowly, builds up. Just also at the same time, as I mentioned before, we're very intentional that while we want this to be a powerful tool that makes sense for solo researchers or for solo organizations,

that we have all these efforts that encourage participation,

that make it easy for people to contribute and others to build on the work of each other. Because these things may not be intuitive if you don't design for them, you know, it don't, you know, happen by by themselves. And, yeah, hopefully,

that's the hope, and we can see we have seen very good traction so far that more and more people find it useful, and they also make it better for the benefit of all of us. In the AI ecosystem

as well, for a long time, there's been a large number of tools available that you can pick and choose and cobble together even more so in the Gen AI ecosystem where there seems to be a new tool every day, especially with the fact that so many of those tools can be generated by AI.

And there have also been some efforts at managing that end to end flow of user experience. I'm thinking in particular in terms of meta flow and ML flow for being able to manage that

bridging from local development through testing and evaluation and experimentation

to infrastructure and deployment. And I'm curious how you have

approached the work that you're doing in Oumi to address that end to end need and figuring out what are the shortcomings in the existing solutions that need to be addressed as you build out this platform.

So quite often people ask me about this also, you know, in relation also to other, frameworks like Hugging Face, you know, how you know, what is exactly the things that you're trying to do different? What's the problem you're trying to solve? And, parts and lots of the the motivation for how we and why we started this work besides all the reasons I mentioned earlier were discussions with two academics,

from CMU that then became over a dozen, actually two dozens of academics, and they were all telling me the same thing. That, yeah, there's tools like Hugging Face. Yeah. You can also talk about high level orchestration tools like Mlflow, Airflow, and things like that. But, for my students to be able to do this research on foundation models, it's way harder than it was two years ago. For one, none of my students has done multi node distributed training, meaning to scale a bigger model. And it's not that we have a couple dozen GPUs. We don't have too many. So but we could do it. It's just from the framework perspective, from the software perspective, it's just too much friction for them to figure out how to do all these things. And then I mentioned even just evaluation right now. You can say, yes, I support evaluation, but all the different, benchmarks that are out there, you have to integrate with 10 different repositories or many more to to be able to access all of them and use them and evaluate what you are doing. So,

and that's what we set out to do that we said, okay, we're not gonna reinvent the wheel where we don't have to. It just doesn't make sense. But we're gonna make sure that there's an end to end fully flexible open source platform

that provides all the tools that somebody typically needs. So they're away from data that is standardized. Because, yeah, you could say datasets are on hygiene phase, which is great. Actually, we really love hygiene phase as well. But, quite often they come in many different formats. So, you know what? It wasn't a matter. We're going to standardize them to the same format. At least we have converters to put them to the same format. So, if you want to use them for your research, you don't have to, every person out there, they don't have to reinvent the same converters and then you can easily consume them. And then for different training libraries, whether you're using TRL from Hugging Face or the custom training loop we created that is more easily extensible. Or let's say Torchune or any of them, you can still try all of them within the same unified API.

You don't have to figure out how you can integrate with each one of them every single time. Again, all the different post training tech techniques. Again, readily available,

evaluations. Again, you can have all evaluations without figuring out how you can integrate with all these different libraries. Actually, to to to give you an idea, now to buy, we even go a step further that says, hey, you may want to evaluate close models like Gemini and Tropic.

Yeah. You can do that as well. You can just define when you want to evaluate once. Just put your API API key from OpenAI or in Gemini. You can even try to evaluate those as well. So anyway, that was the idea that, that there was way too many friction points that should not exist. And, there was a huge opportunity to make it easier for the community to do this type of research. And at the same time, because you have something that works end to end, you don't have to integrate with 10 different repositories. The key thing is it also facilitates a lot to have contributions that are fully recordable and reproducible. Because that was happening before and what we're seeing from many PhD students is that, also in academia, because they will struggle the same is that for some research that they saw that they want to reuse, the authors,

use this repository then some ad hoc hacky script to integrate with something else and do the next step with something else, something else, but there's no no way the whole process recorded, let alone in a way that's more standardized and easier for everybody else to use. So that that's kind of like a a little bit of the high level goal. And we had people that said, oh, you know, there's all these tools for John Distributed Training. I haven't managed to make it work for a year. So, okay, try this. It shouldn't be any harder. And you can run it on your cluster, just change your deployment. You can also put a GCP, AWS, Azure, whatever together, Lambda, whatever you get your compute. As a result, people shouldn't be any harder than this. So that's been the the goal. In your work of building that end to end experience,

managing that flow, the experimentation

tracking, contribution tracking,

how did you approach the

design and architecture and component selection to make sure that you are building it in a way that is sustainable and maintainable

as well as adaptable as those components either cease to be maintained or get replaced by newer and better, etcetera?

Yeah. Yeah. Really good questions. Right now, as it stands, we're in an organization of 11 people. Because of the constraints, but also at the same time because it's the, I would say, the right thing to do, we said we're not gonna reinvent things that we don't have to. If there's already established tools that work well, we're gonna reuse them. We're gonna try to fill in the gaps or,

you know, grease things wherever there's friction points, standardize things wherever they have to be better standardized like the datasets, for example. But we're gonna build on the Solrash of giants. We're gonna build on the, on top of all the great things that already have happened in open source. Now to give you an idea, we I mean, yeah, this is,

built based on PyTorch.

For,

core capabilities like inference, we use the most common inference frameworks like a ST Lank or VLLM.

For training, as I mentioned before, we use, TRL for Honeywells because it's very commonly used with our own custom training loop. Torch tune is also now being integrated

for deployment. You know? Again, we use all the most common technologies like,

VMs with, Kubernetes.

We use a lot of Skypilot wherever we can to deploy to different clusters, whatever not we use our own implementation. Yeah. MLflow Airflow for

orchestration.

We have integrations with weights and biases,

tensor board for observability. And all the things that work well and people like, we say, you know, we again, we're not gonna reinvent the wheel. But as I mentioned before, there's still still a lot of gaps

or friction points that had to be smoothened out. At the same time though, there is some things I would argue that they're still missing, in the ecosystem.

For example, while there might be some solutions, there was no great solutions that help enterprises

use this AI technology as a reliable production. I'm sure you've said about things like hallucinations,

safety guard drills. For some, there are some good solutions out there, but we think much more needs to be done to help organizations because organizations are still struggling with these things. And so, you know, this is still the reason why we cannot trust these things to put them in production. So, yeah, again, all in all, you know, build on the surface of giants. Do not reinvent the wheel unless we should only build something ourselves from scratch unless there's a very good reason

to yeah.

Just put the focus on those specific things that, again, are missing right now. They're not very strong offerings. As far as

the adoption,

both of Oumi, but also of

building and maintaining your own models, A lot of the work that has been done till now by different teams is to take a model off the shelf, whether that's a llama or mistrial or what have you, maybe do a little bit of fine tuning, but typically just build a rag pipeline, maybe

hosted

on something like Amazon Bedrock

or base 10, or just use

one of the APIs and then build all the scaffolding around it. Whereas you're focused specifically

on being able to enable the actual building and,

evolution of those foundation models. So we've already addressed the fact that there are challenges beyond just the code and access to data because you have to have access to the necessary hardware to be able to train these models because they're very compute intensive.

You need to be able to have access

to

that underlying data, which there are corpuses available. But if you want it to be specific to your business, you need to have your own data to make it actually useful. And

I'm wondering how you're seeing

those barriers to adoption as far as hardware access, data access,

and and just the overall know how to manage that, impede the potential for adoption and building momentum in those organizations?

So,

arguably, many organizations right now may not have the know how, how they can best leverage these technologies. And I think that's why quite often they may just go use something like, you know, JWT or the GPT for all the or any of the newer models just maybe change the prompt and play with it. Because actually, may many of them realize, but I, perhaps I would argue the majority of them do not, that you can

use an existing open model, and especially the moment you customize it with your own data on your own domain, you can get a way better quality than the model out of OpenAI, Entropic, or Google as it comes out of the box. I had talked to many customers saying, no. No. Actually, you don't, you know, you don't need to convince me. I've already tested myself. But I just struggle to experiment with open models because each one behaves differently and now there's a new model that came out of for example, from QAN, and I want to move from PHY to and it's so hard and that's why, you know, we help them with this. But there's some other ones actually that they just don't realize also about the potential and what they they could achieve. And they are also daunted by the friction. Because again, what we hear from many of them is that it's too daunting to figure out what works in open source, how they should best combine all these different components to get their job done. And, again, that's also one of the things we seek to make very easy. And as you said, we may not have a RAG in such compound systems right now, built, but we have a growing set of, I think, over 200 recipes that says, you know what? If you want to train this specific model, let's say any llama, quen, dipstick, whatever that is, or phi of this specific size, here's a recipe, a configuration that says exactly what the best parameters and how you should go about training this. Even if you don't know what you're doing

or you have no ML background, if you could just put your data on the same format, which any application developers should be able to do and you just hit the only training with the same configuration file, you should be able to get very good results. I would argue, I don't know, perhaps

95%

of what an expert would get, hopefully, if they play a lot and that you know the parameters themselves.

And

very likely we're better than getting that black box that is not customized to your domain and task. So that's what's our goal to make it very easy because, again, it shouldn't be hard even for those people to get to a good enough or much, again, much better solution than the black boxes. But then the more experienced you are in as different organization build up more and more technical expertise, then they can turn the big knobs, the small knobs, or even go to the source codes and adapt it. Okay. That being said, you also mentioned about data. Besides just saying, okay. Put your data in the right format and train. We're also building utilities to help them synthesize and create better data to the point that, you know, actually, right now, you may need to know a little bit what you're doing. But if you have a set of prompts, you can synthesize responses for even the biggest models. You don't have to worry about how do you do inference with these very large models. Even curate and clean up your data using LLM judges and other automatic approaches, which actually arguably doesn't take a lot of, expertise. But there's more and more things that we're doing to make it easy because our principle is it should be as easy as the black boxes,

for those who don't have the expertise. And as they gain more expertise, then they can go deeper and deeper and deeper and squeeze out more and more color because I think that's also the winning risk before your organization. I talk to kind of people I talk to that if you consider yourself an AI first company, don't stay in the shallow waters of just playing with a prom because your competitor may not do that and then you're gonna end up on the wrong side of history. Start flexing your AI muscles. You could start for something like this. It's easy and then go deeper as, you know, as you learn as you learn and learn more. And on that data

portion as well, it's also difficult to know unless you have already built up that expertise,

how much data you need, what data you need to be able to actually build something that's worthwhile. And so I'm wondering what you generally use as a heuristic when you're talking to people

of you need either x gigabytes

or this many samples or this format of data to be able to either build your own foundation model or take an existing foundation model and fine tune it or etcetera, and just some of the ways that you think about those different gradations

of capability?

Yeah. Glad you asked. So this arguably something where we need to do better right now, or at least we should have some guide or something that makes it easy for people because there might be something out there. But think there's a lot of value. Tell somebody, you know what? Go to this one place. You're gonna get some high level guidance about what you need to do and then even the commands, anything you need to do to get going because that's where people struggle. So, you know, I I don't know where to start. It's just chaotic out there to figure out what is the right guidelines with which tool exactly to get my goal done. But overall, the the, you know, if I were to tell somebody in short what the guidance would be is I mean, typically, the idea is that the bigger the foundation model, the less data it needs to to be aligned or to learn how to do a new task. But now you may ask how many. You know, it could be as little as hundred, a couple hundred. Yeah. Maybe we go to thousands, maybe even better quality. But the best way is to do, a learning curve and say, you know what? I'm gonna start with hundred example 1,000 examples and see,

how, let's even start with 10. You know, what is the quality I get with out of the box with zero? What is the quality I get with 10 samples? What is the quality I get with a hundred? And then go to 500,000.

And then you can see how the quality improves or how it starts to plateau and say, okay, I'm not getting more benefits. I should stop here. So, yeah, the best the best you know, the best ways to do it incrementally.

So for somebody who wants to start experimenting with Oumi, incorporate that into their development workflow, maybe even their

inference and serving

use case? What's the process for getting started? What are the hardware capabilities that they need? Where do they look to for data sources to start that experimentation

and playing around if they don't already have their own datasets that they want to build from? Yeah. Yeah. So currently, Oumi is, an open source library on GitHub, which means anybody can take it, download it even on their local map MacBook. You can, if you don't want the to get clone the code, you can just install the library. And, you could even,

train a tiny model on a CPU. Actually, we have some recipes because, you know, maybe somebody wants to learn how they can go through all these pros, but they don't have access right now to some of the bigger CPUs. So even on CPUs, you can train a tiny model if you want to. If you have an actual GPU, even better. And the good thing is that you can define, for example, the training recipe. What do you want to train or what do you want to evaluate once? And then all it takes is to change the deployment config. Again, not code, just the configuration. Where you say, for example, here's my DCP credentials and go deploy to 10, VMs, 10 GPU user. And that's that's that's all that's all that it takes. At least for all the major clouds,

and we support again AWS, TCP, Azure, Together, Lambda, Rampodes, and a few more perhaps I'm forgetting. Or you can put on your HPC.

We have tested this over a thousand GPUs. And again, all it takes is just change the command. You don't need to or the configuration. You don't need to write any code. And the idea was that assuming that you have the GPUs, the recipe should be have have to be recorded once. And then again, the more GPUs you have, the more you can change the deployment to scale and do more and more. And also in terms of datasets and all these things, somebody gave us actually two, two different academics. A post doc and another professor gave us the feedback. You know what? Actually, this seems to be the best way to learn how to, experiment with foundation models because it's very easy to use. And I think, actually, it it is easy. That was the goal, the design goal from the beginning. That it should be easy, but still extensible. That should not inhibit anyone from going beyond what's over there right now. And, yeah, there's already existing datasets that are integrated. You can just say, I want to use this that alpaca dataset or these other datasets. You just you just specify with a name and then just automatically download from Hugging Face and you can start training and or you can specify also your own if you want to, like, if you have something in your local directory, directory, for example. And to that point of extensibility,

evolvability,

what are some of the

places that you have added some of those escape hatches for? I need to go deeper, build my own custom capabilities,

or add in some additional dependencies, or I'm already

using something like a, you know, lane graph or maybe I'm building my own deep learning models, and I wanna use that as the evaluator. Just some of the ways that you're thinking about the extensibility

of the framework to allow for people to pull it in the directions that they want, and then also some of the ways that you're thinking about

contribution back to Oumi for from people who have done that extension and, experimentation.

Yeah. That was a very important design goal. And also relates to the previous question you, you had asked about, hey. You know, components come in and they go. Maybe there's some new technology in the future. Right? Or somebody some you should just said something that somebody else wants to contribute. That's why making sure that we have a very good design that is very with well defined components, with good abstractions was very important so that we can integrate again with all the different, inference engines, you know, with HSTLANG or VLLM or any other one so that we can have both TRL or any other training loops that we define. So all if you go all across the stack step by step from the the dataset definition and standardizing also that format so that anybody can introduce their own datasets with the same abstraction

to

again, only the steps, tokenization,

the trainers,

the evaluation, even that is well, define how somebody can extend it and create their own evaluation library or their own benchmark or their own evaluator,

based on the specific metric they want to define. Yeah. Even for high level tools like, hey. Here's how you can define your, LLM jobs so you can do your own auto evaluations or your own automatic data curation. All of this if you look at every I may be forgotten somebody. If you look at every every step of the way, we say, you know what? This needs to be a very well obstructed

components so that it's like Lego blocks. People can easily swap them in, swap them out with their own implementation,

and they don't feel that, you know, it's too hard to move away from the set of options that we defined. And then that was, again, for many reasons. As you mentioned before, because technology is coming and go, maybe there's a new one that we need to integrate ourselves in the future Or

because the goal was to appeal not just to enterprise, but also researchers, and we know they typically want to push the state of the art. They wanna go beyond what's there. So Umee is very focused on

building an open ecosystem to enable people to build open models

because the overall

bend of the ecosystem and software and now in AI is that open yields better results and allows for more experimentation and innovation in the space.

Also, Oumi itself is a business. You've taken some funding, so, obviously, that comes with some expectations. I'm wondering how you're thinking about the commercialization

aspect and some of the ways that you're also making sure that the Oumi project and core capabilities

are sustainable

and are able to continue on in the event that the business behind it ultimately

collapses or gets acquired or whatever the future might hold? Yeah. Yeah. So there was somebody else I was talking to recently from one of the national labs that was asking me exactly about this. And I mentioned that, you know what? It's an Apache two effort. Let's say something happens to, OMI, the effort can still continue in the open. But also going to a little bit to your own question, you know, the goal was it was very intentional. We said, you know what? What type of organizations would this be? And so, you know what? The best way to have impact is if you're able to pull the right funds to make this succeed and at the same time, you are an entity that don't doesn't just appeal to academic research, but also to enterprises. Because there's one thing I learned by a leading enterprise AI service at Google Cloud for almost four years before this effort was that enterprises,

they will use an open source effort,

an open source project if, you know, that's the best alternative. But they strongly prefer to use offerings, open source offerings that are backed by some organization. Let's say that I had compared to the Linux for example. Because they know that if they get stuck, they're gonna have support. It's gonna be there to help them,

and there's gonna be somebody that will likely maintain it. Because quite often, there's so many other open source projects that are maybe started by different academic institutions. When the student graduates, you know, they are the projects are mostly stopped. They lose the core people that were moving them forward. So, anyway, that that was the intention. So, you know what? We need to able to support both the academic community and enterprises.

So that should be a sustainable,

company. And but I think, you know, these two, you know, serving the open community, by the way, we're a PBC, a public benefit corporation, are not at odds. Our goal is to

by truly embracing open source, not just open weights, with, like, a very, very pseudo open source approach, that we help promote this,

we shall promote o Open Frontier AI in a way where our costs are not even a fraction, not even 1% of the cost of OpenAI and all these other organizations, which means also we don't have that pressure to monetize so aggressively out of these technologies. Technologies. Actually, our key principle is that the core foundation and the general foundation models should be a utility,

like electricity, like water. It should be something that is freely accessible to everyone. Everyone can contribute to make it better. You know, the Nvidia's, the Metas, the all of these very already powerful entities. And then it's freely accessible to everybody else. And then if that op if it's open like this and you don't claim it's your own IP and you are licensing it, you're trying to make money out of it, then this means that everybody can sign me into the same, you know, into can contribute to the same goal. Whereas if you contrast it with OpenAI

or Anthropic,

they have to shoulder the full economic and human investment cost to advance the technology, which means then they have all the economic pressure to monetize

and enough so they can be sustainable. So that's a very so that's why it's a very different model. You know, we want to promote this model where it's easier for everybody to chime into into the same, effort. And then we all soldered the cost. We all put in the same human contributions, and then everybody benefits. And then for us, the things we plan to monetize is just the enterprise features that enterprises need. If it's something that the open community needs to promote this as a better technology for everyone, our thesis that this should be unconstrained,

no paywalls, nothing, and we help and support them. Because it helps us also have a better enterprise offering at the end of the day. Absolutely.

Another challenge in the overall space of building these models, evaluating them is that there is a lot that we still don't know about them. Obviously, having an open ecosystem

to have broader collaboration

to bringing that forward and evolving the state of the art is useful to address some of those challenges. But for organizations

who are contemplating

building their own model, It also brings with it those risks of, well, what if the model says something that is embarrassing to my organization

or gives completely wrong information or in some way harms my reputation or my business or my operations?

And I'm curious how you are thinking about

that aspect of model development, model of model evaluation

in the context of what you're building at Oumi and the community elements of it as well. Yeah. As you mentioned, Tobias, this is one of the arguably the biggest concern that we hear from enterprises about what's blocking them to using,

this generic technologies in production. They say, you know, we can't trust them. One, as you said, you know, they they may hallucinate or or, they may just say something offensive. They may go,

over the rails. They may just, you know, maybe they just provide customer support, but they may suddenly start generating,

an objectionable

poem. It has happened before it was on the news. Or, again, just let me just say offensive things. Or maybe, yeah and again, we don't want them to start generating,

financial and medical advice. So there's all these,

this very there's a sizable set of risks that come with foundation models because they are designed to be

generic and powerful and generic,

answer any question you throw at them. And that's why and you need that's why for, actually, you need to build these extra guardrails around them or inside them to make sure that they don't go off track. And

also, you will find a way to mitigate hallucinations because, again, their foundation model is always gonna try to say something. And as I mentioned before, these are the main ways in which we are helping enterprise. So you know what? We can help you build your model. If you have the expertise, you can use the open source platform yourself. That's great. If you need extra help, we can be there. We can help you. But at the same time, all these extra systems that you need to deploy this is in, in production reliably

to make sure they don't hallucinate

or, again, they don't go off, of the rails? That's are things we can help. Actually, by the way, we have developed. We haven't announced it yet. A hallucination detection,

solution that is by far the state of the art in the industry, also the most ergonomic one.

And, yeah, unsurprising has been raised or anything very well because it's a it's a key talent for enterprises.

As you have been building Oumi, onboarding people, building the community around it, what are some of the most interesting or innovative or unexpected ways that you've seen it applied?

I think it may be a little bit early to say because we only announced it a couple weeks ago, and there's so many people that are using it. And I would say we're just reaching out to say, okay. How exactly are we using it? How exactly are you building with this? Because we know from the questions we get from them that they must be getting already very deep, into using it. But, yeah, I was gonna say, you know, there's already

a lot of impressive

things that are being developed. There are some students that said, you know what? Yeah. I want to use it to build upon it better partial differential equation solvers as I was mentioning. There was, actually an effort,

together,

that it was led by the,

by you researchers at UIUC. We also helped them a little bit to in that to train models that to develop a better agentic model that actually was beating g two four o in the leaderboards

even

actually, it was beating g p four o in several benchmarks, and it was beating mister Alard, Gemini, Claude

across the whole leaderboard of of,

of tool use. It was the the Berkeley

function calling, I think it's called, function calling, tool use leaderboard. And, you know, anyway, there's, like, some very impressive results that, that, that they got. There's also, you know, other enterprises we have talked to and they say, you know what? As I mentioned before, that

we were able to

use

an existing open model, like five from Microsoft, and develop a better custom model than GPT four o. But,

we now want to test. We want to test all these other, new models, and it's too hard for us. And it's like, okay. Great. Actually, Oumi does exactly what we need because now we can go experiment and play with those other models. And actually, that's another

company that we're working with right now. Yeah. I think,

and actually and this actually makes me very excited that we're gonna be think, soon surprised by what we see over the community bills because once you make it easy for somebody, it's just a matter of how far can they go with their own, creativity.

And actually, there's some more things that we plan to do very soon,

related to DeepSig and capabilities that DeepSig released to make this for other people to play around and do similar things. So, anyway, yeah, I'm I'm very excited to see what people will do with it.

And one other potential downside

of opening up the floodgates as it were for more people to be able to contribute to build more models and do more experimentation

and testing is that you then expand the paradox of choice where right now, there's already a plethora of models to choose from. And if you have a given task or if you're just trying to say, I want something general purpose, there's a lot of

question about, okay. Well, which model do I use? And we have leaderboards for different use cases, but benchmarks

are perennially hard to actually get any real insight from. Obviously, I can play with the different models, but that takes time. And so I'm wondering how you're thinking about that aspect of it as well of there are thousands and thousands more models that are available. They all do great for your one use case. How do I figure out which one to build or which one to start from when I wanna build my own and just some of that aspect of choice

and understanding

the differentiation

between those models.

Yeah. As you said, there is definitely that paradox of choice where so many different options are like, okay. Where do I start? I'm not sure. And that may even,

kind of

make them,

unable to act and move forward because, like, okay. Where do they start? And they feel daunted by all the selection and the options.

But besides the fact that there's a lot of options, I think the thing that makes it even, more problematic, as I was mentioning earlier, is that it's even hard for them to assess all these different options. So if you know you have so many options or so many things you should be trying out, but it's too hard because all of them they need a different way to integrate with them, then, becomes even a harder, problem for somebody to solve. And that's why, you know, our goal was to say that, yeah, there's many open models, but you don't have to do all this

additional,

effort to test every single one of them. You can just say the names, and you can test them. We have some more thoughts in the future as you mentioned. You know, if we know a little bit more about the scenario that somebody has, we could say, you know what? Just don't try these hundred options. Right? I mean, this is your latency or this is your cost quality trade off. Just try one of these five. And it's easy. Right? You just specify the names of the models and you can test them. And And as I mentioned, you can even test them against the the closed ones. And then, yeah, hopefully,

that becomes a much clearer,

option. The other challenge I think you already alluded to this is the benchmarks. Is that because I what I described assumes that you have the benchmark available and it's just a matter of testing things out. That's definitely a challenge. There's many academic benchmarks

that have been contaminated.

Quite often, models have

trained in some fashion on these benchmarks.

And, actually, the other problem is that sometimes it also align very well with the enterprise use cases.

That's why one of the efforts that we plan to have in the future is to help enterprises

build their own benchmarks for specifically their own use case.

And in your own work of starting this venture,

building the business and technology

of Oumi,

working on bootstrapping the community around it and engaging with some of the early users? What are some of the most interesting or unexpected or challenging lessons that you've learned personally?

Well, for one, I would I would say has been very

exciting and rewarding.

Because even when I first reached out to the first academics, again, these are some of the most reputable people

worldwide, and they were so encouraging. Like, you know what? This needs to exist, but it's so hard for us to build it. You know? Please go build it, and we'll support you. I'll say I I was

hopeful that academia would be supportive,

but I was still perhaps somewhat

surprised by how kind of unanimously excited they were about this. There's only one or two academics out of the, I don't know, 20 or something that said, no. I can't I don't want to contribute. And it was because they had already some other affiliation with an enterprise and that was blocking them. But there was even some that had this affiliation said, you know what? I have this affiliation. I I I won't name the company, so the academics. But at the same time, I'm a professor.

Right? So during my time as a professor, I can still contribute. So I'm happy to do that because, you know, it's part of my my academic research. So, anyway, yeah, I was very surprised to see how,

eager people were to to embrace this. At the same time, I think as

let's see.

More on the negative or the more constructive side of things is

yeah. I would say, yeah, just the value you provide needs to be clear to people. We had quite a few people say, oh, I'm using this other library. You know, why should they use this? And, and like, for a better argument, actually, it was an academic project called Llamafactory.

I told the student, you know, give it a try and you tell me. And then they came back saying, oh, yeah. I could see that it was easier to use. With this other one, I was never able to do multi node distributed training. I just got to work right away. But, yeah.

And for people who are interested

in building with AI or building their own AIs, what are the cases where Oumi is the wrong choice?

It's hard

to imagine

Well, there might be some very narrow use cases where you may say, you know what? Actually,

yeah, for that use case, maybe

that specific closed model is the best one. For example, if you're doing coding copilots,

I think the models from Anthropic may still have a little bit of the lead there. I won't be confident for how long this is gonna be sustained. But if you're like, okay, right now, what is the best coding model? I think likely it's gonna be, one of the Anthropic ones. And, specifically, we want something that is very generic. Though, if you start going and say, you know what? No. I have my own code base.

I have my own ways of, doing software development, and I want something that is more customized

to my own way of doing things. I would say, yeah. Most likely or almost certainly, you can get an open model and and make it become a lot better even than a tropic that is generally trained to be a general a general coding copilot. But besides this very specific user, I would say, yeah, again, for most other ones, open models would be very competitive. And especially if you want to say, you know what? It's great this this black box. It's a good starting point for me to see that this is a technology makes sense for me. But now I really want to differentiate compared to my competitors. I really want to maximize the value I can get out of this technology. Then I would say it's extremely likely that you move to an open model and you can get better quality, lower latency, lower cost, more flexibility, better private security. You can deploy whatever you want. I think it's gonna be very few reasons why somebody should not be using the Linux of AI and they should stick with the UNIX

And, increasingly, they're gonna be even less and less, if any. And as you continue to build and iterate on and evolve Oumi, what are some of the things you have planned for the near to medium term or particular projects or problem areas you're excited to dig into?

I would say that for one, the platform itself still needs,

is it can still become better through our own contributions and contributions by others already doing. So the people who contribute in your datasets and, you know, all sorts of different things. So there's still a lot of things to do to make it a better platform for all of us. For example,

having better RL capabilities

with, GRPO. We just complete the integration just a couple days ago, but there's still more things that we need to do to make it better usable and more powerful,

RL platform for everybody else. And actually, that's something we're investing on very heavily. The reliability features like hallucinations,

especially those actually work really well right now, but the rest of the guardrails, these These are things that we still need to put more investment to help enterprises. Oh, yeah. And the thing actually that makes me actually very excited is we just recently started research efforts.

When we launched, we had over a hundred people that volunteered and say, hey, I want to help as a research collaborator

besides all these academics and students we're already working with. And we're starting now projects one after another

so they can help collaborate and advance the platform. So it's actually something I'm very excited with just starting this right now. And day by day, we're gonna be starting more and more projects. And, yeah, I'm very excited about all the things that, we're gonna be doing. Are there any other aspects of the work that you're doing at Ume, the overall space of open models and open source collaboration

around these foundation models and model training, model serving that we didn't discuss yet that you would like to cover before you close out the show? I think we covered a lot of ground. The only thing I would say which may be, again, not as intuitive to many people especially when they hear the some elements of the world saying, hey. We have that many GPUs is that there's a lot that an open community can do, especially when you have already these massive pretrained models that companies like Meta generously contribute to the open community. At the post training, especially, there's so many things that an open community can do, and they can come and this contribution can be combined to develop jointly

a better model and ecosystem. Again, the platform, the whole ecosystem.

That's gonna be way better than,

OpenAI

and all the other ones out there. So that's what I would say. I called the community that, you know, there's,

you're not, you know, powerless as perhaps, again, the some elements of the world may want may want to make you feel. We're only, you know, powerless if we let ourselves believe that, and that comes from somebody who was leading the efforts inside of this large organizations.

And I saw that it's

more of a matter of

how many intelligent people can you have that can contribute to such an effort,

and less about, you know, you don't need

stockpiles of GPUs for you to be able to contribute. And I think this is good news for all of us, and it's good news for enterprises, the sciences, humanity,

because it's critical that

this succeeds. Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gaps in the tooling technology or human training that's available for AI systems today. Yeah. I think this is gonna be,

in terms of the technologies, there's a lot of powerful technologies, but the main problem again we're seeing is that it was very hard to

to make them more easily

accessible or especially in an end to end way. That, you know what? I can go there. Datasets are standardized.

I can maybe for this use case, it's better to use this inference engine. For this one, it's better than other one. Or maybe I need to test both of them. Because maybe for this specific model, this works better. For the other one, maybe something else works better. But putting all these things together so it's easy for somebody to experiment, so it doesn't become daunting and a huge leaf for every single researcher or for every single enterprise,

I would argue that was the one of the main things that we're missing.

The the the only very point solutions or very specific gaps were,

definitely the ones around reliability before for enterprise where these areas we were trying to to help. But, yeah, I would say this is the main thing. It was clear that the

enterprises community

are like they were missing

a way to go through the whole workflow. Because when you build foundation models, when you do these research, it's not just training. It's not just data. You have to work through the whole work, and it's important you can navigate the whole workflow easily.

Alright. Well, thank you very much for taking the time today to join me and share the work that you and your team are doing on Oumi. It's definitely a very interesting and exciting project. I'm very happy to see it out there in the open and in the ecosystem and excited to see where it takes all of us. I appreciate the time and energy that you're all putting into that, and I hope you enjoy the rest of your day. The pleasure was all mine. Thank you very much, Tobias.

Thank you for listening, and don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest in modern data management,

and podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used. You can visit the site at the machine learning podcast dot com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hoststhemachinelearningpodcast

dot com with your story. To help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

AI Engineering Podcast