Summary
In this episode of the AI Engineering Podcast Adil Hafiz talks about the Arch project, a gateway designed to simplify the integration of AI agents into business systems. He discusses how the gateway uses Rust and Envoy to provide a unified interface for handling prompts and integrating large language models (LLMs), allowing developers to focus on core business logic rather than AI complexities. The conversation also touches on the target audience, challenges, and future directions for the project, including plans to develop a leading planning LLM and enhance agent interoperability.
Announcements
Parting Question
The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
In this episode of the AI Engineering Podcast Adil Hafiz talks about the Arch project, a gateway designed to simplify the integration of AI agents into business systems. He discusses how the gateway uses Rust and Envoy to provide a unified interface for handling prompts and integrating large language models (LLMs), allowing developers to focus on core business logic rather than AI complexities. The conversation also touches on the target audience, challenges, and future directions for the project, including plans to develop a leading planning LLM and enhance agent interoperability.
Announcements
- Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems
- Your host is Tobias Macey and today I'm interviewing Adil Hafeez about the Arch project, a gateway for your AI agents
- Introduction
- How did you get involved in machine learning?
- Can you describe what Arch is and the story behind it?
- How do you think about the target audience for Arch and the types of problems/projects that they are responsible for?
- The general category of LLM gateways is largely oriented toward abstracting the specific model provider being called. What are the areas of overlap and differentiation in Arch?
- Many of the features in Arch are also available in AI frameworks (e.g. LangChain, LlamaIndex, etc.), such as request routing, guardrails, and tool calling. How do you think about the architectural tradeoffs of having that functionality in a gateway service?
- What is the workflow for someone building an application with Arch?
- Can you describe the architecture and components of the Arch gateway?
- With the pace of change in the AI/LLM ecosystem, how have you designed the Arch project to allow for rapid evolution and extensibility?
- What are the most interesting, innovative, or unexpected ways that you have seen Arch used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Arch?
- When is Arch the wrong choice?
- What do you have planned for the future of Arch?
Parting Question
- From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?
- Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers.
The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
[00:00:05]
Tobias Macey:
Hello, and welcome to the AI Engineering podcast, your guide to the fast moving world of building scalable and maintainable AI systems. Your host is Tobias Macy, and today I'm interviewing Adil Hafiz about the Arch project, a gateway for your AI agents. So, Adil, can you start by introducing yourself?
[00:00:29] Adil Hafeez:
Yeah. Of course. Hi. My name is Adil Hafiz, and I have been working with software for as long as I can remember. I got a hold of, computers when I was really early, early age and fell in love with computers ever since. So in 02/2006, I moved to US to work for Microsoft. At Microsoft, I've worked for Bing Relevance team. And there then after a few years at Microsoft, I I went to work with Amazon, s three, then a few years at Dropbox and Lyft, and finally joined hand with Salman to bootstrap this startup that I'm working currently, which is Adremo, which is developing our gateway. Do you remember how you first got started working in the ML and AI space? Yes. During my time at Bing is the first time I got, introduced to ML. That's a long time ago, like, in '20 '2 thousand '8 or 02/2010, around that time. I was working, for Bing Relevance. That's, that's where I learned the importance of label data. We had a whole, team of human labelers, and all they did the whole day the entire day was to they were shown query results. All they they would do is label the results good, bad, or excellent. And we would use that data to train our model, which was based on gradient boosted entry to improve the ranking. We also fed in query logs to also representation of, human interest, which also represented humans likeness of the human liking preferences. And during my time at Bing, I also trained a ranker and classifier for Bing Shopping to show you the related products. That was the my, first introduction to hardcore, AIML and machine learning at that time. Of course, at that time, algorithms were didn't exist, so it was all about classifier and rankers.
[00:02:11] Tobias Macey:
And so bringing us now to what you're building at Catanema with the Arch gateway, I'm wondering if you can just start by describing a bit about what it is and some of the story behind how it got started and the problems you're trying to solve with it. Yeah. %.
[00:02:26] Adil Hafeez:
So Arch Arch Gateway is an open source, agentic, edge and SLM proxy designed for prompts. So we talked to, like, many developers, like hundreds of developers, and one consistent theme that emerged in our conversation was that they wanted to build, quickly they they wanted to build apps quickly tailored to their business systems and APIs to support, knowledge based applications and agentic, and solve agentic tasks. The another thing was they wanted to focus on core business logic and not left on their own to, you know, build features like, guardrails, routing, observability, features. Those those are those are important things, but not core to the business logic. So Arch integrates several of these, related capabilities in handling and processing prompts so the developers can focus on high level objects objectives and extra time to market. This means developers get cycles back to build undifferentiated prompt engineering work and detect intent and extract data from the user queries to build high quality agentic task or to, you know, build and maintain card rails and to get unified interface to other lens Calls to, improve, you know, and observability and resiliency.
So all that is built and packed in our gateway.
[00:03:43] Tobias Macey:
And in terms of the target audience, you mentioned that it's largely engineers. They're focused on trying to be paying attention to the business problems more than all of the wiring and scaffolding around that. But I'm wondering if there are any particular personas or categories of industry or just types of engineers that you're seeing who are most attracted to the capabilities of Archer, who are most invested in the ways that it thinks about the overall architecture of this of the problem? Yeah. That's a very good question. Right now, today,
[00:04:16] Adil Hafeez:
there are many frameworks and many APIs out there that people can use to onboard and integrate LLMs and AIML in their, applications. So our gateway, is designed for developers who may have zero knowledge of AI, but they wanted to build but the the zero or some knowledge of AI, but they wanted to quickly build LLM supports, elements to support knowledge and agentic scenarios. So actually, during my time at Lyft, I'll give you a backstory. During my time at Lyft, I worked at on worked worked to deploy Envoy to manage our service deployments. The reason it quickly became de facto standard for Cloud native application is because developers could use it to handle several complex problems like threat limiting, traffic management, tree trials without having to worry about actually implementing those scenarios.
This meant developers could could move faster and has become more resilient and easy to maintain and business profited. One quick example there is that we when we were scaling services at Lyft, we we would see services getting browned out or services getting five x x due to the the load. So it was hard to pinpoint which in endpoint or which service or which or why why why they're, you know, failing. So one quick thing was, you know, why don't you go to the machine and tell their logs to see what's going on there? But we were meeting logs to, you know, s three and, other services. So it was hard to pull logs down to the machine to query those logs instantly. But with with with Envoy, what we were able to do is that we were able to see service to service call details, like, what services are grounding out with services. So that give us a very good overview of the entire network without having us to write additional piece of code to do that. Onway, we gave a lot of those auxiliary features, to our services. Similarly, Arch is engineered with purpose built other than to handle critical but undifferentiated tasks related to handling and processing of prompt. This includes detecting and rejecting jailbreak attempts, testing task routing to improve agent, performance, mapping simple user requests directly to the back end APIs to improve responsiveness, and, you know, and entire, all of that doing all of that, managing access and observability of other lens in a centralized way. So so the so the back end API exposure is one of the key thing we do at using arch gateway is we expose our APIs through other lens. So it's easier for the users and developers to converse or vertical orchestration interface to your APIs. As you mentioned,
[00:06:42] Tobias Macey:
there are a number of different components that are being built out in the LLM and AI and agent based ecosystem. One of the first ones that comes to mind when I look at Arch and look at some of the positioning around it is the category of LLM gateways, which are largely built as a means of proxying to many different models, but giving a single interface to the calling application. And I'm wondering what are some of the ways that you think about the areas of overlap and differentiation between the broad category of LLM gateways and even maybe some specific ones that you want to call out in comparison to how you think about the capabilities of Arch and its,
[00:07:23] Adil Hafeez:
the areas where it's applied. Yeah. Yeah. I think I think there are many benefits on offering unified access layer to different providers, But we don't think of it as a means of abstract away LLMs. In fact, task routing, the work we are working at the moment is especially so developers can leverage trends of any LLM to maximize task performance by lowering the latency. We are developing, we are continuously improving our models. I don't know if you know they're open source, the models we have put on Hugging Face and we can't wait to release those models which are going to be hopefully very soon, maybe probably next month, which will be more better at task performance, parameter extraction, summarization, and, Internet recognition. Those models are gonna we already have some models which are out already out there in the interface, but more more models will, come, next month.
[00:08:14] Tobias Macey:
The other broad category of tooling that I see Arch Gateway is having some measure of overlap with is the space of these agentic frameworks like, Langchain, LAMA index, Haystack, etcetera, where they have a lot of the capabilities around things like task routing, being able to incorporate guardrails, tool calling, which are some of the things that Arch Gateway has in its feature set. And I'm wondering how you about the use cases for Arch Gateway in that context and how the presence of those features in Arch Gateway maybe shifts the way that teams think about the overall architecture of their agentic applications.
[00:08:58] Adil Hafeez:
So there there are there are few things I wanna highlight here. First thing, you can try and DIY this whole thing yourself. Like, you can prompt the LLMs and you can bring in all the SDKs that are out there to showcase to to deliver the features that we we already are offering. And you can push that or you can push that responsibility to a software component exclusively designed for that scenario. Second, all the DIY effort you put in could go wrong in many ways. That way, thereby, you're wasting time and cycles in indicating and maintaining different abstractions, which may or may not work together with other dependencies that you may bring in. So, these these are all open source projects and they lot of time, they work, but sometimes they don't work. So you will end up wasting some time, debating and fixing issues that, you know, version incompatibilities or, you know, other access issues. Lastly, the the the suppression of concern and the need side effect. You can invite other members of your team. If you use our framework, you can have other members of your team. They can come and collaborate along with you to, you know, build build like platform engineers.
One more thing I wanna say is that Arch is the only agentic proxy designed for prompts today. It knows what to do with prompts in a real actionable way. For example, we're working with a company called Red Hat, right now. They have a product called ACM, which has many APIs out there, less based APIs. And their their problem is how how do the how do we expose these APIs to the operators? And what we don't have there is that we're gonna provide a conversational interface, to the ACM APIs. So the operators, without having to worry about, with the UI, trying to find where what options to click, what options to, you know, click and get the details, they can simply converse, with this, with the ACM now. They can say, you know, what can you do? Can you show me the detail of that machine? Can you restart that machine? Or, you know, things like that.
[00:10:58] Tobias Macey:
You mentioned the prompt understanding, the guardrail capabilities. I'm wondering if you can talk to some of the ways that, architecturally, you've designed the Arch Gateway to be able to manage that type of processing. I know that it's a layer seven gateway, but some of the ways that you think about the system design of Arch Gateway, the components that are necessary for it to be able to operate, and in particular, the ways that you have designed it for being able to evolve and adapt to the constantly shifting ecosystem that it's built within?
[00:11:34] Adil Hafeez:
So, Salman, my other, cofounder, he and I have spent many years in infrastructure, companies. He spent many years at Amazon, s three EC two. I spent a lot of years Amazon and Lyft. Our core principle when you were developing this gateway was to, you know, develop it such such a way that it brings ease of use to the AI developers, and it's scalable and it's maintainable. So with that, we have two major components. The primary component of our gateways as a Rust component, certain in Rust, designed to handle forward and process prompts from the elements. We use for, certain language certain language tasks. And other part is the language model. So first part is the Rust component that manages the prompts. Other part is the language model. The language model is something we have fine tuned in house. We have ML scientists that we who are training this this model. So the model is based off from, Quench 2.5, and we have, open source. It fits for one point five billion one point five billion model parameter, 10,000,000,000 model on Hugging Face. You can you can go ahead there and try it yourself. And it's, it's trending number one on function calling calling task right now. It's quite powerful and has also ranked top five in BFCL, which is Berkeley function calling, leaderboard. The core capabilities of this model are, function calling, which is the task resulting, parameter extraction, and also engages in lightweight mode dialogue dialogue. And then the it also can do a summarization of, you know, the results. So another very important part of the architecture is the the gateway itself. As I mentioned earlier, in our discussion, I spent quite quite some time working on Envoy at Lyft and helped manage it and deploy it as a distributed mesh proxy at Lyft to handle service to service calls and observability. The gateway is built on top of Envoy. We have extended Envoy to add support for LLMs as a first class support for LLMs, making it helps it helps us making our phone calls to arrest endpoints. So if we are making calls to Envoy, we get, you know, tracing and observability and, you know, rate limiting, logging, all those amazing things that Onway already has, we are getting all those features in the gateway. And then there there are more things like, intelligent retries, circuit breaking, and access logs. All those things are, getting offered through the, to the benefit of one word to the arch gateway user. So, yeah. That's that's about it. But I've I've talked a lot, but I don't know if you have any questions there. You can go more deeper. Specifically, in terms of teams who are trying to address a specific problem where they identified
[00:14:09] Tobias Macey:
agentic systems and LLMs as a component to that solution. I'm wondering if you can talk to some of the workflow aspects of incorporating arch gateway into that design process and then maybe where it sits in the overall application stack where a lot of LLM gateways sit between the code that the team writes and the actual calls to the LLM. Whereas looking at some of the sequence diagrams for Arch, it looks like it actually would more logically sit in front of the code that the team writes and just some of the ways that that overall design process looks like and some of the engineering effort that's involved in using Arch to build some custom application with an agentic basis.
[00:14:51] Adil Hafeez:
Yeah. So Arch Arch gateway sits in between developer APIs, the developer, and then the, backend APIs and Gauss interface to the upstream SLMs. So we sit in the middle. In the simplest form, you can use our gateway to provide observability to your other calls. That's very simplest use case where you are, let's say, talking to OpenAI or Mistral or any other SLM. Instead of going to SLM or instead of going directly to them, you can put RSCAD in the middle and you don't need to change any code at all. We just transparently, you know, send requests to upstream SLMs. And by doing so, you will give a lot of benefit of, you know, observability, tracing, access logs, rate limiting, token based rate limiting, all those things. But that's just a very simplest use case. The more complex use cases which, AI developers really want is the task based routing where you have a bunch of APIs that best based APIs in your system and you want a way of provide a conversation interface to your APIs. So you you can tell our gateway about your, REST APIs, like description of those APIs. And then we will manage when to invoke those APIs based on your based on the the user's conversation.
[00:16:00] Tobias Macey:
And so generally speaking, it sounds like Arch Gateway is a system that you would use maybe in place of doing custom development with a lang chain or a llama index where rather than saying I'm going to write some Python code and tell the agent about these APIs over here, and this is the way that I wanna manage it and doing that all in Python, you would instead say, here's Arch Gateway. This is the config file that tells it where the endpoints are for these APIs, and maybe this is the model that I wanna use for any LLM calls, and then Archgateway will handle all of that sort of agent based routing. And so you wouldn't use the blank chain or the LOM index at all in that case, or it would at least remove a lot of the need for those frameworks
[00:16:44] Adil Hafeez:
for at least a general case of an AI agent for being able to do the specifically tool calling. Yeah. Yeah. So our goal is to provide the the framework to AI developers and non AI developers so they can get started with developing AI agents without having to without having to know about, you know, all the details about how to make calls, how to make guardrails, how to do access login, or how to do, you know, rate limiting. So we provide, the basic toolset the developers without having to worry about integration
[00:17:14] Tobias Macey:
of SDKs and, you know, other libraries. So as I mentioned earlier, to onboard on arch gateway, we don't we don't expect you to know deep about AML. We hope to understand your own APIs, and then we would help you, you know, provide a conversation interface to your APIs using our gateway. So it it seems like in a lot of cases, at least for teams who maybe have a web application, they understand how to build APIs, they understand how to build CRUD systems, but maybe they don't have any in house AI expertise, but they want to be able to provide some sort of chat based interface to their end consumers. They would bring in Arch Gateway, and that would do all the heavy lifting on their behalf. But for a case where maybe they do have some AI expertise in house, that team could focus on maybe more complex use cases and still allow the web engineering team to use Arch Gateway for their use cases and maybe the AI focus team can work on doing model fine tuning or more complex ML use cases?
[00:18:14] Adil Hafeez:
Yeah. With our architecture, we allow, developers and engineers, AI scientists to experiment with different models. So so we have separation of concerns implementing our services where we have a gateway service, we have a model service, and we have, a gateway service, and we have a model service. So both are separate. The model service host, our, a model which is a functionality model. It also hosts our guardrails guardrails model, which, you know, detects if there is jailbreaking intent or not and stuff like that. So for for the common developer, they don't need to mess up with any of those details. But for a engineer or a scientist, they can start tweaking the parameters, to fine tune according to the, you know, the requirements of applications. And they can also tweak a what upstream other than they wanna use for to summarize the task. So for example, you could build a currency converter. That's what we have on our website. You can build a build a currency curve converter API. Now there are many APIs out there. You can, integrate that through conversation interface by thinking, you know, can you convert hundred response to US dollar? And we give you a JSON response. But now how do you convert those JSON response to human readable format? So you can use upstream LLM like Oven or Mistral or even over model to say, hey, you know, can you summarize this response that came from this API into a human readable format or a consumer response? But then we can present the user end user a nice summary of response that came from the API. Convert JSON into human readable readable,
[00:19:41] Tobias Macey:
format. In terms of that model customization piece, it seems like Arch is very useful in that regard as well because it gives you a way to be able to slot in maybe a customized model or a fine tuned model for that tool calling in the event that you have a team that wants to be able to do something a little bit more explicit or maybe restrict the types of tools that can be called based on whatever parameters they want to set or maybe they want to load in their own custom guardrails model for being able to add some additional types of checks of we don't want, for instance, if you're using some sort of educational context, I don't want the model to be able to actually give the answer directly. I want it to only be able to engage in Socratic dialogue. And just wondering some of the ways that Arch is engineered to allow for that extensibility for the end user as well. %.
[00:20:34] Adil Hafeez:
Our, infrastructure allows, optics and developers to intercept at many different places. Like you said, there there are model you can swap out. There's a functional modeling model you could swap out. There is a upstream model. You today, you have many options to try either local or model for summarization or for Mistral or for one AI. You can use any of these models to do summarize summarization of the response from API. So we do provide key interfaces at various points of the pipeline to for developers to, you know, customize the pipeline. In your experience of building the Arch Gateway and the project and building it in the open and allowing the community to be able to experiment with it and tinker, what are some of the most interesting or innovative or unexpected ways that you've seen it applied? So we're still learning. We are reaching out to various communities like communities on Reddit and communities on Discord channels to advert to to showcase our capabilities and, you know, inviting others to partner with us and, you know, contribute with us. And do by doing so, we are learning a lot more about what could be the what could be the potential for this gateway that we're building. So there are a lot of developers. There are a lot of use cases out there. There are a lot of companies who want to onboard LLMs into their infrastructure, but they are faced with this challenge of what SDK to use, what API to use, what model to use. And this, they slow down their, experimentation quite by quite a lot. And what our aim is there is to, you know, help them, those developers, experiment and onboard faster from these LLMs to, you know, achieve their whatever goal they have in their mind. And so in your experience of building the Arch
[00:22:13] Tobias Macey:
gateway and the technology and investing in this overall space of LLMs and agentic most interesting or unexpected or challenging lessons that you've learned personally in the process? I think the the primary thing is to trust the community.
[00:22:28] Adil Hafeez:
Well, we I learn from their experiences. So so whenever we are experimenting with a new feature, we tend to, seek out our help from the community and community from Hacker News and see what they what they like or what they don't like. So that that that feedback, instant feedback has helped us a lot in shaping the roadmap or shaping the the feature set for our gateway. So I would say that you go go go and talk to the people around you, companies around you for working and, you know, the the community that Reddit and Hacker News. They're amazing resources. You know, share your ideas, get feedback, and then, incorporate those learnings into your roadmap. Open source community is amazing. And for people who are interested in building some sort of
[00:23:09] Tobias Macey:
AI based interface or application, what are the cases where Arch is the wrong choice and maybe they're better off using one of these frameworks or a different gateway or just a different architecture overall?
[00:23:22] Adil Hafeez:
Yeah. Yeah. Good question. So in in in its current form, Arch Gateway needs, a GPU to run, ILM functional model and docker system to operate. We understand that not everyone has a GPU, so we have hosted, our model in a cloud in cloud. So to help developers, you know, try out, work out quickly and faster. And also the the model that we developed is not general purpose, so you can't ask him questions about what the list of presidents of US since 1950. So it may answer those questions, but it's not designed for that task. Our model is designed for function calling for, you know, for task oriented, scenarios. That's something we focus on, which is we're not a generic, model. We're not simply competing with the all the big companies out there. We're a very fine tuned model, a small model, able to able to route tasks appropriately to the right function.
[00:24:16] Tobias Macey:
As you continue to build and invest in the Arch gateway and some of the other technologies that you're building, what are some of the things you have planned for the near to medium term or any particular industry trends that you're keeping an eye on to help direct your focus? So there are two key
[00:24:32] Adil Hafeez:
areas of investment which are very important to us. Number one is to build, world's leading planning LLM. That'll be adaptation of our function calling model today And agent to agent interpret operability standards are another one. We wanted to open up the use cases for developers to provide basic tools and knowledge is how they can drive agents that can communicate with each other. These are the two scenarios that are coming up a lot in our customer conversation, and we are excited about this work. So we actually are right now working on to adapt our functional model for planning.
[00:25:06] Tobias Macey:
Are there any other aspects of the work that you're doing on Arch Gateway, the architectural designs of the system, and the ways that you think about how to build some of these agentic applications that we didn't discuss yet that you'd like to cover before we close out the show? I think we could talk about choice of Rust and choice of Envoy. Why did we go with Envoy? I guess to that point, you did mention those key technologies in terms of incorporating that into the gateway. So what are some of the aspects of Rust as a language and Envoy as a networking technology that lend themselves well to this use case of building the layer seven prompt gateway?
[00:25:47] Adil Hafeez:
So all of it has become a de facto standard of service to service communication and cloud native, applications. And it provides wealth of information and stats and, you know, logs and tracing that if we were to rewrite all of that again, that would take tons of time for to to, you know, do the things right. So we didn't want to reinvent the wheel. So and we had to build something on top of that so we can reuse whatever Envoy has built in the open source that we can, you know, make it available to the open source, developers. So so the primary reason for Rust was the open source community. We saw that, their the the the the number of c plus plus developers, compared to Rust developers, there there's no comparison. So there's a lot more, Rust developers than c c plus plus developers.
And that was one of the reason for using the language that we can, you know, open source our filter, and then, you know, we'll have got more community, open source community that can come and contribute with us along with us. Rust as a language is an amazing as you most love language, on Stack Overflow for past many years now. And it is quite stable. It is it is supported in nice kernel that shows you, you know, importance and then and it has, amazing operators. It's pretty good language. It's definitely a very widely adopted language and it's also been gaining a lot of popularity in the Python ecosystem because of its ability to be easily built as an extension module for Python use cases.
[00:27:09] Tobias Macey:
Particularly given the AI use case, I can definitely see the benefits of using Rust particularly for being able to manage that bridging from the operational and network oriented layers that the Rust can handle performantly with the AI and ML interaction that you're doing in Python?
[00:27:26] Adil Hafeez:
The one of the key differentiator was the memory ownership. That was a very different concept that rest productionized where with Java coming from Java and Python, you don't care or worry about memory at all, but you are bitten by the GC from time to time even in the complex scenario, like when you're serving, like, a high performance server. If you're not careful about the memory spending, your p 99 or p 99.9 may suffer a lot because of the GC. So, so in fact, I was developing a high performance router at s three, and there we so it's background story. I'll just still talk about the GC. We're developing a high performance, s three, to distribute the traffic from one region to another region. So in performance test, we saw B 50 wasn't busy, but B 99 was okay, but B 99.9 was suffering a lot. We didn't know what's going on there. Upon further investigation, we found out that the GC was kicking in, and there was some call that was getting kicked in, every every few seconds or so to do the GC. So we went back and, you know, redoed few of those pieces to improve the GC. That that helped a lot. But coming back to the Rust side, we don't have GC problem anymore because as soon as the ownership of the variable, is taken away, the variable is discarded right away. It's it's, you know, taken back right away. It's like a c plus plus memory management, but without having to worry about a new and Dell for the developers. I think the compiler makes a very good effort in, you know, providing you with the right set of error codes. So it's in the the community say this in the community that the rest compiler errors are actually actionable. And they gave you, you know, further information. Hey, you know, go click on this link to see more background information about why I gave you this error. And that there are a lot more details like there's is a tool called Clippy and there's good tool called formatting.
[00:29:11] Tobias Macey:
Those are pretty nice tools that I use all the time to find, static analysis and errors and bugs in my code. Pretty cool. Well, for anybody who wants to get in touch with you and the rest of your team and follow along with the work that you're doing and the evolution of the Arch Gateway. I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gaps in the tooling technology or human training that's available for AI systems today. What we see is that they're all over the place. To build a basic application,
[00:29:40] Adil Hafeez:
you'd have to import multiple libraries and pray that they work together. So we are attempting to simplify writing AML applications by providing a simple interface, which is arch configured camo file. I think open source models like LAMA, Mistral, and others are providing amazing value to developers. But how do you onboard them is a challenge. Like, as I mentioned earlier, we're gonna SDK use and how to how to do rate limiting, how to switch between models, which model perform better, how do you do AB testing, how do you do disaster, you know, a load balancing among these models. So all these things are tricky and hard to implement. But as more and more, use cases are productionized, we would see the need for these key features become more and more, important in the near future. And I think, something like Envoy can really help, productionize,
[00:30:30] Tobias Macey:
AML use cases faster in near future. Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing on the Arch Gateway. It's definitely a very interesting project and definitely a very valuable entrant into the space of making it easier for teams to be able to incorporate AI and prompting into their overall stack without having to invest in custom application development. So I appreciate all the time and energy that you and your team are putting into that, and I hope you enjoy the rest of your day. Thank you very much. Thanks for having me. Bye now. Thank you for listening. Don't forget to check out our other shows. The Data Engineering podcast covers the latest on modern data management, and podcast.net covers the Python language, its community, and the innovative ways it is being used.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@aiengineeringpodcast.com with your story.
Hello, and welcome to the AI Engineering podcast, your guide to the fast moving world of building scalable and maintainable AI systems. Your host is Tobias Macy, and today I'm interviewing Adil Hafiz about the Arch project, a gateway for your AI agents. So, Adil, can you start by introducing yourself?
[00:00:29] Adil Hafeez:
Yeah. Of course. Hi. My name is Adil Hafiz, and I have been working with software for as long as I can remember. I got a hold of, computers when I was really early, early age and fell in love with computers ever since. So in 02/2006, I moved to US to work for Microsoft. At Microsoft, I've worked for Bing Relevance team. And there then after a few years at Microsoft, I I went to work with Amazon, s three, then a few years at Dropbox and Lyft, and finally joined hand with Salman to bootstrap this startup that I'm working currently, which is Adremo, which is developing our gateway. Do you remember how you first got started working in the ML and AI space? Yes. During my time at Bing is the first time I got, introduced to ML. That's a long time ago, like, in '20 '2 thousand '8 or 02/2010, around that time. I was working, for Bing Relevance. That's, that's where I learned the importance of label data. We had a whole, team of human labelers, and all they did the whole day the entire day was to they were shown query results. All they they would do is label the results good, bad, or excellent. And we would use that data to train our model, which was based on gradient boosted entry to improve the ranking. We also fed in query logs to also representation of, human interest, which also represented humans likeness of the human liking preferences. And during my time at Bing, I also trained a ranker and classifier for Bing Shopping to show you the related products. That was the my, first introduction to hardcore, AIML and machine learning at that time. Of course, at that time, algorithms were didn't exist, so it was all about classifier and rankers.
[00:02:11] Tobias Macey:
And so bringing us now to what you're building at Catanema with the Arch gateway, I'm wondering if you can just start by describing a bit about what it is and some of the story behind how it got started and the problems you're trying to solve with it. Yeah. %.
[00:02:26] Adil Hafeez:
So Arch Arch Gateway is an open source, agentic, edge and SLM proxy designed for prompts. So we talked to, like, many developers, like hundreds of developers, and one consistent theme that emerged in our conversation was that they wanted to build, quickly they they wanted to build apps quickly tailored to their business systems and APIs to support, knowledge based applications and agentic, and solve agentic tasks. The another thing was they wanted to focus on core business logic and not left on their own to, you know, build features like, guardrails, routing, observability, features. Those those are those are important things, but not core to the business logic. So Arch integrates several of these, related capabilities in handling and processing prompts so the developers can focus on high level objects objectives and extra time to market. This means developers get cycles back to build undifferentiated prompt engineering work and detect intent and extract data from the user queries to build high quality agentic task or to, you know, build and maintain card rails and to get unified interface to other lens Calls to, improve, you know, and observability and resiliency.
So all that is built and packed in our gateway.
[00:03:43] Tobias Macey:
And in terms of the target audience, you mentioned that it's largely engineers. They're focused on trying to be paying attention to the business problems more than all of the wiring and scaffolding around that. But I'm wondering if there are any particular personas or categories of industry or just types of engineers that you're seeing who are most attracted to the capabilities of Archer, who are most invested in the ways that it thinks about the overall architecture of this of the problem? Yeah. That's a very good question. Right now, today,
[00:04:16] Adil Hafeez:
there are many frameworks and many APIs out there that people can use to onboard and integrate LLMs and AIML in their, applications. So our gateway, is designed for developers who may have zero knowledge of AI, but they wanted to build but the the zero or some knowledge of AI, but they wanted to quickly build LLM supports, elements to support knowledge and agentic scenarios. So actually, during my time at Lyft, I'll give you a backstory. During my time at Lyft, I worked at on worked worked to deploy Envoy to manage our service deployments. The reason it quickly became de facto standard for Cloud native application is because developers could use it to handle several complex problems like threat limiting, traffic management, tree trials without having to worry about actually implementing those scenarios.
This meant developers could could move faster and has become more resilient and easy to maintain and business profited. One quick example there is that we when we were scaling services at Lyft, we we would see services getting browned out or services getting five x x due to the the load. So it was hard to pinpoint which in endpoint or which service or which or why why why they're, you know, failing. So one quick thing was, you know, why don't you go to the machine and tell their logs to see what's going on there? But we were meeting logs to, you know, s three and, other services. So it was hard to pull logs down to the machine to query those logs instantly. But with with with Envoy, what we were able to do is that we were able to see service to service call details, like, what services are grounding out with services. So that give us a very good overview of the entire network without having us to write additional piece of code to do that. Onway, we gave a lot of those auxiliary features, to our services. Similarly, Arch is engineered with purpose built other than to handle critical but undifferentiated tasks related to handling and processing of prompt. This includes detecting and rejecting jailbreak attempts, testing task routing to improve agent, performance, mapping simple user requests directly to the back end APIs to improve responsiveness, and, you know, and entire, all of that doing all of that, managing access and observability of other lens in a centralized way. So so the so the back end API exposure is one of the key thing we do at using arch gateway is we expose our APIs through other lens. So it's easier for the users and developers to converse or vertical orchestration interface to your APIs. As you mentioned,
[00:06:42] Tobias Macey:
there are a number of different components that are being built out in the LLM and AI and agent based ecosystem. One of the first ones that comes to mind when I look at Arch and look at some of the positioning around it is the category of LLM gateways, which are largely built as a means of proxying to many different models, but giving a single interface to the calling application. And I'm wondering what are some of the ways that you think about the areas of overlap and differentiation between the broad category of LLM gateways and even maybe some specific ones that you want to call out in comparison to how you think about the capabilities of Arch and its,
[00:07:23] Adil Hafeez:
the areas where it's applied. Yeah. Yeah. I think I think there are many benefits on offering unified access layer to different providers, But we don't think of it as a means of abstract away LLMs. In fact, task routing, the work we are working at the moment is especially so developers can leverage trends of any LLM to maximize task performance by lowering the latency. We are developing, we are continuously improving our models. I don't know if you know they're open source, the models we have put on Hugging Face and we can't wait to release those models which are going to be hopefully very soon, maybe probably next month, which will be more better at task performance, parameter extraction, summarization, and, Internet recognition. Those models are gonna we already have some models which are out already out there in the interface, but more more models will, come, next month.
[00:08:14] Tobias Macey:
The other broad category of tooling that I see Arch Gateway is having some measure of overlap with is the space of these agentic frameworks like, Langchain, LAMA index, Haystack, etcetera, where they have a lot of the capabilities around things like task routing, being able to incorporate guardrails, tool calling, which are some of the things that Arch Gateway has in its feature set. And I'm wondering how you about the use cases for Arch Gateway in that context and how the presence of those features in Arch Gateway maybe shifts the way that teams think about the overall architecture of their agentic applications.
[00:08:58] Adil Hafeez:
So there there are there are few things I wanna highlight here. First thing, you can try and DIY this whole thing yourself. Like, you can prompt the LLMs and you can bring in all the SDKs that are out there to showcase to to deliver the features that we we already are offering. And you can push that or you can push that responsibility to a software component exclusively designed for that scenario. Second, all the DIY effort you put in could go wrong in many ways. That way, thereby, you're wasting time and cycles in indicating and maintaining different abstractions, which may or may not work together with other dependencies that you may bring in. So, these these are all open source projects and they lot of time, they work, but sometimes they don't work. So you will end up wasting some time, debating and fixing issues that, you know, version incompatibilities or, you know, other access issues. Lastly, the the the suppression of concern and the need side effect. You can invite other members of your team. If you use our framework, you can have other members of your team. They can come and collaborate along with you to, you know, build build like platform engineers.
One more thing I wanna say is that Arch is the only agentic proxy designed for prompts today. It knows what to do with prompts in a real actionable way. For example, we're working with a company called Red Hat, right now. They have a product called ACM, which has many APIs out there, less based APIs. And their their problem is how how do the how do we expose these APIs to the operators? And what we don't have there is that we're gonna provide a conversational interface, to the ACM APIs. So the operators, without having to worry about, with the UI, trying to find where what options to click, what options to, you know, click and get the details, they can simply converse, with this, with the ACM now. They can say, you know, what can you do? Can you show me the detail of that machine? Can you restart that machine? Or, you know, things like that.
[00:10:58] Tobias Macey:
You mentioned the prompt understanding, the guardrail capabilities. I'm wondering if you can talk to some of the ways that, architecturally, you've designed the Arch Gateway to be able to manage that type of processing. I know that it's a layer seven gateway, but some of the ways that you think about the system design of Arch Gateway, the components that are necessary for it to be able to operate, and in particular, the ways that you have designed it for being able to evolve and adapt to the constantly shifting ecosystem that it's built within?
[00:11:34] Adil Hafeez:
So, Salman, my other, cofounder, he and I have spent many years in infrastructure, companies. He spent many years at Amazon, s three EC two. I spent a lot of years Amazon and Lyft. Our core principle when you were developing this gateway was to, you know, develop it such such a way that it brings ease of use to the AI developers, and it's scalable and it's maintainable. So with that, we have two major components. The primary component of our gateways as a Rust component, certain in Rust, designed to handle forward and process prompts from the elements. We use for, certain language certain language tasks. And other part is the language model. So first part is the Rust component that manages the prompts. Other part is the language model. The language model is something we have fine tuned in house. We have ML scientists that we who are training this this model. So the model is based off from, Quench 2.5, and we have, open source. It fits for one point five billion one point five billion model parameter, 10,000,000,000 model on Hugging Face. You can you can go ahead there and try it yourself. And it's, it's trending number one on function calling calling task right now. It's quite powerful and has also ranked top five in BFCL, which is Berkeley function calling, leaderboard. The core capabilities of this model are, function calling, which is the task resulting, parameter extraction, and also engages in lightweight mode dialogue dialogue. And then the it also can do a summarization of, you know, the results. So another very important part of the architecture is the the gateway itself. As I mentioned earlier, in our discussion, I spent quite quite some time working on Envoy at Lyft and helped manage it and deploy it as a distributed mesh proxy at Lyft to handle service to service calls and observability. The gateway is built on top of Envoy. We have extended Envoy to add support for LLMs as a first class support for LLMs, making it helps it helps us making our phone calls to arrest endpoints. So if we are making calls to Envoy, we get, you know, tracing and observability and, you know, rate limiting, logging, all those amazing things that Onway already has, we are getting all those features in the gateway. And then there there are more things like, intelligent retries, circuit breaking, and access logs. All those things are, getting offered through the, to the benefit of one word to the arch gateway user. So, yeah. That's that's about it. But I've I've talked a lot, but I don't know if you have any questions there. You can go more deeper. Specifically, in terms of teams who are trying to address a specific problem where they identified
[00:14:09] Tobias Macey:
agentic systems and LLMs as a component to that solution. I'm wondering if you can talk to some of the workflow aspects of incorporating arch gateway into that design process and then maybe where it sits in the overall application stack where a lot of LLM gateways sit between the code that the team writes and the actual calls to the LLM. Whereas looking at some of the sequence diagrams for Arch, it looks like it actually would more logically sit in front of the code that the team writes and just some of the ways that that overall design process looks like and some of the engineering effort that's involved in using Arch to build some custom application with an agentic basis.
[00:14:51] Adil Hafeez:
Yeah. So Arch Arch gateway sits in between developer APIs, the developer, and then the, backend APIs and Gauss interface to the upstream SLMs. So we sit in the middle. In the simplest form, you can use our gateway to provide observability to your other calls. That's very simplest use case where you are, let's say, talking to OpenAI or Mistral or any other SLM. Instead of going to SLM or instead of going directly to them, you can put RSCAD in the middle and you don't need to change any code at all. We just transparently, you know, send requests to upstream SLMs. And by doing so, you will give a lot of benefit of, you know, observability, tracing, access logs, rate limiting, token based rate limiting, all those things. But that's just a very simplest use case. The more complex use cases which, AI developers really want is the task based routing where you have a bunch of APIs that best based APIs in your system and you want a way of provide a conversation interface to your APIs. So you you can tell our gateway about your, REST APIs, like description of those APIs. And then we will manage when to invoke those APIs based on your based on the the user's conversation.
[00:16:00] Tobias Macey:
And so generally speaking, it sounds like Arch Gateway is a system that you would use maybe in place of doing custom development with a lang chain or a llama index where rather than saying I'm going to write some Python code and tell the agent about these APIs over here, and this is the way that I wanna manage it and doing that all in Python, you would instead say, here's Arch Gateway. This is the config file that tells it where the endpoints are for these APIs, and maybe this is the model that I wanna use for any LLM calls, and then Archgateway will handle all of that sort of agent based routing. And so you wouldn't use the blank chain or the LOM index at all in that case, or it would at least remove a lot of the need for those frameworks
[00:16:44] Adil Hafeez:
for at least a general case of an AI agent for being able to do the specifically tool calling. Yeah. Yeah. So our goal is to provide the the framework to AI developers and non AI developers so they can get started with developing AI agents without having to without having to know about, you know, all the details about how to make calls, how to make guardrails, how to do access login, or how to do, you know, rate limiting. So we provide, the basic toolset the developers without having to worry about integration
[00:17:14] Tobias Macey:
of SDKs and, you know, other libraries. So as I mentioned earlier, to onboard on arch gateway, we don't we don't expect you to know deep about AML. We hope to understand your own APIs, and then we would help you, you know, provide a conversation interface to your APIs using our gateway. So it it seems like in a lot of cases, at least for teams who maybe have a web application, they understand how to build APIs, they understand how to build CRUD systems, but maybe they don't have any in house AI expertise, but they want to be able to provide some sort of chat based interface to their end consumers. They would bring in Arch Gateway, and that would do all the heavy lifting on their behalf. But for a case where maybe they do have some AI expertise in house, that team could focus on maybe more complex use cases and still allow the web engineering team to use Arch Gateway for their use cases and maybe the AI focus team can work on doing model fine tuning or more complex ML use cases?
[00:18:14] Adil Hafeez:
Yeah. With our architecture, we allow, developers and engineers, AI scientists to experiment with different models. So so we have separation of concerns implementing our services where we have a gateway service, we have a model service, and we have, a gateway service, and we have a model service. So both are separate. The model service host, our, a model which is a functionality model. It also hosts our guardrails guardrails model, which, you know, detects if there is jailbreaking intent or not and stuff like that. So for for the common developer, they don't need to mess up with any of those details. But for a engineer or a scientist, they can start tweaking the parameters, to fine tune according to the, you know, the requirements of applications. And they can also tweak a what upstream other than they wanna use for to summarize the task. So for example, you could build a currency converter. That's what we have on our website. You can build a build a currency curve converter API. Now there are many APIs out there. You can, integrate that through conversation interface by thinking, you know, can you convert hundred response to US dollar? And we give you a JSON response. But now how do you convert those JSON response to human readable format? So you can use upstream LLM like Oven or Mistral or even over model to say, hey, you know, can you summarize this response that came from this API into a human readable format or a consumer response? But then we can present the user end user a nice summary of response that came from the API. Convert JSON into human readable readable,
[00:19:41] Tobias Macey:
format. In terms of that model customization piece, it seems like Arch is very useful in that regard as well because it gives you a way to be able to slot in maybe a customized model or a fine tuned model for that tool calling in the event that you have a team that wants to be able to do something a little bit more explicit or maybe restrict the types of tools that can be called based on whatever parameters they want to set or maybe they want to load in their own custom guardrails model for being able to add some additional types of checks of we don't want, for instance, if you're using some sort of educational context, I don't want the model to be able to actually give the answer directly. I want it to only be able to engage in Socratic dialogue. And just wondering some of the ways that Arch is engineered to allow for that extensibility for the end user as well. %.
[00:20:34] Adil Hafeez:
Our, infrastructure allows, optics and developers to intercept at many different places. Like you said, there there are model you can swap out. There's a functional modeling model you could swap out. There is a upstream model. You today, you have many options to try either local or model for summarization or for Mistral or for one AI. You can use any of these models to do summarize summarization of the response from API. So we do provide key interfaces at various points of the pipeline to for developers to, you know, customize the pipeline. In your experience of building the Arch Gateway and the project and building it in the open and allowing the community to be able to experiment with it and tinker, what are some of the most interesting or innovative or unexpected ways that you've seen it applied? So we're still learning. We are reaching out to various communities like communities on Reddit and communities on Discord channels to advert to to showcase our capabilities and, you know, inviting others to partner with us and, you know, contribute with us. And do by doing so, we are learning a lot more about what could be the what could be the potential for this gateway that we're building. So there are a lot of developers. There are a lot of use cases out there. There are a lot of companies who want to onboard LLMs into their infrastructure, but they are faced with this challenge of what SDK to use, what API to use, what model to use. And this, they slow down their, experimentation quite by quite a lot. And what our aim is there is to, you know, help them, those developers, experiment and onboard faster from these LLMs to, you know, achieve their whatever goal they have in their mind. And so in your experience of building the Arch
[00:22:13] Tobias Macey:
gateway and the technology and investing in this overall space of LLMs and agentic most interesting or unexpected or challenging lessons that you've learned personally in the process? I think the the primary thing is to trust the community.
[00:22:28] Adil Hafeez:
Well, we I learn from their experiences. So so whenever we are experimenting with a new feature, we tend to, seek out our help from the community and community from Hacker News and see what they what they like or what they don't like. So that that that feedback, instant feedback has helped us a lot in shaping the roadmap or shaping the the feature set for our gateway. So I would say that you go go go and talk to the people around you, companies around you for working and, you know, the the community that Reddit and Hacker News. They're amazing resources. You know, share your ideas, get feedback, and then, incorporate those learnings into your roadmap. Open source community is amazing. And for people who are interested in building some sort of
[00:23:09] Tobias Macey:
AI based interface or application, what are the cases where Arch is the wrong choice and maybe they're better off using one of these frameworks or a different gateway or just a different architecture overall?
[00:23:22] Adil Hafeez:
Yeah. Yeah. Good question. So in in in its current form, Arch Gateway needs, a GPU to run, ILM functional model and docker system to operate. We understand that not everyone has a GPU, so we have hosted, our model in a cloud in cloud. So to help developers, you know, try out, work out quickly and faster. And also the the model that we developed is not general purpose, so you can't ask him questions about what the list of presidents of US since 1950. So it may answer those questions, but it's not designed for that task. Our model is designed for function calling for, you know, for task oriented, scenarios. That's something we focus on, which is we're not a generic, model. We're not simply competing with the all the big companies out there. We're a very fine tuned model, a small model, able to able to route tasks appropriately to the right function.
[00:24:16] Tobias Macey:
As you continue to build and invest in the Arch gateway and some of the other technologies that you're building, what are some of the things you have planned for the near to medium term or any particular industry trends that you're keeping an eye on to help direct your focus? So there are two key
[00:24:32] Adil Hafeez:
areas of investment which are very important to us. Number one is to build, world's leading planning LLM. That'll be adaptation of our function calling model today And agent to agent interpret operability standards are another one. We wanted to open up the use cases for developers to provide basic tools and knowledge is how they can drive agents that can communicate with each other. These are the two scenarios that are coming up a lot in our customer conversation, and we are excited about this work. So we actually are right now working on to adapt our functional model for planning.
[00:25:06] Tobias Macey:
Are there any other aspects of the work that you're doing on Arch Gateway, the architectural designs of the system, and the ways that you think about how to build some of these agentic applications that we didn't discuss yet that you'd like to cover before we close out the show? I think we could talk about choice of Rust and choice of Envoy. Why did we go with Envoy? I guess to that point, you did mention those key technologies in terms of incorporating that into the gateway. So what are some of the aspects of Rust as a language and Envoy as a networking technology that lend themselves well to this use case of building the layer seven prompt gateway?
[00:25:47] Adil Hafeez:
So all of it has become a de facto standard of service to service communication and cloud native, applications. And it provides wealth of information and stats and, you know, logs and tracing that if we were to rewrite all of that again, that would take tons of time for to to, you know, do the things right. So we didn't want to reinvent the wheel. So and we had to build something on top of that so we can reuse whatever Envoy has built in the open source that we can, you know, make it available to the open source, developers. So so the primary reason for Rust was the open source community. We saw that, their the the the the number of c plus plus developers, compared to Rust developers, there there's no comparison. So there's a lot more, Rust developers than c c plus plus developers.
And that was one of the reason for using the language that we can, you know, open source our filter, and then, you know, we'll have got more community, open source community that can come and contribute with us along with us. Rust as a language is an amazing as you most love language, on Stack Overflow for past many years now. And it is quite stable. It is it is supported in nice kernel that shows you, you know, importance and then and it has, amazing operators. It's pretty good language. It's definitely a very widely adopted language and it's also been gaining a lot of popularity in the Python ecosystem because of its ability to be easily built as an extension module for Python use cases.
[00:27:09] Tobias Macey:
Particularly given the AI use case, I can definitely see the benefits of using Rust particularly for being able to manage that bridging from the operational and network oriented layers that the Rust can handle performantly with the AI and ML interaction that you're doing in Python?
[00:27:26] Adil Hafeez:
The one of the key differentiator was the memory ownership. That was a very different concept that rest productionized where with Java coming from Java and Python, you don't care or worry about memory at all, but you are bitten by the GC from time to time even in the complex scenario, like when you're serving, like, a high performance server. If you're not careful about the memory spending, your p 99 or p 99.9 may suffer a lot because of the GC. So, so in fact, I was developing a high performance router at s three, and there we so it's background story. I'll just still talk about the GC. We're developing a high performance, s three, to distribute the traffic from one region to another region. So in performance test, we saw B 50 wasn't busy, but B 99 was okay, but B 99.9 was suffering a lot. We didn't know what's going on there. Upon further investigation, we found out that the GC was kicking in, and there was some call that was getting kicked in, every every few seconds or so to do the GC. So we went back and, you know, redoed few of those pieces to improve the GC. That that helped a lot. But coming back to the Rust side, we don't have GC problem anymore because as soon as the ownership of the variable, is taken away, the variable is discarded right away. It's it's, you know, taken back right away. It's like a c plus plus memory management, but without having to worry about a new and Dell for the developers. I think the compiler makes a very good effort in, you know, providing you with the right set of error codes. So it's in the the community say this in the community that the rest compiler errors are actually actionable. And they gave you, you know, further information. Hey, you know, go click on this link to see more background information about why I gave you this error. And that there are a lot more details like there's is a tool called Clippy and there's good tool called formatting.
[00:29:11] Tobias Macey:
Those are pretty nice tools that I use all the time to find, static analysis and errors and bugs in my code. Pretty cool. Well, for anybody who wants to get in touch with you and the rest of your team and follow along with the work that you're doing and the evolution of the Arch Gateway. I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gaps in the tooling technology or human training that's available for AI systems today. What we see is that they're all over the place. To build a basic application,
[00:29:40] Adil Hafeez:
you'd have to import multiple libraries and pray that they work together. So we are attempting to simplify writing AML applications by providing a simple interface, which is arch configured camo file. I think open source models like LAMA, Mistral, and others are providing amazing value to developers. But how do you onboard them is a challenge. Like, as I mentioned earlier, we're gonna SDK use and how to how to do rate limiting, how to switch between models, which model perform better, how do you do AB testing, how do you do disaster, you know, a load balancing among these models. So all these things are tricky and hard to implement. But as more and more, use cases are productionized, we would see the need for these key features become more and more, important in the near future. And I think, something like Envoy can really help, productionize,
[00:30:30] Tobias Macey:
AML use cases faster in near future. Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing on the Arch Gateway. It's definitely a very interesting project and definitely a very valuable entrant into the space of making it easier for teams to be able to incorporate AI and prompting into their overall stack without having to invest in custom application development. So I appreciate all the time and energy that you and your team are putting into that, and I hope you enjoy the rest of your day. Thank you very much. Thanks for having me. Bye now. Thank you for listening. Don't forget to check out our other shows. The Data Engineering podcast covers the latest on modern data management, and podcast.net covers the Python language, its community, and the innovative ways it is being used.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@aiengineeringpodcast.com with your story.
Introduction to the AI Engineering Podcast
Meet Adil Hafiz: A Journey Through Tech Giants
First Steps in Machine Learning at Bing
The Arch Gateway: Solving Developer Challenges
Target Audience and Use Cases for Arch
Differentiation from Other LLM Gateways
DIY vs. Arch Gateway: Benefits and Challenges
Architectural Design of Arch Gateway
Incorporating Arch Gateway into Development Workflows
Customizing Models with Arch Gateway
Community Engagement and Lessons Learned
Future Plans and Industry Trends
Technical Choices: Rust and Envoy
Closing Thoughts and Contact Information