Summary
Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
Announcements
Parting Question
Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
Announcements
- Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems
- Your host is Tobias Macey and today I'm interviewing Tsavo Knott about Pieces, a personal AI toolkit to improve the efficiency of developers
- Introduction
- How did you get involved in machine learning?
- Can you describe what Pieces is and the story behind it?
- The past few months have seen an endless series of personalized AI tools launched. What are the features and focus of Pieces that might encourage someone to use it over the alternatives?
- model selections
- architecture of Pieces application
- local vs. hybrid vs. online models
- model update/delivery process
- data preparation/serving for models in context of Pieces app
- application of AI to developer workflows
- types of workflows that people are building with pieces
- What are the most interesting, innovative, or unexpected ways that you have seen Pieces used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pieces?
- When is Pieces the wrong choice?
- What do you have planned for the future of Pieces?
Parting Question
- From your perspective, what is the biggest barrier to adoption of machine learning today?
- Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers.
- Pieces
- NPU == Neural Processing Unit
- Tensor Chip
- LoRA == Low Rank Adaptation
- Generative Adversarial Networks
- Mistral
- Emacs
- Vim
- NeoVim
- Dart
- Flutter
- Typescript
- Lua
- Retrieval Augmented Generation
- ONNX
- LSTM == Long Short-Term Memory
- LLama 2
- GitHub Copilot
- Tabnine
- Podcast Episode
[00:00:05]
Tobias Macey:
Hello, and welcome to the AI Engineering podcast, your guide to the fast moving world of building scalable and maintainable AI systems. Your host is Tobias Macy, and today, I'm interviewing Tsavo Knott about pieces, a personal AI toolkit to improve the efficiency of developers. So, Tsavo, can you start by introducing yourself?
[00:00:29] Tsavo Knott:
Hey. Yeah. Thanks for having me on, Tobias. Tsavo Knott here. I'm a technical cofounder and CEO at Pieces for developers. And, yeah, I mean, as you mentioned, we're gonna talk about, you know, all things developer tooling, developer productivity, and just AI capabilities. What it means for our product, what it means for, you know, developer experiences at large, and, yeah, pretty excited to get into it. And do you remember how you first got involved in machine learning? Oh, yeah. I mean, I I was actually, like, I think 17 17 or 18, and we were, starting a language learning company. And basically, we we kind of broke it down into this idea that there were, you know, visual learners, auditorial learners, and textual learners. Right? And this company was called accent dotai. Right? We had the dotai domain, like, way back in the day. And, basically, it would learn how you would learn most efficiently. Right? If you interacted with visual content, textual content, or audio auditorial content, it would determine, like, what is the most efficient way for you to process this material. And then it would continually, like, reweight future content of lessons so that your efficiencies were going up. Right? And and so that was, like, kind of first early days application of AI, and and we were thinking about things. I was, like, that was almost a decade ago. You know, I'm 27 now. I was 17. Like, that was company number 1. And, you know, we we just we just always have been thinking about like, at least in in a lot of my, you know, experience with technology, I've always been thinking about how it conforms to the way we think. Right? And and building systems that feel natural to interact with and and just, you know, if I was teaching someone, that's how I would do it. Right? So why can't I build that into a system? But, yeah, that was good question. That was way back in the day, and and we've been doing, work around that space ever since. Now digging into the pieces project, which is what you are dedicating a lot of your time to currently, can you give a bit of an overview about what it is, some of the story behind how it came to be, and why this is where you decided to spend your time and energy?
Yeah. Absolutely. I would say, you know, fundamentally, pieces is about changing the way that developers or professional creators at large interact with small bits of workflow material throughout that work in progress journey. Right? So if you think about it like developers, they're in 3 major locations, the browser, the IDE, and the collaborative environment. Right? So they're researching problem solving. They have 50 tabs open. They're in their IDE. You got a bunch of repositories open, jumping around, writing code, doing whatever else. And then, of course, they're in that collaborative space doing cross functional team collaboration, going back and forth with team members, you know, commenting here and there in in a sync or or a synchronous regard.
And so, you know, this work in progress journey actually drums up a lot of nuanced small details. And that's actually kind of why our job, I always like to describe it as half remembering something and looking up the rest. Right? And so these these nuanced details of workflow context, they're always lost. Right? I feel like at the end of the day, I commit that PR or I send some stuff up, and I close all my tabs. And, you know, that whole work in progress journey is is really uncaptured. Right? And it also takes a lot of time to to really document it myself. I wanna move quickly. I'm doing some context switching. I'm like, hey, I'll probably figure it out later. So pieces really started out as a place to just be a home for the small things that you wanna fire and forget, notably, like code snippets or configs or links or screenshots or just kind of like small bits of material where, like, I don't need to formally document this right now, but I know I might need this later to pick up where I left off or backtrack. I'm just gonna select some stuff, save the pieces, and hopefully be able to to find it later. But once you start to fire and forget all these materials over to pieces, you need some type of enrichment. Right? What else is it related to? Can I tag this stuff automatically? Can I associate documentation links and people?
Can I relate 2 materials together? And so that's where, you know, a lot of the enrichment kind of came into places. Why do I have to label this, title it, classify what code language it is, classify, you know, who it's related to and all that stuff? That should all be done for me because I wanna move quick. Right? And then, you know, the the other component now is, like, you know, I'm also generating a lot of code, and I'm going back and forth. And so I should be able to conversationally interact with the things I have in pieces to both generate new materials, but also find, discover, and iterate on existing ones. And can we use the context of your workflow, the things that pieces is aware about, to ground those models and give you better responses? And then the final component of pieces is really that developers, you know, it's funny, they always realize they should have saved something. At the moment, they realize they didn't save it. And that's right about the time they've gone to look something up again, and they're like, oh, man. It is lost. I know, like, this framework upgrade, I did it 6 months ago or 8 months ago. There was one small nuance that's just gone. Right? And you're, like, back at it square 1, Google searches, GitHub issues, team member questions, all that stuff. And so we want pieces to solve that problem where you don't have to think to save important things. Right? Can pieces pick up on what's important, triangulate the overlap between the browser, the IDE, and that collaborative space, and capture that stuff for you? And the goal is not to capture everything, but also to to decay things that are less relevant and to promote things that are relevant.
And that's kind of like the proactive phase that we're in right now where, you know, a lot of our users will start to see, you know, this kind of idea of pieces doing a lot more proactive work and also starting to serve that back to you in what we're calling the the Copilot feed, which will roll out this quarter. So I know I threw a lot at you right there, but that's the the essence is just that work in progress journey. Can you have a little sidekick that's just capturing the small details?
[00:06:09] Tobias Macey:
There has been a lot of development and progress and churn in the overall space of personalized AI tools over the past few months to a year, particularly since the launch of Chat GPT and some of its different subsequent iterations. And I'm wondering what are some of the areas of focus of pieces and the capabilities that you're building into that that make it stand out from the alternatives and would encourage somebody to invest in integrating that into their workflow. And particularly for developers who are very particular about their workflows, they don't want to invest the time and muscle memory in learning a new thing that's going to become part of their core inner loop if there's the risk that it's going to go away in a few weeks or a few months.
[00:07:01] Tsavo Knott:
Yeah. I I I think that's a really great point. And and, you know, so two things there. 1st and foremost, you you brought up this idea of developers having to invest time. Right? And I think that's the the problem with many tools today is that there's a learning curve. Right? You have to really try to understand how to use it, how to integrate it, things like that. And so, you know, our approach with pieces, and this is really rapidly where we've been building towards, is can it be a tool between tools where you actually don't have to think to use it? Right? The system is just aware. And then when you you need something, either it's giving it to you or you can go and find it. Right? And it's already captured. And so, you know, that that learning curve where it's like, oh, I have to build a new habit, I think that AI enables us to kind of remove that from the product adoption process. Right? But then the second thing is this, you know, what makes pieces unique is is really like a couple of things. 1st and foremost, respecting the privacy of a developer or really any individual's workflow is first and foremost. Right? This also goes along with the speed of the AI tool itself.
So we invested a long time ago in kind of local on device models. Right? All the models that tag and enrich and organize and embed, all of that runs completely on device. Like, I have a MacBook Air and, you know, runs an m two chip, and we saw this. Right? We saw this with the MPUs. We saw this with the tensor chips. We saw this with the m series. We knew that client side devices should be able to run ML continuously, quickly, and respect the privacy that you would expect, you know, to date. Right? And so that also leads us to to be able to integrate small bits of ML that are that are task oriented and purpose driven throughout the workflow. And I think, like, you know, you have these large language models like ChatGPT and Gemini and stuff like that, and, of course, you can bring those into pieces.
But you also need, like, smaller models that just do the simple things. Right? Like, how about titling files or tagging stuff. Right? Or just, you know, giving you search results that are that are interesting. We've kind of heavily invested into both. Now when it comes to these large language models, you know, I think the key is context. Right? And the thing about pieces is that we sit across the browser, the IDE, and that collaborative environment to give your large language model the awareness of the entire workflow. Right now, for example, if you wanna use OpenAI, you gotta go to a browser tab. You gotta say, here's a bunch of context. Here's what's important to me, and then use that to then prompt the question you're looking for. You have to provide that context. Pieces, for example, can take an enterprise OpenAI license, take it out of that single browser tab, and put it everywhere that that developer is. So I think, like, to get the most out of these large language models, they do they do need context. They do need relevant, kind of things to look at before you prompt it for a response. So the integration kind of, like, efforts that we put forth are really about grounding large language models in the most amount of horizontal context as opposed to going very deep within, one domain.
And then, yeah, I mean, I think the the final thing is conversational copilots, I think that's a little laborious, to be completely honest. Like, we we built 1, you know, primarily because we think that it's gonna be a paradigm shift and it and it's a must have. Right? Staple stakes. You need to be able to talk to your your your materials. But also too, like, no one wants to have a conversation back and forth, right, to be completely It's, like, probably not right the first time, the second time, or even the third time. And then when it is right, you're like, okay. I definitely wanna, like, save this output somewhere. Right? So, of course, we integrated the generative and the the curative processes, like generate, curate, and iterate, but we are jumping right over that. And we're saying, like, it should really just be a stream of saved, generated, captured materials elevated to you at all times. Right? And I shouldn't have to do these searches. I shouldn't have to do these generative cycles.
I mean, you look at systems like YouTube and TikTok and Instagram and, like, all these algorithms are optimized to give you things that are interesting. Why don't we apply that to workflow materials? Right? And that's that's exactly what we're doing. You know, we're saying, like, this should just be, you know, aware of what's important to save, but then also aware of what I need and when I need it from your workflow context. And I think that's really the route we're going. And so, you know, while we're gonna see a lot more efforts in these copilots and a lot more competition from the big players and things like that, we're gonna continue to play Switzerland, continue to integrate, continue, continue to contextualize these systems, but also to just change the experience at large. So, yeah, I think we'll be around for for a good bit more time. Let's say that.
[00:11:30] Tobias Macey:
Another interesting aspect of this problem is, to your point, you're not just leading on large language models. You're also thinking about what are some of these curative models or smaller models that we can run on device and in low power or embedded contexts. And I'm wondering what your process looks like for determining which models to integrate, which models to expose, and some of the ways to think about tuning and contextualizing of those models to ensure that you're giving useful results to the end user without forcing them to go through their own iterative cycle of saying, okay. Well, I want this to be useful. Now I need to spend a bunch of time on making it that way.
[00:12:14] Tsavo Knott:
Yeah. I would say, you know, we can get a little technical here, and and I'm sure you've got some some technical listeners. But, you know, what I would say is there's a couple of right tools for the right job here. Right? And what I would say is first is your embedding space. So early days, we we started out with this kind of, like, pre shipped spherical embedding space that represented tags and titles and suggested searches and things like that. And that wasn't very helpful because when a user added a tag manually, we wanted that user added tag to now be integrated dynamically into that embedding space. So the next time that user added tag can actually be, you know, suggested and and automatically added to a related material.
Same for websites and links and things like that. So just, like, picking the right embedding space is super important. We want the one that expands over time as opposed to stay static. And the second thing is there are models for classifying, like, what type of language something is. Right? I think we support, like, 50 or so language classifications. That was that was, like, a pretty primitive type of model. I think it was based on the t five from Salesforce that came out a while ago. And then we just fine tune that, and we kind of dynamically inject these embedding layers, these these LoRa, layers, if you will, at runtime. And we can actually say the core of the t five is pretty good at generating a title or classifying a language or generating tags. But to make it just a little bit better, we can say, here's that extra layer of embeddings to do that task specifically.
And then when it comes to large language models, I think the the the biggest challenge right now is context window and then just, like, you know, in in memory usage. Right? And so if you use, for example, llama 2 or Mistral with pieces, it's gonna use about 4 to 5 gigs of RAM. Your tokens per second, they're pretty decent, and, you know, I will say it's it's nice because it's offline, so it's always gonna be constant. But the thing is, how do you fit into Alama 2, you know, the proper amount of tokens for context and the proper amount of tokens for prompting versus if you have a large language model like, you know, the 128 k GPT 35 turbo. Well, of course, you can just yeet all types of stuff up there. But anything over the wire, you also have to consider your payload size. Right? If I'm uploading 5 files every single time I'm asking a question, maybe that's not very efficient either.
So all the considerations, you know, just just kinda coming down to it, it's really about thinking, you know, what what are what are the lowest common denominator themes that we have to be aware of, and those are your constraints and building up from there given the the relaxation of those constraints. But but also too, like, do we need to bring a sledgehammer to a job that we can just use a, you know, a normal hammer. Right? We don't need a large language model to classify a title for something, right, or a tag for something. And I think, like, that's where we've decided to distill down, you know, large models into smaller models for certain tasks to to use classical kind of algorithmic approaches and graphical approaches and and not just throw large language models at everything. That would be the the summation of that.
[00:15:18] Tobias Macey:
The other challenge when you're dealing with machine learning in any context is that it's the whole principle of garbage in, garbage out, where if you're not feeding it useful data or you're not feeding the data to the models in a format that they can understand, then you're not gonna get any useful results from it. And given the fact that you are dealing with an uncontrolled environment where it's somebody else's laptop or desktop, you have no idea what software they're going to be running on it. You have no idea exactly how data is gonna be populated into the piece's context. What are some of the, approaches that you're taking to manage some of that data preparation, data cleaning, data serving to the models to ensure that you're providing some useful outputs to the end user.
[00:16:00] Tsavo Knott:
Yeah. And and also too, like, talking about model drift. I think a lot of people, like, you know, they don't really talk about model drift sometimes, but, like, models act up in the wild all the time. Right? And I think for the most part, our users will, you know, tell us if something's, like, way off. Right? Like, we had a model that just somewhere, somehow, just decided to start putting out Spanish. Right? I think this is, like, llama 2, and, like, overnight, this user's, like, kind of Copilot just decided it wanted to, you know, start outputting in Spanish. We're like, what is going on there? And and I think, like, at that point, you know, we're still trying to understand what causes model drift. You know, there's a little bit of, like, black box nuances to all of large language models, even OpenAI. They're like, why is, you know, chat gpt 3 getting worse over time or something? Like, it's a lot of things to understand. But what I would say is the things we can control, we do control. Right? And so, you know, we kinda use a lot of double check systems where we have a model that will kind of adversarially evaluate the output of a different model and saying, like, is this good? Is this bad? And I think the double check systems are pretty good. We're working on a model right now where, it can determine if the the Copilot conversation has gotten too far off topic or if the output is just completely unrelated.
And it's kind of like a binary, like, is this good or is this bad, you know, and and that actually you can actually start to see that pretty clearly. You know, for us humans, it's trivial to say, okay, we're off topic or this is a bad output, but training kind of that double check small model across all types of outputs, you know, both conversational ones as well as just like labels and tags and titles. That's really where we're putting in a lot of, like, quality assurance work, if you will. It's not the best. It's not the most interesting, not the most fun work to do, but at scale, like, you know, we're at, like, 10 k daily active users. I can't even imagine a 100 k, like, the amount of random support tickets coming in. But, you know, at scale, we're gonna have to do a lot of work around that.
[00:17:58] Tobias Macey:
And so digging now into the design and development of the piece's product. I'm wondering if you can talk to some of the ways that you think about how to approach the architecture, the integration paths, the types of integrations and inputs that you need to be able to support, and just some of the ways that the overall design and goals of the project have changed for when you first started thinking about it?
[00:18:22] Tsavo Knott:
Yeah. I would say, you know, we we specifically, when it comes to architecture, you know, we talk about on device first edge ML. You know, that was that was always, like, you know, table stakes for us. We knew we'd be going into large companies. We knew we'd be dealing with code that's sensitive or has a lot of IP associated with it or things like that. So we had to say, I should be able to do this entire demo, Wi Fi off. Right? And that was kind of first and foremost. So then you're saying, okay. Without the Internet, not not networking, but without the Internet itself, how do our integrations communicate? Right? And so a lot of users have always been curious about why we ship the pieces OS service in addition to, like, the desktop app or some of the others, and that's because pieces OS basically hosts a a local server on your computer, and that's, you know, again, air gapped where all the integrations talk to it. Pieces OS will host all the machine learning. It'll host your llama instance. It'll host whatever you need. It's also the kind of on device database where these other systems can be powered by it. Now the funny thing there is, of course, we're building a whole bunch of integrations, but we have a a large open source initiative going on where we're open we're opening up our SDKs. Right? We have over 200 endpoints where if you wanna build a Copilot agent on a Mistral large language model, you can absolutely do that with our endpoints. And so a lot of interesting things are coming out from the community about just integrations like Emax or Vim or, you know, some of the others where we might not be experts in, you know, those languages.
I don't I can't even remember what what Vim is written in. I think it's, like, Erlang, but not Erlang. Do you know by any chance?
[00:19:57] Tobias Macey:
So Vim, the core application, I think, is largely written in c, and then the language that you use to configure it is its own particular dialect called the VIM script. And now there's the n n Vim project or Neovim that also incorporates Lua capabilities. So it's a that's what it was. Complicated hodgepodge of things.
[00:20:18] Tsavo Knott:
Yeah. So like I said, like, I'm not an expert in that. You know, I'm I'm pretty good at Dart and and Flutter and TypeScript and things, but, you know, our community is is, you know, much better at those things. So I think by being an on device, you know, kind of server type of architecture, it's very extensible. It's very easy to, like, triple validate, like, hey, this is running offline and and, oh, by the way, you know, we can we can really scale the amount of things that pieces can communicate with. Right? So it can communicate with our Chrome extension or Versus Code extension, JetBrains, you name it, and truly be kinda local first. That said, now going from that architecture into, like, teams and enterprise architectures, that's really interesting for us. You're dealing with retrieval augmented generation across an entire team, across an entire, you know, body of of documentation or code bases or things like that.
You're dealing with a lot more kind of peer to peer real time stuff. And so those are the things we're looking at in 2024 is how do we compound the the productivity benefits, the suggestive capabilities that pieces provides when it's running on your computer, but to the rest of the team. How does that, you know, kind of look scaled up?
[00:21:25] Tobias Macey:
And in terms of the model deployment, model updates, what are the mechanisms that you have in place to manage some of that selection process, the capabilities of being able to split execution across either local context, online context, hybrid, some some of the ways that you're thinking about that overall model serving architecture given the fact that your target use case is this distributed sea of arbitrary devices?
[00:21:59] Tsavo Knott:
Yeah. It's it's crazy. I mean, like, we are doing a lot of work. I think, Brian Lambert on our team, like, he he PRs to Onyx a couple of times. Onyx run time is is basically a a pretty nice, like, uniform format to run certain model types, large language models, as well as, you know, LSTMs and and transformer models and things like that. So ONNX runtime is pretty ex pretty excellent. I think it actually is now shipping with Windows 11 natively, so it's a big Windows project. But that said, like, how you get a model to run-in a GPU or CPU, aka hardware accelerator regard, making sure it's not taking up too much RAM, making sure you're releasing the model from from memory. We do a lot of kind of dart to c FFI. And I would say even too, like, keeping a t five in memory or keeping an LSTM in memory, you know, we get a lot of lot of complaints that, like, pieces is using, you know, a good amount of memory. It's like, it it will, you know, and and hopefully, hopefully, the devices get better and the models get smaller and more efficient. But that's at the end of the day, you're putting a small brain onto your computer. Right? A brain that can arbitrarily process data.
But, yeah, for the for the model, you know, kind of pipeline itself, we train primarily, you know, in house on on a stack of MacBook M1 Maxes that we melt. I will say being in Ohio in the winter, we have no shortage of heating. That happens to be from our ML team. And, you know, I think, like, you know, even even getting the models, the small models are hashed, but the big models, like, you have to go in and download. Right? You have to say, hey, I wanna use llama 2. You go choose the model, it downloads. We don't ship any of those in the in the binary or the initial packaging because, again, it's like each user, you know, they they have different requirements. They have different systems, you know, and and they wanna opt into some things that are massive and some things that are small. So I would say we're still feeling out the bring your own model stuff. But for the small things, like, you know, those are all hash. They install and update and upgrade, you know, on their own. The large ones, that's still pretty curious.
[00:24:03] Tobias Macey:
In terms of the actual workflow integration, you mentioned some of the different specific applications that people might want to integrate into their overall experience with pieces. But I'm wondering what are some of the types of workflows that you're looking to drive with this new capability and some of the ways that some of your early users are thinking about how to integrate this assistant into their day to day work?
[00:24:29] Tsavo Knott:
Yeah. I would say as pieces stands today, the the largest kinda work flow, at least for me, is when I'm looking at open source packages. You know, I'm, like, on crates. Io. I'm on, you know, pub.dev. I'm on NPM. And I'm always, you know, I'm always looking at example code. Right? And I'm always jumping over to GitHub issues. I'm looking at installation scripts. I'm looking at configs. And I'm really just trying to figure out, like, okay. This is an interesting package. I may use this. I may not, but I just wanna save this somewhere. And so oftentimes, you know, I'll be looking at, like, oh, like an image picker or, like, you know, a notification system, and I'll see, like, oh, that's some good boilerplate.
Select save to pieces, and I know pieces captures, like, the website, you know, the other context, associated related links, all that stuff. And that was kind of what I was talking about, this concept of, like, fire and forget. There are a lot of things as a developer in that research and problem solving motion that are interesting, but not guaranteed to end up in production. Similarly, like, when I'm, you know, kind of in a large code base or I'm trying to find something like a way I did a promise tons of code. And, also, too, I don't necessarily wanna generate this. You know, tons of code. And also too, I don't necessarily wanna generate this, you know, from scratch. I gotta type out a sentence and describe it, you know, go back and forth. So I just wanna select, save to pieces, and it'll capture who's related to it, you know, what project it came from, what file it was, all of that. And so that's just like, how can I do the copy paste, the select save, you know, all of that faster, easier, and with less mental overhead?
Similarly too, like, you know, the generative process, I think, you know, our largest challenge is I always have to set up whether I'm using, you know, chat gpt or bard. I always have to set up the context for it. GitHub Copilot is is pretty good and and, you know, I would say, context for it. GitHub Copilot is is pretty good, and and, you know, I would say they're doing a lot of work on context as well. But even then, it doesn't know, like, who or when or what I was doing in the browser or who I was talking to in g chat. It's not very horizontal. And that's been a big problem for us is, you know, saying, who should I talk to about this? Right? Or what was I doing yesterday? And that's a lot of the stuff that the pieces Copilot does offer. And so, you know, we're we're an early technology partner with GitHub. In the future, you'll sign in with GitHub. We'll use some of the GitHub large language models and GitHub Copilots behind the scenes, but also, too, we're providing context that maybe GitHub doesn't have.
And I would say, like, you know, for us, these are the ways that, you know, you you kinda use it in a in a very nuanced and subtle regard across the workflow. Right? Researching, generating code, talking to team members, and these are ways that, you know, you maybe don't think are actually, like, value additive to capture, but it's actually 90% of your workflow. You know, you maybe write code 10%, but the other times, like, talking to people about what you're writing, talking to customers about how it went, even researching, you know, how to do something or how to upgrade. That's that's really what we're targeting.
[00:27:20] Tobias Macey:
Because of the fact that you are a small team, you're early in the product life cycle, What is your process for determining the integration targets and the specific applications that you want to invest the time and energy into getting working and working well due to the fact that there is this wide open set of options that you could integrate with and how to think about reducing the surface area of the problem space, figuring out what are the interfaces that if we expose or that we implement are going to give us the largest possible set of capabilities, thinking in particular in terms of things like the, language server protocol for text editors and IDEs?
[00:28:08] Tsavo Knott:
Yeah. I would say, you know, this is this is a good one. It's just like stack ranking. You know, in the beginning, we just we wanted to you know, we're we're a team of, like, 18 or so, you know, very technical team members. And in the beginning, it was like, what are the tools we're using, and can we dogfood our own product? Right? Like, JetBrains, Versus Code, you know, like, you know, g chat, you know, Teams, stuff like that. Like, what are the tools we're using, and can we build integrations and experiences around those? And now it's really starting to scale to, okay, those are pretty interesting, but what can we do to get pieces, like, in a CLI? Right? Like, wherever I am, whatever terminal I'm using, like, I would love to be able to, you know, app pieces save or app app pieces list or something like that. Just getting it in as many locations as possible. Now the problem with this is everyone wants a different integration. You know how many dev tools there are out there, and we and let's you know, it's hard to do all that. And And it also costs a lot of dollars and introduces a lot of technical debt, right, things you have to manage and maintain and and so on. So there's 2 things that we're working on regarding that. I think that, 1st and foremost, let's open up these wonderful APIs that are literally just running in pieces OS on your computer, you know, local host, whatever, and to our users. Right? Like, people are are crafty, and they can build, you know, all types of stuff. We've seen that real quick out of the gate.
And that's, like, the Python SDK, the, you know, TypeScript and the Kotlin ones and the Dart ones. Like, all of our code, all of our SDKs are generated, so that was a no brainer for us. Right? Generate them, open source them, put some content out there, and let's see what happens. The second one is a larger kind of effort, which is primarily around why do we need to build integrations. Right? How about we start to do a good bit more vision processing? And I say this, like, in in in the most, like, non creepy way possible. Right? But, like, you know, how do humans interact with their computer? Right? It's the same way that, like, Tesla built their, you know, autonomous systems. Like, how do humans drive while they look around? Right? They make decisions like that. So we started out on this vision processing journey almost a year ago now, and we are about to debut our first kind of really awesome vision processing model, which will enable you to take a YouTube video. Maybe it's a tutorial of how to code something or something like that. And we've gotten our kind of, like, code OCR model down to the point where it can run frame by frame, 30 milliseconds, 50 milliseconds, repair the code if it's broken, pull it out, stitch it together, and this will let you basically say, hey, Copilot, watch this tutorial before we have a conversation with it, and it'll pull all the code out. It'll have all that. So from there, we're saying, okay. We can understand what the URLs are. We can understand what's code, what's not code, what's related, what's not related, and we can do this in a way that's cheap enough to run completely on device, right, to run-in the background in in less than, you know, a a large usage of RAM, let's say, less than 200 megs. And I think that's what it's gonna take to have adoption of systems that see your world like you do. They need to be private. They need to be secure. They need to be performant.
And and I think, like, for us, you know, what you're looking at, we wouldn't necessarily need to build an emacs or a VIM plug in because we can understand, okay, this is what's on the screen. Take, like, you know, a a clip or something and process that, and now suggest that to you in your feed. Or now, what are you looking at in the browser? What was in the IDE? Are they related? If so, coordinate those, shadow save them, uprank them, and downrank them over time, and eventually, they'll surface to your feed. Do this in a quick closed loop regard. That's something we're experimenting with, and you're you're gonna start to see couple of companies that are doing similar things like that. But I think that, you know, for AI to to really be ubiquitous across the the workflow, across the operating system, it needs that level of of insight. Pun intended.
[00:31:58] Tobias Macey:
Another aspect of this project and the way that you think about targeting developers and engineers is thinking about what are the what what is the level of control that the end user has, and how does that fit with your overall vision for the product and the sustainability? How do you think about what are the means that you are looking to for making this, monetizable so that you can actually justify the time and energy that it requires to build this whole system, and what are some of the ways that you can encourage engineers to use the platform without having fear that they are going to either end up getting priced out of it eventually or become the product eventually?
[00:32:48] Tsavo Knott:
That's right. So first and foremost, if if people haven't gotten it by now, we are offline first everything. Right? And I think that's that's super important. So, I'll say selfishly that's that saves us a lot of server side compute costs. Right? Let's just say that, you know, we don't have these massive models processing everyone's, you know, stuff all the time. So, of course, it's efficient from that regard. But the second thing is, like, you should be able to opt out of everything. Right? I I think, like, you know, if you don't want it, you don't have to have it. Right? And I actually think this this next update has a lot of controls even around, like, tags and links and things like that of how much you wanna generate, if you want it, if you don't, you know, that type of thing. And so I think, like, opt in nature, on device nature, you know, that's that's gonna be, you know, table stakes for us. That that must be the case. But the second thing is, you know, everyone I think our goal as a venture backed company is to build something that's a true step function in experience. Right? And and systems that start to feel like an extension of yourself. Right? That start to just be aware of what's interesting to you at that moment in time and also to what you need from the past at a certain moment in time, like, what you need when you need it. Those are systems we are, you know, kind of embarked on to build, if you will. And that's that's our job as a venture backed company. If it's not a 10 x step function, then, you know, what is Right? Like, there there are other ways to to build. So we have to be extremely ambitious in that. When it comes to, like, pricing of this stuff, I I think that, you know, you look at, like, GitHub Copilot or, like, Tab 9 or some of the others. I think that, you know, around that 8 to $12 a month type of thing, maybe some add ons, maybe some extras, that's probably, you know, gonna be the pricing range. Right? We can make that efficient, you know, by doing some of these kinda on device things and not having it all centralized and processing everything. There's ways to get smart about that. But what we're really building is, like, you know, to use the Ironman analogy, it's Jarvis for your workflow. Right? It it it sees you know, it's it's aware of of what you're working on, and it's able to give you those things as, like, a heads up display, you know, real time feed. And that's kind of autocomplete beyond the line of code. That's autocomplete at the what tab is I looking at? Who should I talk to? What was I doing yesterday?
That's where we're we're really building towards. And I'll say this, we have the capabilities to do this not only for developers, but also for designers and animators and audio engineers and things like that, because pieces is an extension of the operating system. You know, it it it very deeply integrates with macOS, Linux, and Windows, and it understands not only what you're interacting with, but also that that object type, that data type in memory on the OS, and we can capture that as well. So I think that people need tools like this. I think they should be affordable. I think they should always be on, you know, and and always be helpful. I don't think you should have to use them. I think it's it's nice to be able to tap into them when when you need them, but it's nice to have that insurance policy, if you will.
From there, you know, we'll we'll see. But
[00:35:44] Tobias Macey:
I think the experience with technology, the experience with your operating system will begin to feel a lot more of an extension of your thought process, and that's what we're we're trying to do. As you have been building this product, as you have been getting it in front of people and early adopters, what are some of the most interesting or innovative or unexpected ways that you've seen pieces used?
[00:36:06] Tsavo Knott:
Yeah. It's funny. You know, we have the, like, the super organizers, which are, like, you know, they they want every single level of control, you know, over how things are organized in pieces. And, you know, our philosophy, I'd say candidly has been, you know, does the Internet, you know when you do a Google search, does it give you a list of directories back, you know, and, like, you know, topics and stuff like that? No. It's it's pretty flat. Right? So our our kind of dream has always been, like, just dump everything in there, and let's give you a flat list of results, you know, in your feed or in your search, you know, results or things like that. So you have the super organizers, but then you also have the people that just, like, literally save everything. Right? They're like, anything they look at. I'm not even kidding you. They just right click, save to pieces. We have users with over, like, 6000, 7000 snippets and and links and tags and stuff. We're like, how do we even make this thing performant for them again? Right? Because they're just, like, saving everything.
So kind of, like, threading the needle between the needs of the savers. But then also too, like, we did a bunch of work around the copilot at the end of last year, and we saw, like, a lot of people came in, and they used the Copilot to, like, generate a ton of stuff. Right? But then very quickly, we saw them convert to savers. So that kind of proved out the thesis of, like, oh, like, our value prop number 1, simply being able to save something for later, find it, reference, and use it, deeply correlates and and and, you know, complements the value prop number 2, which is to be able to generate and explain and and, you know, interact in a conversational regard with pieces.
So I think that was, more surprising than expected is they're informed value props. Right? The generate, curate, iterate, they're deeply related. And then I think, you know, the value prop number 3, we will see how users like it, but we're really, really just putting in so much work. It's, like, two and a half years in the making of the the Copilot feed, right, the workflow feed. And that is just identifying, 1st and foremost, what's important, what's it related to, and when we should serve it back to you. And those are, like, the big, big things that we're trying to do. I'll be interested to see what users think of that because it is not a feature they have to use. It's a feature, like, they can consume. Right? It's like, hey, here's this link. Open it. You know, that's a consumption event as opposed to here's pieces, type out what you want, generate some stuff, and then, you know, see if it's good. So, like, the amount of input that a user has to put in to get value out of the tool is, is about to decrease, dramatically.
So we'll see what that looks like, right, and how that informs behaviors. But users use it all types of ways, senior engineers, junior engineers, you know, intermediate, the full stack, researchers, you know, full blown, like, architects. You know, we we see all types of customers. Yeah.
[00:38:51] Tobias Macey:
And as you have been building this project, exploring the ways that people are using it, thinking through how do we apply machine learning to this problem of day to day workflow for developers? What are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:39:09] Tsavo Knott:
Oh, man. I would say getting machine learning systems to be integrated, fast, performant, and accurate, and consistent. You know? I I I think, like, those are the largest challenges, you know, to what we were talking about earlier, like model drift or just unexpected outputs. You know, I I know a lot of the industry calls them hallucinations, but, like, the statistical and probabilistic, you know, outputs from these models is just so hard to wrangle, if you will, especially in the wild. And that's gonna be even more challenging when we we make them multilingual. But, you know, I also think consumer education around that stuff. Understanding, like, hey, this is helpful, and it can be secure and private, and, you know, maybe this is something you should consider, you know, using. There's a lot of work being done around consumer education regarding AI and how it should impact your efficiency.
So those have been challenges. And then, you know, because the tool we're building is, like, in obvious at first, you know, in in the idea of, like, why you would use it, that is also challenging. Right? We're we're trying to figure out how to communicate this system to people where it's, like, just save small things or let pieces save things for you because I promise you, you're losing 90% of the things that you'll need later. And you might not realize that as a developer, but, it always happens. You know, you're always like, man, what was that thing from, like, 2 weeks ago? Right? And that adds up over time. It's like death by a 1000 cuts to your efficiency.
[00:40:39] Tobias Macey:
Yeah. And for people who are interested in having some sort of assistant for their development process, what are the cases where pieces is the wrong choice?
[00:40:55] Tsavo Knott:
Yeah. I think, you know, if you're looking for code generation or autocomplete, that is not us. You know, we are not, you know, we're not trying to be the best in the world at generating that next line of code. Right? What we would like to be the best in the world at is telling you who to talk to about this code or telling you what documentations are related or maybe even what you were doing couple of weeks ago that are related to this file. Right? All the other components of your workflow context. I think we wanna complement systems like GitHub Copilot and tab 9 and things like that, but we wanna be very horizontal as opposed to very vertical.
So I'd say, you know, it's not the right tool to to generate, like, in real time the line of code you're writing. That said, it is really excellent at generating solutions, you know, where you need a promise resolution with a callback to cancel it, right, or something like that. Or you're trying to explain some code or you're trying to, you know, understand, again, like, naturally, who is this related to or or things like that. So, you know, I would say when it's not the right tool is when you have a very, very specific problem to solve. When it is the right tool is when you just need to go through your workflow as you naturally will and save small bits of things that, you know, you know you might need later, but you're not guaranteed to need. It's in that exploratory phase. Right? That's that's kind of what I would say is, like, when you're when you're going wide across the workflow, you're in the browser, you got all the tabs, you're jumping between your repositories, you're referencing existing code, you're talking to people, that's when you should use pieces. It's excellent for connecting the dots and capturing those connections. But when you're focused on a singular dot, pieces is not the right tool.
[00:42:37] Tobias Macey:
And as you continue to build and iterate on the pieces product, What are some of the things you have planned for the near to medium term or any particular projects or problem areas you're excited to dig into?
[00:42:49] Tsavo Knott:
As I said, you know, Jarvis for developers. Right? That that auto complete at at at a level beyond the code. I think that, you know, the the funny thing is that developer workflows are about to become absolutely insane. If they weren't already, they're about to become crazy. And what I mean by that is, you know, you're gonna be writing 2 times, 3 times the volume of code. You're gonna PR way more code. You're gonna ship way more features. You're gonna work on a lot higher, you know, cross functional team environments. Right? They're gonna they're gonna be, like, way more intricate, with who you're working on, and that's because Copilot's let you be a generalist now as opposed to an expert. I can write c plus plus, Python, Rust, and, you know, whatever else, Kotlin, because I can, you know, really go and get the fine details worked out by Copilot.
But what does that mean? That means I'm also looking at all the docs for all those things. I'm also looking at all the repositories for all those projects. I'm also talking to all the people that worked on those. So your your volume of material that you're interacting with is about to, you know, probably tenfold, right, if you will. And that's why pieces is on the receiving side trying to capture, coordinate, and then serve back to you what's actually critical. And that's why I think, like, this idea of it's always on, it's it's proactive, it's understanding what what's relevant from a, you know, in moment perspective and serve that back to you when you need it, that's really what I'm excited about. I feel this as a cofounder.
I feel this as a as a technologist. You know, I'm always researching the stuff that my team members are sending to me, my, you know, head of engines sending to me, my, you know, plug in team members are sending to me. And it's really hard to, like, capture and connect all those dots, but this is about to be the case for every developer out there.
[00:44:35] Tobias Macey:
Are there any other aspects of the pieces, project, the overall space of machine learning and AI capabilities as a developer enhancement or workflow enhancement or the overall space of the ML evolution in the context that we didn't discuss yet that you'd like to cover before we close out the show?
[00:44:57] Tsavo Knott:
I would say, you know, we're we're looking at this evolution of agents, which is pretty pretty cool in the market. You know, a lot of people are talking about autonomous agents, agents that are, you know, able to to coordinate with each other and do certain tasks. I think that in the same kind of, you know, thread of thought, always on large language models that are efficient and aware and and capable, that is something that is really interesting and exciting for us that we're trying to push. Like, how do we get a 1,000,000,000 parameter model? Not like a 200,000,000,000 1, right, but like a 1,000,000,000 parameter model to be always on, very aware, and run-in less than, you know, 500 gigs of or 500 megs of RAM. Right? Like, that is something that, you know, we didn't talk too too much about, but a lot of the large language models today are prompted. And then, you know, you get a response, and they probably shut down, go to 0.
In our world, we're we're super interested in just always on systems.
[00:45:53] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest barrier to adoption for machine learning today.
[00:46:09] Tsavo Knott:
I think its ability to be integrated everywhere and and and understand the largest level of context. Right? The level that, like, we operate at. I think, you know, they're trying to increase these context window sizes. They're trying to, you know, retrieve all the relevant documents, embed everything under the sun, you know, all to just give this stuff the proper, you know, information it needs. And I think the the largest challenge to machine learning today is if we are going to generate and really, like, you know, have these massive LLMs running, I don't think we're ready for the compute requirements or, you know, the the the scale that that's gonna be, you know, compounded. Like I said, if you're generating, you know, docs and images and code and everything like that, like, 2023 and prior, it was generated by humans, and it had, like, a a velocity of output. Now you you put all these large language models behind everything and, like, the data in the world is gonna, you know, almost tenfold, if not 100 fold. It goes almost non polynomial.
And so now you're like, okay. We're building systems that can understand data that was already existing by humans. How do we build systems that's going to understand data that was generated by large language models and keep up with the pace? I think that'll be the challenge to truly globally adopting AI systems in every capacity of our our digital world. It's just gonna be so much data and very expensive to, you know, throw large language models at every single one of these problems. So that that's that's kind of my challenge. It's a compute and data problem.
[00:47:39] Tobias Macey:
Well, thank you very much for taking the time today to join me and share the work that you're doing on the pieces project. It's definitely a very interesting application of ML and AI, and I'm definitely very interested to see how it continues to evolve, particularly once you get Emax support figured out. Absolutely. Thank you. Thank you for the time and energy that you and your team are putting into that, and I hope you enjoy the rest of your day. Thanks, Tobias. Thanks for having me. Thank you for listening. Don't forget to check out our other shows. The Data Engineering podcast covers the latest on modern data management, and podcast dot on it covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@aiengineeringpodcast.com with your story.
Hello, and welcome to the AI Engineering podcast, your guide to the fast moving world of building scalable and maintainable AI systems. Your host is Tobias Macy, and today, I'm interviewing Tsavo Knott about pieces, a personal AI toolkit to improve the efficiency of developers. So, Tsavo, can you start by introducing yourself?
[00:00:29] Tsavo Knott:
Hey. Yeah. Thanks for having me on, Tobias. Tsavo Knott here. I'm a technical cofounder and CEO at Pieces for developers. And, yeah, I mean, as you mentioned, we're gonna talk about, you know, all things developer tooling, developer productivity, and just AI capabilities. What it means for our product, what it means for, you know, developer experiences at large, and, yeah, pretty excited to get into it. And do you remember how you first got involved in machine learning? Oh, yeah. I mean, I I was actually, like, I think 17 17 or 18, and we were, starting a language learning company. And basically, we we kind of broke it down into this idea that there were, you know, visual learners, auditorial learners, and textual learners. Right? And this company was called accent dotai. Right? We had the dotai domain, like, way back in the day. And, basically, it would learn how you would learn most efficiently. Right? If you interacted with visual content, textual content, or audio auditorial content, it would determine, like, what is the most efficient way for you to process this material. And then it would continually, like, reweight future content of lessons so that your efficiencies were going up. Right? And and so that was, like, kind of first early days application of AI, and and we were thinking about things. I was, like, that was almost a decade ago. You know, I'm 27 now. I was 17. Like, that was company number 1. And, you know, we we just we just always have been thinking about like, at least in in a lot of my, you know, experience with technology, I've always been thinking about how it conforms to the way we think. Right? And and building systems that feel natural to interact with and and just, you know, if I was teaching someone, that's how I would do it. Right? So why can't I build that into a system? But, yeah, that was good question. That was way back in the day, and and we've been doing, work around that space ever since. Now digging into the pieces project, which is what you are dedicating a lot of your time to currently, can you give a bit of an overview about what it is, some of the story behind how it came to be, and why this is where you decided to spend your time and energy?
Yeah. Absolutely. I would say, you know, fundamentally, pieces is about changing the way that developers or professional creators at large interact with small bits of workflow material throughout that work in progress journey. Right? So if you think about it like developers, they're in 3 major locations, the browser, the IDE, and the collaborative environment. Right? So they're researching problem solving. They have 50 tabs open. They're in their IDE. You got a bunch of repositories open, jumping around, writing code, doing whatever else. And then, of course, they're in that collaborative space doing cross functional team collaboration, going back and forth with team members, you know, commenting here and there in in a sync or or a synchronous regard.
And so, you know, this work in progress journey actually drums up a lot of nuanced small details. And that's actually kind of why our job, I always like to describe it as half remembering something and looking up the rest. Right? And so these these nuanced details of workflow context, they're always lost. Right? I feel like at the end of the day, I commit that PR or I send some stuff up, and I close all my tabs. And, you know, that whole work in progress journey is is really uncaptured. Right? And it also takes a lot of time to to really document it myself. I wanna move quickly. I'm doing some context switching. I'm like, hey, I'll probably figure it out later. So pieces really started out as a place to just be a home for the small things that you wanna fire and forget, notably, like code snippets or configs or links or screenshots or just kind of like small bits of material where, like, I don't need to formally document this right now, but I know I might need this later to pick up where I left off or backtrack. I'm just gonna select some stuff, save the pieces, and hopefully be able to to find it later. But once you start to fire and forget all these materials over to pieces, you need some type of enrichment. Right? What else is it related to? Can I tag this stuff automatically? Can I associate documentation links and people?
Can I relate 2 materials together? And so that's where, you know, a lot of the enrichment kind of came into places. Why do I have to label this, title it, classify what code language it is, classify, you know, who it's related to and all that stuff? That should all be done for me because I wanna move quick. Right? And then, you know, the the other component now is, like, you know, I'm also generating a lot of code, and I'm going back and forth. And so I should be able to conversationally interact with the things I have in pieces to both generate new materials, but also find, discover, and iterate on existing ones. And can we use the context of your workflow, the things that pieces is aware about, to ground those models and give you better responses? And then the final component of pieces is really that developers, you know, it's funny, they always realize they should have saved something. At the moment, they realize they didn't save it. And that's right about the time they've gone to look something up again, and they're like, oh, man. It is lost. I know, like, this framework upgrade, I did it 6 months ago or 8 months ago. There was one small nuance that's just gone. Right? And you're, like, back at it square 1, Google searches, GitHub issues, team member questions, all that stuff. And so we want pieces to solve that problem where you don't have to think to save important things. Right? Can pieces pick up on what's important, triangulate the overlap between the browser, the IDE, and that collaborative space, and capture that stuff for you? And the goal is not to capture everything, but also to to decay things that are less relevant and to promote things that are relevant.
And that's kind of like the proactive phase that we're in right now where, you know, a lot of our users will start to see, you know, this kind of idea of pieces doing a lot more proactive work and also starting to serve that back to you in what we're calling the the Copilot feed, which will roll out this quarter. So I know I threw a lot at you right there, but that's the the essence is just that work in progress journey. Can you have a little sidekick that's just capturing the small details?
[00:06:09] Tobias Macey:
There has been a lot of development and progress and churn in the overall space of personalized AI tools over the past few months to a year, particularly since the launch of Chat GPT and some of its different subsequent iterations. And I'm wondering what are some of the areas of focus of pieces and the capabilities that you're building into that that make it stand out from the alternatives and would encourage somebody to invest in integrating that into their workflow. And particularly for developers who are very particular about their workflows, they don't want to invest the time and muscle memory in learning a new thing that's going to become part of their core inner loop if there's the risk that it's going to go away in a few weeks or a few months.
[00:07:01] Tsavo Knott:
Yeah. I I I think that's a really great point. And and, you know, so two things there. 1st and foremost, you you brought up this idea of developers having to invest time. Right? And I think that's the the problem with many tools today is that there's a learning curve. Right? You have to really try to understand how to use it, how to integrate it, things like that. And so, you know, our approach with pieces, and this is really rapidly where we've been building towards, is can it be a tool between tools where you actually don't have to think to use it? Right? The system is just aware. And then when you you need something, either it's giving it to you or you can go and find it. Right? And it's already captured. And so, you know, that that learning curve where it's like, oh, I have to build a new habit, I think that AI enables us to kind of remove that from the product adoption process. Right? But then the second thing is this, you know, what makes pieces unique is is really like a couple of things. 1st and foremost, respecting the privacy of a developer or really any individual's workflow is first and foremost. Right? This also goes along with the speed of the AI tool itself.
So we invested a long time ago in kind of local on device models. Right? All the models that tag and enrich and organize and embed, all of that runs completely on device. Like, I have a MacBook Air and, you know, runs an m two chip, and we saw this. Right? We saw this with the MPUs. We saw this with the tensor chips. We saw this with the m series. We knew that client side devices should be able to run ML continuously, quickly, and respect the privacy that you would expect, you know, to date. Right? And so that also leads us to to be able to integrate small bits of ML that are that are task oriented and purpose driven throughout the workflow. And I think, like, you know, you have these large language models like ChatGPT and Gemini and stuff like that, and, of course, you can bring those into pieces.
But you also need, like, smaller models that just do the simple things. Right? Like, how about titling files or tagging stuff. Right? Or just, you know, giving you search results that are that are interesting. We've kind of heavily invested into both. Now when it comes to these large language models, you know, I think the key is context. Right? And the thing about pieces is that we sit across the browser, the IDE, and that collaborative environment to give your large language model the awareness of the entire workflow. Right now, for example, if you wanna use OpenAI, you gotta go to a browser tab. You gotta say, here's a bunch of context. Here's what's important to me, and then use that to then prompt the question you're looking for. You have to provide that context. Pieces, for example, can take an enterprise OpenAI license, take it out of that single browser tab, and put it everywhere that that developer is. So I think, like, to get the most out of these large language models, they do they do need context. They do need relevant, kind of things to look at before you prompt it for a response. So the integration kind of, like, efforts that we put forth are really about grounding large language models in the most amount of horizontal context as opposed to going very deep within, one domain.
And then, yeah, I mean, I think the the final thing is conversational copilots, I think that's a little laborious, to be completely honest. Like, we we built 1, you know, primarily because we think that it's gonna be a paradigm shift and it and it's a must have. Right? Staple stakes. You need to be able to talk to your your your materials. But also too, like, no one wants to have a conversation back and forth, right, to be completely It's, like, probably not right the first time, the second time, or even the third time. And then when it is right, you're like, okay. I definitely wanna, like, save this output somewhere. Right? So, of course, we integrated the generative and the the curative processes, like generate, curate, and iterate, but we are jumping right over that. And we're saying, like, it should really just be a stream of saved, generated, captured materials elevated to you at all times. Right? And I shouldn't have to do these searches. I shouldn't have to do these generative cycles.
I mean, you look at systems like YouTube and TikTok and Instagram and, like, all these algorithms are optimized to give you things that are interesting. Why don't we apply that to workflow materials? Right? And that's that's exactly what we're doing. You know, we're saying, like, this should just be, you know, aware of what's important to save, but then also aware of what I need and when I need it from your workflow context. And I think that's really the route we're going. And so, you know, while we're gonna see a lot more efforts in these copilots and a lot more competition from the big players and things like that, we're gonna continue to play Switzerland, continue to integrate, continue, continue to contextualize these systems, but also to just change the experience at large. So, yeah, I think we'll be around for for a good bit more time. Let's say that.
[00:11:30] Tobias Macey:
Another interesting aspect of this problem is, to your point, you're not just leading on large language models. You're also thinking about what are some of these curative models or smaller models that we can run on device and in low power or embedded contexts. And I'm wondering what your process looks like for determining which models to integrate, which models to expose, and some of the ways to think about tuning and contextualizing of those models to ensure that you're giving useful results to the end user without forcing them to go through their own iterative cycle of saying, okay. Well, I want this to be useful. Now I need to spend a bunch of time on making it that way.
[00:12:14] Tsavo Knott:
Yeah. I would say, you know, we can get a little technical here, and and I'm sure you've got some some technical listeners. But, you know, what I would say is there's a couple of right tools for the right job here. Right? And what I would say is first is your embedding space. So early days, we we started out with this kind of, like, pre shipped spherical embedding space that represented tags and titles and suggested searches and things like that. And that wasn't very helpful because when a user added a tag manually, we wanted that user added tag to now be integrated dynamically into that embedding space. So the next time that user added tag can actually be, you know, suggested and and automatically added to a related material.
Same for websites and links and things like that. So just, like, picking the right embedding space is super important. We want the one that expands over time as opposed to stay static. And the second thing is there are models for classifying, like, what type of language something is. Right? I think we support, like, 50 or so language classifications. That was that was, like, a pretty primitive type of model. I think it was based on the t five from Salesforce that came out a while ago. And then we just fine tune that, and we kind of dynamically inject these embedding layers, these these LoRa, layers, if you will, at runtime. And we can actually say the core of the t five is pretty good at generating a title or classifying a language or generating tags. But to make it just a little bit better, we can say, here's that extra layer of embeddings to do that task specifically.
And then when it comes to large language models, I think the the the biggest challenge right now is context window and then just, like, you know, in in memory usage. Right? And so if you use, for example, llama 2 or Mistral with pieces, it's gonna use about 4 to 5 gigs of RAM. Your tokens per second, they're pretty decent, and, you know, I will say it's it's nice because it's offline, so it's always gonna be constant. But the thing is, how do you fit into Alama 2, you know, the proper amount of tokens for context and the proper amount of tokens for prompting versus if you have a large language model like, you know, the 128 k GPT 35 turbo. Well, of course, you can just yeet all types of stuff up there. But anything over the wire, you also have to consider your payload size. Right? If I'm uploading 5 files every single time I'm asking a question, maybe that's not very efficient either.
So all the considerations, you know, just just kinda coming down to it, it's really about thinking, you know, what what are what are the lowest common denominator themes that we have to be aware of, and those are your constraints and building up from there given the the relaxation of those constraints. But but also too, like, do we need to bring a sledgehammer to a job that we can just use a, you know, a normal hammer. Right? We don't need a large language model to classify a title for something, right, or a tag for something. And I think, like, that's where we've decided to distill down, you know, large models into smaller models for certain tasks to to use classical kind of algorithmic approaches and graphical approaches and and not just throw large language models at everything. That would be the the summation of that.
[00:15:18] Tobias Macey:
The other challenge when you're dealing with machine learning in any context is that it's the whole principle of garbage in, garbage out, where if you're not feeding it useful data or you're not feeding the data to the models in a format that they can understand, then you're not gonna get any useful results from it. And given the fact that you are dealing with an uncontrolled environment where it's somebody else's laptop or desktop, you have no idea what software they're going to be running on it. You have no idea exactly how data is gonna be populated into the piece's context. What are some of the, approaches that you're taking to manage some of that data preparation, data cleaning, data serving to the models to ensure that you're providing some useful outputs to the end user.
[00:16:00] Tsavo Knott:
Yeah. And and also too, like, talking about model drift. I think a lot of people, like, you know, they don't really talk about model drift sometimes, but, like, models act up in the wild all the time. Right? And I think for the most part, our users will, you know, tell us if something's, like, way off. Right? Like, we had a model that just somewhere, somehow, just decided to start putting out Spanish. Right? I think this is, like, llama 2, and, like, overnight, this user's, like, kind of Copilot just decided it wanted to, you know, start outputting in Spanish. We're like, what is going on there? And and I think, like, at that point, you know, we're still trying to understand what causes model drift. You know, there's a little bit of, like, black box nuances to all of large language models, even OpenAI. They're like, why is, you know, chat gpt 3 getting worse over time or something? Like, it's a lot of things to understand. But what I would say is the things we can control, we do control. Right? And so, you know, we kinda use a lot of double check systems where we have a model that will kind of adversarially evaluate the output of a different model and saying, like, is this good? Is this bad? And I think the double check systems are pretty good. We're working on a model right now where, it can determine if the the Copilot conversation has gotten too far off topic or if the output is just completely unrelated.
And it's kind of like a binary, like, is this good or is this bad, you know, and and that actually you can actually start to see that pretty clearly. You know, for us humans, it's trivial to say, okay, we're off topic or this is a bad output, but training kind of that double check small model across all types of outputs, you know, both conversational ones as well as just like labels and tags and titles. That's really where we're putting in a lot of, like, quality assurance work, if you will. It's not the best. It's not the most interesting, not the most fun work to do, but at scale, like, you know, we're at, like, 10 k daily active users. I can't even imagine a 100 k, like, the amount of random support tickets coming in. But, you know, at scale, we're gonna have to do a lot of work around that.
[00:17:58] Tobias Macey:
And so digging now into the design and development of the piece's product. I'm wondering if you can talk to some of the ways that you think about how to approach the architecture, the integration paths, the types of integrations and inputs that you need to be able to support, and just some of the ways that the overall design and goals of the project have changed for when you first started thinking about it?
[00:18:22] Tsavo Knott:
Yeah. I would say, you know, we we specifically, when it comes to architecture, you know, we talk about on device first edge ML. You know, that was that was always, like, you know, table stakes for us. We knew we'd be going into large companies. We knew we'd be dealing with code that's sensitive or has a lot of IP associated with it or things like that. So we had to say, I should be able to do this entire demo, Wi Fi off. Right? And that was kind of first and foremost. So then you're saying, okay. Without the Internet, not not networking, but without the Internet itself, how do our integrations communicate? Right? And so a lot of users have always been curious about why we ship the pieces OS service in addition to, like, the desktop app or some of the others, and that's because pieces OS basically hosts a a local server on your computer, and that's, you know, again, air gapped where all the integrations talk to it. Pieces OS will host all the machine learning. It'll host your llama instance. It'll host whatever you need. It's also the kind of on device database where these other systems can be powered by it. Now the funny thing there is, of course, we're building a whole bunch of integrations, but we have a a large open source initiative going on where we're open we're opening up our SDKs. Right? We have over 200 endpoints where if you wanna build a Copilot agent on a Mistral large language model, you can absolutely do that with our endpoints. And so a lot of interesting things are coming out from the community about just integrations like Emax or Vim or, you know, some of the others where we might not be experts in, you know, those languages.
I don't I can't even remember what what Vim is written in. I think it's, like, Erlang, but not Erlang. Do you know by any chance?
[00:19:57] Tobias Macey:
So Vim, the core application, I think, is largely written in c, and then the language that you use to configure it is its own particular dialect called the VIM script. And now there's the n n Vim project or Neovim that also incorporates Lua capabilities. So it's a that's what it was. Complicated hodgepodge of things.
[00:20:18] Tsavo Knott:
Yeah. So like I said, like, I'm not an expert in that. You know, I'm I'm pretty good at Dart and and Flutter and TypeScript and things, but, you know, our community is is, you know, much better at those things. So I think by being an on device, you know, kind of server type of architecture, it's very extensible. It's very easy to, like, triple validate, like, hey, this is running offline and and, oh, by the way, you know, we can we can really scale the amount of things that pieces can communicate with. Right? So it can communicate with our Chrome extension or Versus Code extension, JetBrains, you name it, and truly be kinda local first. That said, now going from that architecture into, like, teams and enterprise architectures, that's really interesting for us. You're dealing with retrieval augmented generation across an entire team, across an entire, you know, body of of documentation or code bases or things like that.
You're dealing with a lot more kind of peer to peer real time stuff. And so those are the things we're looking at in 2024 is how do we compound the the productivity benefits, the suggestive capabilities that pieces provides when it's running on your computer, but to the rest of the team. How does that, you know, kind of look scaled up?
[00:21:25] Tobias Macey:
And in terms of the model deployment, model updates, what are the mechanisms that you have in place to manage some of that selection process, the capabilities of being able to split execution across either local context, online context, hybrid, some some of the ways that you're thinking about that overall model serving architecture given the fact that your target use case is this distributed sea of arbitrary devices?
[00:21:59] Tsavo Knott:
Yeah. It's it's crazy. I mean, like, we are doing a lot of work. I think, Brian Lambert on our team, like, he he PRs to Onyx a couple of times. Onyx run time is is basically a a pretty nice, like, uniform format to run certain model types, large language models, as well as, you know, LSTMs and and transformer models and things like that. So ONNX runtime is pretty ex pretty excellent. I think it actually is now shipping with Windows 11 natively, so it's a big Windows project. But that said, like, how you get a model to run-in a GPU or CPU, aka hardware accelerator regard, making sure it's not taking up too much RAM, making sure you're releasing the model from from memory. We do a lot of kind of dart to c FFI. And I would say even too, like, keeping a t five in memory or keeping an LSTM in memory, you know, we get a lot of lot of complaints that, like, pieces is using, you know, a good amount of memory. It's like, it it will, you know, and and hopefully, hopefully, the devices get better and the models get smaller and more efficient. But that's at the end of the day, you're putting a small brain onto your computer. Right? A brain that can arbitrarily process data.
But, yeah, for the for the model, you know, kind of pipeline itself, we train primarily, you know, in house on on a stack of MacBook M1 Maxes that we melt. I will say being in Ohio in the winter, we have no shortage of heating. That happens to be from our ML team. And, you know, I think, like, you know, even even getting the models, the small models are hashed, but the big models, like, you have to go in and download. Right? You have to say, hey, I wanna use llama 2. You go choose the model, it downloads. We don't ship any of those in the in the binary or the initial packaging because, again, it's like each user, you know, they they have different requirements. They have different systems, you know, and and they wanna opt into some things that are massive and some things that are small. So I would say we're still feeling out the bring your own model stuff. But for the small things, like, you know, those are all hash. They install and update and upgrade, you know, on their own. The large ones, that's still pretty curious.
[00:24:03] Tobias Macey:
In terms of the actual workflow integration, you mentioned some of the different specific applications that people might want to integrate into their overall experience with pieces. But I'm wondering what are some of the types of workflows that you're looking to drive with this new capability and some of the ways that some of your early users are thinking about how to integrate this assistant into their day to day work?
[00:24:29] Tsavo Knott:
Yeah. I would say as pieces stands today, the the largest kinda work flow, at least for me, is when I'm looking at open source packages. You know, I'm, like, on crates. Io. I'm on, you know, pub.dev. I'm on NPM. And I'm always, you know, I'm always looking at example code. Right? And I'm always jumping over to GitHub issues. I'm looking at installation scripts. I'm looking at configs. And I'm really just trying to figure out, like, okay. This is an interesting package. I may use this. I may not, but I just wanna save this somewhere. And so oftentimes, you know, I'll be looking at, like, oh, like an image picker or, like, you know, a notification system, and I'll see, like, oh, that's some good boilerplate.
Select save to pieces, and I know pieces captures, like, the website, you know, the other context, associated related links, all that stuff. And that was kind of what I was talking about, this concept of, like, fire and forget. There are a lot of things as a developer in that research and problem solving motion that are interesting, but not guaranteed to end up in production. Similarly, like, when I'm, you know, kind of in a large code base or I'm trying to find something like a way I did a promise tons of code. And, also, too, I don't necessarily wanna generate this. You know, tons of code. And also too, I don't necessarily wanna generate this, you know, from scratch. I gotta type out a sentence and describe it, you know, go back and forth. So I just wanna select, save to pieces, and it'll capture who's related to it, you know, what project it came from, what file it was, all of that. And so that's just like, how can I do the copy paste, the select save, you know, all of that faster, easier, and with less mental overhead?
Similarly too, like, you know, the generative process, I think, you know, our largest challenge is I always have to set up whether I'm using, you know, chat gpt or bard. I always have to set up the context for it. GitHub Copilot is is pretty good and and, you know, I would say, context for it. GitHub Copilot is is pretty good, and and, you know, I would say they're doing a lot of work on context as well. But even then, it doesn't know, like, who or when or what I was doing in the browser or who I was talking to in g chat. It's not very horizontal. And that's been a big problem for us is, you know, saying, who should I talk to about this? Right? Or what was I doing yesterday? And that's a lot of the stuff that the pieces Copilot does offer. And so, you know, we're we're an early technology partner with GitHub. In the future, you'll sign in with GitHub. We'll use some of the GitHub large language models and GitHub Copilots behind the scenes, but also, too, we're providing context that maybe GitHub doesn't have.
And I would say, like, you know, for us, these are the ways that, you know, you you kinda use it in a in a very nuanced and subtle regard across the workflow. Right? Researching, generating code, talking to team members, and these are ways that, you know, you maybe don't think are actually, like, value additive to capture, but it's actually 90% of your workflow. You know, you maybe write code 10%, but the other times, like, talking to people about what you're writing, talking to customers about how it went, even researching, you know, how to do something or how to upgrade. That's that's really what we're targeting.
[00:27:20] Tobias Macey:
Because of the fact that you are a small team, you're early in the product life cycle, What is your process for determining the integration targets and the specific applications that you want to invest the time and energy into getting working and working well due to the fact that there is this wide open set of options that you could integrate with and how to think about reducing the surface area of the problem space, figuring out what are the interfaces that if we expose or that we implement are going to give us the largest possible set of capabilities, thinking in particular in terms of things like the, language server protocol for text editors and IDEs?
[00:28:08] Tsavo Knott:
Yeah. I would say, you know, this is this is a good one. It's just like stack ranking. You know, in the beginning, we just we wanted to you know, we're we're a team of, like, 18 or so, you know, very technical team members. And in the beginning, it was like, what are the tools we're using, and can we dogfood our own product? Right? Like, JetBrains, Versus Code, you know, like, you know, g chat, you know, Teams, stuff like that. Like, what are the tools we're using, and can we build integrations and experiences around those? And now it's really starting to scale to, okay, those are pretty interesting, but what can we do to get pieces, like, in a CLI? Right? Like, wherever I am, whatever terminal I'm using, like, I would love to be able to, you know, app pieces save or app app pieces list or something like that. Just getting it in as many locations as possible. Now the problem with this is everyone wants a different integration. You know how many dev tools there are out there, and we and let's you know, it's hard to do all that. And And it also costs a lot of dollars and introduces a lot of technical debt, right, things you have to manage and maintain and and so on. So there's 2 things that we're working on regarding that. I think that, 1st and foremost, let's open up these wonderful APIs that are literally just running in pieces OS on your computer, you know, local host, whatever, and to our users. Right? Like, people are are crafty, and they can build, you know, all types of stuff. We've seen that real quick out of the gate.
And that's, like, the Python SDK, the, you know, TypeScript and the Kotlin ones and the Dart ones. Like, all of our code, all of our SDKs are generated, so that was a no brainer for us. Right? Generate them, open source them, put some content out there, and let's see what happens. The second one is a larger kind of effort, which is primarily around why do we need to build integrations. Right? How about we start to do a good bit more vision processing? And I say this, like, in in in the most, like, non creepy way possible. Right? But, like, you know, how do humans interact with their computer? Right? It's the same way that, like, Tesla built their, you know, autonomous systems. Like, how do humans drive while they look around? Right? They make decisions like that. So we started out on this vision processing journey almost a year ago now, and we are about to debut our first kind of really awesome vision processing model, which will enable you to take a YouTube video. Maybe it's a tutorial of how to code something or something like that. And we've gotten our kind of, like, code OCR model down to the point where it can run frame by frame, 30 milliseconds, 50 milliseconds, repair the code if it's broken, pull it out, stitch it together, and this will let you basically say, hey, Copilot, watch this tutorial before we have a conversation with it, and it'll pull all the code out. It'll have all that. So from there, we're saying, okay. We can understand what the URLs are. We can understand what's code, what's not code, what's related, what's not related, and we can do this in a way that's cheap enough to run completely on device, right, to run-in the background in in less than, you know, a a large usage of RAM, let's say, less than 200 megs. And I think that's what it's gonna take to have adoption of systems that see your world like you do. They need to be private. They need to be secure. They need to be performant.
And and I think, like, for us, you know, what you're looking at, we wouldn't necessarily need to build an emacs or a VIM plug in because we can understand, okay, this is what's on the screen. Take, like, you know, a a clip or something and process that, and now suggest that to you in your feed. Or now, what are you looking at in the browser? What was in the IDE? Are they related? If so, coordinate those, shadow save them, uprank them, and downrank them over time, and eventually, they'll surface to your feed. Do this in a quick closed loop regard. That's something we're experimenting with, and you're you're gonna start to see couple of companies that are doing similar things like that. But I think that, you know, for AI to to really be ubiquitous across the the workflow, across the operating system, it needs that level of of insight. Pun intended.
[00:31:58] Tobias Macey:
Another aspect of this project and the way that you think about targeting developers and engineers is thinking about what are the what what is the level of control that the end user has, and how does that fit with your overall vision for the product and the sustainability? How do you think about what are the means that you are looking to for making this, monetizable so that you can actually justify the time and energy that it requires to build this whole system, and what are some of the ways that you can encourage engineers to use the platform without having fear that they are going to either end up getting priced out of it eventually or become the product eventually?
[00:32:48] Tsavo Knott:
That's right. So first and foremost, if if people haven't gotten it by now, we are offline first everything. Right? And I think that's that's super important. So, I'll say selfishly that's that saves us a lot of server side compute costs. Right? Let's just say that, you know, we don't have these massive models processing everyone's, you know, stuff all the time. So, of course, it's efficient from that regard. But the second thing is, like, you should be able to opt out of everything. Right? I I think, like, you know, if you don't want it, you don't have to have it. Right? And I actually think this this next update has a lot of controls even around, like, tags and links and things like that of how much you wanna generate, if you want it, if you don't, you know, that type of thing. And so I think, like, opt in nature, on device nature, you know, that's that's gonna be, you know, table stakes for us. That that must be the case. But the second thing is, you know, everyone I think our goal as a venture backed company is to build something that's a true step function in experience. Right? And and systems that start to feel like an extension of yourself. Right? That start to just be aware of what's interesting to you at that moment in time and also to what you need from the past at a certain moment in time, like, what you need when you need it. Those are systems we are, you know, kind of embarked on to build, if you will. And that's that's our job as a venture backed company. If it's not a 10 x step function, then, you know, what is Right? Like, there there are other ways to to build. So we have to be extremely ambitious in that. When it comes to, like, pricing of this stuff, I I think that, you know, you look at, like, GitHub Copilot or, like, Tab 9 or some of the others. I think that, you know, around that 8 to $12 a month type of thing, maybe some add ons, maybe some extras, that's probably, you know, gonna be the pricing range. Right? We can make that efficient, you know, by doing some of these kinda on device things and not having it all centralized and processing everything. There's ways to get smart about that. But what we're really building is, like, you know, to use the Ironman analogy, it's Jarvis for your workflow. Right? It it it sees you know, it's it's aware of of what you're working on, and it's able to give you those things as, like, a heads up display, you know, real time feed. And that's kind of autocomplete beyond the line of code. That's autocomplete at the what tab is I looking at? Who should I talk to? What was I doing yesterday?
That's where we're we're really building towards. And I'll say this, we have the capabilities to do this not only for developers, but also for designers and animators and audio engineers and things like that, because pieces is an extension of the operating system. You know, it it it very deeply integrates with macOS, Linux, and Windows, and it understands not only what you're interacting with, but also that that object type, that data type in memory on the OS, and we can capture that as well. So I think that people need tools like this. I think they should be affordable. I think they should always be on, you know, and and always be helpful. I don't think you should have to use them. I think it's it's nice to be able to tap into them when when you need them, but it's nice to have that insurance policy, if you will.
From there, you know, we'll we'll see. But
[00:35:44] Tobias Macey:
I think the experience with technology, the experience with your operating system will begin to feel a lot more of an extension of your thought process, and that's what we're we're trying to do. As you have been building this product, as you have been getting it in front of people and early adopters, what are some of the most interesting or innovative or unexpected ways that you've seen pieces used?
[00:36:06] Tsavo Knott:
Yeah. It's funny. You know, we have the, like, the super organizers, which are, like, you know, they they want every single level of control, you know, over how things are organized in pieces. And, you know, our philosophy, I'd say candidly has been, you know, does the Internet, you know when you do a Google search, does it give you a list of directories back, you know, and, like, you know, topics and stuff like that? No. It's it's pretty flat. Right? So our our kind of dream has always been, like, just dump everything in there, and let's give you a flat list of results, you know, in your feed or in your search, you know, results or things like that. So you have the super organizers, but then you also have the people that just, like, literally save everything. Right? They're like, anything they look at. I'm not even kidding you. They just right click, save to pieces. We have users with over, like, 6000, 7000 snippets and and links and tags and stuff. We're like, how do we even make this thing performant for them again? Right? Because they're just, like, saving everything.
So kind of, like, threading the needle between the needs of the savers. But then also too, like, we did a bunch of work around the copilot at the end of last year, and we saw, like, a lot of people came in, and they used the Copilot to, like, generate a ton of stuff. Right? But then very quickly, we saw them convert to savers. So that kind of proved out the thesis of, like, oh, like, our value prop number 1, simply being able to save something for later, find it, reference, and use it, deeply correlates and and and, you know, complements the value prop number 2, which is to be able to generate and explain and and, you know, interact in a conversational regard with pieces.
So I think that was, more surprising than expected is they're informed value props. Right? The generate, curate, iterate, they're deeply related. And then I think, you know, the value prop number 3, we will see how users like it, but we're really, really just putting in so much work. It's, like, two and a half years in the making of the the Copilot feed, right, the workflow feed. And that is just identifying, 1st and foremost, what's important, what's it related to, and when we should serve it back to you. And those are, like, the big, big things that we're trying to do. I'll be interested to see what users think of that because it is not a feature they have to use. It's a feature, like, they can consume. Right? It's like, hey, here's this link. Open it. You know, that's a consumption event as opposed to here's pieces, type out what you want, generate some stuff, and then, you know, see if it's good. So, like, the amount of input that a user has to put in to get value out of the tool is, is about to decrease, dramatically.
So we'll see what that looks like, right, and how that informs behaviors. But users use it all types of ways, senior engineers, junior engineers, you know, intermediate, the full stack, researchers, you know, full blown, like, architects. You know, we we see all types of customers. Yeah.
[00:38:51] Tobias Macey:
And as you have been building this project, exploring the ways that people are using it, thinking through how do we apply machine learning to this problem of day to day workflow for developers? What are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:39:09] Tsavo Knott:
Oh, man. I would say getting machine learning systems to be integrated, fast, performant, and accurate, and consistent. You know? I I I think, like, those are the largest challenges, you know, to what we were talking about earlier, like model drift or just unexpected outputs. You know, I I know a lot of the industry calls them hallucinations, but, like, the statistical and probabilistic, you know, outputs from these models is just so hard to wrangle, if you will, especially in the wild. And that's gonna be even more challenging when we we make them multilingual. But, you know, I also think consumer education around that stuff. Understanding, like, hey, this is helpful, and it can be secure and private, and, you know, maybe this is something you should consider, you know, using. There's a lot of work being done around consumer education regarding AI and how it should impact your efficiency.
So those have been challenges. And then, you know, because the tool we're building is, like, in obvious at first, you know, in in the idea of, like, why you would use it, that is also challenging. Right? We're we're trying to figure out how to communicate this system to people where it's, like, just save small things or let pieces save things for you because I promise you, you're losing 90% of the things that you'll need later. And you might not realize that as a developer, but, it always happens. You know, you're always like, man, what was that thing from, like, 2 weeks ago? Right? And that adds up over time. It's like death by a 1000 cuts to your efficiency.
[00:40:39] Tobias Macey:
Yeah. And for people who are interested in having some sort of assistant for their development process, what are the cases where pieces is the wrong choice?
[00:40:55] Tsavo Knott:
Yeah. I think, you know, if you're looking for code generation or autocomplete, that is not us. You know, we are not, you know, we're not trying to be the best in the world at generating that next line of code. Right? What we would like to be the best in the world at is telling you who to talk to about this code or telling you what documentations are related or maybe even what you were doing couple of weeks ago that are related to this file. Right? All the other components of your workflow context. I think we wanna complement systems like GitHub Copilot and tab 9 and things like that, but we wanna be very horizontal as opposed to very vertical.
So I'd say, you know, it's not the right tool to to generate, like, in real time the line of code you're writing. That said, it is really excellent at generating solutions, you know, where you need a promise resolution with a callback to cancel it, right, or something like that. Or you're trying to explain some code or you're trying to, you know, understand, again, like, naturally, who is this related to or or things like that. So, you know, I would say when it's not the right tool is when you have a very, very specific problem to solve. When it is the right tool is when you just need to go through your workflow as you naturally will and save small bits of things that, you know, you know you might need later, but you're not guaranteed to need. It's in that exploratory phase. Right? That's that's kind of what I would say is, like, when you're when you're going wide across the workflow, you're in the browser, you got all the tabs, you're jumping between your repositories, you're referencing existing code, you're talking to people, that's when you should use pieces. It's excellent for connecting the dots and capturing those connections. But when you're focused on a singular dot, pieces is not the right tool.
[00:42:37] Tobias Macey:
And as you continue to build and iterate on the pieces product, What are some of the things you have planned for the near to medium term or any particular projects or problem areas you're excited to dig into?
[00:42:49] Tsavo Knott:
As I said, you know, Jarvis for developers. Right? That that auto complete at at at a level beyond the code. I think that, you know, the the funny thing is that developer workflows are about to become absolutely insane. If they weren't already, they're about to become crazy. And what I mean by that is, you know, you're gonna be writing 2 times, 3 times the volume of code. You're gonna PR way more code. You're gonna ship way more features. You're gonna work on a lot higher, you know, cross functional team environments. Right? They're gonna they're gonna be, like, way more intricate, with who you're working on, and that's because Copilot's let you be a generalist now as opposed to an expert. I can write c plus plus, Python, Rust, and, you know, whatever else, Kotlin, because I can, you know, really go and get the fine details worked out by Copilot.
But what does that mean? That means I'm also looking at all the docs for all those things. I'm also looking at all the repositories for all those projects. I'm also talking to all the people that worked on those. So your your volume of material that you're interacting with is about to, you know, probably tenfold, right, if you will. And that's why pieces is on the receiving side trying to capture, coordinate, and then serve back to you what's actually critical. And that's why I think, like, this idea of it's always on, it's it's proactive, it's understanding what what's relevant from a, you know, in moment perspective and serve that back to you when you need it, that's really what I'm excited about. I feel this as a cofounder.
I feel this as a as a technologist. You know, I'm always researching the stuff that my team members are sending to me, my, you know, head of engines sending to me, my, you know, plug in team members are sending to me. And it's really hard to, like, capture and connect all those dots, but this is about to be the case for every developer out there.
[00:44:35] Tobias Macey:
Are there any other aspects of the pieces, project, the overall space of machine learning and AI capabilities as a developer enhancement or workflow enhancement or the overall space of the ML evolution in the context that we didn't discuss yet that you'd like to cover before we close out the show?
[00:44:57] Tsavo Knott:
I would say, you know, we're we're looking at this evolution of agents, which is pretty pretty cool in the market. You know, a lot of people are talking about autonomous agents, agents that are, you know, able to to coordinate with each other and do certain tasks. I think that in the same kind of, you know, thread of thought, always on large language models that are efficient and aware and and capable, that is something that is really interesting and exciting for us that we're trying to push. Like, how do we get a 1,000,000,000 parameter model? Not like a 200,000,000,000 1, right, but like a 1,000,000,000 parameter model to be always on, very aware, and run-in less than, you know, 500 gigs of or 500 megs of RAM. Right? Like, that is something that, you know, we didn't talk too too much about, but a lot of the large language models today are prompted. And then, you know, you get a response, and they probably shut down, go to 0.
In our world, we're we're super interested in just always on systems.
[00:45:53] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest barrier to adoption for machine learning today.
[00:46:09] Tsavo Knott:
I think its ability to be integrated everywhere and and and understand the largest level of context. Right? The level that, like, we operate at. I think, you know, they're trying to increase these context window sizes. They're trying to, you know, retrieve all the relevant documents, embed everything under the sun, you know, all to just give this stuff the proper, you know, information it needs. And I think the the largest challenge to machine learning today is if we are going to generate and really, like, you know, have these massive LLMs running, I don't think we're ready for the compute requirements or, you know, the the the scale that that's gonna be, you know, compounded. Like I said, if you're generating, you know, docs and images and code and everything like that, like, 2023 and prior, it was generated by humans, and it had, like, a a velocity of output. Now you you put all these large language models behind everything and, like, the data in the world is gonna, you know, almost tenfold, if not 100 fold. It goes almost non polynomial.
And so now you're like, okay. We're building systems that can understand data that was already existing by humans. How do we build systems that's going to understand data that was generated by large language models and keep up with the pace? I think that'll be the challenge to truly globally adopting AI systems in every capacity of our our digital world. It's just gonna be so much data and very expensive to, you know, throw large language models at every single one of these problems. So that that's that's kind of my challenge. It's a compute and data problem.
[00:47:39] Tobias Macey:
Well, thank you very much for taking the time today to join me and share the work that you're doing on the pieces project. It's definitely a very interesting application of ML and AI, and I'm definitely very interested to see how it continues to evolve, particularly once you get Emax support figured out. Absolutely. Thank you. Thank you for the time and energy that you and your team are putting into that, and I hope you enjoy the rest of your day. Thanks, Tobias. Thanks for having me. Thank you for listening. Don't forget to check out our other shows. The Data Engineering podcast covers the latest on modern data management, and podcast dot on it covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@aiengineeringpodcast.com with your story.
Introduction and Guest Introduction
Early Days in Machine Learning
Overview of the Pieces Project
Differentiating Pieces from Other AI Tools
Model Integration and Contextualization
Design and Development of Pieces
Model Deployment and Updates
Workflow Integration and Use Cases
User Control and Monetization
Unexpected Uses and Lessons Learned
When Pieces is the Wrong Choice
Future Plans and Exciting Projects
Challenges in AI and ML Adoption
Closing Remarks