LangChain’s Harrison Chase on Building the Orchestration Layer for AI Agents
Last year, AutoGPT and Baby AGI captured our imaginations—agents quickly became the buzzword of the day…and then things went quiet. AutoGPT and Baby AGI may have marked a peak in the hype cycle, but this year has seen a wave of agentic breakouts on the product side, from Klarna’s customer support AI to Cognition’s Devin, etc. Harrison Chase of LangChain is focused on enabling the orchestration layer for agents. In this conversation, he explains what’s changed that’s allowing agents to improve performance and find traction. Harrison shares what he’s optimistic about, where he sees promise for agents vs. what he thinks will be trained into models themselves, and discusses novel kinds of UX that he imagines might transform how we experience agents in the future. Hosted by: Sonya Huang and Pat Grady, Sequoia Capital Mentioned: ReAct: Synergizing Reasoning and Acting in Language Models , the first cognitive architecture for agents SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering , small-model open-source software engineering agent from researchers at Princeton Devin , autonomous software engineering from Cognition V0 : Generative UI agent from Vercel GPT Researcher , a research agent Language Model Cascades : 2022 paper by Google Brain and now OpenAI researcher David Dohan that was influential for Harrison in developing LangChain Transcript: https://www.sequoiacap.com/podcast/training-data-harrison-chase/ 00:00 Introduction 01:21 What are agents? 05:00 What is LangChain’s role in the agent ecosystem? 11:13 What is a cognitive architecture?
- Published
- Published Jun 18, 2024
- Uploaded
- Uploaded Jun 11, 2026
- File type
- POD
- Queried
- 00
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] It's so early on that like [00:02] it's so early on. There's so much to be built. [00:05] Yeah, like, you know, GPT-5 is going to come out and it'll probably make some of the things you did. [00:10] not relevant, but you're going to learn so much along the way. And this is, I strongly, strongly believe, like a transformative technology. And so the more that you learn about it, the better. [00:21] *music* [00:36] Beep beep beep beep beep. [00:38] Hi and welcome to Training Data. [00:40] We have with us today Harrison Chase, founder and CEO of Langchain. [00:44] Harrison is a legend in the Asian ecosystem, [00:46] as the product visionary who first connected LLMs with tools in action. [00:50] and Langchain is the most popular agent building framework in the AI space today. [00:54] We're excited to ask Harrison about the current state of agents [00:57] the future potential and the path ahead. Harrison, thank you so much for joining us and welcome to the show. [01:04] Of course. Thank you for having me. So maybe just to set the stage, agents are the topic that everybody wants to learn more about. [01:12] And you've been at the epicenter of agent building pretty much since the LLM wave [01:17] first got going. And so maybe first, just to set the table. [01:21] What exactly are agents? [01:24] I think defining agents is actually a little bit tricky and people probably have different [01:29] definitions of them.
[01:30] Which I think is pretty fair because it's still pretty early on in the... [01:34] lifecycle of everything LLMs and agent related. [01:37] The way that I think about agents... [01:40] is that it's when an LLM is kind of like deciding the control flow of an application. [01:46] So what I mean by that is if you have a more traditional kind of like rag chain or retrieval augmented generation chain, [01:52] the steps are generally known ahead of time. First, you're going to maybe generate a search query, then you're going to retrieve some documents, then you're going to generate an answer, and you're going to return that to a user. It's a very fixed sequence of events. [02:03] Um... [02:05] And I think when I think about things that start to get agentic, it's when you put an LLM at the center of it and let it decide what exactly it's going to do. So maybe sometimes it will look up a search query. Other times it might not. It might just respond directly to the user. [02:18] Maybe it will look up a search query, get the results, look up another search query, [02:22] look up two more search queries and then respond. And so you kind of have the LLM deciding the control flow. [02:27] I think there are some other maybe... [02:30] more buzzwordy things that fit into this. So like tool usage is often [02:35] associated with agents. [02:37] And I think that makes sense because when you have an LLM deciding what to do, [02:41] The main way that it decides what to do is through tool usage. [02:44] So I think those kind of go hand in hand. [02:48] There's some aspect of memory that's commonly associated with agents, and I think that also makes sense, because when you have an LLM deciding what to do, it needs to remember what it's done before. [02:58] um [02:59] And so like tool usage and memory are kind of like loosely associated, but to me,
[03:04] When I think of an agent, it's really having an LLM decide the control flow of your application. [03:10] And Harrison, a lot of what I just heard from you is around decision making. [03:14] And I've always thought about agents as sort of action taking. [03:18] Do those two things go hand in hand? Is agentic behavior more about one versus the other? How do you think about that? [03:26] I think they go hand in hand. I think like, [03:27] a lot of what we see agents doing is deciding what actions to take. [03:31] for all intents and purposes. [03:34] And I think the big... [03:36] Uh... [03:37] difficulty with action taking [03:40] is... [03:41] deciding what the right actions to take are. [03:44] So I do think that solving one kind of leads naturally to the other. And after you decide the action as well, there's generally the system around the LLM that then goes and executes that action and kind of like feeds it back into the agent. [03:57] So I do think they go kind of hand in hand. [04:02] So Harrison, it seems like the main distinction then, [04:04] between an agent and something like a chain is that the LLM itself is deciding what step to take next. [04:11] what action to take next as opposed to these things being hard. Is that like a fair way to distinguish? [04:16] from agencies. [04:18] Yeah, I think that's right. And there's different gradients as well. So as like an extreme example, [04:22] you could have basically a router that decides between which path to go down. And so there's maybe just like a classification step. [04:29] in your chain. And so the LLM is still deciding like what to do, but it's a very simplistic way of deciding what to do.
[04:36] And, you know, at the other extreme, you've got these autonomous agent type things, and that is just this whole spectrum in between. So I'd say that's largely correct, although I'll just note that there's a bunch of nuance in gray area, as there is with most things in the LLM space these days. Got it. It's like a spectrum from... [04:52] control to fully autonomous decision making and logic. [04:56] Well, yeah, those are kind of on the spectrum of agents. [04:58] Interesting. [04:59] What role do you see Langchain playing in the agent ecosystem? [05:04] I think [05:05] Um... [05:06] Right now we're really focused on [05:08] making it easy for people to create something in the middle of that spectrum. [05:12] And for a bunch of reasons, we've seen that that's kind of the best spot to be building agents in at the moment. [05:19] So we've seen [05:21] Some of these more fully autonomous things get a lot of interest and prototypes out the door, and there's a lot of benefits to the fully autonomous things that actually... [05:31] quite simple to build. [05:33] But we see them going off the rails a lot. And we see people wanting more constrained things [05:38] but a little bit more flexible and powerful than chains. And so a lot of what we're focused on recently [05:44] is being this orchestration layer that enables the creation of these agents, particularly these things in the middle between chains and autonomous agents. [05:53] And I can dive into a lot more about [05:55] what exactly we're doing there, but at a high level, that being that piece of orchestration [06:01] framework is kind of where we imagine Langchain City. [06:06] got it so there's chains there's autonomous agents there's a
[06:09] kind of spectrum in between and your sweet spot is somewhere in the middle enabling people to build agents. [06:13] Yeah, and obviously that's changed over time. So it's fun to reflect on the evolution of Langchain. [06:19] So, you know, I think when Langchain first started, [06:22] It was actually a combination of [06:24] chains. And then we had this one class, this agent executor class, which was basically this autonomous agent thing. And we started adding in like a few more controls to that class. [06:34] And, but... [06:36] Eventually, we realized that people wanted way more flexibility and control than we were giving them with that one class. So like recently, we've been really heavily invested in Lang Graph. [06:44] which is an extension of Langchain that's really aimed at like customizable agents that sit somewhere in the middle. [06:49] And so kind of like our focus, you know, has evolved over time as the space has as well. [06:55] Fascinating. [06:56] Maybe one more final kind of setting the stage question. [07:00] One of our core beliefs is that agents are the next big wave in AI. [07:06] and that we're moving as an industry from co-pilots to agents. [07:10] I'm curious if you agree with that take and why or why not. [07:14] Yeah, I generally agree with that take. I think... [07:18] The reason why that's so exciting to me [07:21] Is that a co-pilot still relies on having this human in the loop? [07:24] And so there's a little bit of almost like an upper bound on the amount of work that you can have done by an external kind of like. [07:32] uh... [07:33] by another system. [07:34] And so it's a little bit limiting in that sense. I do think there's some...
[07:39] really interesting thinking to be done around what is the right UX and human agent interaction patterns. [07:46] But I do think there'll be more along the lines of an agent doing something and maybe checking in with you as opposed to a co-pilot that's constantly. [07:56] kind of like in the loop. I just think it's [07:58] I just think it's more powerful and gives you more leverage if the more that they're doing. [08:03] Which is very paradoxical as well, because the more you let it do... [08:08] things by itself, [08:09] there's more risk that it's messing up or going off the rails. And so I think striking this right balance is going to be really, really interesting. [08:16] I remember back in, I think it was March-ish of 2023... [08:21] There were a few of these autonomous agents that – [08:24] really captured everyone's imaginations, like baby AGI, auto GPT, a few of these. [08:30] and [08:32] I just remember Twitter was very, very excited about it. And it seems like... [08:37] That first iteration of an Asian architecture hasn't quite met people's expectations. [08:42] Um... [08:43] I think why do you think that is and where do you think we are in the agent hype cycle now? [08:49] Yeah, I think... [08:52] Maybe thinking about the agent hype cycle first. [08:55] I think. [08:56] Auto GBT was definitely the start and end and then [09:02] I mean, it's one of the most popular GitHub projects ever, so one of the peaks of the hype cycle. [09:08] on [09:09] I think.
[09:10] And I'd say that [09:12] started in the spring 2023 to summer of 2023-ish, then I personally feel like there was a bit of kind of like a lull slash down... [09:21] trend from the late summer to basically the start of the new year. [09:26] in 2024. [09:29] And I think starting in 2024, we've started to see a few more realistic things come online. [09:36] I'd point out some of the work that we've done at LinkedIn with Elastic, for example. They have kind of like an Elastic assistant, an Elastic agent in production. [09:44] And so we're seeing that. We saw kind of like the Klarna customer support bot. [09:49] kind of like come online and get a lot of hype. We've seen Devin, we've seen Sierra, these other companies start to emerge. [09:56] Um... [09:57] in the agent space. [09:58] And so I think with that hype cycle in mind, [10:02] talking about why the AutoGPT style architecture didn't... [10:06] really work. [10:08] It was very general and very unconstrained. And I think that made it really exciting and captivated people's kind of like imaginations. [10:16] But I think practically for... [10:18] things that people wanted to automate to provide immediate business value. There's actually a lot. It's a much more specific thing that they want these agents to do. [10:27] And there's really like a lot more [10:29] that they want the agents to follow or specific ways they want them to do things. [10:33] And so I think in practice, what we're seeing with these agents is they're much more [10:37] kind of like custom cognitive architectures is kind of like what we call them, where there's a certain way of doing things that you generally want an agent to do. And there's some flexibility in there for sure. Otherwise, you would just code it.
[10:49] But it's a very like directed way of thinking about things. And that's most of the agents and assistants that we see today. And that's just more engineering work. And that's just more kind of like, [11:00] trying things out and seeing kind of like what works and what doesn't work. And it's harder to do. So it just takes longer to build. And I think that's kind of why, you know, that's why that didn't exist a year ago or something like that. [11:13] Since you mentioned cognitive architectures, [11:15] I love the way that you think about them. Maybe can you just explain, like, what is a cognitive architecture... [11:21] Is there a good mental framework for how we should be thinking about them? [11:25] Yeah, so... [11:26] The way that I think about a cognitive architecture is basically what's the... [11:30] system architecture of your LLM application. [11:34] And so what I mean by that is if you're building an LLM application, there's some steps in there that use LLMs. [11:40] What are you using these LLMs to do? [11:42] Are you using them to just generate the final answer? Are you using them to route between two different things? [11:49] Do you have a pretty complex one with a lot of different branches and maybe some cycles repeating? [11:56] Or do you have kind of like, you know, a pretty... [12:00] a loop, would you basically run this LLM in a loop? These are all different variants of Cognitive Architectures. [12:07] And cognitive argument is just a fancy way of saying like, [12:11] from the user input to the user output, what's the flow of data, of information, of LLM calls that happens along the way. [12:19] um
[12:20] And... [12:21] what we've seen, [12:22] more and more [12:23] especially as people are trying to get agents actually into production. [12:27] is that [12:28] the flow is specific to their application and their domain. [12:32] So there's maybe some specific checks they want to do right off the bat. [12:37] There's maybe three specific steps that it could take after that, and then each one maybe has an option to loop back or has two separate sub-steps. [12:44] um [12:45] And so we see these more like [12:47] If you think about it as a graph that you're drawing out, we see more and more basically custom and bespoke graphs as people kind of try to constrain and guide the agent along. [12:57] their application. [12:59] The reason I call it a cognitive architecture is just, you know, I think a lot of the power of LLMs is around reasoning and thinking about what to do. [13:06] And so... [13:08] you know, I would maybe have like a cognitive mental model for how to do a task. And I'm basically just encoding that mental model. [13:14] into some kind of like software system, some architecture that way. [13:19] Do you think that's the direction the world [13:22] Because I kind of heard two things from you there. One was it's very bespoke. [13:27] and second was it's [13:29] fairly brute force like it's fairly hard-coded in a lot of ways [13:33] Do you think that's [13:34] where we're headed? Or do you think that's a stopgap and at some point... [13:38] more elegant architectures or a series of default sort of reference architectures will emerge. [13:44] That is a really, really good question and one I spend a lot of time thinking about. I think so, like at an extreme.
[13:51] you could make an argument. [13:53] that if the models get really, really good and reliable at planning, [13:59] then the best thing you could possibly have is just this for loop that runs in a loop, calls the LLM, decides what to do, takes the action and loops again. And like all of these constraints on how I want the model to behave, I just put that in my prompt and the model follows that kind of like explicitly. [14:16] Um... [14:18] I [14:20] I do think the models will get better at planning. [14:22] and reasoning for sure. [14:25] I don't quite think they'll get to the level. [14:28] where that will be the [14:30] best way to do things for a variety of reasons. One, I think [14:34] efficiency if you know that you always want to do step a after step b [14:39] you can just [14:40] put that in order [14:42] And to reliability as well. Like these are still non-deterministic things we're talking about, especially in enterprise settings. You probably want a little bit more comfort that it's always supposed to do step A after step B. It's actually always going to do step A over step B. [14:56] or after step B. [14:57] I think it will get easier to create these things. Like, I think they'll maybe start to become a little bit... [15:04] less [15:04] and less complex [15:06] Um, but actually this is maybe a hot take or interesting take that it has. You could say like, so the, the architecture of just running it in a loop. [15:14] um [15:15] you could think of as like a really simple [15:18] but general cognitive architecture. [15:21] And then what we see...
[15:23] in production is like, [15:24] Custom. [15:25] and complicated kind of like cognitive architectures. [15:29] I think there's a separate access, which is like complicated, but generic. [15:34] custom or complicated, but generic. [15:37] cognitive architectures. And so this would be something like a really complicated, like planning step and reflection loop or like tree of thoughts or something like that. [15:45] And I actually think that quadrant will probably go away over time because I think a lot of that generic planning and generic reflection, [15:52] will get. [15:53] trained into the models themselves. [15:55] but there will still be a bunch of not generic training or not generic planning, not generic reflection, not generic control loops. [16:02] that are never going to be in the models, basically, no matter what. [16:06] And so I think like, [16:07] those two ends of the spectrum I'm pretty bullish on. [16:11] I guess you can almost think about it as like the LLM... [16:14] does the kind of like general, the very general agentic reasoning, [16:19] But then you need domain-specific reasoning, and that's the sort of stuff that you can't really build into one general model. [16:27] 100%. I think a way of thinking about the custom cognitive architectures, [16:32] is you're basically taking [16:33] you're taking the planning responsibility, [16:36] away from the LLM and putting it onto the human. And some of that planning, you'll move more and more towards the model and more and more towards the prompt. [16:44] but I think they'll always be like, [16:47] I think a lot of a lot of [16:49] Tasks are actually quite. [16:50] complicated in some of their planning.
[16:53] And so I think it will be a while before we get [16:55] things that are [16:56] just able to do that. [16:58] super, super reliably off the shelf. [17:00] It seems like we've simultaneously made a ton of progress on agents in the last six months or so. [17:06] I was reading a paper, the Princeton SWE paper, where their coding agents can now solve 12.5% of GitHub issues versus [17:15] I think 3.8% when it was just RAG. [17:18] So it feels like we've, you know... [17:21] We've made a ton of progress in the last six months but, [17:23] 12.5% is like not good enough. [17:25] to replace even an intern, right? And so [17:29] It feels like we still have a ton of room to go. [17:32] I'm curious where you think we are both for general agents and also for your customers that are building agents like. [17:38] are they kind of getting to, I assume not five nines reliability, but are they getting to kind of like, [17:43] the thresholds they need to kind of deploy these agents out to actual [17:47] from the customer facing deployments. [17:50] Yeah, so the SWE agent is, I would say, a relatively general-ish agent in that it [17:56] is expected to work across a bunch of different GitHub repos. I think if you look at something at like v0 by Vercel, [18:02] that's probably much more reliable than 12.5. [18:06] percent, right? And so I think that speaks to like, yeah, there are definitely custom agents that not five nines of reliability, but that like are being used in production. So like Elastic, I think we've talked publicly about how they've done. [18:22] I think multiple agents at this point. And I think this week is RSA. And I think they're announcing something new at RSA.
[18:28] that's an agent [18:30] And yeah, those are, I don't have the exact numbers on reliability, but they're reliable enough to be shipped into production. [18:37] um [18:38] General agents are still tough. Yeah, this is where kind of like... [18:42] Longer context windows, better planning, better reasoning will help those general agents. [18:48] you shared with me this great Jeff Bezos quote which is like focus on what makes your beer better and I think it's referring to the fact that [18:55] At the turn of 20th century, breweries were trying to make their own electricity, generate their own electricity. [19:02] Um, [19:03] I think similar question a lot of companies are thinking through today. Do you think that... [19:08] having control over your cognitive architecture [19:11] Really makes your beer taste better, so to speak, metaphorically, or... [19:15] Like, [19:16] Or do you see control that the model and just build kind of UI and product? [19:21] I think it maybe depends on the type of cognitive architecture that you're building, going back to some of the discussions earlier. [19:27] If you're building like a generic cognitive architecture... [19:30] I don't think that makes your beer taste better. I think the bottle providers will work on this general planning. I think like well work on these general cognitive architectures that you can try off the bat. [19:41] On the other hand, if you're cognitive architectures, [19:44] are basically you [19:45] codifying a lot of the [19:48] way that your support team thinks about something or internal business processes or the best way that [19:53] you know, to kind of like develop code or develop this particular type of code.
[19:58] or this particular type of application. [20:00] Yeah, I think that absolutely makes your beer taste better, especially if we're going towards a place where these applications are doing work. [20:08] Then like, [20:09] the logic, the bespoke kind of like business logic or mental models for, I'm anthropomorphizing these LLMs a lot right now, but like the models for these things to, [20:19] to do the best work possible. [20:22] 100%. Like, I think that's the key thing that you're selling in some capacity. I think UX, [20:27] and UI and distribution and everything absolutely still plays a part. But like, yeah, I draw this distinction between general. [20:33] versus custom. [20:36] Harrison, before we get into some of the details on how people are building these things, can we pop up a level real quick? So our founder, Don Valentine, was famous for asking the question, so what? [20:47] And so my question to you is, so what? Let's imagine that autonomous agents are working flawlessly [20:53] What does that mean for the world? Like, how is life different if and when that occurs? [20:59] I think at a high level it means that [21:01] as humans were focusing... [21:03] on just a different set of things. So I think there's a lot of like [21:08] rote, repeated kind of like work that goes on in a lot of industries at the moment. [21:14] And so I think the idea of agents is a lot of that will be kind of like automated away. [21:19] leaving us to think maybe higher level about like what these agents should be doing and maybe leveraging their outputs to do more creative or building upon those outputs to do more kind of like higher leverage things.
[21:33] basically. And so [21:35] I think, you know, you could imagine. [21:38] bootstrapping an entire company where you're outsourcing a lot of the functions that you would normally have to hire for. [21:45] And so... [21:46] You could... [21:47] play the role of a CEO with an agent for marketing. [21:52] an agent for sales, something like that. [21:54] and allow you to basically... [21:57] outsource a lot of this [21:59] Work. [22:00] to agents, [22:02] leaving you to do a lot of the interesting strategic thinking, product thinking. And maybe this depends a little bit on what your interests are. But I think at a high level, it will free us up to do what we want to do and what we're good at. [22:14] and automate a lot of the things that we might not necessarily want to do. [22:19] And are you seeing any interesting examples of this today, sort of live and in production? [22:25] I mean, I think the biggest... [22:27] There's two kind of like categories or areas of agents that are starting to get more traction. One's customer support, one's coding. [22:34] So I think customer support is a pretty [22:37] good example of this. Like I think [22:39] you know. [22:40] often [22:41] Times... [22:42] People need customer support. We need customer support at LinkedIn. [22:46] And so if we could hire agents to do that, [22:49] That would be really powerful. [22:51] um coding is interesting because i think there's some aspects of coding [22:55] that, I mean, yeah, this is maybe a more philosophical debate, but I think there's some aspects of coding that are really creative and do require like really, I mean, lots of product thinking, lots of positioning and things like that. There's also aspects of coding.
[23:10] that [23:12] limit some of the or not limit but get in the way of a lot of the creativity that people might have [23:17] So if my mom has an idea for a website, [23:19] She doesn't know how to code that up, right? But if there was an agent that could do that, she could focus on the idea for the website and basically the scoping of the website. [23:28] but automate that. [23:29] And so I'd say customer support, absolutely. That's having an impact today. [23:33] coding there is a lot of interest there [23:36] I don't think it's as mature as customer support, but in terms of areas where there is a lot of people doing interesting things, that would be a second one to call out. [23:45] Your column on coding is interesting because I think this is one of the things that has us very optimistic about AI. It's this idea of sort of. [23:52] closing the gap from idea to execution or closing the gap from, you know, dream to reality. [23:57] where you can come up with a very creative, compelling idea. [24:00] but you may not have the tools at your disposal to be able to put it into reality. And AI seems like it's well-sated for that. I think Dylan and Figma talks about this a lot too. [24:10] Yeah, I think it goes back to this idea of automating away the things that [24:15] get in the way of... I like the phrasing of idea to reality. It automates away kind of like the [24:21] the things that [24:23] You don't necessarily... [24:24] know how to do or want to think about but are needed to create [24:28] whatever you want to create. [24:30] I think it also, one of the things that I spend a lot of time thinking about is like, what does it mean to be [24:34] a builder in the age of kind of like generative AI and in the age of agents. [24:40] So what it means to be a builder of software today
[24:44] means you either have to be an engineer or hire engineers or something like that, right? [24:49] But I think [24:52] what it means to be a builder in the age of agents and generative AI. [24:56] just allows people [24:57] to [24:58] Build. [24:59] a [25:00] way larger set of things than they could... [25:04] build today. [25:05] Because they have at their fingertips all this other knowledge and all this other knowledge. [25:10] Kind of like. [25:11] all these other builders they can hire and use for... [25:15] very, very... [25:16] cheap. I mean, I think like, you know, some of the [25:19] language around like commoditization of kind of like [25:23] intelligence or something like that as these LLMs are providing intelligence for free. [25:27] I think does speak to enabling a lot of these. [25:31] new builders to emerge. [25:34] You mentioned reflection and chain of thoughts and other techniques like [25:38] Baby, can you just... [25:39] say a word on like what we've learned so far about what some of these [25:43] I guess, cognitive architectures are capable of doing. [25:48] for a gentic performance and maybe just [25:50] I'm curious what you think are the most promising. [25:53] Cognitive architectures. [25:56] Yeah, I think there's... [26:00] Maybe it's worth talking a little bit about why kind of like the auto GPT things didn't didn't work. [26:06] Because I think a lot of the Kograff architectures are kind of like, [26:09] emerged to [26:11] counteract some of that. [26:12] Um... [26:14] I guess way back when there was basically the problem that LLMs
[26:18] couldn't even reason well enough about a first step to do and like what they should do as the first step. [26:24] And so I think prompting techniques. [26:26] like Chain of Thought, [26:28] turned out to be really helpful there. They basically gave the LLM more space to think about and think step by step about like what they should do for for a specific kind of like single step. [26:39] then that actually started to get trained into the models more and more and they kind of did that by default. [26:45] Is that kind of like [26:46] is basically everyone wanted the models to do that anyways. And so, yeah, you should train that into the models. [26:51] Um [26:52] I think then, [26:53] there was a great paper by Chenu [26:55] called React, which basically was the first [27:01] cognitive architecture for agents or something like that. [27:04] And the thing that it did there... [27:07] was one, it asked the LM to predict what to do. That's the action. But then it added in this reasoning component. And so it's kind of similar to chain of thought. [27:16] and that it basically added in this reasoning component [27:18] he put it in a loop he asked us to do this reasoning thing before each step and you kind of run it there [27:23] um [27:24] And so that was kind of like, [27:26] and actually that's that like explicit reasoning step. [27:29] has actually... [27:31] become less and less necessary as the models have that trained into them. Like just like they have kind of like the chain of thought trained into them, that explicit reasoning step. [27:39] has become unless necessary. So if you see people doing kind of like react style agents today, [27:44] they're oftentimes just using function calling without kind of like the explicit like [27:48] thought process that was actually in the original React paper.
[27:52] But it's still this loop that has kind of become synonymous with the React paper. [27:56] Um [27:58] So that's a lot of the... [27:59] That's a lot of the... [28:01] difficulties initially with agents. [28:03] And I wouldn't entirely describe those as cognitive architectures. I describe those as prompting techniques. [28:07] But okay, so now we've got this working. Now what are some of the issues? [28:11] The two main issues are basically planning and then kind of like [28:14] realizing that you're done. And so by planning, I mean like [28:18] when I think about what to do things, [28:20] subconsciously or consciously, I like put together a plan of the order that I'm going to do the steps in. And then I kind of like go and do each steps and basically models, uh, [28:30] struggle with that. They struggle with long-term planning. [28:33] They struggle with coming up with a good long-term plan. And then if you're running it in this loop, [28:39] At each step, you're kind of... [28:41] doing a part of the plan and maybe it finishes or maybe it doesn't finish and so [28:46] So there's this, you know, if you just run it in this loop, you're implicitly asking the model to first come up with a plan, then kind of like track its progress on the plan and continue along that. [28:56] So I think some of the planning [28:58] cognitive architectures that we've seen have been okay. [29:01] First, let's add an explicit step where we ask the LLM to generate a plan. [29:05] Um, [29:06] then you know let's go step by step in that plan and we'll make sure that we do each step and that's just a way of like enforcing that the model [29:13] generates a long-term plan and like actually does each step before going on. And it doesn't just like, you know, generate a five-step plan, do the first step and then say, okay, I'm done. I finished or something like that.
[29:23] Um... [29:24] And then, [29:24] I think... [29:25] A separate but kind of related thing is this idea of reflection, which is basically like [29:31] has a model actually done its job well, right? So like I could generate a plan where I'm going to go get this answer. I could go get an answer from the internet. [29:40] Maybe it's just like completely the wrong answer or I got like bad search results or something like that. [29:44] I shouldn't just return that answer, right? I should kind of like think about whether I got the right answer or [29:50] um [29:51] And or whether I need to do something again. And again, like if you're just running it in a loop, you're kind of asking the model to do this implicitly. [29:58] So there have been some cognitive architectures that have emerged to overcome that, that basically add that in as an explicit step. [30:05] where they do an action or a series of actions and then ask the model to explicitly think about whether it's done it correctly or not. [30:12] Um... [30:13] And so planning and reasoning are probably like two of the more. [30:16] popular generic kind of like cognitive architectures. There's a lot of like custom cognitive architectures, but that's all super tied to like business logic and things like that. [30:26] But planning and reasoning are generic ones. I'd expect these to become [30:30] more and more trained into the models by default. [30:34] Although I do think there's a very interesting question of how good will they ever get in the models. But that's probably a separate longer term conversation. [30:42] Harrison, one of the things that you talked about at AI Ascent was UX. [30:46] which we would normally think about as kind of being on the opposite end of the spectrum from architecture. You know, the architecture is behind the scenes. The UX is the thing out in front.
[30:55] Um... [30:55] But it seems like we're in this interesting world where the UX can actually influence the effectiveness of the architecture. [31:01] By allowing you, like, for example, with Devin to rewind to the point in the planning process where things started to go off track. [31:07] Can you just say a couple words about UX and the importance of it in agents or LLMs more generally and maybe some interesting things that you've seen there? [31:16] Yeah, I'm super fascinated by... [31:19] UX... [31:21] And I think there's a lot of really interesting work to be done here. I think the reason it's so important... [31:28] It's because these LLMs still aren't... [31:30] perfect and still aren't kind of like reliable and have a tendency to mess up. And I think that's why chat is such a powerful UX for some of the initial kind of like interactions and applications. You can easily see what it's doing. It streams us back, it's response. You can easily correct it by responding to it. You can easily ask follow up questions. [31:47] And so I think chat has clearly emerged as the dominant UX at the moment. [31:52] Um [31:53] I do think there are downsides to chat. [31:56] Um, you know, it's generally like one. [31:59] AI message, one human message. The human is very much in the loop. It's very much a co-pilot-esque type of thing. [32:06] And I think the more and more [32:07] that you can remove the human out of the loop [32:10] the more it can do for you and it can kind of like work for you. And I just think that's incredibly... [32:16] powerful and enabling. [32:17] Um... [32:19] However... [32:20] Again, going LLMs are not perfect and they mess up. So how do you kind of like balance these two things?
[32:26] I think some of the interesting ideas that we've seen talking about Devon. [32:31] are this idea of basically having a [32:33] like really transparent list of everything the agent has done, right? Like you should be able to know what the agent has done. That seems like step one. Step two is probably like being able to [32:42] modify. [32:44] what it's doing or what it has done. So if you see that it [32:47] you know, messed up step three, you can maybe rewind there. [32:51] give it some new instructions or even just like edit it's kind of like, uh, [32:55] decision manually. [32:57] and go from there. [33:00] Um... [33:01] I think other like [33:02] interesting UX patterns besides this rewind and edit. [33:06] um [33:07] One is like the idea of kind of like a... [33:11] inbox where the agent can reach out to the human as needed. So you've maybe got like, you know, 10, 10 agents running in parallel in the background. And every now and again, it maybe needs to ask the human for clarification. [33:23] And so you've got like an email inbox where the agent is sending you like help, help me. I'm at this point. I need help or something like that. And you kind of go and help it at that point. [33:32] A similar one is like reviewing its work, right? And so I think this is really powerful. [33:37] for we've seen a lot of like. [33:38] Um, [33:40] agents for writing different types of things, doing research, like research style agents. There's a great project, GPT researcher. [33:47] which has some really interesting architectures around agents. [33:51] And I think that's a great place for this type of like review, all right? Like you can have an agent write a first draft.
[33:57] And then I can review it and I can leave comments basically. [34:00] um [34:01] And there's a few different ways that can actually happen. So, you know, the most... [34:08] maybe like the least involved way is I just leave like, [34:11] a bunch of comments in one go, send those all to the agent, and then it goes and fixes all of them. Another UX that's really, really interesting is this collaborative at the same time. So Google Docs [34:22] but a human and an agent working at the same time. Like I leave a comment. [34:27] the agent fixes it while I'm making another comment or something like that. I think that's a separate UX. [34:32] That is... [34:34] pretty complicated to think about setting up and getting working. And I, yeah, I think that's interesting. [34:41] Um, [34:42] There's one other kind of like UX [34:45] thing [34:46] that I think is interesting to think about, which is basically just like... [34:49] How do these agents [34:52] learn from these interactions, right? Like we're talking about a human kind of like correcting the agent a bunch or giving feedback. [34:59] It would be so frustrating if I had to give the same piece of feedback a hundred different times, right? That would suck. [35:05] What's the architecture of the system that enables it so that it can start to learn from that? [35:11] I think is really interesting. And, you know, I think all of these are [35:15] Um... [35:17] All of these are still to be figured out. We're super early on in the game for figuring out a lot of these things, but this is a lot of what we spend a lot of time thinking about.
[35:27] Hmm. [35:29] Well, actually, that reminds me. [35:32] I don't know if you know this or not, but you're sort of legendary for the degree to which you are present in the developer community and paying very close attention to what's happening in the developer community and the problems that people have in the developer community. [35:45] there are the problems that LinkedIn sort of directly addresses and you're building a business to solve. [35:51] And then I imagine you encounter a bunch of other problems that are just sort of out of scope. And so I'm curious, within the world of... [35:57] problems developers who are trying to build with llms or trying to build an ai are encountering today [36:03] What are some of the interesting problems that you guys are not directly solving that [36:06] Maybe you would solve if you had another business. [36:09] Yeah, I mean, I think two of the obvious areas are like, [36:14] at the model layer and at kind of like the database layer. So like we're not building a vector database. I think it's really interesting to think about what the right storage is, but, you know, we're not doing that. [36:25] We're not building a foundation model and we're also not doing fine tuning of models. Like we want to help with the data curation bit. Absolutely. But we're not kind of like building the infrastructure for fine tuning for that. There's fireworks and other companies like that. I think those are really interesting. [36:41] Um, [36:42] I think. [36:43] those are probably at like... [36:46] the immediate... [36:47] Infrilayer in terms of what people are. [36:50] running into at this moment. [36:54] I do think there's a second... [36:56] question there or a second thought process there which is like
[37:00] if agents do become [37:02] kind of like the future... [37:04] Like, what are... [37:06] What are... [37:08] other infra problems that are going to emerge because of that. [37:11] And so like, you know, to and I think it's way too early for us to say like, [37:16] What? [37:17] of these we will or won't do because [37:20] to be quite frank, we're not at the place where agents are reliable enough to have this whole like economy of agents emerge. But I think like [37:26] you know, identity verification for agents. [37:29] permissioning for agents, payments for agents. There's a really cool startup for payment for agents. Actually, this is the opposite. It was agents could pay humans to do things, right? [37:39] And so I think there's like, I think that's really interesting to think about. Like if agents do become prevalent, like what is the toy and infra? [37:45] that is going to be needed for that. [37:47] which I think is a little bit separate than like, what's the things that are needed in the developer community for building LLM applications, because I think LLM applications, [37:55] are here. [37:56] agents are starting to get here, but not fully here. And so I think it's just different levels of maturity for these types of companies. [38:04] Harrison, you mentioned fine-tuning and the fact that you guys aren't going to go there. [38:08] It seems like the two kind of prompting and like going with architectures and fine tuning are almost substitutes for each other. [38:15] Uh, [38:16] How do you think about the... I mean... [38:19] the current states of like how people should be using prompting versus fine tuning and and how do you think that plays out? [38:25] Yeah, I don't think [38:27] that fine-tuning and cognitive architectures are substitutes
[38:31] for each other. [38:32] Um, [38:33] And the reason I don't think they are [38:35] And I actually think they're kind of complimentary in a bunch of senses. [38:39] Is that [38:40] When you have a more custom cognitive architecture, the scope of what you're asking each agent or each node or each piece of the system to do becomes different. [38:48] much more limited and that actually becomes really really interesting for fine tuning [38:53] Maybe actually on that point, can you talk a little bit about Langsmith and Langrath? Like Pat had just asked you, what problems are you not solving? [39:00] I'm curious, what problems are you solving? And as it relates to kind of all the problems with agents that we were talking about earlier, like... [39:07] The things that you were doing to, I guess, making to make managing state more effective [39:12] more manageable to make you know the agents more kind of controllable so to speak like [39:19] Ha [39:20] How do your products help people with that? [39:23] Yeah, so maybe even backing up a little bit and talking about Langchain when it first came out. [39:27] I think the... [39:28] The LinkChain Open Source Project. [39:31] really solved and tackled a few problems there. I think one of the ones is basically standardizing the interfaces for all these different components. So we have tons of integrations with [39:42] different models, different vector stores, different tools, different databases, things like that. [39:47] And so that's a big that's always been a big value prop of Langchain and why people use Langchain. [39:53] Um, [39:54] in Langchain. [39:55] There also is a bunch of higher level interfaces for easily getting started off the shelf with like RAG or SQL Q&A or things like that.
[40:04] And there's also a lower level runtime for dynamically constructing chains. And by chains, I kind of mean... [40:11] we can call them DAGs as well, like directed flows. [40:16] And I think that distinction is important because when we talk about Lange graph and why Lange graph exists, it's to solve problems. [40:23] a slightly different orchestration problem, which is you want these customizable and controllable things that have loops. [40:29] Both are still in the orchestration space, but I'd draw this distinction between a chain [40:35] and these cyclical loops. [40:37] I think with line graph and when you start having cycles, [40:40] there's a lot of other [40:42] problems that come into play, one of the main ones being this persistent layer, persistence layer so that you can resume, so that you can kind of like [40:50] has [40:52] them running in the background in kind of like an async manner. And so we're starting to think more and more around deployment of these long running, cyclical, human in the loop type applications. And so we'll start to tackle that more and more. [41:06] And then the piece that kind of spans across all of this is Lang Smith's [41:10] which we've been working on basically since the start of the company. [41:13] And that's kind of like observability. [41:16] and testing for LLM applications. [41:19] And so basically from the start, we noticed that [41:22] you're putting an LLM at the center of your system. LLMs are non-deterministic. [41:27] You got to have good observability and testing for these types of things in order to have confidence to put it in production. [41:33] Um, so we started building Lang Smith works with and without Lang chain, um,
[41:38] Um... [41:39] There's some other things in there like a prompt hub so that you can manage prompts. [41:44] a human annotation queue to allow for this human review, which I actually think is... [41:48] crucially one like i think in all this it's important to ask like [41:52] So what's actually new here? And I think like the main thing that's new here is these LLMs. And I think the main new thing about LLMs is they're non-deterministic. So observability matters a lot more. [42:02] And then also testing is a lot harder and specifically you probably want a human to review things more often. [42:08] Then you want them to review like a software test? [42:10] or something like that. [42:12] And so a lot of the tooling we're adding in Langsmith kind of helps at that. [42:17] Actually, on that, Harrison, do you have a heuristic for where... [42:20] existing observability, existing testing, [42:23] You know, existing fill in the blank will also work for LLMs versus where LLMs are sufficiently different that you need a new product or you need a new architecture, you need a new approach. [42:35] Yeah, I think I've thought about this a bunch on the... [42:40] testing side. [42:41] From the observability side, I feel like it's almost... [42:44] Like, [42:45] I feel like it's almost more obvious that there's something new that's needed here. And I think that's... Maybe that's just... [42:51] because of these multi-step [42:53] applications like [42:55] is just an [42:57] need a level of observability to get these insights. And I think a lot of the, like Datadog, I think is really aimed, Datadog is great kind of like monitoring. [43:07] but for specific traces,
[43:09] Um... [43:10] I don't think you get the same level of insights that you can easily get with something like Langsmith, for example. [43:15] And I think a lot of people [43:16] spend time looking at specific traces because they're trying to debug things that went wrong on specific traces because there's all this non-determinism that happens when you use an LLM. [43:25] And so observability [43:27] has always kind of [43:29] felt like [43:30] um [43:31] there's something new to kind of like be built there. Testing is really interesting. [43:36] Um, [43:37] And I've thought about this a bunch. I think there's two maybe like new unique things about testing. [43:42] Um... [43:44] One is basically this idea of like pairwise comparisons. [43:48] So when I run software tests, I don't generally like [43:51] compare the results of like it's either pass or fail for the most part [43:56] And if I am comparing them, maybe I'm like comparing like, [44:01] the latency spikes or something, but it's not like necessarily pairwise of two individual unit tests. [44:07] But if we look at like some of the evals for LLMs, [44:12] The main... [44:13] The main... [44:14] eval that's trusted by people is this LLM Sys kind of like arena, chatbot arena style thing, where you literally judge two things side by side. And so I think this pairwise thing, [44:23] is pretty important and pretty distinctive from kind of like traditional software testing. [44:30] I think another component is basically... [44:32] Depending on how you set up evals, [44:34] you might not have kind of like 100% pass rate [44:38] at any given point in time.
[44:40] And so it actually becomes important to track that over time and see that you're improving or at least not not regressing. [44:46] And I think that's different than software testing because you generally have [44:50] everything kind of like passing. [44:53] um, [44:54] And then... [44:54] the third bit is just a human in the loop component. [44:58] So I think [44:59] you still want humans to be looking at the results of... [45:05] like [45:07] Watts may be the wrong word because there's a lot of [45:09] downsides to it. Like it takes a lot of human time to look at these things, but like those are generally more reliable. [45:16] than having some automated system. [45:18] If you compare that to software testing, [45:21] Software can test whether two equals two just as well as I can tell that two equals two by looking at it. [45:25] And so figuring out like [45:27] how to put the humans in the loop for this testing process is also really interesting and unique and new, I think. [45:35] I have a couple of very general questions for you. [45:38] Cool. I love general questions. [45:40] Thank you. [45:42] Um, [45:43] Who do you admire most in the world of AI? [45:48] Um... [45:49] That's a good question. I mean, I think what OpenAI has done over the past year and a half is incredibly impressive. So I think... [45:58] Sam, but also [46:00] everyone there. I think across the board, [46:05] has has has [46:06] I have a lot of admiration for the way they do things. I think [46:09] Logan, when he was there, did a fantastic job at kind of like some of bringing these concepts to folks.
[46:15] Sam obviously deserves a ton of credit for a lot of the things. [46:19] that has happened there. [46:21] lesser known but like [46:23] David Dohan is a researcher that I think is absolutely incredible. He did some early model Cascades papers and I chatted with him super early on in Langchain and he's been like he's like. [46:37] he's been incredibly just... [46:40] influential in the way that I thinks about things. And so I have a lot of admiration for the way that he does things. [46:44] separately from [46:46] you know, like I'm touching all different possible answers for this, but I think like, [46:51] Zuckerberg and Facebook. Like, I think they're crushing it with Llama and a lot of the open source. [46:58] And I also think like as a CEO and as a leader, the way that [47:03] he and the company have embraced that has been incredibly impressive to watch. So I have a lot of admiration for that. [47:08] Um... [47:09] Speaking of which, is there a CEO or a leader who you try to model yourself after or who you've learned a lot about your own leadership style from? [47:19] It's a good question. I think I definitely think of myself as more of kind of like a product centric kind of like CEO. [47:28] And so I think like, [47:31] And Zuckerberg has been interesting to watch there. Brian Chesky, I saw him talk or I listened to him talk at the Sequoia Base Camp last year and really admired the way that he kind of like thought about product and thought about kind of like company building.
[47:45] Um... [47:46] And so. [47:48] I'm [47:49] Brian's usually my go-to answer for that. [47:51] But I can't say I've gotten incredibly into the depths of everything that he's done. [47:58] If you have one piece of advice for current or aspiring founders trying to build an AI, [48:04] What would your one piece of advice from them be? [48:07] just build and just try building stuff. It's... [48:13] It's so early on that, like, it's so early on. There's so much to be built. [48:18] Yeah, like, you know, GPT-5 is going to come out and it will probably make some of the things you did. [48:23] not relevant, but you're going to learn so much along the way. [48:27] And this is... [48:29] I strongly strongly believe like a transformative technology and so the more that you learn about it the better. [48:35] One quick anecdote on that, just because I got a kick out of that answer. [48:39] I remember at our first AI Ascent in early 2023, [48:43] when we were just starting to get to know you better. [48:46] I remember you were sitting there pushing code the entire day. Like people were up on stage speaking and you were listening, but you were sitting there pushing code the entire day. And so – [48:56] So when the advice is just build, you're clearly somebody who takes your own advice. [49:01] I think, well, that was the day OpenAI released like plugins or something. And so there was a lot of scrambling to be done. And I don't think I did that at this year's Sequoia Ascent. So I'm sorry to disappoint and regress in that capacity.
[49:16] Yeah. [49:17] Thank you for joining us. We really appreciate it. [49:43] you
Want to learn more?
Ask about this episode