Founder Eric Steinberger on Magic’s Counterintuitive Approach to Pursuing AGI

There’s a new archetype in Silicon Valley, the AI researcher turned founder. Instead of tinkering in a garage they write papers that earn them the right to collaborate with cutting-edge labs until they break out and start their own. This is the story of wunderkind Eric Steinberger, the founder and CEO of Magic.dev. Eric came to programming through his obsession with AI and caught the attention of DeepMind researchers as a high school student. In 2022 he realized that AGI was closer than he had previously thought and started Magic to automate the software engineering necessary to get there. Among his counterintuitive ideas are the need to train proprietary large models, that value will not accrue in the application layer and that the best agents will manage themselves. Eric also talks about Magic’s recent 100M token context window model and the HashHop eval they’re open sourcing. Hosted by: Sonya Huang, Sequoia Capital Mentioned in this episode: David Silver : DeepMind researcher that led the AlphaGo team Johannes Heinrich : a PhD student of Silver’s and DeepMind researcher who mentored Eric as a highschooler Reinforcement Learning from Self-Play in Imperfect-Information Games : Johannes’s dissertation that inspired Eric Noam Brown : DeepMind, Meta and now OpenAI reinforcement learning researcher who eventually collaborated with Eric and brought him to FAIR ClimateScience : NGO that Eric co-founded in 2019 while a university student Noam Shazeer : One of the original Transformers researchers at Google and founder of Charater.ai

Published: Published Sep 10, 2024
Uploaded: Uploaded Jun 11, 2026
File type: POD
Queried: 00

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:31

[00:00] The thing that remains to be solved is general domain long horizon reliability, and I think you need inference time compute, test time compute for that. [00:09] When you try to prove [00:11] A new theorem. [00:13] in math or when you're writing a large software program or when you're writing an essay of reasonable complexity, you usually wouldn't write a token by token. [00:23] You'd want to think quite hard about some of those tokens. And finding ways to spend not... [00:30] 1x or 2x or 10x, but a million x the resources on that token in a productive way, I think it's really important. That is probably the last big problem. [00:43] Bye. [00:58] Hi and welcome to Training Data. [01:02] I'm delighted to share today's episode with Eric Steinberger, founder and CEO of Magic. [01:07] Eric has an epic backstory as a researcher, having caught the attention of Noam Brown, [01:12] and becoming one of his research collaborators while still a student in high school. [01:16] Eric is known for his exquisite research taste, as well as his big ambition to build an AI software engineer. [01:22] We're excited to ask Eric about what it takes to build a full-stack company in AI. [01:26] His ambitions for magic... [01:28] and what separates a good AI researcher from a legendary one.

1:33-3:25

[01:33] Eric, welcome to the show. Thank you so much for joining us. [01:37] Thank you for having me, Sonia. [01:38] Okay, so let's start with who's Eric. You're a Vienna-born Wunderkind whose early passion for math turned into a [01:45] I think what you described as a full-fledged obsession with AI by age 14. Take us back to age 14, Eric. [01:51] What were you up to? How did you become so obsessed with AI? [01:54] Thank you, Sonia. I think I just had my midlife crisis when I was 14. And I [02:01] I was just looking for something meaningful to do. Spent about a year. [02:06] looking at physics, math, bio, medicine, just anything really that seemed valuable to the world. And, um, [02:12] at some point bumped into just simply the idea of AI. It hadn't sort of occurred to me until then. And if you could just build... [02:20] a system, a computer system that could do all this other stuff for me, like, great, like, I don't have to decide. So it felt like my decision paralysis was sort of resolved then. [02:30] It was this weird moment where I could just see the next 30 years of my life unfold in front of me. And I was like, okay, this is clearly what's going to happen. Like, I have to do this. Yeah. [02:37] It was quite nice. I like predictability, so it was great to know what the world will look like. [02:45] And you started loving math. Like, why AI then? [02:48] I think I'm naturally attracted to math. It's just what my brain sort of gravitates to. [02:54] AI just seems useful. [02:56] The thing that's most important to me is just what is useful for humanity and the world. And math is nice, but not useful at some point. Like, you know, 17-dimensional spheres are probably not going to be the best career choice if you want to be useful. So it seemed like something that I could get good at, but also just the most important thing ever. And so it was a very clear choice. It was clear 10 years ago. It's just it wasn't close.

3:26-5:10

[03:26] and clear. [03:28] Can you tell the story of how you got to FAIR? I think it is such an epic story. Sure. I mean, so when I started at 14, I didn't really know how to program. I didn't get into programming out of curiosity about computers. I just wanted to solve AI, basically. So after a couple of years of just warming up on my own, I reached out to one of David Silver's PhD students, who is the AlphaGo [03:56] DeepMind co-founder, [03:57] And this PhD student, I guess at that point he was a graduate and sort of worked that deep mind. I asked him if he could like spend a year with me just every two weeks bashing my work. [04:08] trying to do some sort of like super speed up, mini kind of PhD like experience where I could just learn how to do research. And I sent him this like giant email, you could print it out, I don't know how many pages it would be, but it'd be a lot of pages. [04:23] where I was basically just saying, like, I want to build this algorithm you made in your PhD. I want to beat this algorithm you made in your PhD. Here's a list of 10 ideas. [04:30] I don't know if they're going to work, and I think I need your help to figure that out. [04:34] And then over a year, we eventually got there. And he was like, his name is Johannes. Johannes was kind enough to just bash me every two weeks, roughly. And yeah, it was it was brutal, dude. Like, because I was like, he hold me to the standard, you know, and I was like, like, don't be nice just because I'm in high school. [04:54] Yeah, I was in high school. And then when we were done, I just graduated high school when when I finished the project that I was trying to get to with with with this. And then Noam Brown, who is obviously one of the best RL researchers in the world.

5:10-6:44

[05:10] reached out, because he had worked on something similar, it turns out. [05:14] uh, had like some ideas that were very similar and some ideas that were a little different. And so, so we just both published this and, um, [05:21] He reached out and then I got to work with Noam Brown for two years, which was great. And then that continued. And so I got back for another year. You were a high schooler. He was Noam Brown. [05:29] Well, I mean, he published a paper called Deep Counterfactual Regret Minimization, and I published a paper called Single Deep Counterfactual Regret Minimization, and mine beat his by a little bit. So you won up Noah Brown as a high schooler? [05:40] I think I just graduated. And I also, it took him like three months to write this paper and it took me a couple of years. But it, yeah, I mean, slightly. I'm sure he like would have come up with this the next day. But like the sort of the gap between the two things. But yeah. [05:57] It was just... Yeah, obsession is the right word. I do things like 100%. And... [06:04] Yeah, so that was a lot of fun. [06:08] Kept working in our hall with Noam Brown for a while and uh, yeah, so that's that's how I got to fair no brown worked at fair at the time and uh [06:16] He reached out. I was actually at university then. [06:19] And, um... [06:21] basically just worked part-time as a researcher at Fairfile studying. Anyway, so that was it. That's awesome. It was a lot of fun. Noam is great. Like the brainstorm ping-pong sessions with Noam Brown, dude. [06:35] there's like nothing like this. Or you're just like, there's like this problem and it's sort of like, you know, maybe you would like start a six month research for like, no, Noam and I would get on a call and it was like,

6:44-8:30

[06:44] We just discuss it and it's done. I love that. I love that. What makes him so great as a researcher? [06:50] I think it's a number of things. As a researcher generally, more from a meta level, he is fantastic at picking the right problems and then spending a long time just grinding to make it better and better and better and better. He's very good at the whole compounding thing. [07:06] in research. Also making bets that aren't obviously the right bets when he makes them, he makes them earlier, I suppose, and making them slightly differently. So he's generally very good at picking problems and then attacking them consistently. [07:20] He's also just very smart. I guess that helps. He works really hard. He used to do 100-hour weeks during his PhD. I don't know if he still does them, but he used to work really, really hard during his PhD. [07:32] I imagine he still is. Okay, so Noam arranged for you to become a researcher at Farah while you were still a university student. [07:40] Yeah, that was fun. I was in my first semester, I think, or something. So you were juggling that. You were juggling being a collaborator to know him at the fair. [07:48] And then you became obsessed with yet another problem, climate change, and actually started an NGO that is incredibly popular, Climate Science. [07:56] So you just didn't have enough on your plate. That was actually too crazy. That's when I dropped out. I was like, this is crazy. This is too much. I can do two things. I cannot do three was like my conclusion after after doing like I did three months of that. [08:09] That was terrible. That was fucking awful, doing all three things. Because you just can't do well at three things. I mean, Elon can, but maybe I'll learn it in 10 years. But I couldn't at the time. So I dropped out at the time. But yeah, I started an NGO. I generally just think Cherry stuff is awesome and hugely under...

8:30-10:06

[08:30] of [08:31] appreciate it. It's sort of like super high status to start a startup, but [08:36] But I think it should be equally cool to start a charity. You're helping the world in other ways. And so, yeah, I mean, we started it as a – it's a nonprofit, but we started it like a startup. People were working insanely hard. [08:48] We had clear objectives. It was a software product effectively. [08:52] It was much more similar to a startup with the exception that there was no money in, no money out. [08:57] um i could play it's just very weird but um the yeah it's mostly volunteer driven or i guess is i just no longer run it uh but uh yeah that that was it was an interesting experience you'd think i would like learn transfer a lot from running a quote-unquote company to running a quote-unquote company now but they were so different that i could like there was like no transfer at all between climate science and magic like a thousand volunteers 20 hardcore engineers no money at all [09:27] but you know giant it's like completely different in every imaginable way but yeah it was a lot of fun [09:35] So Eric, Climate Science became an incredibly successful nonprofit. It wasn't just any nonprofit. What made you decide to kind of hand over the reins and hand over the torch on that and go start a company in AI? [09:49] I just thought AGI was further away when we started it at all. I would never have started anything else if I thought AGI was so close. And once I realized it is, there was just like no other. I mean, my initial thing was always AI. That's what I did as a kid.

10:06-11:37

[10:06] I care about various issues in the world, but... [10:11] None of them are my unique calling in any way. [10:15] you know, I hopefully be in a position to donate a bunch of money and whatever, but, uh, [10:19] The thing I care about fundamentally is AGI and it was like, "Oh, damn it. This is not 20 years away." So I have been running around with this AGI to-do list, which is somewhat of a meme internally, because then sort of like we just get going through it and we're trying to fix all these problems. I think I've seen it. You have seen it. Yes, we showed it to you. [10:38] I've been running around with a version of this. It's actually like in 2017 or so, I was still in high school. [10:44] I don't know why, but some conference invited me to present my AGI to the list. [10:49] It was wrong at the time. I was also sure it was wrong, but... [10:52] Um, [10:53] At some point, there was one thing I just couldn't at all figure out. And I don't like blue sky research in the sense of just staring at a wall and trying to figure out what the right question is. [11:03] I really like to have the question and then look for the right answer. [11:06] when starting an intense project. [11:09] Because you need to know which direction you run in to really plan for it. And many things seemed clear, but it seemed completely unclear how to make these models reason in the general domain. [11:19] And that became... [11:23] more clear with language models, especially code models. [11:28] When I saw some of the early results in this space, I was like, okay, I know all this stuff from the RL world. [11:35] I have a bunch of other thoughts.

11:38-13:15

[11:38] This seems great. Like, we should just take LMs and make them do the RL stuff. It's a very simple kind of... [11:46] uh... [11:48] but I think that's sort of where... [11:50] I mean it makes a lot of sense. Sarl has been doing this for 10 years. It works in [11:55] in restricted domains, if you can make something work in 20 restricted domains, [11:59] and you have something else that works in a general domain, if you can combine them, maybe you get both the X and the Y axis and then [12:05] You have your beautiful top right corner of the matrix and [12:10] Yeah, so it seemed like it seemed pursuable. [12:12] And if something... [12:13] When something as important as AGI becomes an actually executable to-do list, obviously there are still things to figure out, like details of the algorithms, how do you make it efficient, etc., etc. It's not like we do everything at all. [12:25] Many, many things to figure out, but the direction was clear. [12:29] Yeah, so it seems like the right moment. [12:32] Okay, we're going to circle back to your AGI to-do list later because I'm curious about it. Sure, yeah. I want to brag about you for a minute because… It might be wrong still. I don't know. Until we have AGI, it is a hypothetical AGI to-do list, but we're trying. I think the research field is tracking pretty closely to your to-do list. [12:49] I want to brag about you for a minute. I think you've been incredibly humble about your background, but... [12:55] As a high school student, you did catch Noam Brown's eye, and as a [13:00] you know, as one of his colleagues at FAIR, you became one of his... [13:03] top collaborators, not even just one of many, because there's such talented people that work there, but you're one of his top collaborators. And, you know, I speak to folks that know you, they just say extraordinary things about your capabilities as a researcher,

13:15-14:50

[13:15] your creativity, your work ethic. As far as I can tell, you work nonstop. I think you texted me at 2 a.m. in preparation for this podcast. So I think it's safe to say that... I hope I didn't wake you up. No, no. Thank you, silent mouth. Anyways, I think it's safe to say that you are one of the brightest minds of the current research generation already and will certainly be one of the legends that people talk about for the next decade. And so with that in mind, I'd love to... [13:39] ask you some questions of advice for aspiring researchers. And so maybe first off, [13:44] You did it all from a very untraditional background. [13:48] How did you do it? And like, do you think that like what advice would you give to others in your shoes? [13:53] I can only really speak for the sort of profile of goals and person I am. [13:58] I think I was lucky in the sense that I knew very, very early with 14, as we said, exactly what I wanted to do with my life. I had no doubt at all. [14:06] Um... [14:08] And then uncertainty can be paralyzing to a lot of people. I also had a very clear sense that I did not at all have a plan B. [14:16] Like there was no other path in life that I would have been even like remotely above the neutral line on. It had to be build HCI. Everything else is completely irrelevant. So, you know, I understand. For many people, you know, a well-paying job at Google is a great company. [14:32] great achievement. I mean, I would, if it's on AGI, it's fine. But you get what I mean. Like, I just knew that there was nothing else I could do and like be fulfilled in life. I look back when I'm [redacted address], like burning the boats very, very early gives you the opportunity to, uh,

14:50-16:25

[14:50] Just be... [14:51] You'd be like to do things that you'd otherwise do 10 years later, which again, even like I sucked at the beginning. It took me two months to understand the first paper I tried to understand. [15:00] I was terrible at programming for a long time, but when you're a teenager... [15:06] You're like a decent researcher. You don't have to be great. That gets you things like a great mentor who then bashes you for a year. [15:16] which was very, very helpful. And then you get better and you're still young, so your brain shapes more easily maybe. I don't know. So I feel like I benefited a lot from being early. [15:26] But within that, [15:28] I'd say just like go for the end goal immediately doing anything sort of like, oh, I'm going to do a PhD because I need a PhD to get into that's all bullshit. [15:35] Like, you don't. It's just completely bullshit. The other thing is, like, writing five-page emails to people actually works. Writing, like... [15:46] I get a lot of these two paragraph meh [15:49] things now and grateful I get emails but I understand now why people think this stuff doesn't work. It certainly does when you're like, here is how I'm going to beat your algorithm, please help me. Five pages, at least in my experience, every single time anyone I want help from in this way was very helpful. So I suppose [16:08] be proactive in seeking like the best people in the world to, in a time efficient manner, just distill their brain into yours and show them that you can make use of that. If you, if you, [16:19] If you tell someone who's very good, effectively, hey, I'm going to make good use of this. If you want to coach someone, I would love to be that person.

16:25-18:03

[16:25] They'll usually do it. They won't do it for 10 people, but if they do it for one or two, that's enough. You just have to [16:30] to win that seat, I guess. So that's been really helpful in my experience. Also, just not shying away from learning new things. Like, again, I didn't get into programming because I'm curious about computers. I'm not very curious about computers. I just like AI and that computers are the thing that are necessary. So it's fun. I enjoy programming now. It's great. But I wouldn't have gotten into it, I think, if it wasn't for AI. But still, like, you get into it. So like, don't be shy. We interview a lot of people who don't know, you know, how you'd implement [17:00] an LLM and it's kind of crazy to me. [17:02] if you're a researcher and you couldn't implement charting or whatever. It's just insane. [17:07] So really understanding the whole stack, going down to, but sort of not bottom up, really top down. Like, here's the thing I care about. This is the problem I want to solve. Okay, like, what do I need? What do I need? What do I need? And then like all the way down. [17:20] Um, and, um, and they're like much more competent people at kernel programming and hardware design or whatever than I could ever, ever dream up to be. But I understand enough of it to do better work at the top of the stack than I could if I didn't. Um, [17:34] So, um... [17:36] I think fundamentally you need to understand the domain you work in. It's also really good to just read everything. [17:41] I used to read... [17:44] I don't know, I don't have a precise number, but just every paper I could, every paper I would see, basically. And eventually you get so fast at it that you can, like, that's feasible. And you, like, build a database in your head of, like, oh, this is similar to this thing. This was sort of my eye-opening moment where Bill Gates has this interview, like, oh, yeah, if you learn enough things, they're all, like, similar to each other. So it's not linear. It gets easier.

18:03-19:38

[18:03] And at that point, I was like, I should read every paper. And so thanks for the advice, Bill. Obviously, this was through a video. I never met him. [18:11] So I just started reading every paper. And that's really, really helpful because a lot of the best ideas that we had that work really well now at Magic... [18:19] were enabled by random things that are like, oh, it would never work without this random thing that I would have to have come up with in tandem. But because I have this database in my head, I can go like, oh, yeah, like this. And then so often like one good idea is enabled by three other ideas that others have come up with. [18:35] And so it's always just like this composition of stuff. So having a large database is really helpful. Yeah, and then just never stop. Like never, never stop. It takes like ages to do good stuff, to do good work. And at any point, there was actually one moment with Johannes, the deep mind research scientist who mentored me for a year in high school. [18:55] Um, [18:55] where we had a version of the algorithm that wasn't very good. It was all right. And we're thinking, "I guess if we publish this," we were both not really happy about it. And he was close to giving up on me. [19:08] It was like, "Oh, you know, maybe this is just not gonna work. I wouldn't wanna publish this." And so I was like, "Dude, fuck you. I'm just gonna get this done." And then we got it done like a month or two later. [19:18] And so I think like I remember going on a walk after this and just being like, [19:25] Can I do this? I don't know if I can do this. But there is no other option. So I just better get it done. And then I went back home and I started programming again. It was still sad that day, but the next day was fine again. It just kept going.

19:38-21:14

[19:38] So I think you have to, I think that's a pretty formative experience because I actually wasn't sure if I could do it. [19:43] And then we just did it super soon after. So I really haven't felt that-- [19:49] insane level of like doubt and pressure since then, which has sort of enabled, I think it's actually beneficial. You have to be realistic, but you don't want to, if you stop, you, yeah. I mean, so anyway, so I think those would be the main, [20:02] things. Also be like really fucking honest about what you suck at to yourself because otherwise you're never going to get good at it. Like you need to search for the bad things. [20:13] And instead of like trying, actually, I think like as a researcher, betting on your strengths is good only to the extent that you don't have necessary conditions that are completely missing. Like you can't bet on your strengths if they're not enabled. This is again, like back to the engineering thing, for example. [20:31] Totally. [20:31] So, yeah, I don't know. I'm rambling, but stuff like that. That's great. That is such a fascinating glimpse into the inner mind of what it takes to be a great researcher and behind all the glamour of... [20:41] of training large models. And so thank you for providing that peak. And I'm really glad that you mentioned kind of [20:47] reading every paper voraciously and having this database in your head. Because one thing I've heard from your collaborators is that your superpower is understanding and absorbing new research. And so I'm curious, do you agree? Like, do you think that is your superpower as a researcher? Or what – [21:02] kind of [21:03] traits do you think have made you such an exceptional researcher? [21:06] So I think initially in the RL work I did, it was synthesis where I would read every paper and I would go like this thing plus this thing plus that thing with this modification. Um,

21:14-22:47

[21:14] I think that's what they would mean. [21:16] That, yes, was definitely very helpful. I think it's a good way to do research. Generally, there's enough work for synthesis to be a successful strategy. [21:24] I guess to an extent it's still that. I tried very hard after it. This is actually, in reading, just like we bring this up. I realized this, and I tried very hard to get better at leaps, [21:32] Um, like coming up with totally alien crap that just, there's no reference for it at all. Uh, and, um, yeah. [21:39] Because ultimately, like, so if you take, like, the Transformer, for example, right? Like, attention existed. [21:45] Um, [21:45] the idea of stacking a bunch of LSTM blocks existed, and you just have to remove the idea of recurrence, really, like, and, like, a bunch of-- a couple other things that weren't necessary, right? Residual streams, like the residual update and transformers existed from ResNet. So it's, like, it's synthesis. [22:02] But there is an amount of leap in there to make it all work. Like, it's a little more complex than just taking components and putting them together. You need to come up with new things, too. [22:12] Like, you know, the normalization and the head, square, square, the head resection, correct? But anyway, everyone now knows this. But roughly, like you should do a summer session. So there are some new ideas in there that like really help make it work. [22:26] but it's still a large amount of synthesis. So I suppose most good ideas are synthesis, but there are always some... In the best ideas, there are some leaps, and I'm trying to get better at those, but still it's mostly, I guess, like... [22:41] by things and... [22:42] throw away the stuff that doesn't work in the app, make the things work.

22:47-24:30

[22:47] I think some stuff needs sleep. But yeah, I guess, no, that's the recipe. Take LLMs, make them super efficient, block contacts, giant. [22:56] throw RL on it, make it all work together. It's still mostly synthesis, I guess. You're right. Who do you admire most in the research world and, like, [23:02] What do you think those folks' superpowers are? [23:05] Shazir. No, Shazir. Me too. Yes. [23:10] What is his superpower? Shazir. [23:12] Uh... [23:14] I guess to an extent, synthesis. He is... [23:20] I mean, he's just the best at synthesis. He is also great at everything in the stack. He can... [23:27] He has no weakness, really. He can implement the whole thing. [23:31] on his own if he had to run it. He sees the future, I think, in a way, like, it's very unconstrained. And that, you know, I think everyone's sort of crediting, you know, a number of the labs for scaling laws. This guy made a presentation where he was zipping through essays or completions or whatever written by, like, models of various scale. I was like, this is a 100 million parameter model. This is a 300 million parameter model. This is a billion parameter model. This is a 5 billion [24:01] bigger. He's sort of presenting it this hilarious way. And everyone else is like super scientific about it. [24:08] I think GNOME is generally just, if I have to put it, he's very, very intuitive. [24:12] I think [24:14] There are like a lot of labs and researchers are sort of, and I think this is not a bad thing. It's very good. Are very evils driven, very mechanical, right? Like sort of very empirical in a way. Like no one sort of just knows. He's like, ah, this would work. And then it works. Yeah.

24:31-26:08

[24:31] So I think that's a superpower. That is just extremely great synthesis. Yes, the larger, he has a larger database because he's been around for so long. He just, he literally knows everything. I mean, he invented half of the stuff that everyone's doing now. [24:44] I the the either this there's no other universe. I'd say [24:51] There are a number of other people, I guess... [24:54] Just that you shouldn't fail. Out of all the people who are sort of the OGs of deep learning, I think you could have hinted there's by far the most credit just because he, like, [25:04] Went through all the bashing. [25:05] when when it was like, oh, this will never work. And they're like trading like tiny, tiny, tiny, tiny, tiny things. They're like, this will never work. And he's almost stuck with it. I think that's that level of grit and and belief in something that is now obviously working. [25:18] Um, [25:19] deserves a huge amount of credit. [25:22] Whether capsule lets work or not. [25:24] whatever. It's incredible to come to something like the conclusions that the world is at now. And if you look at some of the older papers, a lot of the ideas that are important now were in there already. [25:37] So that's important. [25:39] um and i think he just deserves a ton of credit um [25:43] No one Brown had [25:45] the army of nobs no one brown uh [25:48] I should name my kid Noam as well. It's a very good strategy, yeah. [25:54] It's a great strategy actually. I think 100% of Noems that are like somewhat [25:58] popular and well-known in the research community are great. Yeah, no, he's also amazing. I mean, a number of labs were working on what he was working on during his PhD, and he basically soloed the thing.

26:08-27:39

[26:08] and was like way better and way faster than labs that put 10 people, including some really famous names on it. And if you just look at the paper track record, like, [26:18] like here's the rest of the field and then no 100x's efficiency and then here's the rest of the field and no one does it again and [26:28] It's just consistent. [26:29] I think the consistency with which he has just bashed out these 100x multipliers in RL data efficiency and computer efficiency is crazy. [26:40] Yeah. So, yeah, the Noam army is pretty good. I want to go back to this concept of, you know, leaps are still needed in research and that you still have this AGI to-do list. [26:50] What do you think are the most interesting unsolved problems in AI right now? Well, so a lot of... [26:56] It is solved now, I think. And the thing that remains to be solved is general... [27:02] domain, long horizon, reliability, and I think you need inference time compute, test time compute for that. [27:08] So you'd want, um, [27:11] When you try to prove [27:14] a new theorem. [27:15] or when you're writing a large software program or when you're writing an essay of reasonable complexity, you usually wouldn't write a token by token. [27:25] you'd want to think quite hard about some of those tokens. And... [27:30] Finding ways to spend not spending [27:34] or 2x or 10x, but a millionx [27:37] the resources on that token

27:39-29:25

[27:39] in a productive way. [27:41] I think it's really important. [27:43] That is probably the last big problem. Fascinating. The last one. Okay. I hope so. I think it's reasonable to think that is the last big unsolved problem. I mean, look, over the last few years, all of this other stuff got solved. Like, oh, can we do multimodal things? Can we do long contacts? Can we do it? All this is gone. [27:58] reasonably smart models, they're quite efficient now in terms of cost that [28:04] I mean, you'd have to be a reality denier to not see what's coming. I mean, this is just a... This is like a realization to a lot of people in the online space. But RL has been doing this for ages. So... [28:18] It's just so clear that you need to do that. Maybe you don't need to. Maybe you can get away without doing it, which would be insane. But if you don't need to, it will still help you a lot. It's just like, do I want to spend a billion dollars on my pre-training run and then a little bit more money on inference? Or do I need to spend $10 billion on my pre-training run? I'd rather, you know, $10 billion would be great, but I'm going to prefer spending one. [28:46] research problem? Like, there's still, like, fundamental, like, unsolved science problems? Or is that, like, a... [28:52] You know, we have the recipe, we just need to do it and have the confusion, the data. I think there is no public successful recipe right now. [28:59] There are good ideas. Like, okay, even if you take best of end, [29:03] make N large enough. It's not terrible. The [29:08] So there are ideas. I don't know that the final idea exists. I think there's just a lot of room up from what is currently known, but there are ideas. See, I think it's very unlikely that even if you stop progress in research, we would not at some point hit something that everyone would agree is AGI.

29:26-31:03

[29:26] It's just that I think we can do better. And maybe it couldn't solve Riemann, right? Maybe it couldn't do all these like super hard things. [29:32] But it'd be pretty good. [29:33] And now I'm just curious, okay, what's the actual, if we did all the things, [29:39] How good will it get? So I think there is research left to be done. And there are a lot of ideas floating in the world now. Everyone's sort of, [29:49] working on this, but... [29:52] I don't know that the current set of ideas is even final. It'll keep moving, I think. [29:58] Let's transition to talking about magic. Maybe just what is magic? You've been very mysterious today. So maybe just share a little bit about what you're building. [30:07] Yeah, I mean, we're trying to automate software engineering. [30:10] The... [30:11] It took us a while to figure out how to train super giant models. [30:16] That's a pretty interesting engineering challenge. I mean, fundamentally, we're trying to automate software engineering from the product side. And a subset of that is a model that can build AGI, because if it's like... [30:27] If it's a great software engineer, then it should be able to do everyone's job at Magic. Like if we can do everyone else's job, that would be a subset. [30:34] So the idea is that you could use this to recursively improve alignment as well as the models themselves in a way that isn't bottlenecked by... [30:45] by human resources. And there aren't that many noam shazir's in the world. [30:50] If I had a Noam Shazir on my computer, I could spin up a million of them and maybe alignment would just be solved. [30:56] I'm saying, it's like simplifying a ton and very idealistic in the statement. I'm happy to turn this whole thing into a...

31:03-32:35

[31:03] Scalable oversight podcast, if you'd like. [31:06] The, um... [31:08] The core idea is like, okay, like if I could just clone what we are doing into a computer and then press yes on the money button to run a cluster to do the work we would be doing next week, that would be phenomenal. So I think we're pursuing these two things in... [31:27] tandem where we want to ship something that's a good AI software engineer for people to use. It's like, I think one of the going to be one of the first domains to see higher levels of automation and [31:36] I don't like talking around. I don't think the whole assistant pitch is going to last very long once these models are good enough to automate. [31:41] There's just no way the economy is not going to do that. [31:43] And I think everyone knows this and they're just like, they just don't like talking about it. It's totally fine. The world, we used to all be farmers. We're not farmers. We're fine. Everyone prefers this. [31:52] We'll figure our way out. [31:54] and the economy, if it produces the same or more stuff with less inputs, like we should be able to figure that out. That's not a hard problem. And like, [32:01] from my economic principles. He just has to figure out distribution. Anyway, um, [32:05] But that's what we're trying to do. We're trying to automate software engineering and as a part of that, automate ourselves in doing the work we want to do. And so the reason they go after software engineering then is that is the kind of lever that allows you to automate everything else. It's like the MVP of AGI, right? Like the minimum viable AGI. Yeah. Because then it creates everything else. Like, yeah, we wouldn't train something like Sora. Sora is great. You know, fantastic. Generate videos. Awesome. [32:31] It's just not interesting from an AGI perspective.

32:35-34:24

[32:35] perspective if you believe that models can quote themselves in. [32:38] Totally. And so out of all the companies that are trying to build an AI software engineer, you were [32:43] Probably the only one. [32:44] that is really taking a vertically integrated approach and training your own models. And that is either... [32:51] insanely brave or insanely crazy and probably a combination of both. I'm curious, like, why – I know you love training models, and so I know that's part of it. But, like, why do you think you need to own the model to get this right? And, like, how do you motivate yourself in kind of the David versus Goliath of, like, knowing that open AI exists and has great people and cares about coding and – [33:13] He's great at building models, obviously. How do you think about that entire dynamic? [33:18] I think you need, well, to build the best model, you need to build the model. And we want to solve these fundamental problems. [33:25] You can't rely on an API. Like if the API guy sold it, then what the hell are you doing? We might as well start the company three years later. It goes to the point where we started, right? We started working on this stuff two years ago. [33:36] So we have. [33:38] It took us some time to learn how to train these large models. I think it took OpenAI two years to get from GPT-3 to GPT-4. [33:45] as well and [33:47] I thought we could be much faster and this is going to be great. It's a pain. So it's definitely an engineering challenge, but it's necessary. It's not like we're doing it just because it's fun or because I like training models. [34:01] It's a massive financial investment that... [34:04] people trust us with. And it's not like it's one of those one-to-one ROI investment. It's like, if it works, it's fantastic. And if it doesn't work, the GPU is ran and the money is gone. So you're getting a lot of people's trust doing that. It's certainly not something you should do just because it's fun and you enjoy it.

34:24-35:53

[34:24] um [34:25] fundamentally, I think the value will accrue at both at the AGI and at the hardware level, and never at the application level. There's no incentive at all. [34:34] to offer an API, if the API creates a $100 billion company, you will just build that company eternally. And if OpenAI doesn't, someone else will. [34:44] It's just incredibly unimaginable to me. [34:48] that that would be how you would build these companies in the first place. So from a business perspective, I don't think that's... [34:53] necessarily the right way. Maybe there's some partnership potentials. You could like, oh, we'll get like special access or whatever. And then we're like, I don't know. But why is it different from like cloud computing, right? Like there's been maybe $10 billion, $100 billion. I mean, it's much, much, much harder to build Netflix and Airbnb and Uber than it is to build a chat interface. [35:12] Like... [35:13] So fundamentally, Magic is an application you press download on that we have a couple of guys working on and it's just there. Like it's not... [35:20] you know, [35:21] You can build this with YC pre-seed money. [35:25] I guess I could just make the API twice as expensive for the next model and then launch my own product and then undercut everything. It's really fucked to not own the model. [35:32] in this domain. And in any domain that's going to generate a ton of revenue for a single company. [35:39] In the case where it's distributed, maybe it's fine, but I don't think this will be. So it's necessary both for the market, which is good for us because the market is incentivized to fund – [35:47] folks like us, um, uh, which, which it isn't in other domains, like have fun writing like an email assistant.

35:55-37:28

[35:55] You're not going to get that funded anymore. Uh, so, so, so that's, that's helpful. But, um, [36:00] fundamentally the reason we train our own models is because it's necessary for our mission [36:03] And I just wouldn't be interested in building, like, a nice little SaaS wrapper. It's just not, like... [36:09] That's going to happen anyway. And I think, though, about competing against the 800-pound gorillas. You've raised a lot of money, but some people have raised... [36:18] Boatloads of money. Yeah, they do. They still have more money. Oh, and so as well, some people have 100 million plus in revenue a year that they can spend [36:26] It goes beyond even the ones who could race, yeah. [36:28] Absolutely. And so how do you motivate yourself to compete in that? [36:33] you know, reality. [36:34] The question is how much does it cost to build AGI, and not how much money can you raise. Because if you can build AGI for however much you can raise, and you're... [36:44] Having more might help you, but it won't get you there substantially sooner. [36:47] If you have all the right ideas and you can build it with a certain amount of hardware, by definition, okay, if someone had 100 times more hardware, would it be computing that much faster or whatever? [36:58] It doesn't seem like a material advantage if your estimate for how much compute you need to build AGI is not... [37:04] as high as the revenue these companies could generate, or the funding they raise is in fact much lower. So I think that is the case. So [37:12] It's not by any means accessible. It's very damn hard to get that much money. But it's not 100 billion. It's it's it. And if I'm wrong, I'm wrong. And it'll be 100 billion. And we will not have 100 billion. And that's it. But [37:24] If we can get to that point where we have AGI and a couple others have AGI and then like

37:28-39:01

[37:28] The benefit of additional computers there And you show an ROI It's like a reasonably even playing field In terms of... [37:37] Additional revenue, you're going to bring AGI to the market, you're going to raise more on it. So the starting conditions of like half this hardware is like you need sufficient hardware, but you don't need more than sufficient. [37:49] So that's a bet. That's not a... [37:51] You don't know, but... [37:53] I think it's a bet with a-- [37:55] high enough probability of being right that it is reasonable to... [38:00] compete in this space. And I think it is actually it is reasonable to [38:03] think that... [38:06] like the ROI of having quote unquote sufficient funding might be better than the ROI of having like infinite funding early on. [38:14] And is there like an ideal... For investors, that is, not for me, for investors. Is there an ideal like team size for researchers? Is there a certain point at which you reach kind of like diminishing marginal returns of adding on the extra researcher? [38:27] So one of my biggest weaknesses, especially early on at Magic, was just scaling the team effectively. Like we were very single threaded. [38:35] on a very small number of people doing basically all the work in [38:40] I think we're getting better at that now. It's also you just need a certain level of maturity of your code base and of your research ideas and everything to properly segment them. [38:50] Um, [38:52] So early on, I would have said five for that time. Now I would say closer to 20. And I'm not including folks working on other stuff. I'm including folks working on the models and everything.

39:01-40:33

[39:01] I'd say closer to 20. I could imagine that in a few months, I'll say at a [39:06] slightly larger number, especially when you get into large scale deployment, you really want to have a very, very good, [39:12] processes around just having high reliability, availability of services that are detached from each other, et cetera, et cetera. So then you can segment even more, which obviously stuff we're working on now. [39:24] Um, [39:25] but, um, [39:28] It sort of grows over time. I don't see it ever exceeding like the tens of people. And right now it's in the low tens, very low tens. [39:37] But I don't know. Maybe it's a skill to be able to utilize. If you're able to utilize 200 people, you're just a better CEO than I am. [39:45] No, seriously. If you can... [39:47] It's a good skill. And I think part of why I say a smaller number for us is that there is a ton of stuff we just don't do. Like if we built a video model, that would just be a separate team. They built a video model and that's more scaling. So to an extent, we're more focused and that's why we're smaller. [40:04] But also, if we could double the team and be twice as fast, I would do it any day. [40:09] Back in, or was it late 2022 when I first met you? At the time, like it was marketing assistants and email assistants were all the rage. [40:18] You were the first pitch that I heard that was... [40:20] AI that feels like a colleague. And I just remember that really sticking in my brain. So in some sense, you've been thinking about, it's kind of like, [40:28] agents, to use a buzzword, longer than anyone else.

40:33-42:04

[40:33] Maybe share your vision for that and what you think it takes to build a great agent. [40:37] So fundamentally, there are two tiers here, I guess three. One is useless. The next is the system that you have to micromanage. And then the next is the thing that manages you, basically, where it's sort of more like a colleague. I think the layer where it's exactly even doesn't really exist because it's sort of this little thin point. [40:55] Once the model is more competent than you are, [40:58] you are there to give it [41:00] guidance on what you want to be accomplished and answer clarification questions. [41:07] But you'll never have to tell it. [41:09] Like... [41:10] Here's a bug. [41:11] I'm not saying that this is v1 of everything. I'm not saying this is v1 of our product. [41:16] But... [41:16] fundamentally that has to be the goal. [41:19] Bye. [41:20] The way I feel when I talk to my best engineer, [41:24] That's how I want to feel when I talk to Magic, where we have a discussion. [41:30] He's almost always right. [41:31] And then he just writes the code. [41:33] and then someone else reviews it, and then it works. That experience, where my job is exclusively saying, "Here's kind of what I want." [41:42] And then they help clarify even right like I just want to hear specifically that that you it should feel like that. [41:50] And everything else doesn't matter to the user. [41:53] Like what tools the agent uses, how it works, does it run locally in the cloud, does it need a VM, does it have a browser, I don't care. [42:00] It doesn't fucking matter. Our problem, not your problem. You care about your problems getting solved.

42:04-43:39

[42:04] So fundamentally, that's what I think matters to [42:09] customers and everything else is dependent on the exact product shape, exact domain, except everything. And like I'm stubborn as fuck. I just don't want to launch anything that isn't that. We will probably have to. But the I just really want to get. [42:24] that thing like I want to talk to my computer [42:27] go and have lunch and come back and it built AGI. Like that's the, that's, that's the end goal. Right. And, uh, [42:35] There'll be checkpoints, but I don't think anything else matters. How you accomplish that is up to each individual company. [42:44] Yeah. How far away do you think we are from that? Or I guess maybe break it down into a little bit more. I mean, we met in 2022. You learned how to extrapolate Eric's timelines. [42:54] So maybe, yeah, one and a half or double everything I say, but I think very soon, like. [43:00] Very small number of years. [43:02] This. [43:03] I don't want to give it a number now, but a very small number. [43:06] Less than 10. [43:08] Oh, definitely less than 10. I mean, way less. Wow, okay. Because I'm seeing some of the, like, the SWE agent stuff that just came out, they're at, like, 14% on SWE bench, which feels like... I mean, 14%. I just don't care about 14%. Like, I mean, I don't know if 80 or 90 is good enough. [43:25] Yep. Like, [43:26] I think you need 99, even 96, I don't trust my computer. I don't want to review the code. If I have to, the tier of product where I have to review the code is fundamentally different from the tier of product where I don't have to review and understand the code.

43:39-45:13

[43:39] And like, you're not talking about 95 when you don't want to review. You're talking about 99 point something. You're talking about whatever my developers accomplish. Same as with self-driving cars. So the difference with self-driving cars is like, [43:52] You die if the thing crashes. And here you just have to review code. So it's launchable before, but fundamentally you need way, way, way more. And usually the last few, the nines are hard to get. [44:02] um, [44:03] So yeah, but no, I think you can, [44:06] I don't know. I don't... [44:08] People have, I mean, models have surpassed all these benchmarks. I mean, just recently the math benchmark, right? Like way faster than even like the prediction mark gets assumed than like [44:16] I don't see that stopping. [44:18] There's just too much. If everyone was stuck, and I realized there's some perception in the public that, oh, GPT-4 is like, oh, it's not getting much better. [44:26] Go. [44:27] No. Okay, we're going to close out with a lightning round. [44:30] Um... [44:32] One word answers. [44:33] One, what's your favorite AI app, not Magic? Probably all the invisible ones still, like my spam filter and all this stuff. [44:40] Love that. The things that keep life working, I think, are still at the moment more useful than the sort of AGI-like apps. Yeah. Because if you took them away, life would just be awful. [44:50] Um, [44:51] like recommendation algorithms for whatever. [44:55] I think that's really useful. [44:56] Um, [44:58] Other than that, [45:00] Yeah, I think whichever... [45:02] are you saying other than let's say other than the programming world other than magic I'd say whichever model is currently best would

45:14-46:48

[45:14] It's a very boring answer, but I actually picked the spam filters, etc. at the recommendation... [45:18] Services first. [45:19] on a [45:20] What paper has been most influential to you? [45:22] I don't think this paper is relevant at all in the world anymore, but it was the first paper I ever tried to deeply understand. [45:29] months on it and re-implemented it and everything. [45:32] And so it was most influential to me as a person, not so much to my [45:36] current work. [45:39] And the paper's called DeepStack. It's one of those neural networks plus imperfect information game solving papers. It's reasonably complex for time... [45:50] Yeah, so it's a pair of folks who are interested. It's like nowhere near Arizona now, but... [45:54] And it's just an irrelevant type of algorithm, but at the current time, sorry, right now, back then it was useful. So that was very interesting for me because it was just my first touch point with research, really. I had no idea how to do research at all in them. [46:09] I sort of just was like, I'm going to dig into this. The way people like hyperlink spam on Wikipedia where you like rabbit hole, I did that with this paper. [46:15] I love that. Okay, that's going to be my weekend reading. Last question. What are you most excited about in AI in the next one, five, and ten years? [46:24] Um, [46:25] just what it's gonna [46:27] How society is going to integrate with it. [46:30] - Hmm. [46:31] I think that's, we're getting to the point now where it's really going to impact over the next one to five years, it's really going to impact how society does stuff and beyond just, you know, another tab in your browser that, you know, [46:43] speeds you up by some percentage on some paths. I think it'll get much more significant in that time frame.

46:48-48:24

[46:48] and um... [46:49] Ultimately, you should like the only I am not one of the intrinsic curiosity type of people. I know most researchers are. I really am not. I just care about the outcome. [46:57] And that is the outcome. So I'm most excited for the outcome. [47:01] Eric, thank you for joining us again. Last time we recorded the podcast, we weren't actually able to talk about the thing that got us so excited about magic, which was you had shared with us. [47:14] your long context eval. And our own kind of AI researchers had gotten really excited by what you'd accomplished on that. And that was actually what led to us investing in magic in the first place. So you just made some exciting new announcements around the eval. I was hoping you could share it with our audience. [47:30] Yeah, for sure. Thank you so much. Yeah, I mean, we've been running around with this hashless eval for a while, basically just being frustrated by the needle and his stack eval. And, you know, everyone keeps complaining about it. And now that we've decided to announce our, sort of where we're currently at in terms of our context work. [47:49] Instead of just like blah blah talking about like oh we have so many tokens, [47:54] Of context, it felt reasonable to share the eval as well. I mean, we've used it in our fundraising, obviously, and thanks for backing us. [48:03] and generally just used it to guide our architecture development and our research. So yeah, felt right to open source it and let others compare their architectures and their results with ours. [48:17] And then, you know, it's exciting to share. And thank you for having me back on to talk about it. Yep.

48:24-49:57

[48:24] Thank you. Can you say a word on what's broken about Needle in a Haystack and what your e-mail does differently? [48:29] Yeah, for sure. With Needle and ASAC, basically what you're testing is like, find this weird thing, the Needle, in this giant pool of not weird stuff. [48:38] the haystack. And so really all you need to be able to do to do this is to sort of take like a little backpack and walk from the start of the context to the end of the context and like find the weird thing, put it in your backpack and return it. [48:51] Um, [48:52] you have to implicit prior that this thing is weird, so you're more likely to remember it, which means that you actually don't need to remember the whole context window. You don't need to know all of it. That allows some models to look like they're doing long context really well, when really it's not working as well. We decided to just go the complete opposite, super hardcore mode, and just replace everything with random noise. [49:21] There is no semantic information at all, because it's just randomly generated letters, basically, just hashes. [49:30] And if you did something like needle in a stack in a pool of hashes, you really have to know the whole thing. But then what we do is we also do a hop. So it's not just you find this one thing, but you find this one thing and then you find another thing. And obviously, you know, you can keep that going. But those two dimensions, I think, really are the important quantitative components of context. There are other things you can measure much better in more domain specific evals.

50:00-51:12

[50:00] at that too. [50:01] internally. [50:02] But I think from a general purpose context evaluation, [50:06] perspective and the reason we chose to open source this eBall and only this eBall is just that, you know, I think this quantifies exactly what you want to measure when you think about long context and everything else is sort of domain specific. [50:17] But yeah, you've [50:18] You want to be forced to remember the whole context window when you're talking about the context window. Otherwise, is it really that big? [50:26] Totally. I remember our own researchers were just blown away by the purity of the eval and... [50:33] how well done it was. And so thank you for what you're doing. And thank you for open sourcing it, especially in an age where... [50:39] Law and context is becoming more and more important. [50:41] Thank you so much for having me, Matt. Cheers. Of course. Thanks, Eric. [51:11] Woo!

Want to learn more?

Ask about this episode