Nicholas

OpenAI Sora 2 Team: How Generative Video Will Unlock Creativity and World Models

Nicholas

The OpenAI Sora 2 team (Bill Peebles, Thomas Dimson, Rohan Sahai) discuss how they compressed filmmaking from months to days, enabling anyone to create compelling video. Bill, who invented the diffusion transformer that powers Sora and most video generation models, explains how space-time tokens enable object permanence and physics understanding in AI-generated video, and why Sora 2 represents a leap for video. Thomas and Rohan share how they're intentionally designing the Sora product against mindless scrolling, optimizing for creative inspiration, and building the infrastructure for IP holders to participate in a new creator economy. The conversation goes beyond video generation into the team’s vision for world simulators that could one day run scientific experiments, their perspective on co-evolving society alongside technology, and how digital simulations in alternate realities may become the future of knowledge work. Hosted by: Konstantine Buhler and Sonya Huang, Sequoia Capital

Published
Published Nov 6, 2025
Uploaded
Uploaded Jun 11, 2026
File type
POD
Queried
0

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:35

[00:00] For OpenAI across the board. [00:01] It's really important that we kind of like iteratively deploy technology in a way where we're not just like dropping bombshells on the world when there's like some big research breakthrough. [00:09] co-evolved society. [00:10] with the technology. And so that's why we really thought it was important to like do this now and like do in a way where, you know, we've hit this again, this kind of like GPT 3.5 moment for video. [00:19] Let's make sure the world is kind of aware of what's possible now. [00:21] And also, you know, start to get society comfortable in like figuring out the rules of the road for this kind of like longer term vision for where there are just copies of yourself running around in Sora and the ether, like doing tasks and like reporting back in the physical world. Because that is where we are headed long term. [00:38] Bye. [00:54] Today on Training Data, we sit down with the team behind OpenAI's Sora: Bill Peebles, Thomas Dimson, and Rohan Sahai. You'll hear about space-time tokens, building internal world simulators, and how optimizing for creation instead of consumption is just better for social platforms. [01:10] This conversation goes way beyond video generation and into questions about how society will co-evolve with powerful simulation technologies. We promised that this was an actual real-world conversation and not a video generation, but we don't know how to prove that to you. [01:24] Let's jump in. [01:27] Hey, guys. Thank you for being here at Sequoia. Congratulations on Sora. Thank you. Maybe you could tell us a little bit about yourselves and how you got to OpenAI and Sora.

1:36-3:21

[01:36] Yeah, I'm Bill. I'm the head of the Sora team at OpenAI. I had a pretty traditional path, came through undergrad doing research on video generation, then continued that work at Berkeley. [01:45] and then started at OpenAI working on Sora from the first day I joined. [01:51] And I'm Thomas. I work as an engineering lead instead of Sora. [01:55] Um, [01:56] have a bit of a [01:57] Longer story, but I... [01:59] I worked at Instagram for about seven years, doing some of the early kind of machine learning systems and recommender systems there. [02:06] but it was a very tiny company. It was about 40 people. [02:08] Then I quit, did my own startup for a while, which was Minecraft and the browser, which we've talked about a couple of times. And I think that OpenAI noticed that we had a very cracked product team there, and so they acquired our company. And... [02:20] I've been bouncing around different products inside of OpenAI and on the research side as well on post-training. [02:25] but super happy we landed kind of together on Sora to... [02:29] bring this thing to life. It was a really cool product in between too. [02:32] Like the global illumination product. Oh, yeah. I still believe in it. Yeah, me too. [02:36] Awesome. I'm Rohan. I've been at [02:38] Opened AF for about two and a half years. Started as an IC on ChatGPT. [02:41] um, [02:42] But then as soon as I saw the video gen research, I got quickly Sora-pilled and made my way over there. And so currently we have the Sora product team. Before that, just... [02:52] startups big companies within kind of the valley bunch of random stuff yeah [02:56] Cool. Well, Bill, you are the inventor of the Diffusion Transformer. [03:01] Can you tell us what that is? Yeah. So most people are pretty familiar with autoregressive transformers, which is the core tech that powers a lot of language models that are out there. So there you generate tokens one at a time, and you condition on all the previous ones to generate the future. Diffusion transformers are a little bit different. So instead of using autoregressive modeling as kind of the core objective,

3:21-4:57

[03:21] You're using this technique called diffusion, which at a very high level basically involves taking some signal, for example, video, adding a ton of noise to it, and then training neural networks to predict the noise that you applied. Mm-hmm. [03:33] And this is kind of a different kind of iterative generative modeling. So instead of generating token by token, as you do in autoregressive models, [03:40] diffusion models [03:42] generate by gradually removing noise one step at a time. And in Sora 1, we really kind of popularize this technique for video generation models. So if you look at all the other competitor models that are out there, both in the States and in China, [03:54] Most of them are based on DITS, Diffusion Transformers. And a big part of that is because... [03:59] Dits are a really powerful inductive bias for video. So because you're generating the whole video simultaneously, [04:04] you really solve issues where quality can like degrade or change over time, which was kind of like a big problem for prior video generation systems. [04:11] which DITS ended up fixing. So that's kind of why you're seeing them proliferate within video generation stacks. When I try to visualize it, I mean, for each diffusion, you have a matrix of pixels, and then you do the entire video at the same time, which you can basically see as different frames, I imagine. Can you visualize that as, you know, matrix of matrices, or, [04:30] that basically transforms over time? Yeah, it's a good question. So we really kind of consider things at the granularity of like space-time tokens, which is sort of like an insane phrase. But, you know, whereas, you know, for example, characters are very fundamental building block for language. For vision, it's really this notion of a space-time patch, right? You can just imagine this little cuboid [04:49] that composes both X and Y, like spatial dimensions, as well as a temporal locale. And that really is kind of like the minimal building block.

4:58-6:38

[04:58] that you can build visual generative models out of. And so diffusion transformers sort of consider these... [05:04] almost you can think of it like voxel by voxel. And, you know, and... [05:09] the traditional versions of these, these diffusion transformer models, you have, [05:13] all of these little space time patches talking with all of the other ones. And that's how you actually are able to get properties like object permanence to fall out because, uh, [05:22] Basically, you have full global context of everything going on in the video at every position in space time, which is like a very powerful. [05:28] property for a neural network to have. Mm-hmm. [05:31] Yeah. And is that the equivalent of the attention mechanism is the objects movement throughout the video? Yeah, that's right. So in our like Sorrelon blog post on. [05:40] video generation models as world simulators, we kind of laid out some visuals, which sort of go into exactly your point here, which is really attention is like a very powerful mechanism, right? For [05:49] sharing communication, like sharing information across space time. And if you represent data in this way, right, where you patchify it into a bunch of these space time tokens, as long as you're, you know, properly using the attention mechanism, that allows you to transfer information throughout the entire video all at once. What are the biggest differences between Sora 1 and 2? And I remember with the original Sora 1, you're already seeing kind of emergent properties where the more you scale, the more it's able to do things like understand physics. Is Sora 2 purely a function [06:19] or what are the biggest differences? - Yeah, that's a great question. You know, we've spent a long time really just doing, like, core generative modeling research since the Sora 1 launch to really figure out how we get the next step function improvement in video generation capabilities. We really kind of operated from first principles, right? So we really want these models to be extremely good at physics,

6:38-8:09

[06:38] We want them to kind of feel intelligent in a way that I'd say like most prior video generation models don't. So by that, I really mean, you know, if you look at kind of any of the previous set of models that were out there, you'll notice a lot of this kind of like effects that happen. Like if you try to do any sort of complicated stuff, [06:53] sequence of like physical interactions right for example like spiking gymnastics classic Riding a dragon like you do riding a dragon. That was fun. That was that happened for real actually Constantine [07:04] um [07:05] Uh, [07:06] you know, they're like very clear problems with [07:10] the past generation of models that we really set out to solve with Sora 2. And I think one thing that's really cool about this model compared to prior ones is that when the model makes a mistake, it actually fails in a very unique way that we haven't seen before. And so concretely, [07:23] For example, if [07:25] Let's say like the text input to Sora is a basketball star wants to like shoot a hoop, right? Shoot a three throw. [07:31] If he misses in the model... [07:33] Sora will not just like magically guide the basketball to go into the hoop, right? To be over optimistic about respecting what the user asked for. It will actually defer to the laws of physics most of the time. And the basketball will actually like rebound off the backboard. And so this is a very interesting distinction, right? Between like model failure and like agent failure. Agent as in the agent that Sora is like implicitly simulating as it's generating video. [07:54] And we haven't really seen this very unique kind of like semantic failure case in like prior video models. This is really new with Sora 2. [08:01] It's kind of a result of just the investment we put in really [08:04] doing like the core generative modeling research to like [08:07] get this massive improvement in capability.

8:09-10:00

[08:09] Okay, so not purely a function of scale. You're actually... [08:12] you know, there's some concept of agent simplicit and this, there's [08:16] There's things you're doing beyond just scaling up the model. Well, the notion of agents, I'd say, is actually mostly implicit from scale. Like, you know... [08:24] In the same way where we kind of showed that object permanence, right, begins to emerge in SOAR 1 pre-training once you hit some, like, critical... [08:31] flops threshold. [08:32] We see similar kinds of things happen as we like push the next frontier, right? So you begin to see these agents act more intelligently. You begin to see the laws of physics be respected in a way that they aren't at like lower compute scales. [08:43] How does the concept of a space-time latent patch... [08:46] relates to a space-time token. [08:48] relate to object permanence and how things move through the physical world? Yeah, that's a great question. So I'd say space-time patch and space-time token are more or less synonymous with one another. [08:59] I'll use them interchangeably. [09:01] You know, what's really beautiful, right, is [09:03] Uh, [09:04] when people started scaling up language models from like GPT one to GPT two to GPT three, [09:09] we really began to see the emergence of like world models internally in these systems and [09:15] What's kind of beautiful about this, right, is there's incredibly simple tokenizers that actually go into, like, creating the data that we train these systems on. But despite this very simple representation, right, you know. [09:25] like BPE characters, what have you. When you put enough compute and data into these systems, [09:31] in order to actually solve this task of predicting the next token, you need to develop an internal representation of how the world functions, right? You need to like simulate things. And like, you know, the models will make lots of mistakes right now at like low compute scales. But as you continue pushing it from three to four to five, you just see these internal world models get more and more robust. And it's really analogous for video, right? And in many ways, more explicit. So I think it's easier to picture what like a world model or a world simulator looks like with video data, right? Because it is literally representing like the raw observational bits of like,

10:00-11:31

[10:00] All of reality. [10:02] But what's really remarkable is because these space time patches are just this like very simple, [10:07] and like highly reusable representation that can apply to like any type of data, right? Whether it's just like, [10:12] video footage of like this set, whether it's like anime, cartoons, like whatever it is, you're just able to build... [10:19] like one neural network that can operate on this vast, extremely diverse set of data and really build these like incredibly powerful representations that model like very generalizable properties of the world. Right. It's useful to have a world simulator to predict like how a cartoon will unfold. And likewise, it's useful for predicting how this conversation might unfold. [10:36] And so that really puts a lot of optimization pressure on Sora to like grok these like core fundamental concepts in a very like data efficient way. [10:43] Hmm. [10:43] Did you have to put effort into selecting the data such that it reflected the physical world? For example, I'd imagine if you have data from the physical world, it all abides the laws of physics. But you mentioned anime. [10:56] that might not always abide in the laws of physics. Did you have to be selective, or did it naturally find patterns that separated that out? That's a really great question. We did spend a lot of time really thinking about... [11:08] What does the optimal data mix for a world simulator look like? And to your point, I think... [11:14] in some cases will make decisions that, you know, maybe are for like making the model really fun. Like, for example, people love generating anime, but, you know, do not necessarily like perfectly represents, uh, like the laws of physics that are like directly useful for like real world applications. So like to put it another way, right. I think.

11:31-13:06

[11:31] in anime, [11:32] there are certain primitives that are simplified that are actually probably useful for understanding the real world. You know, people still locomote through scenes for example, but like, if there's like some crazy dragon that's like flying around, that's probably like not so useful for like rocking aerodynamics or something. [11:44] Dragon Ball Z is more or less how I learned athletics. You know? There you go. Motion and Super Saiyan. I think it is an interesting question. Like... [11:52] that I do not know the answer to, whether somehow like pre-training [11:57] on simplified representations of like [12:00] the visual world, whether that's like sketches or like some other modality. [12:03] Like... [12:04] you know, makes you more efficient at like rocking these concepts. I think it's actually a very interesting scientific question that we need to understand better. [12:10] Do you think we're close to exhausting the number of pre-training tokens there are out there? Or do you think video data is just so massive and it's actually one of the more untapped vats of data? Yeah. The way I kind of think about this is the intelligence per bit of video is much lower than something like text data. [12:24] but if you integrate over, [12:26] all of the data that really exists out there, the total is much higher. So to directly answer your question, you know, I think [12:32] It's hard to imagine ever fully running out of video data. There's just like so many [12:37] ways that it exists in the world. Um, [12:40] That, like, you know, you will be in a regime where you can continue to just, like, add more and more data to these pre-training runs and continue to see games for, like, a very long time, I suspect. [12:47] Yeah. [12:48] You think we'll ever discover new physics? There's the LLM world of Einstein thinking the whiteboard. It's equivalent to these LLMs thinking. There's also just the, if you develop a perfect simulator and you just... [12:58] stimulate physics better and better, you might learn things about the world that we haven't learned yet. I totally think that this is bound to happen one day. And I think we probably need...

13:07-14:39

[13:07] even like we probably need one more step function change, I'd say, in like model quality to like really get to a point where [13:12] For example, you can think about doing scientific experiments in the models. But you could imagine, right, one day you have a world simulator that is generalized so well to the laws of physics... [13:21] that like you don't even need like a wet lab in the real world anymore, right? You can just like run biological experiments within Sora itself. [13:27] And again, this needs a lot of work to really get to the point where you have a system that's robust enough to do this reliably. [13:33] Um, but you know, internally, like, again, we've used Sora one is kind of being like the GPT one moment for video. It was like really the first time. [13:39] things started working for that modality. Sword 2, we really view as, like, GPT 3.5 in terms of, like, it really being able to, like, kickstart, you know, the world's creative juices and, like, really... [13:49] like break through this kind of usability barrier, where we're seeing like mass adoption of these models. And we're going to need a GPT-4 breakthrough. [13:56] to really get this to the point where this is useful for like sciences as we're seeing now with GPT-5 right like I feel like every day on Twitter I see another like convex optimization lower bounds get like improved by GPT-5 Pro and I think eventually we're going to see the same thing happening for the sciences with Sora. [14:10] Do you think you need physical world embodiment to get there or do you think a lot of it can be done effectively in sim? [14:17] I am like always amazed. [14:19] every time we push another 10x compute into these models, what just magically falls out of it, with very limited changes and kind of like, [14:28] what we're training on and like the fundamental like approach to what we're doing. [14:33] I suspect some amount of physical agency will certainly help. I have a hard time believing it will...

14:39-16:16

[14:39] make you worse at like, you know, modeling like collisions or like something else. Um, [14:43] Video only is like quite remarkable though, and I wouldn't be surprised if it's actually kind of like AGI complete. [14:48] for like building like a general purpose world simulator. [14:50] So for this concept of a general purpose world simulator, a world model where you can do [14:56] science experiments in that world. Do you think that [14:59] video is the soul or some combination of video and text are the combined [15:05] data inputs. [15:07] and you train it on... [15:08] this type of... [15:10] this type of model or is it gonna be, does it have to be based on [15:15] more structured laws of physics that are understood and laws of biology that are understood. I think it probably depends a lot on the specific use case you're kind of envisioning for the world simulator. [15:27] For example, if you just really want to build like an accurate model of how like a basketball game is played, I actually think like only video data and like maybe audio as well. Like kind of sufficient to build that system. Not of me playing basketball. That would be an inaccurate, very bad player of basketball. You know, yeah. [15:44] You actually like Sora's current understanding of how people play basketball, Constantine, maybe at your level. Wow. Okay. That makes sense. It's possible. [15:53] I think he just dissed you. It's accurate. But it's better than mine, Constantine. That was like a Sora 1 situation. You're at Sora 2. We'll toss some hoops. Is that what they'll say? You know, I'm down. I'm down. Yeah. Shoot some hoops. Thanks. Thomas's first statement in the podcast. I'm also at your level. I'm sorry. You know, I think it is an interesting question. Like, what are all of the modalities?

16:17-17:48

[16:17] that [16:18] should be present in this kind of general purpose system. Certainly, if you add more modalities, I have a hard time believing it will decrease the intelligence. I also think there's an argument to be made that, [16:27] Um, [16:28] just, [16:29] adding more and more does not provide significant marginal value compared to full mastery of video and audio, for example? I think it's an interesting open question. I'm not actually sure right now, and it's something we need to understand more. [16:41] So cool. Sonia a minute ago mentioned Einstein at a whiteboard. [16:45] And obviously that makes me think of you, Thomas, and your hair. Me too. [16:51] It had to come. [16:55] gives the feeling of space-time tokens. It's definitely yours. At some point, [17:03] "Bill, you're the creator of this revolutionary technology that has changed the way that [17:08] AI video is created. At some point, you from SORA 1 to SORA 2 said, hey, all together, you said, there needs to be an application around this. There's some benefit to an application. You brought together some of the best product people in the world. How did that crew come together at OpenAI? [17:23] Yeah, it's a, I mean, the story is never as linear as you might think it is. So I think that, I mean, we've had a product team on Sora since the get-go. Rohan was like... [17:33] spearheading that effort in the Sora 1 days. But I think Bill's right when he says it was really like a GPT-1 kind of moment. We're seeing pockets of very interesting things there, but the models were not like – [17:43] models without sound, videos without sound, it's like a very different kind of [17:46] environment.

17:48-19:18

[17:48] We were working on that surface, mostly targeted on kind of like a prosumer demographic. And separately, I mean, Ron, I can probably go into more details of all that. [17:57] um, [17:58] Separately, we're also just kind of exploring different social applications of AI. [18:02] inside of OpenAI and what that could look like. We had a lot of prototypes, most of which were quite bad. And when we started to see some of the magic was actually with ImageGen, [18:13] before it had been released. We were playing with it internally in a social context. And the social context was really interesting to see that [18:21] what people were doing is you'd sort of like, [18:22] take an image and then you'd have like a chain of remixes of that image where like, I don't know, there was a [18:28] It's a duck, and now the duck's on somebody's head, and now everything's upside down, and they're smoking a cigarette. Just a lot of weird things. It's a crazy party. Yeah. And we were seeing this, and we were like, oh, this is kind of like a very interesting thing that's like, [18:41] Nobody can really do that with social media because it's so hard to create something or riff on something. It's such a high barrier to entry action. Maybe you have to go get a camera set up and... [18:53] It's not just like thinking of the idea. There's actually a lot of things involved. And so – [18:57] we were like, okay, this is a very magical behavior. How can we kind of productize that? [19:01] behavior. [19:02] And we're mostly thinking about away from Sora, some of the Sora... [19:06] Research was still ongoing and there were signs of life, but it wasn't quite [19:09] there yet in productized form. Bill probably had it in his head somewhere. He's like, I can see the future, but that's fine. I'm a little bit more. Can't quite see the future yet.

19:19-20:49

[19:19] So... [19:20] uh, [19:20] So we were just exploring that. I think we tried a few things, and then at some point the research was really – [19:26] just showing very clear value. [19:28] of even iterative deployment style value of like, oh, this is something that people will really want. And so we went into this project [19:35] So like, [19:36] two or three months ago. It wasn't very long. It was like July 4th. Wow. Yeah. Wow. That's when you disappeared, Thomas. That's when I disappeared, yeah. Exactly. Yeah. So, and we just kind of locked in like, okay, we're finally doing it. You know, that's always a moment. [19:50] Um, and, um, [19:52] We started without any magical features, just like [19:56] "Okay, let's just try to get native video environment where you can hear the audio full screen." And we did some quick generations. Things were showing very, they're very cool, very fun. [20:07] Very interesting. [20:08] Um... [20:09] And... [20:10] because of that image gen experience, we sort of had thought, I'm like, okay, what's the magical here? Magical thing here is that like barrier to entry is very, very low for creation. Coming from Instagram, that's like, [20:20] It's impossible to get people to create on Instagram, and that's the most valuable thing that people do. [20:24] um, [20:25] So what does that unlock? [20:27] And it's like, okay, well, that remix thing from ImageGen, that kind of could still apply here. [20:32] And so he brainstormed all these things about, [20:34] how could remixes work and what does a remix mean here? [20:37] Um, [20:38] One of those was this cameo thing, which I think also Bill... [20:41] It was in the ether. It was in the ether, for sure. But we just were hacking together things on the product. We were just, let's see if this works.

20:49-22:20

[20:49] I [20:51] I didn't think it would work at all, but it was on the list. And there were a few other things on the list. Some of them were pretty crazy. It was like, why didn't you think it would work? [20:59] I am bad at predicting technology. [21:05] It wasn't super clear to me that you could like – [21:07] take a likeness of a person and have that kind of imagined into a video form. [21:12] Um, [21:13] And whether it would work or not. [21:15] And so we had early prototypes of different things, of like people reacting in the video corner or stuff like that. But when we saw cameos just start to – [21:23] Work. [21:24] And even playing internally, like... [21:25] Ron, do you remember that day where we were like, [21:27] Yeah. [21:28] Feed is entirely candy. Yeah, it's entirely. It just went from, you know, we didn't have that feature. Once we had that feature, product market fit on the team. Everything we were generating was all of each other. [21:38] You must have seen the meme potential. I mean, yeah, that's... I think... [21:43] At first. [21:44] we were just like, this is hilarious, this is amazing. And then a week later, we were like, this is still all we do. [21:51] There's something here. Yeah. I mean, at first, we were actually a little bit like, [21:54] is this good? Like, Hey, the cameos, it's just all cameos now. Does anyone else care about this? People care about other people doing stuff. And, um, [22:03] We kind of got to the point where we're like, no, no, this is actually good. It feels like I'm coming back to see. [22:08] And it really humanized it a lot, where like a lot of AI [22:12] video is just... [22:14] kind of static scenes that are quite beautiful, quite interesting, might have extremely complicated [22:19] things going on.

22:20-23:55

[22:20] but they lose that human touch. [22:22] And it really felt like it was coming back into it. [22:25] Another learning from ImageGen too, like ImageGen took off and had viral moments because I think you could put yourselves in these scenes in accessible ways that weren't possible before. Obviously this massive like put me in a Ghibli scene. [22:39] people taking selfies with their idols and stuff like that. And so... [22:43] Once you actually kind of thought about it, it's like, yeah, cameo feature makes a lot of sense. You put yourself in all these scenes. That's way more exciting. You and your friends. It's novel. It's like not something you could do before. Yeah. And then that combined with remixes became kind of remixed to begin with. But then. [22:58] You start to think about, okay, well, now I can riff on. [23:00] Rohan doing something or whatever it is. Like with Bill... [23:03] had you wrapped in an action figure package. It's been remixed like an insane number of times. Thousands of times. So like just very, very crazy things that kind of go on and very emergent. A lot of stuff that I would have never... [23:16] thought of. [23:16] Actually, how many generations of you guys have been publicly posted at this point? I have no idea. I know I'm 11,000 or so. I was a little less than that. [23:25] - Wow. - Yeah. - That's crazy. - What does surprise you about the types of users that are really sticking with Sora? Who is it really a hit with? [23:32] If you just go to the latest feed, which is just like... [23:35] the fire hose astronaut mode of everything yeah it's it's space-time thomas mode um it's wild out there but [23:43] That gives you a pretty good snapshot into just everything happening. I mean, I think we have... [23:48] like almost 7 million generations happening a day. So you can imagine there's just a ton of information there. It's one of my favorite ways to just get product feedback.

23:56-25:33

[23:56] It is so diverse, the type of stuff people are doing, the type of people. There'll be like a complete... [24:01] variety of age, some people just doing envisioning themselves in scenes that seem like motivation oriented, people just memeing with their friends, people cameoing some of like the public figures on the platform that have done cameos. So I think the [24:15] the diversity has surprised me. I was kind of expecting this sort of like, you know, the Twitter AI crowd to like heavily dominate the feed. They definitely dominate like... [24:25] the press cycles, at least the ones that we're most exposed to. But in terms of people actually using this, it's quite a wide variety. And last thing I'll say is, [24:34] a bigger departure from like [24:36] the sort of niche AI film crowd that existed before, which is great early adopters, but now you kind of get these. I thought it would start there. [24:44] But it felt like it started with [24:45] just a way wider range of people. I think getting to the top of the app store helps with that and just get people who are like browsing and see this thing. My mother keeps cameoing Thomas. Is that right? It's so weird. We have a lot of strange cameos. She said 11,000. She's done 10,000 of them. [25:08] Thomas, you wrote the original algorithm, if I'm right, for the Instagram ranking, ranking algo. [25:15] lot in the Sora 2 blog post about how you guys are clearly being very intentional about how you want to do ranking in the algo. Can you talk a little bit about lessons learned from Instagram and how you're approaching it over at Sora? Yeah. I mean, there's a lot to cover in that. I think that the first thing to think about

25:33-27:04

[25:33] when we think about these platforms or think about source specifically, is it is the thing I was mentioning before about creation. [25:38] So, [25:40] Soar enables basically everybody to be a creator on this platform. [25:44] And that is a very, very different environment than something like Instagram, where you have this like extreme power law of the people that are creating. [25:51] And the power law just naturally gets more... [25:54] uh, [25:55] narrow what's the right word there uh but more uh head heavy yes um [26:00] So sometimes I feel like I have to defend myself on the Instagram algorithm side. We actually did it for, I mean, we did it for a reason. It was to solve a problem. It wasn't just kind of like a random decision to, you know, [26:11] optimized for ads or something like that. And the reason we did that was that we noticed that like [26:16] what was happening on Instagram over time was, [26:19] because it was chronologically ordered. [26:22] Every single person that posted. [26:24] was guaranteed to have the top slot of all their followers. And so if you think about that for a second, the incentive for somebody in that environment is actually to create constantly. [26:33] because they are guaranteed distribution when they create. And over time, because of this power law becoming heavier and heavier, or more head heavy, [26:43] um [26:44] Those type of people, which are great, they provide a lot of value to the ecosystem. [26:48] But they start to crowd out. [26:50] People you really care about. [26:51] And so maybe you follow National Geographic or something, not the Duncan National Geographic. I love them. But, you know, if they're posting 20 times a day, your friend's not. They don't have the same like optimization objective. They're probably just.

27:04-28:35

[27:04] a picture of their coffee or something. And so you'd have 20 Nat Geo posts and then one picture that you actually really cared about that you never really scrolled to. [27:12] And there's not too many solutions to that problem if you have a guaranteed ordering. One of them is that you have to unfollow all these. [27:20] accounts that you maybe care about, but care about not as much as the person that posts once a day. And the other is that you have to permute the... [27:29] Uh, [27:29] prune at the feet. And so we went with that path. We tried it. We tested it out internally. It was very kind of controversial to do. [27:37] And, [27:38] But I think that you can actually kind of like math this out. It's like a proof that basically – [27:42] Over time, you're going to have to take control over distribution on the platform in order to prevent these kind of issues and show people what they actually care about. [27:49] So that's why we did it. And it actually showed a lot of value. I remember the early tests, I won't get into the numbers on them, but they were pretty unambiguous, actually, about this was showing more people that you cared about. It was improving your experience with the platform, actually move creation, which is unusual. [28:03] It made people create more because they were seeing more content that was accessible to them. [28:08] Um, [28:09] But I also think that these things can go astray over time and – [28:13] I won't say like the Instagram algorithm unequivocally bad or unequivocally good. [28:16] But when we started to open up to more unconnected content, [28:20] And [28:22] ad pressure was very strong. [28:24] There's also a natural company incentive. [28:26] to optimize for just blind consumption. [28:28] because it's how you make money. [28:30] So maybe cheaper content or maybe just like get people to scroll more and more and more and more.

28:35-30:16

[28:35] And that also can encourage people to create less. [28:38] because it's just like a more mindless scrolling mode. [28:40] You guys are very concretely committed to doing things to prevent that kind of behavior. We have a lot of mitigations there in place. Yeah. [28:50] I think... [28:52] What it really comes down to me is just like, what are we trying to do as a platform? [28:55] And I think the magic of this technology is that everybody is a creator. [28:59] And so we want this feed to be optimized for you to create. [29:03] to inspire you to create. And that can be like... [29:06] Sometimes when you think of inspiration, you think of like, oh, it's this beautiful, crazy scene that's so elegant. When I think about that, I think about like a meme culture or something really funny or like, oh, that's cool. I've got a riff on that. And I think that's a very different brain mode. [29:20] when you're browsing the feed. [29:22] And of course we have lots of other [29:24] things in place. So I think it starts with incentives. Our incentive right here is to encourage more creation in the ecosystem. But there are certainly... [29:32] use cases we want to prevent. We're not going to get them right [29:35] all the time. It's very... [29:37] It's a very living system. It's also very hard to write a recommender system when you have no data and you don't know what to recommend. You don't know how the platform is going to evolve. [29:45] Um, [29:45] But that's like basically how I kind of think about the incentives of feed. [29:49] And then, Rohan, we have a lot of mitigations in place that I think you've been – [29:52] kind of like thinking about and maybe even more deeply than I have. [29:56] about like, [29:57] preventing maybe the extreme cases. And so, [30:01] I don't know if you want to talk a little bit. Yeah, happy to. But one thing before you I mean, just one thing to add is that the stated intent of like optimizing for creation is working really well. Yeah, it's almost 100% of people who like get past the invite code and all that on the app end up creating on day one.

30:16-31:46

[30:16] um when they come back it's like 70 of the time they come back they're creating and 30 of people are actually even posting to the feed so not just like generating for themselves they're actually like posting into the ecosystem which is incredible testament to the model how fun it is and to like how what we're optimizing for is actually working pretty pretty well right now um but yeah beyond that i mean like one of the top of mind things is i think we don't want this just to be [30:38] like a mindless scroll and beyond just optimizing for creation in the ranking algorithm, there are things we can do, [30:44] like trying to just get you out of this sort of [30:47] flow state, um, [30:49] of just like consumption and push you into like creative mode. I think there's a great article on this called like the curvilinear nature of casinos where they design it. So you never have to make any decisions. It's just like you walk in a circle, there's no windows, all that kind of stuff. Um, [31:03] we can be very intentional about not doing that. And like, you know, whether it's an in-feed unit that's like, [31:09] "Hey, you just kind of viewed a couple of videos in this domain. Why don't you try creating something?" Or other ways to just kind of like push you out of that. We actually have things like that in the product. [31:19] Yeah, those are some of the things that come to mind. I really commend you guys for what you've done to make sure that there's a version of the world where video model as world simulator could have just ended up with us each retreating into our own computer screens and just becoming addicted and just retreating into ourselves. And I think the amount to which you're prioritizing the human element and the social element, I think that the care you've put into that really shows us. [31:44] I don't think we would have launched like...

31:47-33:17

[31:47] a feed of just like AI content that wasn't, that didn't have a human feel like just being [31:52] I don't think that excited us. And as soon as we, we like had the product, we had cameo and we had that feeling internally. Um, we were like, okay, this is actually a little different than, yeah. I don't think it was totally obvious. Again, it was like a pretty crazy sprint. [32:04] to go through this. It wasn't like, [32:07] Super obvious to us what would emerge, but [32:11] I think that the idea, it makes sense in retrospect, right? [32:14] but it was a completely not obvious product decision. The cameos would be the thing. Yeah. Um, [32:18] where it's like, of course, you just want to see your friends doing cool things. So that makes sense. But I was never actually that afraid of competitive pressure. [32:26] in that, that, [32:28] crazy product phase because I was like, we sort of had all these non-trivial decisions that are obvious in retrospect, but were not obvious at the time that we were sort of building on top of each other. It's like, okay, cameos. Well, there's also a version of cameo where you have a crazy – [32:40] flow that's just for you and it's a one player mode cameo and you like go through this onboarding flow and do your stuff. [32:46] But we were already seeing these interesting dynamics where it's like, oh, I could tag Rohan into my video. That's crazy. Like, and then we can have like an argument or like I'm going to have an anime fight. Doesn't matter. And I was like, okay, so that's, that's actually the human element. That's the, that's the magic of this is actually. [33:01] strangely more social than a lot of social networks, even though it's all AI generated content. [33:06] Very unintuitive. [33:07] Totally. [33:08] Is it a separate, is it fine tuned version of Sora 2 or is it like, is it a separate model from what's available over the API or is it the same? [33:15] Between the app and...

33:17-34:47

[33:17] Products. So we're currently exposing like the models in the same state across API and the app. Okay. Really interesting. What are you seeing people do on the API side? And is it different from the types of things people are doing on the consumer app? [33:30] The motivation behind even launching an API is just like support of these long tail use cases. Like we have this vision of enabling, you know, [33:37] chat GPT scale level consumer audience with this tech, but there's tons of very niche things out there. [33:44] You can imagine people who are much, you know, with Sora 1, we went out and talked in a lot of these studios. What we heard from them is like they want to integrate this in this specific part of their community. [33:52] stack in this specific way. And we'd love to support all these long tail use cases, but we don't want to build a thousand different kind of interfaces for this stuff. So that's the kind of stuff we're excited to see with the API so far. It's been, you know, it's been kind of those kind of like, [34:07] a little bit more of a niche company not trying to build like a first-party social app but maybe [34:11] um, [34:12] you know, has some either filmmaking kind of audience or kind of people they're supporting, or even just like, we've definitely, we've seen some like people trying to, [34:21] I think there was some company making... [34:24] um, [34:25] They were doing something with CAD where they were using Soro. Oh, Mattel. Yeah, yeah, yeah. Oh, that's cool. So there's cool use cases out there. I think we're still getting a sense of what they are. Yeah, I think there's a lot that can be done with these things. I think about gaming all the time just based on my background. AI and gaming is always a very controversial subject, but... [34:43] It's very clear that there's a place and there's a role. Maybe it doesn't...

34:47-36:24

[34:47] have to interrupt the creative process, can enhance it. And I'm pretty excited to see some of those use cases emerge. [34:53] Do you think the video models are good enough now for people to be able to build? [34:57] video games on top of the API, or do you think we're still another rev or two away? [35:01] I have my own take on this. I was going to say, never bet against the ways people can be creative with technology to build. Like, [35:07] Someone will be able to build a game, and maybe has built a game already. Will it look and feel like a... [35:13] Obviously, there's latency with this model, so you'd have to do all sorts of crazy stuff to get around that. I think that your mind immediately goes to the obvious sort of things that you would do in gaming, and we've seen some of that sort of stuff happen. [35:24] certainly in research blogs and that kind of thing. My mind often goes to like, [35:28] Okay, this is like a creative tool that's a little bit different. [35:31] and the types of games that really excite me there [35:34] I'll just go off on one, which is this, like, there's a game called Infinite Crafts. [35:38] which is the world's simplest game. It's a web game. [35:40] where you just take elements. It's like fire, water, earth. You have like four elements to start. [35:45] and you just drag them, and it combines into something new. [35:49] And the thing it combines with is like a, it's LLM based. So it's like, [35:53] Fire and Earth might be a volcano. And then volcano plus water might be... [36:00] an underwater volcano or Godzilla or something like that. You always end up in Godzilla for some reason. But that's a game that like, it's like, oh, it kind of makes sense where it's like, yeah, you don't really need a crafting tree. [36:12] The LLM can derive this caffing G, and it's a process of discovery. [36:15] Um, [36:16] And so I think there's a lot of untapped stuff in that space where, again, I like the idea of a process discovery. In fact,

36:24-37:59

[36:24] My philosophical view on LLMs and video models to some extent is that it is a process of discovery. These are all in the weights. You're just unlocking it with like a secret code, which is your prompt. And I love that. That is very magical. That was always in... [36:39] Gaming, that was the thing that excited me the most, was discovering something new. [36:43] Especially if it was a true discovery. It wasn't put there by somebody else. [36:47] Maybe they just enabled the mechanics around it. [36:49] I think there's a huge opportunity in that space of, uh, [36:53] of gaming. [36:54] when you think about games and just a different thing and like embrace this technology in a very different way. [36:58] It reminds me of how some of the earliest use cases for GPT-3 were kind of these text games. So it's different from how you think of a playable video game, but actually a lot of these mechanics are very game-like. Exactly, yeah. I think there's still constraints, and I think that's going to be the mechanism design. That's still very... [37:17] human. Like a lot of the early games with GP3, they're kind of like, yeah, it was fun for a minute. And then it kind of went off the rails. You're like, I don't really know what I'm doing anymore. But again, like this is sort of in some ways, Sora feels like a little bit of that where it's got a little bit of gaming. [37:31] DNA inside of it where it feels very [37:34] fun and different and exploratory so i like things like that and uh i think there's gonna be more use cases that we can't even think of it's too creative what are you guys seeing on the creative filmmaking side like is that an important target market do you want to do you want to empower the long tail or do you want to empower the the head so to speak of the creative market [37:50] It's a really good question. [37:52] We've. [37:53] benefited a lot from creatives who are really willing to like go all in on you know even like

37:59-39:31

[37:59] the early technology, like Dolly 1, Dolly 2, and really like, [38:03] help steer us along the path and [38:06] Like, I think it's important that we continue to, you know... [38:08] build things for [38:10] for those folks. And we are working on some things that are more targeted towards creative power users long term. At the same time, I do think AI is a very democratizing tool right at its best. And so [38:20] what's kind of beautiful about [38:22] the Sora platform in general, right, is whenever someone kind of strikes gold, right, you see one of these like beautiful anime prompts that like goes to like [38:29] the very top of the feed for everyone, like anybody can go and remix that, right? Everyone has the power to like build on top of that and like learn from [38:38] all of these people who come in with this incredible knowledge about [38:41] how to like really get the most out of these tools. And so I am really excited just to see the net creativity of humanity just increase as a result of this. But I think a big part of that is [38:51] continuing to empower people who are always at the frontier, which are these like [38:55] more pro oriented, like creator type folks. And so we want to keep investing in them as well. We've nerded out for a while, like almost a couple of years now about that vision of feature film length content. [39:08] Like, yes, you have these amazing cameos and shorter content, but at some point the individual creator has been something that you've been excited about for a very long time. Yeah. When do we get there? [39:19] Is there a point where we have a feature film [39:21] that is created on Sora 2. Yeah. And how do we consume it? Is it in the Sora app? Is it posted somewhere else online? Do you go to a movie theater and watch it?

39:31-41:02

[39:31] Yeah, it's a great question. I mean, I think this will happen in stages to some extent. So, like, if you guys watch the launch video, I mean – [39:38] That was made by Daniel Fraden, who's on the Sora team. And... [39:41] he already with these tools, right, is able to pump out these like incredibly compelling short stories, right? [39:47] within like days at most. I mean, he literally made that like all by himself in almost no time. And he's been like continuing to like put new ones out there on like the open AI Twitter sense. Clearly, this is like massively compressing the latency that's associated with like filmmaking. I think to get to the point where like really... [40:06] Anybody can do this, right? Like any kid in their home can just like fire up the app or soar.com or something and go and make this. It's really like an economics problem of like the video models. Video is the most intensive compute intensive modality to work with. It's extremely expensive. [40:21] And we're making good progress on the research team, really continuing to [40:25] figure out ways to make this affordable for everyone long term. Like right now, for example, the store app is like totally free. [40:30] In the future, there will probably be ways where people can pay money to get more access to the models, just because that's the only way we can really scale this further. But, you know, I think we are not far off from this world where... [40:43] anybody can really like have the tools to make amazing content. You know, I think there's gonna be like a lot of bad movies that get created by this. But like, likewise, you know, there's probably the next great film director who is just kind of like, [40:53] sitting in their parents' house, like still in high school or something, and just like has not had the investment [40:59] or the tools to be able to like really see their vision come to life. And we're going to find like,

41:02-42:42

[41:02] absolutely like amazing things from like giving this technology to like the whole world. [41:05] I'm looking forward to the feature film length. Constantine's Greek Odyssey. Yeah, me too. Coming to the theaters near you. You'd be a banger. We're all in it together, actually. Different characters. I play the Cyclops. [41:17] It's a good one. I think, just to touch on that one more thing, that... [41:21] Something I've learned from recommender systems over and over again is it like – [41:24] Oftentimes, so the tools, getting people more creative is going to be a huge unlock for people. [41:30] just... [41:31] making people more creative in general because you don't need this access to this filmmaking equipment, all that sort of stuff. But we do consistently see that [41:39] Things content is like also a social phenomenon in a way. And like, [41:44] movies and all that, everything you see out there is kind of a bit of a social phenomenon, in addition to the actual content itself. [41:52] And so I think we're going to enter a very interesting world where [41:55] There's so many people creating and so much content out there. [41:58] that [41:59] even the idea that people are paying attention to and watching it is going to become more and more important. And I think that's actually going to make... [42:06] the quality of content just to kind of elevate because there's this, anybody can create, [42:10] And actually, it's going to be the consumption that's going to be quite limited, which is very different than the world we live in today. [42:16] You guys are very thoughtful and intentional about how you treated IP holders. Can you say a word on that? [42:21] You know, [42:22] We've been... [42:23] in close partnership with like a bunch of folks across the industry and like really trying to [42:28] like both show them kind of this like new technology, right? That is actually like a huge value proposition, um, for rights holders across the board. Right. And like, we're hearing so much excitement from the folks we're talking with, like they really see this as being like, you know, a new frontier, um,

42:42-44:22

[42:42] for... [42:42] Again, like, you know. [42:44] every kid in the world having the ability to like go and like use um like some of this beloved ip and like really like bring it into their lives in a way that feels much more personal and custom than what's been possible before um at the same time you know we really want to make sure that we're doing this like in the right way so we've been like really trying to take feedback and like really uh steer our roadmap in a way where we know that you know both users are going to have an awesome experience getting to use this ip but also the rights holders are going to get you know properly monetized and rewarded [43:12] in a way that everyone wins basically. So we're right now actively working on trying to scope out the exact details about how we're going to [43:20] you know, for example, make it so if you want to cameo your favorite character from some like beloved film or something, you can do that in a way where you have access to it. But like monetization will flow back to the rights holder. Right. So really trying to figure out this kind of like new economy for creators. We kind of have to create this from scratch right now. There's a lot of deep questions about how to do this the right way. [43:41] And, you know, as with everything with this app, [43:44] we come into it with an open mind and we hear feedback and we iterate quickly. You know, we're not sure where this is going to totally converge. [43:50] But we're working closely with people to figure it out. [43:52] Really cool. [43:53] What's ahead? [43:54] Pets. Yeah. I think, I mean, one... Sorry, what? Pet cameos. Cameo your pets. Is that one of the most demanded features? Great breaker. Bill, me it is. Bill's demanding. I will remind us we were just talking about curing diseases and world models, and now we're to the future. Pets, yeah. This is something... No, it's actually... So that's definitely true. We've committed to that. It's coming. But we have... I promise. The...

44:22-45:53

[44:22] We actually had Bill's dog when we were playing around with his rocket. The goodest boy. Yeah. And actually it was very, very cool to actually feature a pet. You can imagine where that goes. It doesn't have to necessarily be a pet. It could be anything, a clock or whatever you have. Clock. Well, yeah. [44:42] You have a special clock. Actually, it's really compelling. I didn't think it could be so compelling until Thomas showed me this clock. It was like a sentient clock. It's based on a real clock. [44:52] father. [44:53] My father was a technology person for a while. This company, Veritas, gave him a clock for his, like, whatever anniversary. Anyway, so I have it on my – [45:02] my, uh, [45:04] table somewhere. [45:06] there's this old Simpsons episode where they talk about a walking clock. And for some reason that's just been an earworm in my head for the last 30 years. And so I always, it's like, you know, they're telling some joke and it's like, [45:16] is there a walking clock? Is there a walking clock? It's like, walking clock? And then it's like, no, man, it's my dog. And so it connected in my brain where I was like, okay, rocket, walking clock. And then so I tried it. Thomas is the dyslore. This is what AI enables. Yeah, so it connected to my brain, and we've been playing around with this just to see if we can get it to work and whether there's something special there. [45:38] which is part of the fun of being on the SWORD team is you get to play with this emergent crazy technology and maybe it does something... [45:45] You wouldn't even have expected. So I recorded a two-second video of my clock, and then I gave it some cameo instructions.

45:53-47:45

[45:53] And I said, you're just a walking clock. You're a walking clock. You talk like you talk. You're a character. And then I generated my first video, and it was insane. It was crazy. It was a walking clock. And then I had one where it was talking to Bill. [46:06] And Bill was like, I didn't think it would ever land, the pet cameo feature. And then walking clock's like, here I am. I just landed. So it's coming. It's all internal means. Talk about emergent IP. Who needs Pokemon when you can have a walking clock? What's the greatest IP? One thing to add in terms of the feature. I think on the feature film question, something I think about all the time is like, what? [46:30] what will that actually look like? I think my, I mean, caveat, Bill's the only one who's good at predicting the future here. But my sense is that the... [46:40] As we get to longer... [46:41] forms. [46:42] what our equivalent of a feature film will look and feel very, very different from what a feature film is today. You know, I don't know exactly what that looks like, but I think on the subject of [46:52] creators and what's coming in the world. I think [46:55] a new medium and a new class of creators. New class could include a lot of existing creators and, and support existing sort of mediums and stuff like that. But I think, [47:04] We're just in the early innings of... [47:06] of what I imagine will be the next film industry, rather than thinking about this being a feature film. But I think there'll be something new. There's some anecdote. I hope this is true because I say it all the time. But apparently when the recording camera... [47:18] like you know hit the world the first thing people did was record plays [47:23] This is like the least interesting thing you could do with a recording camera. It's like, what's the big idea? Oh, people don't have to travel around acting. We can just film them and distribute it. And then someone was like, wait a minute, we can make a film and film in all these different areas. And I feel like we haven't, we're in like the first inning of so many different sort of things that people will do with this technology, especially as the constraints change with latency and length and all that kind of stuff.

47:45-49:17

[47:45] So cool and fun film history nerd fact is one of the original videos, and we should check this as well, but I think the original video – [47:56] was made just down the peninsula. [47:58] to settle a bet. [47:59] on if a horse... [48:01] when it galloped all four legs, [48:03] It left the ground and I could see a world where you have new, that is an example of new scientific discovery. [48:10] People didn't actually have an answer to that. Now that you have a new simulation format, [48:15] What are we going to be able to discover in that? [48:18] It will be crazy. I think one... [48:20] One broader point here is, you know, this app right now feels very familiar in a lot of ways, right? It's like a social media network at its core. [48:28] But fundamentally, like the way... [48:30] that we really view it internally, right? [48:33] With Cameo, we've kind of introduced the lowest bandwidth way. [48:37] to give information to Sora about yourself, right? Aspects about your appearance, about your voice, et cetera. [48:43] You can imagine over time, [48:45] that like that bandwidth will greatly increase, right? So the model deeply understands [48:50] your relationships with other people, [48:52] It understands more than just how you look on any given day. It's seeing your full, like how you've grown up, all of these details about yourself. [49:00] And... [49:01] will really be able to almost function as like a digital clone, right? So there's really a world where the SOAR app almost becomes this like mini alternate reality that's running on your phone. [49:08] You have... [49:09] versions of yourself that can go off and interact with other people's digital clones, [49:13] You can do knowledge work. It's not just for entertainment, right? And it really involves

49:17-50:47

[49:17] more into a platform, which is really aligned with kind of where these like world simulation capabilities are headed long term. [49:22] And I think when that happens, the kind of immersion things we will see are crazy. And, you know, for OpenAI across the board, [49:29] It's really important that we kind of like iteratively deploy technology in a way where we're not just like dropping bombshells on the world when there's like some big research breakthrough we want to. [49:36] co-evolved society. [49:38] with the technology. And so that's why we really thought it was important to like do this now and like do it in a way where, you know, we've hit this again, this kind of GPT 3.5 moment for video. [49:46] Let's make sure the world is kind of aware of what's possible now. [49:49] And also, you know, [49:51] start to get society comfortable in figuring out the rules of the road [49:54] for this kind of like longer term vision for where again, [49:58] There are just copies of yourself running around in Sora and the ether. [50:02] like just doing tasks and like reporting back in the physical world because that is where we are headed long term. [50:06] So cool. So you're building the multiverse. [50:09] Actually, kind of, yeah. Okay. Well, can Tim and me go and find my soulmate somewhere in there? I mean, anything is possible in the multiverse. [50:17] That's call for action, everyone. It is kind of crazy, though, because now I'm going to sound totally cuckoo, but if we're in a computed... [50:25] you know, [50:26] environments, you're building the perfect simulator. [50:30] That kind of is the way you ultimately understand and break out of the computer environment, right? Like, are we getting closer to the heart of the matrix? We're in the matrix. We're in the matrix. Some very deep existential questions. Yeah, yeah. Uh-oh. What's your guys' P of we're simulated? [50:43] Like, this is all... Rising. [50:46] Yeah, me too.

50:47-52:20

[50:47] What's your P? I'm low. Oh, man. Yeah, it's okay. You're really okay. I respect that. I'm just a believer. I'm just like, you know what? [50:54] Sometimes it's got to be real. Yeah. I feel like I'm not like... [50:58] Solid 60%. I don't know. Like more likely than not at this point. I'm there too. [51:03] Well, yeah. [51:05] Zero. Zero? Should we make a calcary on it? A trivially small. How do you settle that? What's the oracle? [51:14] Sora 10 will answer. Sora 10, yeah. [51:18] What do you think are the theoretical limits? [51:20] To Sora. [51:21] Yeah, it's actually a great question. [51:24] I thought a little bit about this. Like, I think there's, like... [51:27] a question, can you eventually simulate a GPU cluster in Sora or something? And I assume there are some [51:34] very [51:35] well-defined limits on like the amount of computation you can run. [51:39] within [51:41] one of these systems, like given the amount of compute you're actually running it on. Um, [51:45] I've not thought deeply enough about this, but... [51:47] I think there are some like [51:49] There's some existential questions there that need to get resolved. Yeah. Yeah. See, that's why his PCM is so high. [51:56] Fascinating. [51:57] Well... [51:57] Got a few lightning round questions for the team that we just kind of generated on the fly here. [52:02] and take your time. [52:04] Jump in whenever you have an answer. Your favorite cameo on Sora to date. [52:09] and what happened. [52:11] That is so tough. [52:13] I have a hot one. Go, go, go. [52:17] Shocker. Okay, so there was this TikTok trend.

52:21-53:51

[52:21] of [52:22] And I got obsessed with them. I don't know why, but these Chinese factory tours where they're like, hello, this is the Chili Factory. They get like one like, and it's me. And it's like they're showing their Chili Factory. And they're like, it's the Chili Factory. This is amazing. Or like there's an industrial chemical one. [52:41] Oh. [52:42] I've lost the name, but there's an industrial chemical factory. [52:47] And the first day I had my Cameo options open just because I was like, I just want to see what happens. [52:55] The first day late at night, [52:57] I opened my cameos and I was starting to get tagged in [53:02] factory tour [53:04] cameos that were all in Chinese. And I was like, [53:08] I'm in the chili factory, and I was so excited. I get zero likes. I liked it. It was just me. But I was like, I'm the chili factory guy now. I'm like doing the ribbon cutting at the chili factory. Amazing. That's too deep of a cut, though. Congratulations. Fun fact, I actually have done Chinese factory tours in real life, and they are truly epic. [53:30] There's this one just I saw of Mark Cuban in jorts dancing around. That was pretty good. That got me. That was good. [53:37] But I mean, my more back to the like, just scrolling the latest feed and just seeing like the wholesome content of people like doing things with their friends, actually, I think what brings me the most joy of they're not like super liked, but it's like people just like getting a lot of.

53:51-55:22

[53:51] you know, value obviously from just like making videos with their friends. So. Sam has so many bangers. I like the one of... [53:58] him doing like this k-pop dance routine about like gpus or something it's very good actually i would put it on my spotify if like we had the full song wow it's very good it's like generated by sora it's like like very compelling yeah [54:11] All right, well, that leads to the next one because you mentioned Spotify. [54:14] What does an AI, fully generated AI, win first? [54:19] Oscar. [54:21] Grammy [54:23] Emmy. [54:23] I think the logical answer is a short winning an Oscar. Yeah. I think that's probably right. What would we win it for? [54:31] like for like a jorts yeah the jorts it'll be the jorts the jorts trilogy yeah yeah we need new content yeah i do think if people stitch things together in an interesting way yeah yeah i think there's a you can actually start to make some very compelling storytelling in that and um [54:47] I don't think it's like, it doesn't really feel like AI anymore. The content I'm seeing like that, that was actually something I noticed with Sora as well. Just like throwing it, it wasn't even noticing it was AI. [54:56] um it was just kind of interesting content that's a more interesting question what will we know oh yeah [55:02] Maybe it's already happened. Maybe it's already happened. I feel like for Oscars, one of the cool things that'll be unlocked is [55:11] this long tail [55:13] of epic stories in history. [55:15] stories of heroism and struggle and all of these things that have been locked up because of the cost of creating.

55:23-57:01

[55:23] as a history... [55:25] enthusiast, I cannot wait for AI to unlock all of those stories. Have you seen the Bible video app? No, I haven't. Oh, it's really good. I'll show it to you after. Like perfect example. Yeah. [55:36] Or there's this movie, The Last Duel, a few years ago about this really... [55:41] terrible crime that was committed in medieval France that was historically relevant and [55:47] you know, basically says a lot about humanity. And it just got picked up because eventually Hollywood picked up this important story about humanity. But how many more are there in human history? That's going to be really cool. [55:57] Um, [55:58] Favorite character from any film or TV show? [56:02] I have a really random one. Go for it. You guys seen Madagascar? Oh, yeah. King Julian. Oh. He's played by Sacha Baron Cohen. He's a lemur. He's a lemur. Yeah, absolutely. It's just like... It's a banger. His humor meets... [56:17] kid-friendly storytelling. It's just perfect. I play a lot of video games, so I mean, your classic answer is gonna be like Mario or something like that. Although, [56:25] I'll do the deeper cut of, we were always joking, Parappa the Rapper. Yeah, Parappa the Rapper, the old PlayStation game, one of the original rhythm games. [56:33] And it's got a great artistic style, and it's got a great IP of just this little dog. What is he? A dog. He's a dog, yeah. [56:39] Yeah. [56:40] That's a good pick. When I was... [56:42] a kid, I played the, like, Pokemon trading card game competitively for a while. So I was, like, really in, like, the Pokemon rabbit hole. So, like... [56:51] I don't know. Starlight. Pikachu. Pikachu. Or Starlight. Starlight. Pikachu. Mudkips. Super non-concensus. Like a fringe. Deep cut.

57:03-58:33

[57:03] Okay, first world model scientific discovery. [57:06] most specific possible. Obviously, you're not going to [57:10] Save the Discovery. [57:11] I suspect it will be something related to classical physics. Like a better theory of turbulence or something. [57:18] That would be my guess. [57:19] I was guessing there was going to be something like that. I was like, yeah, Navier Stokes. I don't know. Yeah, some fluid dynamics thing that's maybe hard to understand now. There's a lot of like unsolved kind of problems there. I think sometimes they call it like continuum mechanics where it's like in between. [57:32] and we don't have good models of them. [57:35] Something that lends itself to simulation... [57:37] Just like the amount of iterations you can do of a simulation unlocking something, which I don't, yeah. [57:43] something in that realm. [57:44] The last thing we'll be able to accurately simulate. [57:49] I do think there's like a set of physical phenomenon for which... [57:52] video data is like a poor choice of representation, right? So like, for example, [57:58] Is it really? [57:59] Efficiency. [58:00] to learn about, you know, [58:03] like high-speed particle collisions or something from like video footage maybe um [58:08] I really think video is at its best when, you know, the phenomenon that you're trying to learn about is just natively represented. Uh, [58:17] in the physical world. And so when you need to do like, you know, like quantum mechanics or [58:21] some other discipline where... [58:24] It's more theoretical. We don't have video footage beyond- - You can't see it. - Yeah, things that we've manually rendered for educational purposes. [58:32] It feels...

58:34-59:54

[58:34] like a weaker medium for understanding those things. So I suspect those would come last. I guess it's the things we don't have sensors for. Right. [58:39] Right. [58:40] Maybe the last things we care to simulate is another way of thinking about the answer. I don't know. I mean, people aren't doing much with smell right now. True. [58:50] Greenfields. I've been meaning to tell you about that. It's kind of awkward. [58:54] We're still trying to figure out how to simulate Thomas with bad hair. Oh, yeah. It remains an unsolved problem. Not even Sora can do it. Thomas' hair flow, just general. Guzzling, ketchup, yes. There was a good round of people being bald. We were all doing bald. Oh, yeah. And this is crazy. Bald gems were good. Actually, kind of cool. That's our use case that doesn't. [59:12] I don't really talk about it very much, but it's like... Visualization? Yeah, everybody wants to be bald. No, it's just like you just see yourself in some different context. [59:20] I think that can be quite powerful, even like therapeutic in some ways where you just like see yourself in some context that you either want or don't want. [59:26] yourself to kind of be in and just see, see yourself. It's a real use case. Yeah. Yeah. [59:31] Guys, thank you so much for coming from space time tokens. [59:35] to object permanence, [59:37] world models that will enable scientific discovery. [59:41] the democratization of creation [59:44] all the way to walking clocks. You guys have covered it all. Thank you so much. [59:50] The future is being created by you. [59:52] Thanks, Constantine. Thanks, Sonia. Thank you.

1:00:22-1:00:24

[1:00:22] you

Want to learn more?

Ask about this episode