Q1 2024 State of Venture Update
From Devin to Robotic Foundation Models: AI's Transformative Potential
I apologize for the delay in this update. To be honest, I was waiting for some data to be released and was curious to see if GPT-4.5/5 would make an appearance in late April. Unfortunately, there doesn't seem to be a significant change happening in that regard. However, I am glad I waited, as a flurry of model releases in April made things interesting.
In notable news, both InflectionAI and StabilityAI appear to have stumbled, with the former being acquired. While both products may continue to exist in some form, I believe these situations highlight an obvious reality: there is a lack of durability in the value of a Large Language Model (LLM), as they are quickly depreciating assets. In this sense, pure LLM companies seem to be following a path more akin to memory manufacturers: high R&D costs for sustaining investments, low switching costs, and long-term low margins. This is a shift from my previous mental model, where I thought they would more closely resemble real estate investments in growing areas, characterized by high capital costs but long-term steady cash flows and persisting terminal values. (This doesn't mean foundation model companies will be valueless; rather, I think much of their durable value will come from their services, software, infrastructure, and tooling.)
Personally, it's been a delightful quarter in San Francisco. There is an abundance of optimism in the community centered around AI and all the new possibilities emerging. It reminds me greatly of being here during the 2012-2014 period. Far fewer tourists are coming simply because "tech is hot." Instead, young founders and engineers are flocking here because they are passionate about technology and the new opportunities that LLMs are creating. It's a welcome change after the COVID lull to witness the rebirth of tech in the Bay Area. I feel blessed to be a part of it and to experience this pivotal time in human history—in many ways, it must be similar to what our predecessors, like the Traitorous 8, felt during their platform transition.
VC Market Update:
Happy New Year! It's 2021! Well, at least for some of the market. The growth market has been a tale of two markets, with the haves and the have-nots. If you're a fast-growing company in AI or showing signs of category leadership and capital efficiency in another area (such as SaaS or defense), it's like 2021 again. The Information had a good article outlining this, noting that the top 8 LLM-oriented companies have raised at an average of 83 times their Annual Recurring Revenue (ARR) multiple.
This intersects with a lot of what we're seeing -- and in fact, some of what we see in AI is actually crazier than that.
In regular Software-land, we are also seeing some fairly lofty valuations, with growth rounds for efficient and fast-growing companies pricing at 40-80 times their revenue in some cases (I won't name them)! This is happening with multiple term sheets, often up to 6-10. I believe this is a result of the lull we've seen over the last two years and the overabundance of venture capitalists. If you hire investors, they, well, invest. When growth has been scarce for 2 years, there are a lot of people sitting on their hands who will move, and move quickly. I said 2024 was likely to be a great year to fundraise, and I think that holds true.
However, make no mistake, this is not the case for everyone. When I pressure-tested these thoughts with some trusted entrepreneurs, not all of them saw what I was seeing. If you aren't AI-adjacent and don't have particularly fast or efficient growth, it can be rough out there. In normal cases, your valuation multiple has adjusted down to reality, and it can be challenging to raise funds. But at least, in contrast to 2022/2023, there are some significant pockets of life.
What's amusing, though, about the 2021-ification of part of the growth market is that it's significantly more absurd to be doing this today than it was in 2021. At least in 2021, the public market multiples were high; today, they have returned to the old normal. As I mentioned before, when there is an excess of capital, the remedy is that capital must be destroyed!
Macro
To start my Macro update, I will point out that my previous inflation prediction proved accurate more quickly than I anticipated. The market is now pricing in just 1-2 rate cuts this year, with a 20% chance of no cuts at all. This shift in expectations is due to weak GDP data and persistently high inflation. GDP growth for Q1 came in at a disappointing 1.6%, significantly below the expected 2.5%. Meanwhile, inflation remains elevated, with CPI at 3.5% and Core CPI at 3.8%. The Fed's preferred inflation measure, PCE, rose by 3.7%, although I don't put as much stock in this metric.
The key takeaway is that inflation has proven to be quite sticky, and with growth slowing down, we may in the future face the risk of stagflation. Stagflation is a challenging economic condition characterized by low growth and high inflation, which limits the tools available to policy-makers. In this scenario, the only way to combat inflation would be to deliberately trigger a deep recession, a course of action most would prefer to avoid. We're not there yet, but it will be a risk in the future.
Why has inflation proven to be more persistent than most analysts anticipated? As I mentioned in my previous update, several factors contribute to our current inflationary environment: monetary, fiscal, global and even structural (deglobalization). These factors combine to create a "perfect storm" for inflation, making it more challenging to control and likely to persist for longer than initially expected.
Despite stagflation indicators, the markets have shown surprising resilience. The 10-year bond yield has risen significantly, from around 3.9% at the beginning of the year to 4.7%. In spite of that sell-off (bond prices move inverse to the rate), the NASDAQ remains near all-time highs. This is an unusual situation, as growth stocks are generally inversely correlated with interest rates. When rates rise, growth stocks tend to underperform, as their future cash flows are discounted at a higher rate.
I believe the strength of growth stocks can be attributed to an increase in liquidity, driven by the draining of the RRF (Reserve Repo Facility) and TGA (Treasury General Account). As these facilities are drawn down, the excess liquidity is finding its way into the stock market, propping up valuations. I expect this liquidity to persist for most of the year, supporting growth stocks even in the face of rising interest rates.
Bio
The most significant development in the Biotech sector this quarter was the highest level of public market equity issuance since 2021. This indicates that public markets are becoming increasingly receptive to Biotech companies, marking a substantial improvement from the past two years. Although we haven't observed significant changes in the private market yet, this increased public market activity is a promising sign for the future of the Biotech industry.
AI - Robot Foundation Models
This quarter, several AI developments stood out as particularly significant. While there is a plethora of activity in the LLM space, I want to first focus on the innovations happening in the robotics market.
While current language model innovations will undoubtedly be disruptive to the knowledge economy, I believe the greater disruption will occur when robotics revolutionizes labor and the physical world. This quarter, the robotics market made several advances through the funding of Robotic Foundation Models (RFMs) — pretrained, generalized models that can embody physical intelligence and be agnostic to form factors. Although we are still far from seeing these models deployed in the field (perhaps 5 years away from a GPT-like moment), the progress is nonetheless impressive.
To give you a sense of what I mean, take a look at the video from my friend Ilija in the following tweet:
He trained a transformer in (mostly) simulation and deployed it to the real-world zero-shot. Pretty cool!
There is a lot of work in this direction, which I'll summarize below:
Physical Intelligence raised $70m - Scarce info, but developing foundation models for physically-actuated devices. A video of Chelsea’s work on ALOHA is here.
Skilld is rumored to raise $300m - Focused on developing RFM for quad and biped robots. The founders seem to be very strong, coming out of CMU. Deepak posted a public video which may show a bit of what they're working on
Figure raised $675m - They make humanoid hardware but are rumored to be working on a model too.
Covariant launched RFM-1 - Covariant released a multi-modal model which can be used to teach and task the model to output action instructions for arbitrary bin-picking.
Tesla Optimus to Launch by Next Year - Optimus is also both hardware and software. I'm not positive they are training a foundation model, but rumors suggest they are. Video here.
NVIDIA Announces Project GR00T - NVIDIA jumped in the ring with GR00T, where they will train foundation models for humanoid robots. It seems they will heavily train it on simulation, which could benefit from their Omniverse platform. They also announced a SOC for robotics, Jetson Thor, suggesting they are heavily investing in the physical world.
There's slight variation in approach (humanoid-focused, manipulation-focused), but the ambition is real: models to automate physical work.
These models can potentially have a greater impact beyond just labor. There's an ongoing debate in the AI community on whether LLMs could truly develop higher-order reasoning or AGI (Artificial General Intelligence) based on language alone. The argument is that much of the knowledge embodied in our minds is physical, not just conceptual. As children, the first things we learn are rules about the world around us, not necessarily higher-order concepts or tasks. A thought-provoking podcast by the renowned researcher Jitendra Malik delves into this academic debate with Pieter Abbeel.
In my view, RFMs are going to be big; the question is one of timing—will it take 5 years or 10 years? My bear case argues that these RFMs will, over time, only eliminate the need for most physical labor in society. The bull case, on the other hand, posits that RFMs are the path to achieving AGI and higher-order reasoning.
AI - The real meaning of Devin
For those who missed it, there was a particularly interesting launch this quarter of a company named Cognition. They released an autonomous coding agent that they described as the world's first AI software engineer, called Devin. If you haven't seen it, I think it's worth watching the quick launch video they provided.
The company had one of the most impressive launches I've ever seen for a waitlisted product, making it a household name in the developer community overnight. These types of extreme hype-cycle launches always make me skeptical, and I started off feeling that way about Cognition as well.
However, after trying Devin, I have to say it is incredibly impressive and an important proof of concept for the future of AI-driven application software. I know that's a pretty bold statement to make, especially for a product that most people can't try yet. But I thought I'd share my surface-level observations on what I think Devin does well in advancing the field of AI-enabled application software and how it has illuminated the path forward for software design.
First, I think calling Devin an "AI Software Engineer" is actually a misnomer. I definitely believe this is the ultimate vision, and that it'll get there eventually, but in many ways, the core of Devin's success lies in its design as a system that harnesses man-machine symbiosis. In our tests and by design, Devin is incredibly powerful but does have limitations. We only tried it on net-new software/scripts, whereas most software engineering involves maintenance rather than creating entirely new code. Requirements are input to the system via prompt, whereas in reality, they often encompass product requirements documents (PRDs), other spec documents, and Figma mocks. Lastly, Devin often doesn't get projects perfect; in most tests, Devin wrote a significant amount of code and got things mostly right, but rarely exactly right.
So, what is awesome here?
Even in isolation, while not perfect, you first have to recognize that the capability slope is amazing. In well under a year, the team has created enough tooling to beat other models by at least 3x on the SWE-Bench benchmark. That's pretty incredible. Imagine how good this will get with a few years of continued development.
Next, I believe the design of the system to enable man-machine symbiosis is part of its genius. Devin exposes a very intuitive interface that allows the human engineer to easily see what Devin is doing and correct it in real-time. On the left side is a chat interface where you communicate with Devin, and on the right is your control panel/canvas, which allows you to see what Devin is doing and provide feedback to steer him down the right path when he does something wrong or gets stuck. The UI serves as your command center. Devin is your pair programmer rather than a standalone engineer. The human acts as the grader, while Devin acts as the writer. This paradigm is likely to be adopted by most highly-performance AI apps in the near term, where the models we use are mostly right but not exactly right.
Lastly, I believe Devin demonstrates what the future of interesting "thick-layer" LLM-enabled applications will look like. To make the system work, Cognition had to build a lot of supportive multi-agent tooling and provide Devin with enough extensions to perform proper planning, reasoning, and possess the necessary skills to interact with the outside world (e.g., posting on developer forums, searching codebases for context, etc...). This contradicts my previous update, where I thought integrating LLMs into applications would be straightforward, requiring only smart prompting. In reality, harnessing the true power of LLMs might require building fairly complex tooling around the system. This insight is quite profound, as it opens up the door to much more potential for a) disruption of incumbent systems that fail to get this tooling right, and b) an archetype of AI company where there is a great deal of complexity/defensibility that "looks" a lot like a typical SaaS investment.
In one fell swoop, Devin has rejuvenated the multi-agent space, which until now was quite unpopular. I believe there's a real opportunity for people to create these types of helper-AI systems in every domain. Wherever there is knowledge work, I think there will be grader-writer systems that enable man-machine symbiosis to give humans superpowers. In most domains, it will be a race to see if incumbents can build these systems before new entrants bring them to market.
AI - Model convergence
This quarter was a very important quarter for language models. I'd argue it was the most significant quarter after the ChatGPT release. We saw notable releases from Databricks' DBRX, MistralAI's Mistral 8x22B, Cohere's Command R/R+, Llama 3 from Meta, and interesting models from RekaAI.
Most notably, I believe the perceived infallibility of OpenAI was challenged, as for part of the quarter, Anthropic's Claude 3 Opus outperformed GPT-4-turbo, perhaps marking the first time OpenAI was not the clear leader (this has since been resolved, as it seems GPT-4's performance had temporarily degraded mid-quarter). I think we'll see a new release from OpenAI soon (possibly GPT-4.5) that provides a significant performance boost, but it's certainly true that their competitive edge has eroded.
While it's arguable who was the best model now, I think it's not arguable to observe that everyone is getting *really* close to one another, including Llama 3-70b which is actually Open-Source. This widespread convergence of models has happened much much faster than I anticipated and I think really brings into question the durability of value captured by the LLM-layer itself. Especially since switching costs are very low.
There are also questions posed on the nature of competition and who will ultimately lead the LLM race. The traditional view is that we are in a world where all that matters is data and compute; those with the most capital (the hyperscalers or those who can raise seemingly infinite amounts) are likely to meet at the top. Reka, model #13, may be an interesting counterexample. They have publicly only raised $50m but are competing with much more resourced competitors.
I have a sneaking suspicion that as we move forward over the coming decade, more novel mid-quality data and increased compute for training will not drive the major gains in creating the best LLMs. Some scale will be table-stakes especially with current architectures; but aside from models having logarithmic scaling, I believe this because we are already discovering more compute-efficient ways to train models, and this trend will likely continue. This makes intuitive sense to me; if we consider how humans learn, we don't need to read the entire internet to acquire knowledge. Similarly, on the data side, we probably already have all the data necessary to learn most human knowledge—the frontier likely lies in either making better use of existing data or creating synthetic data from high-quality sources to enhance learning. I bet the future step-function advances will be algorithmic -- novel architectures, tooling and compute optimization will reign supreme and eventually increase competition further.
Now, it's certainly possible that proprietary data sources will help drive some differentiation. I can see the argument for a network effect driving better fine-tuning via Direct Preference Optimization as a potential layer of proprietary performance enhancement. Or high-quality proprietary data sources in general to make custom-models.
But I think on balance given the speed of model capability convergence, and the ease of switching, it appears that eventually the LLM-layer will be a commodity product. True convergence probably won't be seen in the short-term while there will be lots of iteration on architecture and tooling to drive outperformance, but as all of that stabilizes, LLM-capability won't drive much profit. That is, unless we make significant advances on the tooling or architecture layer that are not broadly disseminated by a current player (OpenAI?).
Despite these competitive considerations, the LLM funding market continues to thrive, with Open Source Mistral apparently raising funds at a $5 billion valuation even after the release of Llama 3! I don't quite get the mania -- especially since Meta has made it clear they're willing to play in Open Source for a long time.
Conclusion
I conclude this update feeling both optimistic and humbled. The technological frontier is evolving at a rapid pace, and many of the beliefs I held in the previous update are no longer true. This is partially due to unpredictable changes and partially a result of further reflection on my part. In either case, I'm going to strive to be honest and unattached to prior views when updating my perspective. To be a great innovator, I don't believe you always need to be right, but you must recognize when you're wrong and adapt your viewpoint more quickly than your competitors.
PS - For those waiting to read my thoughts on natural growth rates, I've decided to upgrade (or downgrade?) that topic to a standalone article.
Interesting things to read/watch
Scalable AI Architectures by Kevin Niechen @ 8vc
Whitney Baker on the Death of American Exceptionalism
Jeff Dean (Google): Exciting Trends in Machine Learning
Jitendra Malik: Building AI from the ground-up, sensorimotor before language