Two frameworks for thinking about AI and growth

Mar 22, 2025

Epoch AI released a new (economic) model this week, which formalises the connection between R&D spend on AI, automation, and economic growth. The aim is to be able to create expressions that link a) the amount of R&D spend, to b) an view of model capabilities, with c) an implied level of automation, and d) the effect of this on economic growth, to finally map this to e) how much additional output is added to R&D spending for the next round of AI improvement. The most striking thing about the model is that if you accept their default parameters, it shows GWP growth exceeding 20% a year in both their conservative and aggressive scenarios.

Their approach, while interesting, has a major flaw. This comes in the step which maps AI capabilities onto some implication for tasks automated: they assumes that the fraction of tasks an AI system can complete in the economy grows as a parametric function of compute, and asymptotes at all tasks in the economy. (See below!)

This is an unreasonable assumption: AI systems can only complete cognitive tasks, not physical tasks. It doesn’t matter how much compute gets spent training the model, it won’t be able to flip a burger! Any model which predicts that AIs could do near 100% of tasks would need to account for progress in robotics.

Earlier work by Epoch estimates 34% of tasks could be done remotely during the Pandemic, and so therefore is exposed to AI automation. This seems like a more reasonable approximation, though it could still be an overestimate, for reasons my coauthor and I wrote here. When I limited the model’s assumptions to only make it possible to automate 34% of tasks, growth spiked to 24% and 17% for 2028 but stabilised thereafter. (See below!) As I understand it, the spike can be explained by the fact the model assumes adoption is almost seamless, which is too aggressive in my view.

Despite this gap, the model is still an admirable attempt to link all these variables together and so I recommend further exploration. The full paper is here. In general, there are very few attempts to link growth with AI capabilities, that are serious about the kinds of progress we can expect in the near future.

Elsewhere, METR released a report that attempts to measure how quickly AI systems are gaining the ability to complete long-horizon tasks. This quantifies Richard Ngo’s framework of thinking about AGI in terms of 1-second AGI, 1-minute AGI, and so on until 1-month AGI. They find:

If we extrapolate this trend out naively, we reach day-long AGIs in 2028 and week-long AGIs 2030. (Remember this is log-linear, so the growth is actually exponential.)

Credit to Daniel Eth for doing the extrapolation

I think there’s three main reasons someone could doubt this:

Naive extrapolation isn’t always trustworthy
Perhaps the tasks they sampled aren’t representative of other domains
A 50% success rate for tasks does not imply the models will be economically useful.

I’m unconcerned about each of these.

To the first rebuttal, the kind of training we use to make the models work for longer—reinforcement learning—has scaled well in other domains when we use simple ideas. The recent DeepSeek R1 paper showed an earlier version of their model ‘R1 Zero’ was trained on end-to-end reinforcement learning. This means the researchers only provided it with objectives, and didn’t control in any way how it answered these questions. Within the training process, it developed the capacity to correct its own mistakes. There are certainly challenges to scaling up reinforcement learning; but these are more finicky than extreme breakthroughs necessary.

The benchmark only uses a suite of software engineering tasks, and so perhaps it is difficult to generalise across domains, but within this category they used over 100 tasks of different lengths. More on applicability from the paper’s authors here.

Finally, only charting a 50% success rate doesn’t seem like a holdup to me. The chart below shows how models are becoming increasingly reliable at relatively ‘shorter’ tasks, as their ability to act for longer periods of time grows.

I think it makes sense to operate on the default assumption of “week-long AGI” by 2028, “month-long AGI” by 2030. Even if you doubt these specifics, METR note:

The steepness of the trend means that our forecasts about when different capabilities will arrive are relatively robust even to large errors in measurement or in the comparisons between models and humans. For example, if the absolute measurements are off by a factor of 10x, that only changes the arrival time by around 2 years.

So, in essence, if one’s baseline assumption is not “week-long AGI” for 2030, I just don’t know where that level of confidence would come from.

This graph causes me to reflect on adoption rates specifically. The first-order effect is how much easier it will be to adopt AI that can work on long tasks. Right now, we are bogged down by needing to give the model ‘step-by-step supervision’. The second-order effect is how much more intelligence we’ll want to consume, when models have become even more useful. I probably ask Deep Research to write ~5 reports/day at the moment, which might equate to having a couple of conscientious interns. When these agents can complete week-long tasks, I imagine wanting to employ a whole army for all the side quests that I never have time for. I expect my job will look primarily like an editor and a manager, rather than a doer.

I expect adoption to go faster than previous general-purpose technology adoption cycles, which are limited to 0.5-1% TFP growth per year. It is not-yet clear to me how this affects employment as a whole.

Both frameworks cause pause for thought.