I got a bit tired of people trying to shut down any discussion about AI with “we really need to think about the environmental impact”. So to win an argument on the internet, I did a bit of research and some basic multiplication, and came to an interesting conclusion:
For a given paragraph, ChatGPT uses less energy to generate it than your laptop does while you type it.
To explain that briefly, we published something on our start-up (plinth) blog. The logic is quite simple: back-calculating the energy usage of a marginal paragraph by using the price per token of the API.
But to really hammer this point home, we also made a quick game where you can try and be more energy efficient than the AI (by typing fast):
The blog post was quite short, and did quickly brush over a few things that it would be good to look at in more detail, so here essentially are the footnotes:
Footnotes:
1. Fixed cost of training the models:
The calculation was done entirely on the basis of a “marginal paragraph” — ignoring the energy used to train the model in the first place. This makes it much easier to get a straight comparison. Once you include the fixed cost, the average cost per paragraph decreases the more you use it. So if the model already exists, it’s almost a waste not to use it. But the more interesting question is should you have trained the model in the first place (from an energy usage perspective)?
The best (most commonly cited) estimate of energy usage I could find was from GPT-3, saying it took 936 MWh to train the model in 2020. That’s quite an outdated model, so it would probably take more energy to train something newer. However, given that training costs keep falling 70% per year, we’ll use the ~1 GWh figure as a rough benchmark for ChatGPT-quality models.
As this Hacker News comment points out, that’s not that much. It’s less than a single long-haul flight. But to bring it back to our toy example, how does that compare to the energy usage of laptops?
Apparently, the average laptop releases ~300kg of carbon in its manufacture. With a 500g of CO2 per kWh of electricity generated (in China), that’s 600 kWh per laptop. Or put another way, you can make ~1,600 laptops for the same energy as training an LLM.
Given that ChatGPT has 100 million + users, and a fair number of businesses are using it to replace the need to hire more people in certain admin roles… it’s probably accurate to say that it’s reduced the need for buying laptops by significantly more than 1,600. You could understand why no one is particularly keen to publicise this argument though.
2. Laptop power usage:
In the original post/game, I used 50W as the power usage of a laptop. A couple of people have queried this figure as unrepresentative. It turns out they all use Macs (I’m still on a filthy Windows machine with a massive power adapter). I struggled to find energy usage statistics by laptop model, but it does seem just from comparing power usage and charger sizes in the office, that Apple machines are dramatically more energy efficient? If anyone has some good statistics for this, I’d love to see them.
3. OpenAI’s margin:
I assumed that OpenAI was making 0% operating profit from the GPT3.5 models through their API. This is probably unrealistic (though I could see an argument for them releasing it at a loss and betting they can improve efficiency through economies of scale to get to breakeven/profit).
However, with the way the calculation works, if you assume they make a >0% margin, the relative energy usage of the AI model in the calculation improves (since we’re assuming 40% of the cost goes on electricity. If OpenAI is also taking a margin, the amount of electricity bought for the same fixed price falls).
4. Other models / complex prompts:
The calculation was done on the basis of the newer GPT3.5 models. GPT4 is ~20x more expensive through the API. By the same logic, GPT4 does use ~10x more energy than the average person typing on an average laptop. Hopefully that falls soon.
I also used price per output token — implicitly assuming there’s no significant prompting of the model. Based on the work we’ve been doing ourselves to put AI into production, that seems pretty unrealistic. By the time you’ve started with RAG, or even just some basic prompt engineering, we often end up with massively long prompts, and sometimes chaining multiple prompts together.
This definitely increases the cost and energy usage, but comparing back to a person on a laptop, is probably analogous to a person researching on the internet or thinking about what to write.
5. Energy usage vs carbon emissions:
I suppose there is an argument that comparing energy usage of LLMs to laptops isn’t as useful as comparing the carbon emissions directly — that is, if you think electricity used by OpenAI’s data centres is significantly more or less carbon intensive than the energy to your house/office.
Going back to our carbon intensity map from earlier, that probably is a fair comment if you live in Iceland, Norway or France, given that their electricity is 5-10x cleaner than the USA. It’s definitely less true otherwise.
However, though it’s hard to know how much of this sustainability page is baseless green-washing, it does seem that data centre companies are more-focused-than-average on reducing the carbon intensity of the energy they use. I’m not sure how much that impacts the calculation, but seems to point towards being in favour of the LLM models.
6. Jevons Paradox
Maybe making it so quick and easy to write text will encourage people to do it more? So the overall impact will be an increase in energy usage? It’s not a new thought — William Jevons first mentioned it in 1865. And it’s probably true? I’ve asked ChatGPT to rewrite grant applications in the style of a sarcastic pirate. I doubt I’d do that on my own time.
But, if we end up using a bit more energy to get a lot more done, isn’t that a good thing? We’d all get to spend more time and effort working on other things. Maybe it could remove the hassle of filling out 100s of pages of pointless planning documents for building new solar farms? Or maybe we could spend less time doing admin, and more time enjoying life? Either way, seems pretty great.
7. Other future developments
Maybe things will improve even faster than we expect? The new hardware model from Groq seems to be stupidly fast for LLMs. I’m guessing we’ll see more similar developments to really reduce these costs.