Smaller is smarter
Concerns about the environmental impacts of Large Language Models (LLMs) are growing. Although detailed information about the actual costs of LLMs can be difficult to find, let’s attempt to gather some facts to understand the scale.
Since comprehensive data on ChatGPT-4 is not readily available, we can consider Llama 3.1 405B as an example. This open-source model from Meta is arguably the most “transparent” LLM to date. Based on various benchmarks, Llama 3.1 405B is comparable to ChatGPT-4, providing a reasonable basis for understanding LLMs within this range.
Inference
The hardware requirements to run the 32-bit version of this model range from 1,620 to 1,944 GB of GPU memory, depending on the source (substratus, HuggingFace). For a conservative estimate, let’s use the lower figure of 1,620 GB. To put this into perspective — acknowledging that this is a simplified analogy — 1,620 GB of GPU memory is roughly equivalent to the combined memory of 100 standard MacBook Pros (16GB each). So, when you ask one of these LLMs for a tiramisu recipe in Shakespearean style, it takes the power of 100 MacBook Pros to give you an answer.
Training
I’m attempting to translate these figures into something more tangible… though this doesn’t include the training costs, which are estimated to involve around 16,000 GPUs at an approximate cost of $60 million USD (excluding hardware costs) — a significant investment from Meta — in a process that took around 80 days. In terms of electricity consumption, training required 11 GWh.
The annual electricity consumption per person in a country like France is approximately 2,300 kWh. Thus, 11 GWh corresponds to the yearly electricity usage of about 4,782 people. This consumption resulted in the release of approximately 5,000 tons of CO₂-equivalent greenhouse gases (based on the European average), , although this figure can easily double depending on the country where the model was trained.
For comparison, burning 1 liter of diesel produces 2.54 kg of CO₂. Therefore, training Llama 3.1 405B — in a country like France — is roughly equivalent to the emissions from burning around 2 million liters of diesel. This translates to approximately 28 million kilometers of car travel. I think that provides enough perspective… and I haven’t even mentioned the water required to cool the GPUs!
Sustainability
Clearly, AI is still in its infancy, and we can anticipate more optimal and sustainable solutions to emerge over time. However, in this intense race, OpenAI’s financial landscape highlights a significant disparity between its revenues and operational expenses, particularly in relation to inference costs. In 2024, the company is projected to spend approximately $4 billion on processing power provided by Microsoft for inference workloads, while its annual revenue is estimated to range between $3.5 billion and $4.5 billion. This means that inference costs alone nearly match — or even exceed — OpenAI’s total revenue (deeplearning.ai).
All of this is happening in a context where experts are announcing a performance plateau for AI models (scaling paradigm). Increasing model size and GPUs are yielding significantly diminished returns compared to previous leaps, such as the advancements GPT-4 achieved over GPT-3. “The pursuit of AGI has always been unrealistic, and the ‘bigger is better’ approach to AI was bound to hit a limit eventually — and I think this is what we’re seeing here” said Sasha Luccioni, researcher and AI lead at startup Hugging Face.
And now?
But don’t get me wrong — I’m not putting AI on trial, because I love it! This research phase is absolutely a normal stage in the development of AI. However, I believe we need to exercise common sense in how we use AI: we can’t use a bazooka to kill a mosquito every time. AI must be made sustainable — not only to protect our environment but also to address social divides. Indeed, the risk of leaving the Global South behind in the AI race due to high costs and resource demands would represent a significant failure in this new intelligence revolution..
So, do you really need the full power of ChatGPT to handle the simplest tasks in your RAG pipeline? Are you looking to control your operational costs? Do you want complete end-to-end control over your pipeline? Are you concerned about your private data circulating on the web? Or perhaps you’re simply mindful of AI’s impact and committed to its conscious use?
SLM can be a smarter choice!
Small language models (SLMs) offer an excellent alternative worth exploring. They can run on your local infrastructure and, when combined with human intelligence, deliver substantial value. Although there is no universally agreed definition of an SLM — in 2019, for instance, GPT-2 with its 1.5 billion parameters was considered an LLM, which is no longer the case — I am referring to models such as Mistral 7B, Llama-3.2 3B, or Phi3.5, to name a few. These models can operate on a “good” computer, resulting in a much smaller carbon footprint while ensuring the confidentiality of your data when installed on-premise. Although they are less versatile, when used wisely for specific tasks, they can still provide significant value — while being more environmentally virtuous.
Smaller is smarter was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
from Datascience in Towards Data Science on Medium https://ift.tt/8JzuqBP
via IFTTT