China’s new AI star DeepSeek grew on more than a shoestring budget
In 2022, a team of AI engineers in southern China working for the parent company of DeepSeek proudly unveiled a new AI supercomputer and said they were making it available free of charge to researchers across the nation. The sleek banks of gray machines whirring beneath fluorescent lights had been constructed from 10,000 A100 graphics processing units, or GPUs, purchased from the U.S. company Nvidia, the world’s leading maker of AI chips. The 2021 bulk buy was timely. By autumn 2022, Washington had banned Nvidia from selling any more A100s to China, in an effort to slow the country’s AI advances. Meanwhile, Nvidia chips were becoming scarce worldwide since the launch of ChatGPT in late 2022 sparked a global hoarding frenzy. DeepSeek’s new chatbot caused a panic in Silicon Valley and on Wall Street this week, erasing $1 trillion from the stock market. That impact stemmed in large part from the company’s claim that it had trained one of its recent models on a minuscule $5.6 million in computing costs and with only 2,000 or so of Nvidia’s less-advanced H800 chips. Nvidia saw its soaring value crater by $589 billion Monday as DeepSeek rocketed to the top of download charts, prompting President Donald Trump to call for U.S. industry to be “laser focused” on competing.
Nvidia CEO Jensen Huang met with Trump on Friday, as U.S. officials probe whether DeepSeek may have bought advanced Nvidia chips through third parties to circumvent U.S. export controls.
The earlier supercomputing project by DeepSeek’s parent company, High-Flyer, helped forge connections to AI researchers across China and suggests that DeepSeek had a major boost before it seemed to hurtle out of nowhere in recent days with technology comparable to that of the leading U.S. AI companies.
Much about how the company achieved this feat is unclear. But a closer look at DeepSeek reveals that its parent company deployed a large and sophisticated chip set in its supercomputer, leading experts to assess the total cost of the project as much higher than the relatively paltry sum that U.S. markets reacted to this week.
While High-Flyer has been around since 2015, DeepSeek is officially only two years old. ChatGPT’s maker, OpenAI, has said it is investigating whether DeepSeek may have “inappropriately” distilled its models.
DeepSeek and High-Flyer did not respond to requests for comment.
DeepSeek’s claim that it spent only $5.6 million in training one of its advanced models highlighted a tiny sum compared with the costs that U.S. companies have been underwriting.
OpenAI chief executive Sam Altman has said GPT-4 cost more than $100 million to train.
The research firm SemiAnalysis estimated Friday that DeepSeek has spent more than half a billion dollars on GPUs.
As DeepSeek’s star has risen, Liang Wenfeng has recently received shows of governmental favor in China. news from The Washington Post