Chinese AI startup DeepSeek has disrupted global tech markets with its cost-effective, high-performance AI model. The company's innovative approach has led to significant impacts on major tech stocks.
On January 27, 2025, DeepSeek unveiled its AI Assistant, surpassing ChatGPT as the top-rated free app on the U.S. iOS App Store. The model was trained using approximately 2,000 Nvidia H800 GPUs over 55 days, costing around $5.58 million—about one-tenth of Meta's recent AI development expenses.
This development prompted a global selloff in technology stocks. Nvidia's stock dropped by up to 17–18%, while other tech firms like Microsoft and Alphabet also experienced declines. By January 28, 2025, approximately $1 trillion had been wiped off American stocks.
Industry leaders have had mixed reactions. Microsoft CEO Satya Nadella and OpenAI CEO Sam Altman praised DeepSeek's achievements, while others, including Elon Musk, expressed skepticism about the model's performance and the sustainability of its success.
DeepSeek's emergence has highlighted potential limitations of U.S. sanctions on China's AI development and raised questions about the future competitiveness of American AI models.
DeepSeek leveraged synthetic data generated by existing AI models, including OpenAI's o1, to enhance its training process. This approach involved using o1 to produce "chain-of-thought" reasoning scripts, which were then used to train DeepSeek's models. This method reduced the reliance on large volumes of human-created data and highlighted the effectiveness of synthetic data in advancing AI capabilities.
The training dataset for DeepSeek-V3 comprised 14.8 trillion tokens from a multilingual corpus, primarily in English and Chinese, with a higher proportion of mathematical and programming content compared to previous datasets. This focus enhanced the model's proficiency in reasoning and coding tasks.
By employing techniques such as dataset distillation, DeepSeek refined complex raw data into more concise and useful forms for training. This process involved denoising and dimensionality reduction, resulting in a distilled dataset that maintained or even surpassed the performance of models trained on original datasets.
These strategies allowed DeepSeek to train high-performance models efficiently, reducing computational costs and resource requirements. The use of synthetic data and dataset distillation not only minimized the need for extensive human-labeled data but also demonstrated the potential of alternative data sources in developing advanced AI systems.