
It has been about a week now since DeepSeek released its reasoning model R1 that is competitive with OpenAI"s o1. Much has been made about the fact it is open source and how it can be copied and built upon by developers. Now, Hugging Face, a community that lets you engage with various AI models, has announced the Open-R1 project to fill in the gaps left by components DeepSeek did not open source.
According to Hugging Face, while the weights used by DeepSeek are known, the datasets and code used to train the model are not. Through Open-R1, Hugging Face wants to fill in these gaps. This work is very important because DeepSeek R1 is very efficient and could act as a base model to innovate from. It can also be used as an affordable model by researchers, scientists, and businesses to promote innovation and breakthroughs.
The following steps are a brief action plan that Hugging Face shared to fill in the gaps:
- Step 1: Replicate the R1-Distill models by distilling a high-quality reasoning dataset from DeepSeek-R1.
- Step 2: Replicate the pure RL (Reinforcement Learning) pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.
- Step 3: Show we can go from base model → SFT → RL via multi-stage training.
What"s also interesting is that this work allows everyone to fine-tune existing and new LLMs into reasoning models, vastly improving their outputs. It said that this work will be useful because the process can help other players in AI avoid wasting time and compute on unproductive paths.
Hugging Face said the synthetic datasets it plans to build will not only focus on mathematics. It will explore other areas to provide benefits for other fields such as science, which could make it incredibly useful.
Image via Depositphotos.com