The world of artificial intelligence (AI) was taken by storm a few days ago with the launch of Deepseek-R1, an open source reasoning model that coincides with the performance of the main foundation models while claiming to have been built using A remarkably low training budget and new techniques after training. The launch of Deepseek-R1 not only challenged the conventional wisdom that surrounds the scale laws of the foundation models, which traditionally favor the mass training budgets, but they did so in the most active research area in the field: reasoning.
The open weight nature (unlike the open source) of the launch made the model easily accessible to the community of AI, which led to a wave of clones in a matter of hours. In addition, Depseek-R1 left his mark on the current career between China and the United States, reinforcing what has been increasingly evident: Chinese models are of high quality and are totally able to boost innovation with original ideas.
Unlike most advances in generative AI, which seem to expand the gap between web2 and web3 within the scope of fundamental models, the release of Deepseek-R1 has real implications and presents interesting opportunities for web3-AI. To evaluate them, we must first observe more closely the innovations and key differentiators of Deepseek-R1.
Within Deepseek-R1
Deepseek-R1 was the result of the introduction of incremental innovations in a pretrack framework well established for foundation models. In broad terms, Depseek-R1 follows the same training methodology as most high-profile base models. This approach consists of three key steps:
- Prediction: The model is initially caused to predict the following word using massive amounts of non -labeled data.
- Supervised authorization (SFT): This step optimizes the model in two critical areas: following instructions and answer questions.
- Alignment with human preferences: A final phase of fine adjustment is made to align the responses of the model with human preferences.
Most of the main fundamental models, including those developed by OpenAI, Google and Anthrope, adhere to this same general process. On a high level, the Deepseek-R1 training procedure does not seem significantly different. However, however, instead of making a base model from scratch, R1 took advantage of the base model of its predecessor, Deepseek-V3-Base, which has impressive 617 billion parameters.
In essence, Depseek-R1 is the result of applying SFT to Deepseek-V3-base with a large-scale reasoning data set. True innovation lies in the construction of these reasoning data sets, which are notoriously difficult to build.
First step: Deepseek-R1-Zero
One of the most important aspects of Deepseek-R1 is that the process did not produce a single model but two. Perhaps the most significant innovation of Deepseek-R1 was the creation of an intermediate model called R1-Zero, which specializes in reasoning tasks. This model was almost completely trained using reinforcement learning, with a minimal dependence on the data labeled.
Reinforcement learning is a technique in which a model is rewarded for generating correct answers, which allows you to generalize knowledge over time.
R1-Zero is quite impressive, since it could match GPT-O1 in reasoning tasks. However, the model fought with more general tasks, such as the question of the question and readability. That said, R1-Zero’s purpose was never to create a generalist model, but to demonstrate that it is possible to achieve state-of-the-art reasoning capabilities using reinforcement learning alone, even if the model does not work well in other areas. .
Second step: Deepseek-R1
Deepseek-R1 was designed to be a general purpose model that stands out in reasoning, which means that it needed to overcome R1-Zero. To achieve this, Depseek once again began with its V3 model, but this time, he adjusted it in a small set of reasoning data.
As mentioned above, reasoning data sets are difficult to produce. This is where R1-Zero played a crucial role. The intermediate model was used to generate a synthetic reasoning data set, which was then used to adjust Deepseek V3. This process resulted in another intermediate reasoning model, which was subsequently placed through an extensive reinforcement learning phase using a data set of 600,000 samples, also generated by R1-Zero. The final result of this process was Deepseek-R1.
Although I have omitted several technical details of the previous previous process, here are the two main conclusions:
- R1-Zero showed that it is possible to develop sophisticated reasoning capabilities using basic reinforcement learning. Although R1-Zero was not a strong generalist model, it successfully generated the reasoning data necessary for R1.
- R1 expanded the traditional pretrarium pipe used by most foundation models by incorporating R1-Zero in the process. In addition, he took advantage of a significant amount of synthetic reasoning data generated by R1-Zero.
As a result, Depseek-R1 emerged as a model that coincided with GPT-O1 reasoning capabilities while being built using a simpler prior process more and more cheaper prison.
Everyone agrees that R1 marks an important milestone in the history of the generative AI, one that will probably remode the way in which foundations models are developed. When it comes to web3, it will be interesting to explore how R1 influences the evolution of web3-AI.
Deepseek-R1 and Web3-Ai
Until now, web3 has struggled to establish convincing use cases that clearly add the value to the creation and use of foundation models. To some extent, the traditional workflow for base models prior to projection seems to be the antithesis of web3 architectures. However, despite being in its early stages, the launch of Deepseek-R1 has highlighted several opportunities that could naturally align with web3-AI architectures.
1) Learning reinforcement of fine adjustment networks
R1-Zero showed that it is possible to develop reasoning models using pure reinforcement learning. From the computational point of view, reinforcement learning is highly parallel, which makes it very suitable for decentralized networks. Imagine a web3 network where the nodes are compensated for adjusting a model in reinforcement learning tasks, each applying different strategies. This approach is much more feasible than other paradigms prior to pretrection that require complex topologies of GPUs and centralized infrastructure.
2) Generation of synthetic reasoning data set
Another key contribution of Deepseek-R1 was to show the importance of synthetically generated reasoning data sets for cognitive tasks. This process is also suitable for a decentralized network, where the nodes execute data generation works and compensate, since these data sets are used for base models prior to exercise or adjustment. Since these data are generated synthetically, the entire network can be completely automated without human intervention, so it is an ideal adjustment for web3 architectures.
3) Decentralized inference for small distilled reasoning models
Deepseek-R1 is a massive model with 671 billion parameters. However, almost immediately after its release, a wave of distilled reasoning models emerged, ranging from 1.5 to 70 billion parameters. These smaller models are significantly more practical for inference in decentralized networks. For example, a 1.5B – 2B distillate model could be integrated into a defined or implemented protocol within the nodes of a Depin network. More simply, we are likely to see the increase in the final points of profitable reasoning inference driven by decentralized computer networks. The reasoning is a domain in which the performance gap between small and large models is reducing, creating a unique opportunity for web3 to efficiently take advantage of these distilled models in the decentralized inference configuration.
4) Origin of Reasoning Data
One of the defining characteristics of reasoning models is its ability to generate traces of reasoning for a specific task. Deepseek-R1 makes these traces available as part of their inference output, which reinforces the importance of origin and traceability for reasoning tasks. Internet today operates mainly in exits, with little visibility of the intermediate steps that lead to these results. Web3 presents an opportunity to track and verify each step of reasoning, potentially creating a “new reasoning Internet” where transparency and verifiability become the norm.
Web3-AI has an opportunity in the era of reasoning after R1
The launch of Deepseek-R1 has marked a turning point in the evolution of generative AI. By combining intelligent innovations with established prison paradigms, it has challenged traditional workflows of AI and has opened a new era in Ia focused on reasoning. Unlike many previous base models, Depseek-R1 introduces elements that bring the generative AI closer to web3.
Key aspects of R1 – Synthetic reasoning data sets, more parallelizable training and the growing need for traceability – are naturally aligned with the principles of web3. While web3-AI has struggled to gain significant traction, this new era of subsequent reasoning can present the best opportunity so far so that web3 play a more important role in the future of AI.