- Zerostoch of Alibaba can generate training material for its AI
- Cost savings of up to 88% are possible
- Technology requires additional GPUs
The Alibaba Tongyi laboratory has found a way to train AI search models without using real search engines, which says they can reduce search training costs by up to 88% compared to commercial APIs such as Google.
In an article entitled “Incentivize LLM’s searching ability without searching,” Alibaba explains how development uses simulated documents generated by AI to imitate the real search engine.
Interestingly, Alibaba researchers also point out that the use of simulated documents can really improve the quality of training, because “the quality of documents returned by search engines is often unpredictable” and runs the risk of introducing noise into the training process.
Alibaba will train the search models in documents generated by AI
“The main difference between a real search engine and a simulation LLM lies in the textual style of the returned content,” the researchers wrote. Zerostoch can also gradually degrade the quality of documents to simulate increasingly challenging recovery scenarios.
Of course, the key benefit for this technology is the significant cost savings available. Training with the 14B model of Zerostoch costs around $ 70.80 for 64,000 consultations, compared to around $ 586.70 through Google API. The costs are even lower for the 7B and 3B models, at $ 35.40 and $ 17.70 for every 64,000 consultations, and yet the three Zerostoch models and the Google API method take the same amount of time.
However, Alibaba acknowledged that one, two or four A100 GPUs are required for its Zerostoch method, compared to the GPU requirement through the Google API method, which could present a negative impact in terms of sustainability, such as energy consumption and emissions.
“Our approach has certain limitations. The implementation of the simulated LLM search requires access to GPU servers. Although it is more profitable than the commercial use of API, this introduces additional infrastructure costs,” the researchers concluded.
Even so, challenging the dependence on expensive and closed platforms such as Google search APIs and reducing costs could help further democratize AI development.