Foxconn unveils its own large language model distilled from Meta's Llama 3.1

Foxconn, the company that is well known for assembling iPhones and other Apple products, has announced its first large language model (LLM), called FoxBrain, and intends to use it to improve manufacturing and supply chain management.

In a statement, the Taiwanese company said that FoxBrain was trained using just 120 H100 GPUs from Nvidia. The LLM is based on Meta's Llama 3.1 architecture with 70B parameters using distillation. Distillation of a model involves using a parent model and training the "child" model based on its responses. Foxconn also acknowledged that its LLM wasn't as good as China's DeepSeek distillation model, but the overall performance is very close to world-class standards.

Dr. Yung-Hui Li, Director of the Artificial Intelligence Research Center at Hon Hai Research Institute, said:

"In recent months, the deepening of reasoning capabilities and the efficient use of GPUs have gradually become the mainstream development in the field of AI. Our FoxBrain model adopted a very efficient training strategy, focusing on optimizing the training process rather than blindly accumulating computing power.

Through carefully designed training methods and resource optimization, we have successfully built a local AI model with powerful reasoning capabilities."

Foxconn not only assembles Apple products but also produces Nvidia's artificial intelligence servers. Along with the 120 H100 GPUs, FoxBrain was scaled with Nvidia's Quantum-2 InfiniBand networking, and the training was finished in just about four weeks (with a total computational cost of 2,688 GPU days). Foxconn was able to generate 98B tokens of high-quality pre-training data in traditional Chinese with a context window length of 128 K tokens.

FoxBrain benchmarks — TMMLU+ benchmark results of FoxBrain, Meta-Llama-3.1-70B and Taiwan-Llama-70B

Foxconn and Nvidia's partnership isn't new, and both companies are also working on other projects, including building the world's largest facility for manufacturing Blackwell GPUs.

Nvidia also provided Foxconn with its Taipei-1 Supercomputer to complete the pre-training of the model. Foxconn said that FoxBrain will become an "important engine" to upgrade its three major platforms: Smart Manufacturing, Smart EV, and Smart City.