Stability AI and Carper AI Labs have joined forces to introduce two groundbreaking large-scale language models, FreeWilly1 and FreeWilly2. Leveraging the foundation of Meta’s Llama model, these models aim to bridge the gap between small and large AI models, showcasing impressive performance in logic tasks and natural language understanding.
Both FreeWilly1 and FreeWilly2 are built on the robust foundation of Meta’s Llama model. FreeWilly1 is derived from Llama 65B, while FreeWilly2 utilizes the powerful Llama 2 70B. To enhance their performance, Stability AI leveraged synthetic datasets generated through the Supervised Fine-Tune (SFT) process, allowing for fine-tuning and elongation.
A key aspect of developing the FreeWilly models involved the implementation of the Orca Method, as outlined in Microsoft’s “Orca: Progressive Learning from Complex Explanation Traces of GPT-4.” Unlike traditional approaches that mimic large models’ output style, the Orca Method imparts step-by-step reasoning processes to smaller models. The goal is to achieve comparable performance to their larger counterparts.
To assess model performance, Stability AI adopted EleutherAI’s lm-eval-harness and introduced AGIEval, a human-centric benchmark for evaluating underlying models. Notably, FreeWilly2 demonstrates a clear advantage over FreeWilly1 and achieves an average improvement of approximately 4 points compared to Llama 2 in various benchmarks. While FreeWilly2 excels in most aspects, Llama 2 remains ahead in the vital general language understanding benchmark, MMLU.
Stability AI places a strong emphasis on responsible release practices for FreeWilly models. An internal red team rigorously tests the models for potential hazards. Moreover, the company actively seeks external feedback to further enhance safety measures and ensure ethical AI applications.