Stability AI has released StableVicuna, the first large-scale open-source chatbot trained with human feedback. The Vicuna chatbot was introduced in early April, and the model is a 13 billion-parameter LLaMA model modified with the Alpaca formula. What sets the Vicuna variant apart is that it was improved using a process called “Reinforcement Learning with Human Feedback” (RLHF) LLM.
StableVicuna does more than just text generation and can write code and perform simple math, according to Stability AI. Although StableVicuna performs similarly to previously released open-source chatbots, Stability AI plans to develop it further and launch it on Discord soon. A demo is currently available on HuggingFace, and StableVicuna will also be available through a chat interface soon.
At Hugging Face, developers can download the model’s weights as a delta to the original LLaMA model. Users who want to access StableVicuna themselves will, however, require access to the original LLaMA, which can be obtained from the company site. It should be noted that commercial use is not permitted.
The issue with open-source chatbots enhanced with generated chatbot data is the potential of an echo chamber in which AI models reinforce their current flaws and biases through ever-new training cycles. Furthermore, fine-tuning training data can reinforce hallucinations if it contains information not actually present in the original model.
The success of ChatGPT, the most current large-scale language model trained on the GPT architecture, was due to reinforcement learning with human feedback (RLHF). RLHF was also used in the development of StableVicuna. By providing small-scale feedback work of thousands of people on the usefulness of tens of thousands of chat outputs, the chatbot was tuned to always have an appropriate response ready.
RLHF also guarantees that the chatbot’s output adheres to social standards. GPT-4 would be far more difficult to utilise without RLHF and might produce severe material that encourages crime or indicates the systematic destruction of mankind.
The release of StableVicuna marks a significant step forward in developing open-source chatbots that incorporate human feedback. By continuing to refine and develop these models, researchers and developers can help ensure that the technology remains ethical and responsible.