Anthropic, a leading artificial intelligence (AI) company, has announced its new approach called “Constitutional AI” to address ethical and social challenges posed by powerful AI systems.
This approach will instill values defined in a “constitution” into AI systems, making their behaviour more understandable and easier to adjust as needed. Anthropic has already published a constitution for its advanced conversational AI model, Claude, which generates text, images, and code. The constitution spells out how Claude should handle sensitive topics, respect users’ privacy, and avoid illegal activity.
Anthropic believe that AI models will have value systems, either intentionally or unintentionally. Hence, the company’s Constitutional AI approach aims to establish a set of principles to judge the text generated by the system. These principles guide the model to take actions that are harmless, helpful, and less discriminatory based on language, religion, political or other opinions, nationality, social origin, property, birth, or other status.
Anthropic’s Constitutional AI approach involves training two models using the principles. The first model generates responses, which are then critiqued and modified using the principles to train the second and final model. According to Anthropic, this method is superior to the traditional approach of using human feedback, which is difficult to scale and requires significant time and resources.
Anthropic’s constitution for AI is derived from various sources, including the United Nations Declaration of Human Rights, Apple’s Terms of Service, and values identified by AI labs like Google DeepMind. The constitution includes principles that encourage consideration from a non-Western perspective and those inspired by Deepmind’s Sparrow Rules. These principles guide the AI system to use fewer harmful generalizations, be less threatening or aggressive, and make fewer assumptions about the user.
Anthropic’s Constitutional AI approach is a significant step towards addressing the ethical and social challenges of powerful AI systems. It provides a set of principles to judge the text generated by the system, making its behaviour more understandable and easier to adjust as needed. Anthropic’s approach is likely to inspire other AI companies to adopt similar ethical practices, ensuring that AI systems are used for the betterment of society.