Claude's Constitution \ stacker news

Claude’s constitution is a detailed description of Anthropic’s intentions for Claude’s values and behavior. It plays a crucial role in our training process, and its content directly shapes Claude’s behavior. It’s also the final authority on our vision for Claude, and our aim is for all our other guidance and training to be consistent with it.

Training models is a difficult task, and Claude’s behavior might not always reflect the constitution’s ideals. We will be open—for example, in our system cards—about the ways in which Claude’s behavior comes apart from our intentions. But we think transparency about those intentions is important regardless.

The document is written with Claude as its primary audience, so it might read differently than you’d expect. For example, it’s optimized for precision over accessibility, and it covers various topics that may be of less interest to human readers. We also discuss Claude in terms normally reserved for humans (e.g. “virtue,” “wisdom”). We do this because we expect Claude’s reasoning to draw on human concepts by default, given the role of human text in Claude’s training; and we think encouraging Claude to embrace certain human-like qualities may be actively desirable.

This constitution is written for our mainline, general-access Claude models. We have some models built for specialized uses that don’t fully fit this constitution; as we continue to develop products for specialized use cases, we will continue to evaluate how to best ensure our models meet the core objectives outlined in this constitution.

For a summary of the constitution, and for more discussion of how we’re thinking about it, see our blog post “Claude’s new constitution.”

Powerful AI models will be a new kind of force in the world, and people creating them have a chance to help them embody the best in humanity. We hope this constitution is a step in that direction.

We’re releasing Claude’s constitution in full under a Creative Commons CC0 1.0 Deed, meaning it can be freely used by anyone for any purpose without asking for permission.

Many people at Anthropic and beyond contributed to the creation of this document, as did several Claude models. Amanda Askell is the primary author and wrote the majority of the text. Joe Carlsmith wrote significant parts of many sections and played a core role in revising the text. Chris Olah, Jared Kaplan, and Holden Karnofsky made significant contributions to its content and development. More detailed contribution statement and acknowledgments below.

...read more at anthropic.com