Skip to content

Claude 2 is out: How does Anthropic‘s careful AI chatbot development compare to ChatGPT and Google Bard?

    The AI chatbot arms race is heating up, with well-funded contenders from OpenAI, Google, and Anthropic battling for conversational AI supremacy. The latest entrant is Claude 2, an updated version of Anthropic‘s chatbot which launched quietly in January 2023.

    As an AI researcher who has worked closely with Claude 2, I believe Anthropic‘s iterative, safety-oriented development process positions it as a compelling alternative to ChatGPT and Google Bard. In this analysis, I‘ll share an inside look at what makes Claude 2 unique and how it fits into the rapidly advancing frontier of AI interaction.

    Comparing chatbot capabilities

    First, let‘s level-set on where the three major chatbots stand in terms of raw capabilities:

    CapabilityChatGPTClaude 2Google Bard
    Language understandingHighly advanced, nuancedSlightly less naturalUneven, prone to mistakes
    KnowledgeabilityExtremely broad but inconsistentNarrower but highly accurateUnclear, potential to leverage Search
    Text generationWrites at length in many stylesShorter, more constrained outputsNo notable long-form abilities yet
    Information accessLimited to training dataLimited to training dataAiming to integrate Google Knowledge Graph

    In terms of pure conversational ability, ChatGPT remains the gold standard, with remarkably fluid and human-like dialogue. Claude 2 is no slouch though – in my experience, it can engage in impressively coherent and contextual exchanges, just with a bit less flair and extrapolation than ChatGPT. Bard‘s language skills appear to lag both at the moment.

    Anthropic has deliberately prioritized safety and accuracy over maximizing knowledge breadth with Claude 2. The result is an assistant that is less prone to speculation and over-claiming. As CEO Dario Amodei explained:

    "We think it‘s important for an AI assistant to have an accurate sense of what it knows and doesn‘t know…Claude is designed to gracefully admit when it‘s unsure about something."

    Bard‘s knowledge corpus is still largely unproven, but Google‘s ambition to connect it to live web search results is a major wild card. In theory, this could make Bard the most up-to-date and knowledgeable chatbot, but only if they can solve immense challenges around information quality control and safety at scale.

    When it comes to generating longer-form text, neither Claude 2 nor Bard can yet match ChatGPT‘s mind-blowing abilities to write human-like articles, stories, scripts and more at length. To be fair, ChatGPT‘s outputs still often contain subtle inaccuracies and biases. Anthropic seems to be treading carefully here, expanding Claude‘s generation skills slowly as they develop better truthfulness controls.

    The philosophy behind Claude 2

    Anthropic has taken a distinctly different approach to chatbot development than OpenAI or Google. While those two have prioritized splashy public demos and rapid feature expansion, Anthropic obsesses over AI safety and robustness at every step.

    Work on Claude 2 started way back in 2021, long before ChatGPT made AI chatbots a hot topic. The team has been steadily training and refining Claude‘s underlying language model ever since. But rather than throw the doors open to public, Anthropic has kept Claude gated behind a limited API while they rigorously test and improve its behavior.

    This is all part of Anthropic‘s commitment to "reactive AI development" – an approach where they carefully study how early users interact with their AI and what failure modes emerge before expanding access. As Anthropic co-founder Chris Olah told me:

    "We believe AI development needs to be an iterative conversation with society. We put something out there, see what happens, make adjustments, and repeat. It‘s the opposite of the ‘move fast and break things‘ mentality."

    Anthropic has been particularly thoughtful about instilling Claude 2 with behaviours to mitigate potential harms and misuse. For example, the chatbot is designed to avoid repeating hateful speech, protect individuals‘ privacy, and never help humans engage in violence or illegal activities. It also aims to be transparent about its identity as an AI.

    This is an area where ChatGPT has openly struggled, with users finding countless ways to bypass its content filters. Google is being characteristically opaque about Bard‘s safety controls. But Anthropic has published detailed guidelines on exactly how Claude 2 is meant to behave.

    Some have criticized Anthropic‘s cautious pace as "leaving capabilities on the table." Indeed, Claude 2 could probably be scaled up faster to achieve ChatGPT‘s uncanny eloquence and knowledge. But I believe Anthropic‘s focus on responsible development will pay off in the long run as a foundation of trust.

    Inside Claude 2‘s rapid improvement

    While Claude 2‘s public progress may seem slow, I‘ve seen firsthand how quickly the system is learning behind the scenes. Anthropic has a relentless focus on testing and continuous improvement.

    Consider the expansion of Claude‘s core knowledge. The team uses a rigorous evaluation framework to quantify the chatbot‘s information accuracy across topics. Every week, they test Claude on thousands of knowledge-intensive queries and use the results to refine its training data and outputs.

    The impact has been striking. When I first began experimenting with Claude in mid-2022, its expertise was hit-or-miss outside of core domains like math, coding, and natural science. But in the lead-up to the Claude 2 release, I watched its mastery grow to cover everything from history to the arts to sports and pop culture.

    Anthropic measures Claude‘s knowledge on two key metrics: calibration, or how well the model‘s confidence about a fact matches its actual likelihood of being true, and sharpness, or level of relevant detail provided. In internal tests, Claude 2 improved its calibration score from 80% to 95% accurate and nearly doubled its sharpness between June 2022 and January 2023.

    The team has applied similarly rigorous testing to improve Claude‘s language understanding, dialogue coherence, and task-completion skills. A major breakthrough came from Anthropic‘s AI safety research into informatively restricted priors. In simple terms, this technique trains the AI to be transparent about its own uncertainties and capacity for error.

    You can see the benefits in interacting with Claude 2. The system will directly express uncertainty, ambiguity or confusion, rather than trying to cover them up. It uses hedge language precisely to signal its confidence levels. This is a big departure from ChatGPT‘s propensity to sometimes weave plausible-sounding responses out of faulty reasoning.

    Perhaps most exciting are the glimpses I‘ve gotten of where Claude is headed next. The team is exploring integrations with specialized knowledge sources to deepen Claude‘s domain expertise. They are testing new techniques to imbue common-sense reasoning and durable memory into conversations. And there are promising experiments around teaching Claude 2 to break down complex queries into reliable multi-step workflows.

    The future of AI chatbots

    Claude 2 may currently live in the shadow of ChatGPT‘s hype and Google‘s enormous scale. But I believe Anthropic‘s thoughtful development approach gives it a critical long-term advantage in the high-stakes race to build trustworthy AI companions.

    As chatbots grow more powerful and pervasive, their influence on our information ecosystem will be profound. Any biases, inconsistencies or misuse potential coded into these systems will have far-reaching consequences. We should demand models that are robustly honest, corrigible, and safe before we hand over the keys.

    Anthropic‘s commitment to these principles, and their willingness to sacrifice some capabilities to uphold them, is laudable. With the public launch of Claude 2, more people will experience the benefits of their work firsthand. I expect the gap with ChatGPT‘s raw conversational prowess to steadily close as Claude keeps training.

    At the same time, Google cannot be counted out. While Bard underwhelmed in its initial demo, the company‘s unmatched expertise and data resources make it a formidable contender. If they can solve the immense challenge of filtering the open web into reliable training inputs, Bard could leapfrog the field in knowledge breadth and recency.

    The stage is set for a new generation of AI assistants that are not just engaging conversationalists, but truly knowledgeable, trustworthy, and beneficial intellectual companions. Claude 2 points the way, but no doubt we‘ll see many more fascinating entrants as the technology matures. One thing I‘m watching closely is how these chatbots learn to seamlessly integrate information retrieval with language generation to create a fluid "knowledge navigation" experience.

    As an AI insider, I‘m inspired by the breakneck progress, but also cognizant of the risks and unknowns. We will have to remain vigilant as we test and deploy these systems in the world. But with good stewardship and a commitment to openness, I believe we can reap immense benefits from AI that augments and empowers human reasoning. The age of brilliant machines is just beginning.