Claude 3.5 Sonnet from Anthropic outperforms GPT-4o in most benchmarks

Anthropic has introduced Claude 3.5 Sonnet, a mid-tier model that not only outperforms its competitors but also surpasses Anthropic’s current top-tier model, Claude 3 Opus, in various evaluations.

Claude 3.5 Sonnet is now freely accessible on Claude.ai and the Claude iOS app, with higher rate limits for subscribers of Claude Pro and Team plans. It’s also available through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI, priced at $3 per million input tokens and $15 per million output tokens, with a 200K token context window.

Anthropic asserts that Claude 3.5 Sonnet “sets new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval).” The model exhibits enhanced capabilities in understanding nuance, humor, and complex instructions, while excelling at producing high-quality content with a natural tone.

Operating at twice the speed of Claude 3 Opus, Claude 3.5 Sonnet is well-suited for complex tasks such as context-sensitive customer support and multi-step workflow orchestration. In an internal agentic coding evaluation, it solved 64% of problems, significantly outperforming Claude 3 Opus, which solved 38%.

Additionally, the model demonstrates improved vision capabilities, outperforming Claude 3 Opus on standard vision benchmarks. This improvement is particularly evident in tasks requiring visual reasoning, such as interpreting charts and graphs. Claude 3.5 Sonnet can accurately transcribe text from imperfect images, a valuable feature for industries like retail, logistics, and financial services.

Leave a Comment

Your email address will not be published. Required fields are marked *