Snowflake Agentic AI Beats Claude Code on Its Own Benchmark: What That Means

Snowflake’s AI Pulse showcase today gives enterprise data teams their clearest look yet at CoWork and CoCo, the company’s pair of agentic AI products that debuted at Summit 26 in San Francisco two weeks ago. The event’s timing is not accidental: Databricks concluded its own Data + AI Summit in San Francisco this week with a direct counterproposal to Snowflake’s approach, and the two platforms are now competing for the same enterprise architecture decision with sharply different answers. The central question — whether enterprise AI agents work better embedded inside your data governance perimeter or operating freely across it — will likely define how data teams make platform commitments for the next several years.

CoWork (formerly Snowflake Intelligence) is a personal AI work agent for knowledge workers. It queries governed Snowflake data in natural language, produces multi-step cited research reports through a Deep Research capability, publishes reusable team dashboards as Artifacts, and takes action across Gmail, Slack, Jira, and Salesforce through Model Context Protocol connectors. CoWork runs on Anthropic’s Claude as its primary reasoning backbone.

Snowflake CoCo (formerly Cortex Code) targets data engineers, analytics engineers, and ML builders. It operates through a four-step agentic loop: it parses a natural language request, checks the catalog and role-based access controls, selects the right native tool — Cortex Analyst for SQL generation, Cortex Search for semantic object lookup, or lineage metadata — then executes and summarizes. The key technical distinction from general-purpose coding agents is what CoCo uses instead of bash-based fallbacks: native Snowflake tooling that already knows your schemas, your column-masking policies, and your row-level security. An agent evaluating a revenue figure without knowing whether your organization defines revenue as gross or net can produce a technically correct answer that is operationally wrong; CoCo’s native harness is designed to prevent exactly that class of error.

What Makes CoCo Different From Claude Code and Codex

On a data-engineering benchmark called ADE-Bench — created by Benn Stancil of Mode Analytics in collaboration with dbt Labs and published in January 2026 — Snowflake reports CoCo achieved a 72.1% pass rate, compared to 65.1% for both Anthropic’s Claude Code and OpenAI’s Codex. Snowflake also reports CoCo completed tasks using 51% fewer tokens and 8% faster than Claude Code on the same runs.

ADE-Bench is a harder, more specific test than general coding benchmarks: tasks run inside Docker containers against messy, real-world dbt projects with deliberately vague prompts, and success is measured by whether the dbt tests pass — not by syntactic correctness. That specificity matters for data engineering work, where business logic embedded in schema design determines whether a technically valid query is operationally correct.

Two caveats matter for enterprise procurement. First, no independent third-party replication of Snowflake’s ADE-Bench run had been published as of this writing. A governance-native agent evaluated on the same harness that provides its claimed advantage is structurally positioned to score higher than externally connected competitors: CoCo reads RBAC, column masking, and row-level security natively per session role, while Claude Code and Codex access the same data through external connectors — a different test condition, not merely a capability gap. Research on AI benchmark reliability documents a systematic tendency for vendor-run scores to exceed independently replicated ones, with enterprise agentic systems showing roughly a 37% gap between lab and production performance. Second, the live ADE-Bench leaderboard maintained by dbt Labs shows Altimate Code, a competing data-engineering harness running on Claude Sonnet 4.6, achieving a 74.4% pass rate on Snowflake — above Snowflake’s own CoCo figure. Both figures should be treated as directional signals pending independent replication under equivalent conditions.

How Does Cortex Sense Work, and Why Is Its Preview Status Important?

At the core of Snowflake’s accuracy narrative is Cortex Sense, a runtime context enrichment layer announced at Summit 26. The mechanism: at query time, it automatically assembles a shared semantic substrate from query history, object metadata, BI dashboard definitions (Power BI, Tableau), and Horizon Context semantic views, then injects business definitions — what “revenue” means in your organization, which customer segments a given query should exclude — into every agent response without manual setup.

Snowflake’s internal testing found that CoWork and CoCo achieved an 83% accuracy rate on complex enterprise queries with Cortex Sense active, compared to 47% without it, and 23% for frontier coding agents using only Snowflake’s MCP connector. That three-way comparison illustrates the governance-native architecture’s claimed advantage most clearly.

Cortex Sense is in private preview as of June 2026. That matters because every CoWork deployment at standard GA currently operates without Snowflake’s own context infrastructure at production readiness. The 83% accuracy figure describes a future state, not the current shipping product. David Linthicum, writing for CIO, called Cortex Sense a capability that “could improve consistency, reduce the risk of hallucinations, and make AI outputs more operationally useful” — language that appropriately reflects its preview status.

Dion Hinchcliffe of Futurum Group noted a longer-term concern for CIOs evaluating the platform: embedding business semantics, workflow intelligence, and agent skills into a single vendor’s orchestration layer could make switching as strategically costly as data lock-in once was. “Enterprises are likely entering an era where semantic lock-in may become as strategically important as data lock-in once was,” Hinchcliffe said.

Snowflake vs. Databricks: What Each Platform Is Betting On

Databricks wrapped its Data + AI Summit this week with announcements that frame the competition directly. Genie One, Databricks’ new agentic coworker for business teams, targets the same knowledge-worker audience as CoWork. Genie Ontology, a self-improving semantic context engine, is Databricks’ answer to Cortex Sense. Genie Code expands its data-engineering agent capabilities. Databricks CEO Ali Ghodsi characterized the platform’s differentiator as openness and avoidance of lock-in — an implicit rebuttal to Snowflake’s tighter governance perimeter model.

Mike Leone, principal analyst at Moor Insights and Strategy, identified the strategic tension: Power BI, Tableau, productivity copilots, and now both Snowflake and Databricks are all converging on the same “agent surface for knowledge workers,” making it genuinely difficult for enterprises to distinguish durable platform differentiation. SiliconAngle’s research team noted that Snowflake “is not yet a full System of Intelligence. It has strong pieces of the foundation and credible early moves into context and agent action.”

Linthicum offered the practical qualifier for any enterprise considering a platform decision: “if a company’s governed data and policy logic already live in Snowflake, CoWork may become the natural choice — otherwise, incumbent analytics and productivity platforms still have serious distribution and workflow advantages.”

Batch Inference and Adaptive Compute: What Shipped This Week

Today’s AI Pulse session also covers Snowflake ML’s agentic capabilities for data science teams, a walkthrough of optimizing Snowpark batch inference workflows, and Snowflake Adaptive Compute, which reached general availability on AWS this week.

Snowpark Container Services now supports GPU-backed batch inference as dedicated distributed jobs, designed for multimodal data including images and audio. The architecture consolidates inference pipelines into a single API call with automatic resource deallocation on completion. Adaptive Compute automatically selects and right-sizes compute resources across an account without manual warehouse configuration — a response to the unpredictable demands that combined AI and analytics workloads generate.

What Does the Natoma Acquisition Mean for Enterprise Security?

Snowflake’s pending acquisition of Natoma, signed May 27, addresses the security surface that MCP connectivity opens. Phil Fersht of HFS Research stated the concern directly: “CIOs should be wary of treating MCP as a plug-and-play miracle. Agents can pull context from email, Slack, CRM, and internal systems, but that also means they can expose sensitive information, trigger the wrong action, or bypass established workflow controls if policies are weak.”

Natoma’s platform acts as a centralized MCP gateway that enforces identity, policy, and audit at the individual tool-call level: before an agent takes an action, the platform verifies who requested it, what permissions apply, and whether the action falls within policy. The pending deal brings a verified library of 100+ MCP servers with governance controls for each connector. Financial terms were not disclosed; the acquisition remains subject to customary closing conditions.

Snowflake’s architecture gives data teams an answer to who authorized a given agent action and what it touched — the accountability question analysts identify as the central unresolved challenge of the agentic enterprise. Whether poor underlying data quality will still produce confident but operationally wrong outputs, regardless of the governance perimeter, depends entirely on the semantic definitions that Cortex Sense and Horizon Context inherit from each customer’s data estate.

Frequently Asked Questions

What is the difference between Snowflake CoCo and general AI coding agents like Claude Code?

CoCo runs inside Snowflake’s governance perimeter and reads your role-based access controls, column-masking policies, data lineage, and schema definitions natively before generating any code. General-purpose agents like Claude Code can be pointed at the same data through external connectors, but must be separately wired to those governance controls. The practical difference is that CoCo knows a given revenue figure excludes free-tier activity because that definition exists in the Horizon Context semantic layer; Claude Code does not know that without explicit instruction. On ADE-Bench, Snowflake reports CoCo scored 72.1% versus 65.1% for Claude Code, though no independent third-party replication of those figures has been published, and the live leaderboard shows at least one competing harness scoring higher.

How does Snowflake CoWork compare to Microsoft Copilot and Databricks Genie One?

All three products target knowledge workers who want to query business data and take action in natural language. CoWork’s argument is that it operates inside the same governance perimeter as your Snowflake data, so access controls, data masking, and lineage apply automatically. Microsoft Copilot 365 and Databricks Genie One each offer broader productivity-suite integration and, in Databricks’ case, greater openness to reduce platform lock-in. The governance advantage applies most clearly to organizations whose data estate already lives in Snowflake; organizations running multi-platform data estates will find that advantage narrower than Snowflake’s benchmark comparisons imply.

What is Cortex Sense, and is it available now?

Cortex Sense is a runtime context enrichment layer that automatically assembles business definitions — revenue formulas, fiscal calendars, customer segment rules — from query history, metadata, BI dashboards, and semantic views, then injects that context into every CoWork and CoCo response without manual setup. Snowflake’s internal testing found it raises accuracy on complex enterprise queries from 47% to 83%. As of June 2026, Cortex Sense is in private preview only. Standard CoWork and CoCo deployments currently operate at the lower accuracy baseline without it.

Can data-quality problems undermine CoWork and CoCo even with governance controls in place?

Yes. Governance controls enforce who can access what data; they do not certify that the data is accurate, consistently defined, or up to date. An agent querying a revenue table with an upstream ingestion error will produce a confident but wrong answer regardless of whether RBAC is enforced. As analyst Kihara Kimachia wrote after Summit 26: “Poor data governance will make these AI colleagues confidently wrong at scale.” The Cortex Sense layer reduces semantic errors — wrong metric definitions — but the underlying data quality problem requires the same governance investment it always has.

View the article

Snowflake Agentic AI Beats Claude Code on Its Own Benchmark: What That Means

What Makes CoCo Different From Claude Code and Codex

How Does Cortex Sense Work, and Why Is Its Preview Status Important?

Snowflake vs. Databricks: What Each Platform Is Betting On

Batch Inference and Adaptive Compute: What Shipped This Week

What Does the Natoma Acquisition Mean for Enterprise Security?

Frequently Asked Questions

Congratulations!

Sign In

Sign up for a free
research account

Premium Access

Email
	If you don't have an account, Register here

Username
Password

	Remember Me Lost your password?

Email
	If you don't have an account, Register here

Username
Password

	Remember Me Lost your password?

Snowflake Agentic AI Beats Claude Code on Its Own Benchmark: What That Means

What Makes CoCo Different From Claude Code and Codex

How Does Cortex Sense Work, and Why Is Its Preview Status Important?

Snowflake vs. Databricks: What Each Platform Is Betting On

Batch Inference and Adaptive Compute: What Shipped This Week

What Does the Natoma Acquisition Mean for Enterprise Security?

Frequently Asked Questions

Congratulations!

Sign In

Sign up for a free research account

Premium Access

Contact Ask HFS AI Support

Thank You!

Sign up for a free
research account