The Emerging Communication Stack for Agents
A GenAI Newsletter by Raj
There are two ways agents communicate, and both are being rebuilt at the same time.
The first is how agents reach humans. Humans don't want to log into a new app called "agents.com." They want their AI to show up where they already are: Slack, Outlook, Gmail, WhatsApp, iMessage, the phone. So a layer of infrastructure has been built and funded to let agents enter every existing channel: phone numbers an agent answers, inboxes an agent owns, Slack workspaces where an agent is a first-class member. This is the channel layer. It's where revenue accrues, priced per minute and per resolution.
The second is how agents reach each other and reach tools. Agents need their own native protocols, because human channels assume a human at one end. So a parallel stack has been donated to the Linux Foundation in the past twelve months: MCP for tools, A2A for peer agents, AG-UI for frontends, plus discovery and identity layers underneath. This is the protocol layer.
The interesting tension is that the two stacks are converging. Voice agents are starting to call other voice agents. Email agents are threading with email agents. The protocol layer was supposed to be where agents talk to each other, but a meaningful chunk of agent-to-agent communication is already happening through SIP and SMTP because that's where the agents already live. This newsletter walks through the channel layer, then the protocol stack, then what happens when they collide.
Voice and phone
Voice is the most aggressive front of the channel war because contact centers are large, expensive, and clearly automatable. Pricing has commoditized to roughly seven to fifteen cents per minute at the platform layer; an all-in voice agent runs fifteen to thirty cents per minute today, roughly one tenth of a fully loaded US contact-center agent.
Vapi. Voice agent orchestration sitting between LLM, STT, TTS, and SIP/WebRTC, with bring-your-own-keys. Five cents per minute platform fee plus components. Series A from Bessemer in 2024.
Bland AI. Pitches "infinite phone calls" by owning its own inference stack and telephony fabric. Build plan is $359 for 500 minutes. Series B from Scale Venture Partners.
Retell AI. YC-backed, transparent seven cents per minute, built for developers wiring voice into existing products without owning the speech stack.
ElevenLabs Conversational AI. Full voice-agent SDK on top of ElevenLabs TTS. Eight, ten, and twelve cents per minute across Standard, Turbo, and Premium, with a 95 percent silence discount over ten seconds. ElevenLabs cut Conversational AI pricing materially in 2025 as OpenAI Realtime and Cartesia entered.
OpenAI Realtime API. $32 per million input audio tokens, $0.40 cached, $64 output. The wholesale price the rest of the platform layer pays through. The OpenAI sample voice agent and ChatGPT Voice Mode both run on LiveKit's WebRTC stack.
LiveKit Agents. The framework underneath ChatGPT Voice Mode and Character.ai's voice product. 1.0 in April 2025, now ships with native MCP tool support.
Pipecat. Daily.co's open-source voice framework, Python and JavaScript SDKs, dozens of model integrations, Pipecat Cloud as the managed offering.
Hume AI EVI. Empathic voice with emotion-aware turn-taking. Seven cents down to four cents per minute. Series B from EQT Ventures and Premji Invest.
PolyAI, Parloa, Cognigy. The European enterprise wedge. PolyAI runs voice for Marriott, FedEx, and Caesars (Series C at $500 million from NVentures, $116 million total). Parloa runs Decathlon and Swiss Life (Series B at $66 million from Altimeter). Cognigy runs Lufthansa, Bosch, Toyota, Mercedes-Benz, and Allianz (Series C at $100 million, $175 million total).
Sesame. Brendan Iribe's company. Released CSM-1B as open-weights on HuggingFace in early 2025: end-to-end speech-in to speech-out, the most credible open-source competitor to closed voice agents.
The pattern: voice agents charge by the minute, those minutes look exactly like the minutes a human agent would have charged for, and pricing drops every quarter as the speech stack underneath commoditizes. The speech-model layer (ElevenLabs, Cartesia, Deepgram, AssemblyAI, Inworld, Whisper successors) has been getting roughly half as expensive every year for two years running. Voice is going to be cheap.
If voice is the loudest channel, email is the deepest. It's still the medium of record for enterprise communication, and any agent inside a corporate workflow eventually needs to send and receive it.
Resend. Developer-first transactional email API from the React Email team. Series A of $18 million from Foundation Capital in 2024. Has marketed itself as "the email API for AI agents" since 2024, with idempotent send patterns and JSON schemas designed for agents that retry. Roughly 100,000 developers on the platform.
AgentMail. Agent-native email infrastructure where every agent gets its own real inbox, with programmatic IMAP and SMTP plus threading APIs designed for autonomous agents to send and receive without a human ever logging in. YC-backed. The cleanest example of email being treated as a first-class agent channel rather than a developer afterthought.
The inbox apps. Notion acquired Skiff in February 2024 and shipped Notion Mail in 2025. Grammarly acquired Superhuman in mid-2024 and rolled AI write-and-reply through the year. Shortwave is the small, well-built ex-Google-Inbox alternative. Gmail Gemini and Outlook Copilot ($30/seat/month) have agent features baked into the defaults.
The structural fact is that SMTP and IMAP are forty years old and have no concept of agent identity. Anyone can send an email "from" your agent, and your agent has no built-in way to prove it sent the email it sent. Identity has to be added at higher layers (SPF, DKIM, DMARC, plus agent-attribution headers the agent stack is now defining). Email is going to be one of the messiest interop fights of the next two years.
SMS, WhatsApp, and the messaging layer
Twilio AI Assistants launched at Signal in late 2024 layered on standard messaging and voice rates. Twilio's Q4 2025 numbers cited 300,000+ active accounts. Twilio plus SendGrid (acquired 2019) plus AI Assistants is the most complete single-vendor offering for an agent that needs to span SMS, voice, and email.
Bird (formerly MessageBird) rebranded in 2023 and pivoted into AI-powered omnichannel agents. Major business is WhatsApp Business API resale. Telnyx is the bootstrapped, profitable telco-grade alternative. Sinch (Stockholm: SINCH-B) consolidated Mailgun, Inteliquent, MessageMedia, and Pathwire into a $3 billion SEK revenue base.
WhatsApp Business is the largest single agent surface in the world. Meta has cited 200 million-plus businesses on the platform. Meta launched its own AI inside WhatsApp Business in 2024 with agentic features (catalog browsing, transactional flows) expanding through 2025 and 2026. The agent ecosystem inside WhatsApp Business is dominated by mid-market companies in India, Brazil, and Indonesia, not the Fortune 500 logos that dominate voice and email.
The messaging layer is where the global story diverges most from the US story. In North America, agents reach customers through SMS and email. Everywhere else, agents reach customers through WhatsApp.
Slack, Teams, and the chat layer
The internal-comms equivalent of WhatsApp is Slack and Microsoft Teams. Both have spent the past eighteen months turning themselves into agent surfaces.
Slack. Owned by Salesforce since 2021. Slack AI shipped in 2024 with summarization, search, and an inline assistant. The bigger move came through Salesforce Agentforce: any Agentforce agent can be installed in a Slack workspace as a first-class member, mentioned by handle, assigned tasks from a thread. The agent appears with its own avatar and identity, not as a bot account.
Microsoft Teams. Copilot in Teams plus Copilot Studio agents. Multi-agent went generally available in April 2026 with A2A as the cross-vendor protocol, which means a Teams-resident agent can call out to a Salesforce-resident agent without either side speaking the other's framework. First time an agent in one vendor's chat client has been able to call an agent in another vendor's chat client in production.
The Slack-app pattern. Glean, Notion AI, Asana AI Studio, Linear's AI features, and dozens of vertical SaaS products ship with Slack as their primary user interface. Install once, mention by name, get answers backed by the SaaS product's data.
The structural advantage of Slack and Teams as agent surfaces is that they already have identity, permissions, and channel-level access control. An agent in Slack inherits the workspace's IAM and the user's permissions, which is exactly the kind of thing voice and SMS struggle to do. Half of the protocol layer covered later is an attempt to give voice and email the same identity primitives that Slack and Teams already had on day one.
The customer-facing agent companies
The platforms above are the infrastructure. The companies fighting on top of that infrastructure are the customer-facing agent vendors selling to enterprises.
Sierra. Bret Taylor and Clay Bavor's company, and the cleanest single bet in the segment. $100 million ARR by November 2025 (twenty one months from launch), $150 million by early February. On May 4, 2026 (two days before this newsletter), Sierra closed $950 million at a $15.8 billion valuation led by Tiger Global and Google's GV, with Benchmark, Sequoia, and Greenoaks participating. The valuation is up from roughly $10 billion in the fall and $4.5 billion in October 2024. The customer roster has moved upmarket fast: ADT, SiriusXM, WeightWatchers, Sonos, plus Prudential, Cigna, Blue Cross Blue Shield, Rocket Mortgage, and what Sierra describes as one in three of the world's largest banks. Outcome-based pricing per resolution, with annual contract values from a $150,000 floor to $1.5 million plus, plus $50,000 to $200,000 in implementation fees. Sierra is now the most valuable pure-play customer-agent company by a wide margin.
Decagon. Raised $250 million at a $4.5 billion valuation in March 2026 with 100+ enterprise logos. $50,000 platform floor plus roughly $0.99 per conversation, with annual contract values $95,000 to $590,000 and a median around $400,000. Customers include Duolingo, Chime, Rippling, Notion, and Eventbrite.
Intercom Fin. $0.99 per resolution, unchanged from launch through April 2026, with a 50-resolution monthly minimum on top of a base Intercom plan. Fin 2 launched in 2025. Eoghan McCabe has publicly cited Fin handling more than half of customer support traffic for many Intercom customers.
Cresta runs real-time AI coaching for human agents (not replacement) at Intuit, Brinks, Hilton, and Cox Communications, with $150,000 floor and $40 to $50 million ARR estimated. Ada sells to Meta, Verizon, Square. Glia does digital + voice for financial services. Replicant was acquired by LivePerson in 2024, an early consolidation case.
Outbound. A separate category for agents making outbound rather than handling inbound. 11x.ai sells "digital workers" called Alice (SDR), Jordan (phone rep), and Julian (inbound qualifier) for $5,000 to $15,000 per month with annual commitment. Series B of $50 million from a16z at roughly $350 million. Artisan AI ran the famous "Stop hiring humans" billboards in San Francisco in 2024.
The pricing pattern is consistent: inbound is per-minute or per-resolution (cheap to start, metered to grow); outbound is headcount-equivalent (expensive to start but easy to compare against a salary line). The two models are slowly converging, with outbound vendors offering per-meeting-set pricing and inbound vendors offering flat-rate enterprise SKUs.
The Klarna reversal is the most important data point in this market
Anyone selling agent software loves the Klarna numbers. 2.3 million conversations in the first month, two thirds of customer service volume, average resolution time from 11 minutes to under 2, "doing the work of 700 full-time agents," $40 million projected profit improvement. Vendors have cited those numbers in pitch decks for two years.
What gets cited less is what happened next.
In May 2025, Klarna's CEO Sebastian Siemiatkowski publicly walked the story back. In a Bloomberg interview he said cost had been "a too predominant evaluation factor" and the result was "lower quality." Customer satisfaction dropped 22 percent. Klarna began rehiring human agents under an Uber-style gig model. The original $40 million had always been cost avoidance (agents Klarna would have had to hire during growth), and even that framing turned out to overstate the savings once you priced in the brand damage from a long tail of badly handled tickets.
The honest read: AI handled the easy 60 to 70 percent of support cleanly, and the remaining 30 to 40 percent failed worse than humans would have, with overconfidence and fabricated policy claims that didn't show up on any vendor invoice but did show up in churn. That's a different story from "AI replaced 700 agents," and it's the story everyone deploying voice and chat agents at scale needs to internalize.
The structural lesson for the channel layer is that the deployments that work are hybrid. The agent handles the easy ticket end-to-end and escalates the hard one to a human inside the same channel, with the agent's full context attached. Sierra and Cresta both pitch this hybrid model explicitly. Klarna is now running it. Vendors pitching full agent replacement keep getting walked back, while vendors pitching escalation-on-failure keep growing.
The protocol layer agents use to talk to each other
A year ago, "agent-to-agent communication" was a phrase you mostly heard from Google. Today, four open foundations are governing it, with most of the work hosted by the Linux Foundation through the new Agentic AI Foundation, formed on December 9, 2025 with Anthropic, Block, and OpenAI as co-founders and platinum members AWS, Google, Microsoft, Cloudflare, and Bloomberg.
MCP (Model Context Protocol). Anthropic's contribution, the agent-to-tool transport. Donated to the Linux Foundation in December 2025. Current spec is 2025-11-25 with Streamable HTTP as the active transport, OAuth 2.1 plus PKCE plus mandatory Resource Indicators (RFC 8707) for auth. 97 million monthly SDK downloads as of March 2026, 10,000+ active public servers, first-class clients in Claude, ChatGPT, Cursor, Windsurf, VS Code, JetBrains, Microsoft Copilot, and Gemini.
A2A (Agent2Agent). Google's contribution, the peer-to-peer agent protocol. Donated to the Linux Foundation on June 23, 2025. The headline feature in v1.0 is the Signed Agent Card: a cryptographic signature on a JSON document that describes an agent's capabilities and origin, so a receiving agent can verify the card was issued by the domain owner before delegating any work. By the one-year mark on April 9, 2026, A2A had 150 supporting organizations with named production deployments in Microsoft Azure AI Foundry, Microsoft Copilot Studio, AWS Bedrock AgentCore, Salesforce Agentforce, and Google Cloud.
AG-UI (Agent-User Interaction Protocol). Built by CopilotKit. The third wire protocol the rest of the stack quietly assumes but doesn't actually solve: agent-to-frontend. MCP handles agent-to-tool. A2A handles agent-to-agent. AG-UI standardizes how an agent streams tokens, tool calls, intermediate state, and dynamically generated UI components into a running web application. CopilotKit closed a $20.5 million Series A on May 5, 2026 (the day before this newsletter went out), led by Glilot Capital with NFX and SignalFire, $27 million total. Repository at 40,000+ GitHub stars with millions of installs per week. Infra adopters include Google, Microsoft, Amazon, and Oracle. Framework integrations include LangChain, Mastra, PydanticAI, and Agno. CopilotKit reports more than half the Fortune 500 using the open-source toolkit, with named customers Deutsche Telekom, Docusign, Cisco, and S&P Global.
The discovery and identity layer. AGNTCY (Cisco/LangChain/Galileo, donated to the Linux Foundation in July 2025) sits one level above MCP and A2A with a federated agent directory and the Open Agent Schema Framework. NANDA (MIT, led by Ramesh Raskar) is DNS for agents: a globally distributed mapping from an agent handle to a verified metadata file, currently hosted at 15 universities, with cryptographically verifiable AgentFacts as the signed metadata format. The NANDA Summit at MIT on April 9 to 11, 2026 was the major adoption event.
The supporting layers. IBM ACP (separate from OpenAI's commerce ACP) is the async-first agent-to-agent protocol from BeeAI, also at the Linux Foundation, designed for long-running tasks with curl-friendly REST. NLIP (standardized through Ecma TC56, ECMA-430 approved December 10, 2025) is the application-level message envelope that abstracts API versioning. Letta Agent File (.af, released April 2, 2025 by the MemGPT team) is a portable container format for stateful agents (the Docker image of agents).
The right mental model: MCP is the tool bus, A2A is the agent-to-agent bus, AG-UI is the agent-to-frontend bus, AGNTCY and NANDA are the discovery and identity layers, NLIP is the message envelope, .af is the container format. These standards mostly compose. The open question is which subset becomes the default for a typical enterprise deployment by the end of 2027, and the answer is starting to look like all of them.
What big tech is shipping
The cloud vendors are not waiting for the standards to settle. They're shipping products that span both the channel layer and the protocol layer.
Salesforce is the cleanest example of the converged play. Agentforce exposes every custom agent as an A2A endpoint and as a first-class Slack member. Salesforce contributed the Agent Card concept itself. The combined picture: an Agentforce agent can be reached by a peer agent over A2A, by a customer over WhatsApp through a Bird-or-Sinch integration, by a service rep inside Slack as a mention, and by a developer through MCP. One agent reachable across four channels, all in production.
Microsoft. Copilot Studio multi-agent went generally available in April 2026 with A2A as the cross-vendor bus. Copilot agents are reachable from Teams as chat, from Outlook as email, and from any A2A peer programmatically. Microsoft Agent Framework v1.0 ships A2A as a first-class protocol for both .NET and Python.
AWS Bedrock AgentCore went GA on October 13, 2025. A2A added October 2025, AWS Marketplace A2A server support November 2025, stateful MCP server features March 2026. Cross-framework support for Strands, OpenAI Agents SDK, LangGraph, Google ADK, Claude Agents SDK.
Anthropic plus aggressive enterprise distribution: Cognizant rolling Claude to 350,000 employees, Deloitte to 470,000, Accenture training 30,000 professionals, Swiggy shipping MCP integration for grocery and restaurant reservations, India's Ministry of Statistics building the first official Indian government MCP server. 300,000+ business customers, 500+ spending over $1 million per year, 8 of the Fortune 10.
Google ships A2A and AGNTCY in Vertex AI agents out of the box. IBM runs ACP plus BeeAI plus watsonx Orchestrate. Block moved Goose to the Agentic AI Foundation, with 70+ documented MCP extensions. None of them is building a closed agent stack. All of them are building open agent stacks that happen to run best on their own clouds and inside their own chat surfaces.
When agents call agents
The most interesting thing happening at the boundary between the channel layer and the protocol layer is that agents are starting to communicate with other agents through the human channels. A Vapi-built voice agent calls a phone number, gets routed to a Bland-hosted answering agent, and the two argue for ten minutes about a refund. A Resend outbound email lands in an inbox where a Grammarly-powered reply agent threads back. None of this is using MCP or A2A. The two sides are speaking SIP and SMTP because those are the channels their humans use.
This is awkward for the protocol layer, because A2A specifically was supposed to be where this happens. The real-world answer is starting to look like the protocol layer wraps the channel layer rather than replacing it. An A2A handshake establishes identity and sets up the call. The actual conversation runs over voice or email. The transcript and outcome are returned through A2A. NANDA's AgentFacts and Salesforce's Agent Cards are part of how this works: an agent picking up a call can read the caller's signed Agent Card, decide whether to switch to a faster programmatic channel, and do so midstream if both sides agree.
Two patterns are showing up in production. "Agent prefers programmatic": both sides detect each other, exchange A2A handshakes, and complete in sub-second over MCP rather than dragging on as a multi-minute voice call. Common in internal coordination between enterprise agents. "Agent stays in channel": both sides agree the human user expects a voice transcript or email thread for audit, so they keep the conversation in the channel even when they could complete it faster elsewhere. Common in customer support.
The right way to think about it: the channel layer is where humans and agents share the same wire; the protocol layer is where agents accelerate when no human is watching. Both stacks need to interop, which is most of what 2026 and 2027 are going to be about.
What 2028 looks like
The protocol war is functionally over. By 2028, A2A is the agent-to-agent default for cross-vendor work, MCP is the agent-to-tool default for everyone, AG-UI is the agent-to-frontend default for any product with a UI, and AGNTCY plus NANDA together provide the discovery and identity layers. IBM ACP keeps a niche in async-heavy enterprise workflows. NLIP becomes the envelope everyone implements without thinking about. Letta-style agent files become the portable container format.
The channel layer consolidates differently. Voice agents become a $20 billion-plus category, dominated by three or four enterprise platforms (Sierra, Decagon, Parloa) and three or four developer platforms (Vapi, Bland, Retell, ElevenLabs). The speech-model layer consolidates around two or three winners. Email becomes the ugliest layer because the underlying protocol has no agent identity built in; the eventual answer is some combination of DMARC plus signed agent headers plus enterprise-only inbox routing. SMS stays small in the US and dominant in the rest of the world, with WhatsApp Business as the largest single agent surface globally. Slack and Teams become first-class agent channels with full IAM, and most enterprise workplace agents end up living there rather than in standalone web apps.
The Klarna lesson holds: vendors pitching full replacement keep getting walked back, and the ones pitching escalation-on-failure keep growing.
The deeper change is that for the first time we're building communication infrastructure for clients that aren't human. Every previous channel (telephone, email, SMS, chat) was designed assuming a human at one end. Voice agents and email agents and Slack agents are forcing those channels to learn a second client type, the same way the web learned mobile in the 2010s. The protocol layer is the agent-native side of the same shift, where we're building from scratch for the second client type without the constraints of human-era assumptions.
Both stacks are real, funded, and in production. The interesting work in 2027 and 2028 is not picking a winner but building the bridges between them, because most agents in the wild will need to operate in both at once.
This is my regular weekly newsletter on Generative AI. Recent editions cover The Agentic Economy Is Already Here, Solving Hallucination, The Quiet Skill Revolution, and Why Looping Is the New Scaling.
Sources and Further Reading:
Voice and phone
- Vapi
- Bland AI
- Retell AI
- ElevenLabs Conversational AI
- OpenAI Realtime API
- LiveKit Agents
- Pipecat
- Hume AI
- PolyAI
- Sesame CSM
- Parloa
- Cognigy
Speech models
SMS, WhatsApp, messaging
Customer-facing agent companies
- Sierra
- Sierra raises $950M at $15.8B (TechCrunch, May 4 2026)
- Sierra hits $100M ARR (TechCrunch, Nov 2025)
- Decagon
- Intercom Fin
- Cresta
- 11x.ai
- Artisan AI
- Klarna AI Assistant retraction (Bloomberg, May 2025)
Protocol layer
- Linux Foundation Agentic AI Foundation
- The 2026 MCP Roadmap
- A2A Protocol
- LF: A2A surpasses 150 organizations
- AG-UI Protocol
- CopilotKit raises $27M (TechCrunch, May 5 2026)
- AGNTCY: Internet of Agents
- Project NANDA (MIT)
- IBM Agent Communication Protocol
- Ecma NLIP standards approval
- Letta Agent File
Big tech deployments