Swiggy partners with Sarvam AI and Razorpay to launch India's first end-to-end agentic voice commerce stack.

Swiggy Enters Voice Commerce — Sarvam AI & Razorpay Power Orders in 11 Indian Languages Without an App

Dr. Mayank Raj
15 Min Read
QUICK TAKE  (30-second read)

  • What: Swiggy + Sarvam AI + Razorpay launch voice-led, multilingual commerce across food delivery, Instamart, and Dineout
  • How: Sarvam’s AI models understand voice in 11 Indian languages; Razorpay’s agentic payments execute the transaction — no app navigation needed
  • First: Swiggy becomes the first commerce platform on Sarvam’s Indus App; pilot also live on The Derma Co’s website
  • Phone-Only Ordering: Users can order groceries on Instamart via a plain phone call — no internet, no app required
  • Tech Stack: Sarvam agentic AI + Razorpay Agent Studio — full discovery-to-checkout in a single conversation
  • What’s Next: Expansion to other businesses via Razorpay’s Agent Studio; developers can build their own multilingual voice commerce agents

Food delivery giant Swiggy has announced a strategic partnership with sovereign AI startup Sarvam and payments platform Razorpay to launch voice-led, multilingual commerce across its food delivery, Instamart, and Dineout services — enabling users to place orders, discover restaurants, and complete payments entirely through natural conversation in 11 Indian languages, with no app navigation required. In the same announcement, Swiggy became the first commerce platform to go live on Indus, Sarvam’s AI-native chat application.

This is not a chatbot upgrade. It is India’s first end-to-end agentic commerce stack — where an AI agent understands spoken intent in a regional language, finds the relevant products, places the order, and completes the UPI payment, all within a single unbroken conversation. For the hundreds of millions of Indians who use voice and regional languages as their primary digital interface, this removes the last major friction point in e-commerce: the app itself.

STARTUPFEED INSIGHT

  • What the numbers say: India has 850 Mn+ internet users but only ~250 Mn active online shoppers — the gap is overwhelmingly users who find text-based, English-first apps confusing. Voice-first commerce in regional languages is the infrastructure that could close that 600 Mn user gap.
  • What this means for you:
  • If you’re a founder: The Swiggy-Sarvam-Razorpay stack is now available to any business via Razorpay Agent Studio — the barrier to launching voice commerce is suddenly API-level, not infrastructure-level
  • If you’re an investor: Sarvam’s commercial monetisation path just became clear — enterprise API licensing to Swiggy-scale platforms, validated with a live pilot that every FMCG and quick-commerce company will now want to replicate
  • If you’re a competitor: Zomato, Amazon India, and Flipkart now face a Swiggy that can reach users who never opened a food delivery app — this is a user acquisition moat, not just a UX feature
  • Our prediction: By Q2 FY27, Swiggy will report measurable order volume from voice-first users in Hindi-speaking Tier-3 markets — the first concrete data point that conversational commerce drives new customer acquisition, not just existing user retention.

How the Three-Way Partnership Works

Three distinct companies bring complementary capabilities to create an experience no single party could build alone:

Partner Role in Stack What They Contribute
Sarvam AI Language Intelligence 11 Indian languages (Hindi, Tamil, Telugu, Kannada, Bengali, Marathi + more); voice recognition trained on Indic accents, dialects, cultural context; Indus App platform for conversational commerce; agentic stack for intent understanding
Razorpay Agentic Payments Agentic payments infrastructure executes transactions within the conversation; UPI, cards, and wallets; Agent Studio enables developers to embed the same voice-payment stack into their own platforms; completes the checkout without redirecting to a payment page
Swiggy Commerce Layer First commerce partner on Indus App; food delivery, Instamart (groceries), and Dineout (table booking) all integrated; provided MCP (Model Context Protocol) integrations that make its catalogue and ordering APIs accessible to AI agents

The result is a single conversational loop: a user says “order chicken biryani from Behrouz” in Hindi, the AI finds the restaurant, confirms the item and address, and Razorpay’s agent executes the payment — all without the user touching a screen. The critical innovation is that Razorpay’s agentic payment layer can complete a financial transaction within the same conversation rather than kicking the user out to a separate payment flow.

End-to-End Voice Order — Step by Step

Step What Happens Technology Doing the Work
1 User speaks in their language Sarvam’s ASR (Saaras) captures voice; handles noisy environments, accents, Hinglish, and mixed-language inputs
2 AI understands intent Sarvam’s NLP and reasoning models (Sarvam-M) parse the request — “biryani, Koramangala, no onions” — into structured commerce intent
3 Discovery & options Swiggy’s MCP-enabled APIs return relevant results; AI agent narrates options back to user in their preferred language using Sarvam’s TTS (Bulbul)
4 Order confirmation User confirms via voice; AI collects address from saved profile or voice input; order created in Swiggy’s system
5 Agentic payment Razorpay Agent generates a payment request; user approves via voice or UPI PIN; transaction completes inside the conversation
6 Confirmation AI reads out order confirmation and ETA in the user’s language; full loop closed without a single screen tap

What the Partners Say

“At Swiggy, our mission is to deliver unparalleled convenience to our consumers. After rolling out MCP integrations across our services, the next step was to make these experiences truly accessible to every Indian. True accessibility means meeting users where they are — in the languages they speak.”

— Madhusudhan Rao, CTO, Swiggy

Rao’s framing is deliberate. Swiggy’s MCP rollout was the infrastructure layer — standardised APIs that let AI agents interact with Swiggy’s catalogue and ordering system. Sarvam’s language intelligence is the user-facing layer. What the quote doesn’t say: voice ordering likely unlocks net-new users who were never on the Swiggy app — Tier-3 city residents, older demographics, and first-time internet shoppers. That new user acquisition angle is the commercial prize here.

“India is a voice-first nation, and the next billion users of AI will experience it in the language they choose. Our partnership with Swiggy brings that vision to life in one of the most everyday, high-frequency use cases there is: ordering food and groceries. By embedding Sarvam’s full-stack AI into the heart of Swiggy’s commerce experience, we are taking AI from a novelty for the few to a utility for the many.”

— Pratyush Kumar, Co-Founder, Sarvam AI

Kumar’s phrase “AI from a novelty for the few to a utility for the many” signals Sarvam’s commercial strategy clearly: the company is not chasing the AI enthusiast demographic. It is building for the next 400 million internet users who will come online speaking Hindi, Tamil, Bhojpuri — and who need an interface that works the way they think.

“India’s next phase of digital commerce will be shaped by experiences that feel effortless and intuitive. By bringing together conversational AI and seamless payments, we’re moving closer to a future where everyday commerce is faster, more natural, and built around how people actually interact.”

— Khilan Haria, Chief Product Officer, Razorpay

Haria’s contribution is the financial execution layer. Previous voice-commerce attempts in India failed because the AI could understand the order but not complete the payment — forcing a handoff to a separate payment interface that broke the conversational experience. Razorpay’s agentic payments stack solves the last-mile problem: the money actually moves inside the conversation.

11 Languages, 3 Platforms, 1 Phone Call

Capability Detail Why It Matters
Languages Supported Hindi, Tamil, Telugu, Kannada, Bengali, Marathi + 5 more (11 total) Covers 90%+ of India’s internet users by native language; Sarvam’s models trained on Indic accents and cultural context, not translated from English
Swiggy Platforms Covered Food delivery, Instamart (groceries), Dineout (table reservations) Full Swiggy ecosystem available via voice — not just food delivery; grocery voice ordering is a new use case globally
Phone-Call Ordering Users can order Instamart groceries via plain phone call — no internet or app needed Breakthrough for rural and semi-urban users; demonstrated at India AI Impact Summit Feb 2026; UPI-enabled feature phone users become Swiggy’s addressable market
Indus App Integration Swiggy is first commerce platform on Sarvam’s Indus App Indus is Sarvam’s consumer-facing AI chatbot running Sarvam-105B; Swiggy being first commerce partner gives it exclusive positioning in an AI-native channel
Third-Party Expansion The Derma Co pilot already live; Razorpay Agent Studio open to all businesses Any brand on Razorpay can now launch voice commerce without building their own NLP stack — dramatically lowers barrier to adoption

Who Should Be Watching?

Player Why This Partnership Changes Their World
Zomato (Eternal) Swiggy now has a voice-first ordering channel that Zomato cannot replicate without its own Indic language AI partner — expect Zomato to accelerate its own AI commerce integrations or partner with a competing Indian LLM company
Amazon India / Alexa Amazon has Alexa but it is English-first and not integrated with deep Indian commerce context; Swiggy-Sarvam’s 11-language stack beats Alexa’s India coverage at voice-commerce depth
Flipkart / Walmart Flipkart’s Immerse is text-based AI search; no voice-first, Indic-language, end-to-end agentic commerce equivalent exists in their stack yet
PhonePe / Juspay Razorpay Agent Studio embeds voice payments into commerce — if this gains traction, payment competitors will need similar voice-agentic rails or risk being left out of the conversational commerce value chain
Krutrim AI / Google Bhashini Sarvam’s first major commerce deployment at Swiggy scale is the proof-of-concept that Indic language AI can power real transactions — raises competitive pressure on every other Indian AI model company to match commercial deployment velocity

The Bigger Picture — India’s Agentic Commerce Moment

This partnership arrives amid a convergence of three trends that have been building independently for years and are now compressing into a single infrastructure layer:

  • Voice search surge: Vernacular language usage in e-commerce grew +162% in 2024 (Meesho data); voice search on Indian platforms growing at +40% annually
  • UPI scale: India processes 17 Bn+ UPI transactions monthly — the payments rail is ubiquitous, including on feature phones; the missing link was an intelligent interface to trigger those payments conversationally
  • Sovereign AI maturity: Sarvam’s models — trained from scratch on Indian language data rather than English-translated — achieve quality thresholds that earlier Indic NLP systems could not; Bulbul V3 (35+ voices, 11 languages) and Saaras V3 (ASR for 22 languages) make the voice stack production-ready
  • Agentic AI readiness: Razorpay’s Agent Studio and Swiggy’s MCP integrations represent the commerce infrastructure side maturing to accept AI agent instructions — both were recent developments that made this partnership technically possible

The Swiggy-Sarvam-Razorpay announcement is also not happening in isolation. Earlier this week, Meesho launched Vaani, its own voice shopping assistant. Sarvam co-founder Vivek Raghavan’s cryptic comment — “Abhi picture baaki hai” (“the movie hasn’t ended yet”) — hints at further commercial partnerships in the pipeline. India’s voice-first commerce infrastructure is being assembled in real time, partnership by partnership.

What’s Next

The immediate watch is Swiggy’s order data from voice-first channels in the next 60–90 days. If Hindi-speaking Tier-3 users start placing orders via voice that they would not have placed via the app, the new-user-acquisition thesis is confirmed — and Swiggy will aggressively expand voice channels across more geographies.

For Razorpay, the Agent Studio represents a platform play — if 500+ merchants integrate voice-payment capabilities via the API, Razorpay becomes the settlement rails for India’s conversational commerce layer, just as it became the default checkout for India’s e-commerce layer in the 2010s.

For Sarvam, Swiggy is the proof-of-work deployment that validates its models at consumer scale — tens of millions of potential users, high transaction frequency, multilingual diversity. With its reported $250 Mn funding round at a $1.5 Bn valuation in advanced talks, this partnership arrives at precisely the right moment to demonstrate that Sarvam’s AI is production-grade commercial infrastructure, not just research output.

Share This Article

Don’t Miss Startup News That Matters

Join thousands of readers getting daily startup stories, funding alerts, and industry insights.

Newsletter Form

Free forever. No spam.