QUICK TAKE:
| Company: | Tech Mahindra — IT services and consulting giant; $6B+ revenue; Mahindra Group; one of India’s Big 5 IT firms |
| Initiative: | Project Indus — Tech Mahindra’s sovereign Hindi-first LLM programme, developed under IndiaAI Mission |
| Model Size: | 8 Billion parameters (scaled from 1.2B in previous version) — built on Meta’s Llama 3.1 8B Instruct base |
| Partner: | NVIDIA — NeMo framework, NIM microservices, NeMo Data Designer, NVIDIA AI Enterprise on AWS cloud |
| Language: | Hindi-first with authentic Indian linguistic and cultural context; NVIDIA NeMo Curator for multilingual data |
| Use Case: | Education — STEM subjects (physics + others); democratising high-quality learning for millions of Hindi-speaking students |
| Agentic AI: | Supports autonomous AI agents that communicate in natural Hindi — not just query-response but interactive learning agents |
| Data Innovation: | 500 million synthetic tokens generated using NVIDIA NeMo Data Designer to address Hindi data scarcity |
| Mandate: | One of 8 entities selected by Government of India under IndiaAI Mission to build homegrown foundational LLMs |
| Announced: | February 20, 2026 — Final day of India AI Impact Summit 2026, Bharat Mandapam, New Delhi |
THE PROBLEM PROJECT INDUS IS SOLVING — THE LANGUAGE BARRIER IN INDIAN EDUCATION
|
THE STORY
On February 20, 2026 — the final day of the India AI Impact Summit — Tech Mahindra announced the next evolution of Project Indus: a Hindi-first, 8-billion-parameter education LLM built in partnership with NVIDIA and developed under the IndiaAI Mission. The announcement marks a significant scale-up from the company’s earlier 1.2-billion-parameter foundational model — a nearly 7x increase in model capacity — and represents India’s first purpose-built Hindi-first LLM specifically designed for education and STEM learning. The model was announced at India’s most important AI gathering — not in Silicon Valley, not in Beijing — but at Bharat Mandapam, New Delhi, a deliberate signal that sovereign AI for education is now a national priority, not a research experiment.
“AI is becoming central to national digital infrastructure and inclusive growth, but global foundational models are often not designed for countries with deep linguistic and cultural diversity like India. A key industry challenge is the lack of domain-trained language models grounded in local languages and learning contexts, particularly in education. Through Project Indus, our collaboration with NVIDIA directly addresses this gap by delivering a Hindi-first, sovereign AI model that enables scalable, relevant, and accessible AI-powered learning and citizen-centric services for India.” — Nikhil Malhotra, Chief Innovation Officer & Global Head of AI and Emerging Technologies, Tech Mahindra
Project Indus 8B — Full Technical Breakdown
| Layer | Component | Detail & Significance |
|---|---|---|
| Base Model | Meta Llama 3.1 8B Instruct | Industry-standard open-weight foundation model by Meta; 8 billion parameters; instruction-following architecture. Tech Mahindra fine-tuned and adapted this for Hindi-first education use cases — building on a proven foundation rather than training from scratch, dramatically reducing compute costs while enabling India-specific customisation. |
| Training Framework | NVIDIA NeMo | NVIDIA’s enterprise-grade LLM training and fine-tuning framework. Provides the pipeline for data curation, model training, alignment, and evaluation. NeMo is what enables Tech Mahindra to train at scale with the efficiency required for a 8B-parameter model within realistic compute budgets. |
| Data Generation | NVIDIA NeMo Data Designer | The most technically notable element. India faces a structural problem: while Hindi has hundreds of millions of speakers, the high-quality, domain-specific digital text needed to train LLMs on STEM subjects in Hindi is sparse. NeMo Data Designer was used to synthetically generate 500 MILLION training tokens in Hindi — solving the data scarcity problem that stops most Indian language AI projects before they start. |
| Deployment | NVIDIA NIM Microservices + AWS Cloud | NVIDIA NIM (NVIDIA Inference Microservices) enables production-ready deployment with optimised inference, low latency, and scalability. AWS cloud provides the infrastructure backbone. Together they ensure Project Indus can serve millions of concurrent student queries without performance degradation. |
| AI Capabilities | Agentic AI — Autonomous Hindi Agents | Beyond simple Q&A, Project Indus supports the creation of autonomous AI agents that converse fluently in natural Hindi. This means the model can power interactive tutors that ask follow-up questions, guide a student through a multi-step physics problem, check understanding, and adapt the explanation based on response — not just answer a single query and stop. |
| Data Curation | NVIDIA NeMo Curator | For multilingual and multimodal data curation. Ensures the training dataset has quality, diversity, and cultural authenticity across Hindi dialects and regional variations — critical for a country where ‘Hindi’ means very different things in Rajasthan, UP, Bihar, and Madhya Pradesh. |
| Compute Infrastructure | NVIDIA AI Enterprise on AWS | Production-grade AI software stack on AWS cloud. John Fanelli (VP, Enterprise Software, NVIDIA): ‘delivers the production-ready performance, reliability and scale required to power Project Indus.’ |
The 500 Million Synthetic Token Problem — Why This Is the Most Important Technical Decision
| GPT-4 was trained on an estimated 45-100 trillion tokens. Llama 3.1 was trained on 15+ trillion tokens. The vast majority of that data is in English, with significant representation of French, German, Spanish, Chinese, and Japanese.
High-quality STEM education content in Hindi — textbooks, solved problems, teacher explanations, concept walkthroughs — barely exists at meaningful digital scale. What exists is often low quality, inconsistent in register, or a translation of English material that loses the natural way Hindi speakers explain concepts. The solution Tech Mahindra deployed is synthetic data generation via NVIDIA NeMo Data Designer: using an existing model to generate 500 million Hindi training tokens covering STEM domains, then filtering and quality-checking that synthetic data to ensure it reflects authentic Hindi language patterns and pedagogically sound explanations. This is not a workaround — it is frontier AI technique. OpenAI, Google, and Meta all use synthetic data generation to fill gaps in their training sets. The fact that Tech Mahindra deployed this for Hindi education at 500M-token scale is a strong signal of genuine technical ambition, not just a rebranding exercise. |
“The global push for sovereign AI is accelerating demand for foundation models tailored to local languages and cultural contexts. By leveraging NVIDIA AI Enterprise, Tech Mahindra delivers the production-ready performance, reliability and scale required to power Project Indus.” — John Fanelli, VP Enterprise Software, NVIDIA
India’s Sovereign LLM Landscape — Where Project Indus 8B Fits
Project Indus wasn’t the only sovereign Indian LLM launched this week. Here’s how it stacks up against the other homegrown models unveiled at the Summit:
| Model / Company | Params | Languages | Type | Primary Use Case |
|---|---|---|---|---|
| Project Indus 8B (Tech Mahindra) | 8B | Hindi-first | Private / Corporate | Education — STEM learning in Hindi; agentic AI tutors; citizen services |
| Sarvam 30B ‘Vikram’ (Sarvam AI) | 30B | 22 Indian | Open-weight sovereign | General + voice-first; real-time conversation; agentic workflows; UIDAI integration |
| Sarvam 105B (Sarvam AI) | 105B | 22 Indian | Open-weight sovereign | Advanced reasoning; mixture-of-experts; complex multi-step tasks; government deployments |
| BharatGen Param 2 (Govt) | 17B | 22 Indian | Government sovereign | Offline deployment; governance; healthcare; education; courts — low-connectivity areas |
| Gnani Vachana TTS (Gnani.ai) | N/A | 12 Indian | Voice / TTS system | Zero-shot voice cloning; 10M calls/day; citizen services; customer support |
| Project Indus 1.2B (earlier) | 1.2B | Hindi | Private | Foundational — predecessor; now superseded by 8B education-focused version |
Project Indus’ Unique Position in the Landscape:
|
The Education Market Opportunity — Why Hindi-First AI Has $100B+ Potential
India’s edtech market is projected to reach $30 billion by 2030 per IBEF estimates. But that number understates the actual opportunity for Hindi-first education AI. Consider: BYJU’s, Unacademy, Vedantu — all of India’s major edtech platforms built primarily in English, for English-comfortable students. The 400+ million Hindi-speaking students in government schools, rural coaching centres, and state board curricula have been structurally excluded from the edtech boom. Project Indus is an explicit bet that this excluded majority is the next edtech frontier.
The 3 Deployment Pathways for Project Indus 8B:
|
STARTUPFEED INSIGHT
| The Bigger Strategic Picture: Tech Mahindra’s Project Indus announcement is strategically timed for the final day of the India AI Impact Summit — when global media attention is highest and the week’s narrative is being set. But the real signal is what it says about the role of India’s legacy IT services giants in the AI era. TCS, Infosys, Wipro, HCL, Tech Mahindra — these are $5-30B revenue companies whose traditional business is delivering software services to Western enterprises. Project Indus is Tech Mahindra making a product bet: not services for someone else, but a sovereign Indian AI model that could generate licensing revenue, government contracts, and platform value domestically. If it works, it rewrites the business model of Indian IT. If it does not, it still earns Tech Mahindra a seat at the IndiaAI Mission table for the next decade of government contracts. | |
| For EdTech Founders: | The arrival of a Hindi-first education LLM from a well-resourced IT giant changes the build-vs-buy calculus. Instead of training your own Hindi STEM model (expensive, slow), EdTech startups can now potentially license Project Indus as an API and focus engineering resources on the product layer — student experience, assessment design, personalisation logic. Watch for Tech Mahindra to announce commercial API access or an EdTech partner programme in H2 2026. |
| For Investors: | The Hindi language AI space is about to see its first Series A wave. Any EdTech startup building specifically for Hindi-medium government school students, competitive exam prep in Hindi (UPSC, state PSC, Railways), or skill development in Hindi now has a credible AI infrastructure layer to build on. Look at companies like Vedantu Hindi, Adda247, Exampur, and Gradeup — the ones that already serve Hindi-medium students at scale but are tech-infrastructure-constrained. |
| For Policy Makers: | Tech Mahindra’s announcement is a proof point that the IndiaAI Mission’s 8-entity sovereign LLM mandate is working. The next step: the government should issue a ‘Hindi AI in Education’ procurement guideline that requires all government-funded EdTech platforms to offer Hindi AI assistance certified under IndiaAI Mission standards. This creates a guaranteed first-customer base for all 8 IndiaAI Mission LLM builders — including Tech Mahindra. |
| Our Prediction: Within 6 months, Tech Mahindra will announce the first enterprise deployment of Project Indus — likely with a state government education board (Uttar Pradesh, Rajasthan, or Madhya Pradesh are highest probability given Hindi-belt demographics and BJP government alignment with AI-in-education narrative). Within 12 months, a major Indian EdTech platform (Physics Wallah is the most natural fit given its Hindi-medium positioning and 100M+ student base) will announce a partnership or licensing deal. By 2027, Project Indus will expand from physics to a full NCERT curriculum across all major subjects, and Tech Mahindra will position it as the infrastructure layer for India’s ₹1 lakh crore National Digital Education Mission. The Hindi-first bet is not just a product — it is Tech Mahindra’s bid for the largest government edtech contract India has ever issued. | |
