DevOps Engineer - Studio Platform
Sarvam AI
About the role
About Sarvam
Sarvam is building the bedrock of Sovereign AI for India. The company is developing India’s full-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India’s leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.
About the Role
We are seeking a driven and highly skilled DevOps Engineer to join the Studio engineering team at Sarvam. Studio is our creative media platform — powering AI dubbing, live translation, voice cloning, and more across 12+ Indian languages for enterprise media companies and millions of creators.
You will own the operational reliability, deployment velocity, and infrastructure automation for a production system that orchestrates ML-heavy workloads at scale. The platform runs a multi-service architecture on Kubernetes with GPU inference servers, async task queues, multi-stage ML pipelines, and real-time delivery — across a multi-cloud setup.
You will be the person who keeps this running, makes it faster to ship, and ensures it scales with growing demand.
What You’ll Do
- Own and operate production Kubernetes clusters — managing multi-role deployments (API servers, task schedulers, per-stage workers, WebSocket servers), horizontal pod autoscalers, CronJobs, and node pool scheduling
- Build and optimize CI/CD pipelines— automated testing, container image builds, registry management, staged rollouts across QA/staging/production, and deployment notifications
- ManageHelm-based deployments— maintaining application charts, shared module dependencies, environment-specific value overlays, and promotion workflows
- Implement and maintain observabilityacross services — metrics collection, dashboards, error tracking, and distributed tracing to catch issues before customers do
- Operate and optimize cloud infrastructure— blob storage with CDN for media delivery, ingress configuration, secrets management, and IAM policies
- Managemulti-cloud coordination— ensuring consistent deployments, artifact management (Docker images + internal Python packages), and credential management across cloud providers
- Ensure robusts ecrets management— vault-to-cluster sync pipelines, service account credentials, and CI/CD secret hygiene
- Optimize async task queue infrastructure — health monitoring, dead-letter handling, stuck message recovery, and real-time notification delivery
- Monitor and optimize database performance — query health, connection pooling, backup/recovery, and coordinating with the platform team on shared infrastructure
- Build developer productivity tooling — local dev environments, pre-commit hooks, self-service infrastructure, and documentation
- Own incident response — runbooks, alerting rules, post-mortem processes, and reliability improv
Underpaid estimate
~₹25.4 LPA for DevOps Engineers (industry-wide) · based on 42 submissions