Deploy reliable AI infrastructure for enterprise production
Own the infrastructure that runs AI systems in production. You'll design deployment architectures, set up monitoring and alerting, ensure security compliance, and keep everything running smoothly at enterprise scale. This isn't ticket-driven ops work. You're building the platform that enables rapid, reliable delivery across multiple client environments. Expect to write code, design systems, and debug complex distributed failures.
Morning: Review overnight alerts (none, because you built good alerts). Mid-morning: Design review for multi-region deployment. Afternoon: Implement automated failover for critical service. Evening: Write runbook for new deployment pattern.
Build infrastructure for systems that millions depend on. Make architectural decisions that matter. Work with modern tools and patterns. No legacy baggage to maintain.
This role requires completing a technical challenge as part of the application process. Challenge: Medium: High-Availability Service
View Challenge DetailsRotating on-call (one week per month). Incidents are rare because we invest in reliability. Compensation for on-call time and incident response.
Everything is code. Manual deployments are failures. You'll spend significant time building automation that prevents repetitive work.
Required. Medium-difficulty infrastructure challenge (6-8 hours). We evaluate: system reliability, observability setup, deployment automation.
Submit your application and we'll be in touch within one week if there's a potential fit.
Apply for this Role