Department: AI Delivery Engineering

Reports To: Head of AI Infrastructure or DevOps Manager

Location: On-site

Employment Type: Full-time

About the Role

We are seeking a highly skilled and experienced Senior DevOps Engineer to design, deploy, manage, and optimize Nutanix-based Kubernetes (K8s) infrastructure that powers our enterprise AI delivery platform. In this critical role, you will own the end-to-end lifecycle of Kubernetes clusters on Nutanixfrom architecture and provisioning to monitoring, scaling, and securityenabling rapid delivery of AI services, machine learning pipelines, and intelligent applications.

You will work closely with AI/ML engineers, data scientists, and platform teams to ensure our infrastructure is resilient, scalable, secure, and optimized for AI/ML workloads (e.g., training, inference, real-time analytics).

Key Responsibilities

End-to-End Kubernetes Platform Ownership: Design, deploy, manage, and maintain production-grade Kubernetes clusters on Nutanix Karbon (or native K8s on Nutanix AHV), ensuring high availability, performance, and security.
AI/ML Infrastructure Architecture: Architect and implement scalable, cost-efficient infrastructure tailored for AI workloadsincluding GPU orchestration, distributed training, model serving, and data-intensive pipelines.
Infrastructure as Code (IaC): Automate provisioning and configuration of Nutanix K8s environments using Terraform, Ansible, Helm, and GitOps workflows (e.g., ArgoCD/Flux).
CI/CD for AI Services: Build and maintain secure, efficient CI/CD pipelines for deploying AI microservices, model endpoints, and data processing jobs into K8s environments.
Observability & SRE Practices: Implement comprehensive monitoring, logging, and alerting (using Prometheus, Grafana, ELK, OpenTelemetry, etc.) with SLO/SLI tracking for AI platform reliability.
Security & Compliance: Enforce zero-trust networking, RBAC, pod security policies, image scanning, and secrets management (e.g., HashiCorp Vault) aligned with enterprise security standards.
Performance Optimization: Tune K8s scheduling, storage (Nutanix Files/Objects), networking (CNI), and resource allocation (CPU/GPU/memory) for AI/ML workloads.
Collaboration & Enablement: Partner with AI/ML engineers to onboard models and services onto the platform; document best practices and provide self-service tooling.
Disaster Recovery & Backup: Implement and test backup/recovery strategies for K8s workloads and persistent data using Nutanix-native or third-party tools (e.g., Velero).

Required Qualifications

5+ years of DevOps/SRE experience with 3+ years focused on Kubernetes in production environments.
Deep hands-on experience with Nutanix (AHV, Prism, Karbon, Files, Objects) and managing K8s on-prem or hybrid.
Proven track record designing and operating AI/ML infrastructure (e.g., Kubeflow, MLflow, Seldon, KServe, Ray).
Expertise in Infrastructure as Code: Terraform, Helm, Ansible, GitOps.
Strong scripting/automation skills (Python, Bash, Go).
Experience with GPU orchestration (NVIDIA device plugins, MIG, CUDA) in K8s.
Solid understanding of networking, storage, and security in K8s (CNI, CSI, RBAC, OPA/Gatekeeper).
Familiarity with CI/CD tools (GitLab CI, Jenkins, GitHub Actions) and artifact management (Harbor, JFrog).
Experience with observability stacks (Prometheus, Grafana, Loki, Tempo, OpenTelemetry).
Bachelors degree in Computer Science, Engineering, or equivalent practical experience.

Preferred Qualifications

Nutanix certifications (e.g., NCP-MCI, NCP-DS).
CNCF certifications (CKA, CKAD, CKS).
Experience with multi-cluster management (Rancher, Anthos, OpenShift).
Knowledge of MLOps practices and tools (MLflow, TFX, Kubeflow Pipelines).
Experience in regulated industries (finance, healthcare) with compliance needs (SOC2, HIPAA, GDPR).

Why Join Us?

Lead the infrastructure backbone for cutting-edge AI products used by millions.
Work with a world-class team of AI researchers, engineers, and product innovators.
Shape the future of on-prem/cloud-hybrid AI infrastructure at scale.
Competitive compensation, equity, and benefits.

Senior DevOps Engineer Nutanix Kubernetes & AI Platform

Über den Arbeitgeber

Ähnliche Stellen