Skip to main content

MLOps Engineer

CNTXT AI

Abu Dhabi Emirate, United Arab EmiratesOn-siteFull-TimeToday

Description

Job role:

A dedicated startup is being formed to industrialize and scale a secure, AI-enabled, multi-source decision-support software offering. The platform is a multi-sensor fusion and agentic AI solution connecting to diverse data sources (for example geospatial layers, imagery, video, and other operational signals). This role will support the delivery of a scalable product and contribute to establishing the processes, standards, and collaboration practices required for sustainable growth.

The Cloud Infrastructure Engineer is responsible for designing, deploying, and maintaining secure, scalable, and highly available cloud environments. This role focuses on building robust infrastructure on AWS (or multi-cloud environments, if applicable), automating operational processes, and ensuring the reliability and performance of cloud-based systems. The ideal candidate combines deep technical expertise with strong problem-solving skills and a passion for automation and cloud-native technologies.

Job Responsibilities

  • Design and operate end-to-end ML/LLM delivery pipelines: data to training/fine-tuning to evaluation to packaging to deployment
  • Build CI/CD for models and services, including automated testing, validation gates, and rollback strategies
  • Standardize experiment tracking, model/version lineage, and artifact management (datasets, prompts, checkpoints, embeddings)
  • Implement monitoring and observability: latency, cost, drift, quality signals, and safety/guardrails metrics
  • Optimize inference performance and cost (batching, caching, quantization, hardware choices)
  • Define and enforce environment and dependency management across dev/stage/prod
  • Work with engineering on scalable serving patterns (APIs, streaming, event-driven), and with security on access controls and secrets
  • Support release readiness: runbooks, incident response, SLOs/SLAs, and post-release stability tracking
  • Coordinate with procurement and legal where needed for tooling, cloud services, and vendor onboarding
  • Startup mode: hands-on, flexible, comfortable pivoting, and able to unblock teams quickly
  • Interfaces / stakeholders

Qualifications & Experience

  • Typically 5+ years in MLOps/DevOps/Data Platform roles, including production deployments of ML and/or LLM-powered systems.
  • Experience in fast-paced product environments preferred.
  • Tools (examples)
  • ML lifecycle: MLflow / Weights & Biases / equivalent
  • Serving: FastAPI, Triton (plus), Ray Serve (plus)
  • Orchestration: Airflow/Dagster (plus)
  • Observability: Prometheus/Grafana, OpenTelemetry, ELK
  • Cloud: AWS/Azure/GCP (or private cloud)
  • KPIs
  • Deployment frequency and lead time for model releases
  • Production stability: incident rate, MTTR, SLO compliance
  • Model quality health: drift detection coverage, evaluation gate pass rate
  • inference cost and latency improvements
  • Reproducibility and traceability coverage (lineage completeness)

Competencies

 Strong MLOps fundamentals: model lifecycle, reproducibility,

evaluation, deployment, monitoring

 Proficiency with containers and orchestration (Docker; Kubernetes

is a plus)

 CI/CD and automation (GitHub Actions/GitLab CI/Jenkins),

infrastructure-as-code (Terraform is a plus)

 Experience with model serving patterns (REST/gRPC), and

observability tools

 Comfort with cloud primitives (compute, storage, networking) and

cost management practices

 Clear communication and documentation; strong ownership and

operational discipline

More jobs