Senior Software Engineer, Applied AI
NVIDIA
Description
We are looking for a Senior Software Engineer, Applied AI Systems, to build production AI / ML and agentic solutions. We need a hands-on senior engineer who can turn ambiguous technical problems into durable software systems and AI-enabled systems: agents, workflow services, APIs, data pipelines, tool integrations, evaluation and benchmarking harnesses, reference architectures, and operational tooling.
We work at the intersection of applied AI, agentic workflows, software engineering, distributed systems, performance engineering, accelerated computing, and data infrastructure. In this role, you will build AI systems as real software systems: write and review high-quality code, make architecture tradeoffs, benchmark behavior and performance, and outcomes from prototype through validation, hardening, deployment, and ongoing support. This is an opportunity to shape how production applied AI systems are built, measured, and reused inside NVIDIA!
We partner across global teams and time zones for design reviews, planning, debugging, support critical issues, and technical decision-making. We need an engineer who turns complex requirements into clear technical plans, keeps the focus on reusable software capability rather than one-off delivery, and drives execution across teams.
What you will be doing:
- Build and own production-grade applied AI systems for NVIDIA’s technical and solution development use cases, including agentic solutions where they materially improve the systems and softwares.
- Design and build agentic workflows and the software around them: workflow services, APIs, retrieval, MCP/A2A-style tool integrations, agent harnesses, automation, telemetry, operational controls, and human oversight.
- Design reliable services, APIs, workflow state, event-driven execution, and observability using systems such as Kafka, ClickHouse, and OTel-style patterns.
- Translate complex technical and operational requirements into clear system designs, plans, interfaces, measurable outcomes, and pragmatic technical decisions through design reviews, code reviews, and clear communication.
- Develop production software in Python and other relevant languages, with strong testing, observability, CI/CD, documentation, and operational practices.
- Build performance and benchmarking workflows for existing production solutions or products, including validation harnesses, regression tests, tracing, metrics, failure analysis, latency, throughput, reliability, resource usage, and AI/inference behavior where relevant.
- Improve standard solution patterns alongside larger applied AI systems, working with NVIDIA engineering and solution teams to codify repeated patterns, product gaps, and field lessons into APIs, services, reference architectures, playbooks, test harnesses, and shared engineering building blocks.
- Debug and support production solutions across software, infrastructure, AI models, data pipelines, inference services, and GPU-accelerated environments, turning recurring support patterns into product or platform improvements.
What we need to see:
- BS, MS, or PhD in Computer Science, Engineering, AI/ML, or equivalent experience, with 5+ years of professional software engineering experience owning production systems or meaningful platform components.
- Hands-on experience with LLM, generative AI, RAG, agentic AI, MCP or intelligent AI technologies beyond simple prompting or notebooks, including tool use, retrieval, evaluation, guardrails, orchestration, or human-in-the-loop control.
- Strong Python engineering skills and practical experience with at least one additional production programming language such as C++, Go, Rust, or TypeScript.
- Demonstrated ability to develop and build distributed systems, backend services, data pipelines, workflow orchestration, APIs, or developer platforms using production environments like Kafka, ClickHouse, PostgreSQL, Redis, object storage, Kubernetes, or similar technologies.