The Role

We are looking for a Senior SRE / Platform Engineer (m/f/d) to own and improve the cloud infrastructure behind SimScale's browser-based simulation platform. The role spans AWS and EKS, observability, disaster recovery, security and compliance controls, multi-region architecture, elastic GPU/HPC capacity, and internal developer tooling.

SimScale's engineering teams run workloads directly on AWS; you will build the standards, guardrails, and self-service tooling that let them do so safely, raising reliability and security without slowing engineering velocity. You will join a small, tightly knit infrastructure team supporting 50+ engineers across the company. This is a hands-on senior individual contributor role; people management is not required, but there is a genuine path toward tech-lead ownership as the team grows.

Your Opportunity

Evolve our Kubernetes platform: Evaluate and adopt technologies such as Kubernetes Gateway API and service mesh patterns, and coordinate platform evolution across 10+ engineering teams.
Take observability to the next level: Drive organization-wide adoption of OpenTelemetry for distributed tracing and metrics, and help teams define meaningful SLOs.
Shape multi-region architecture and data residency: Support our move from an EU-centered footprint toward a global, multi-cloud architecture that satisfies disaster-recovery and data-residency requirements.
Own cloud cost and efficiency at scale: Keep petabyte-scale infrastructure cost-efficient, secure, and well-instrumented.
Improve tooling: Build self-service AWS account provisioning, guardrails and AI-assisted automations that help engineering teams manage infrastructure safely and efficiently at scale.

What We Expect from You

5+ years of professional experience in SRE, platform, or infrastructure engineering.
Software development experience: Your background is rooted in software development, and you moved into SRE from there. You write production-quality software in at least one of Python, Go, Rust, or Java.
Strong systems foundation: You understand Linux internals and distributed systems well enough to debug complex production behavior.
Hands-on cloud and infrastructure experience: AWS (or GCP), declarative infrastructure (Terraform), gitops-workflow (ArgoCD) and container orchestration (Kubernetes).
Observability and reliability experience: You have worked with OpenTelemetry, Prometheus, distributed tracing, monitoring, and meaningful SLOs/SLIs.
Production debugging depth: You can investigate complex failures, communicate clearly during incidents, and turn findings into durable improvements.
Security and compliance awareness: You understand how infrastructure decisions affect access control, auditability, disaster recovery, logging, and standards such as SOC 2.
Clear communication: You can explain trade-offs to engineering teams and help others adopt better platform practices without unnecessary friction.

Bonus Points

An open source portfolio or contributions.
Prior technical leadership experience, especially in infrastructure, reliability, or platform engineering.

Location: Remote (within CET ±5h)

What you can expect from us

Join a dedicated, supportive team with unlimited growth opportunities and leadership potential
Make an impact quickly by sharing ideas and contributing to creative, goal-oriented projects
Work in a diverse, inclusive environment with colleagues from over 35 countries
Enjoy flexible hours and the freedom to work remotely from anywhere in the world
Access comprehensive health coverage, retirement plans, paid time off, and wellness support
Enjoy fresh office lunches or gift cards as a remote employee
Grow as a professional with online/offline

Senior SRE / Platform Engineer (m/f/d)

Description

The Role

Your Opportunity

What We Expect from You

Bonus Points

What you can expect from us

More jobs