About Finom Finom is a European tech startup headquartered in Amsterdam, and we’re on a journey towards revolutionizing the financial landscape for entrepreneurs worldwide. Our mission is to develop an all-in-one financial B2B solution that integrates banking functions, accounting, financial management, and invoicing into a seamless, mobile-first platform.

We recently closed a €115 million Series C equity round (around $133 million), bringing our total funding to approximately $346 million. This significant investment follows a $105 million growth funding round from General Catalyst, a long-term backer since 2021 known for supporting companies like Airbnb, HubSpot, KAYAK, and Stripe.

Finom's platform goes beyond traditional banking, offering invoicing and a growing suite of features, including AI-enabled accounting, aiming to simplify financial management for entrepreneurs. We're actively expanding our reach across key EU markets like Germany, France, the Netherlands, Italy, and Spain.

At Finom, we’re not just redefining the entrepreneurial experience — we’re empowering our employees to make a real difference. Your work matters, and your impact extends far beyond product metrics. We nurture innovation and an inspiring work environment where bold ideas thrive, prioritizing thorough research, swift implementation of solutions, and ensuring that every effort we make benefits our users, employees, partners, and our business as a whole.

Maintaining our start-up spirit, we prioritize thorough research, swift implementation of solutions, and ensuring that every effort we make benefits our users, employees, partners, and, of course, our business.

You'll join the AI Team — the group driving all AI products and technology at Finom

We build and ship AI across the company: AI financial co-pilot, voice agent, and internal AI-powered processes

Our belief: your AI agent is only as good as your eval loop — we can build AI as good as the evals we run on it

Your mission: own that eval loop across every AI product we ship — pre-launch quality gates, post-launch monitoring, continuous improvement

You'll work directly with our AI Quality lead, Igor Kolodkin

Close collaboration with AI engineers, Product, and domain experts across the company

Core stack: Databricks, DeepEval, Claude Code

What You Will Be Doing

Own and extend our offline eval suite across products — datasets (capability + regression), judges, metrics
Build and maintain online quality dashboards: resolution rate, CSAT, thumbs up/down, LLM-as-judge signals, error rate, latency
Close the production feedback loop: mine failure patterns from real traffic → turn them into regression cases → propose fixes to Product and domain experts
Harden methodology: judge stability, non-determinism handling
Translate numbers into decisions - weekly syncs, clear trade-offs, no dashboards for their own sake

Must-Haves

Python and SQL — you can build an analysis end-to-end
Solid foundation in statistics — sampling, hypothesis testing, variance, understanding what a noisy metric is
Analytical mindset — you start from the business question, not from the tool
3+ years in analyst / data scientist roles, at least one in a product context

Nice-to-Haves

Experience in quality analytics for ML systems — ranking, recommendations, classification, etc.
Hands-on experience evaluating LLM applications (RAG, agents, tool use, judges)
Experience building LLM agents — side projects, toy builds, personal experiments all count

How We Work — One Thing We Mean Seriously

AI-assisted coding is our default authoring environment, not a bonus
Claude Code is our main tool — you'll reach for it for SQL, Python, analyses, dashboards, and internal scripts
We're looking for analysts who are already curious and fluent with AI

Product Data Scientist — AI Evaluation & Quality

Description

More jobs in Berlin