Product Data Scientist — AI Evaluation & Quality
Finom
Description
About Finom Finom is a European tech startup headquartered in Amsterdam, and we’re on a journey towards revolutionizing the financial landscape for entrepreneurs worldwide. Our mission is to develop an all-in-one financial B2B solution that integrates banking functions, accounting, financial management, and invoicing into a seamless, mobile-first platform.
We recently closed a €115 million Series C equity round (around $133 million), bringing our total funding to approximately $346 million. This significant investment follows a $105 million growth funding round from General Catalyst, a long-term backer since 2021 known for supporting companies like Airbnb, HubSpot, KAYAK, and Stripe.
Finom's platform goes beyond traditional banking, offering invoicing and a growing suite of features, including AI-enabled accounting, aiming to simplify financial management for entrepreneurs. We're actively expanding our reach across key EU markets like Germany, France, the Netherlands, Italy, and Spain.
At Finom, we’re not just redefining the entrepreneurial experience — we’re empowering our employees to make a real difference. Your work matters, and your impact extends far beyond product metrics. We nurture innovation and an inspiring work environment where bold ideas thrive, prioritizing thorough research, swift implementation of solutions, and ensuring that every effort we make benefits our users, employees, partners, and our business as a whole.
Maintaining our start-up spirit, we prioritize thorough research, swift implementation of solutions, and ensuring that every effort we make benefits our users, employees, partners, and, of course, our business.
You'll join the AI Team — the group driving all AI products and technology at Finom
We build and ship AI across the company: AI financial co-pilot, voice agent, and internal AI-powered processes
Our belief: your AI agent is only as good as your eval loop — we can build AI as good as the evals we run on it
Your mission: own that eval loop across every AI product we ship — pre-launch quality gates, post-launch monitoring, continuous improvement
You'll work directly with our AI Quality lead, Igor Kolodkin
Close collaboration with AI engineers, Product, and domain experts across the company
Core stack: Databricks, DeepEval, Claude Code
What You Will Be Doing
- Own and extend our offline eval suite across products — datasets (capability + regression), judges, metrics
- Build and maintain online quality dashboards: resolution rate, CSAT, thumbs up/down, LLM-as-judge signals, error rate, latency
- Close the production feedback loop: mine failure patterns from real traffic → turn them into regression cases → propose fixes to Product and domain experts
- Harden methodology: judge stability, non-determinism handling
- Translate numbers into decisions - weekly syncs, clear trade-offs, no dashboards for their own sake
Must-Haves
- Python and SQL — you can build an analysis end-to-end
- Solid foundation in statistics — sampling, hypothesis testing, variance, understanding what a noisy metric is
- Analytical mindset — you start from the business question, not from the tool
- 3+ years in analyst / data scientist roles, at least one in a product context
Nice-to-Haves
- Experience in quality analytics for ML systems — ranking, recommendations, classification, etc.
- Hands-on experience evaluating LLM applications (RAG, agents, tool use, judges)
- Experience building LLM agents — side projects, toy builds, personal experiments all count
How We Work — One Thing We Mean Seriously
- AI-assisted coding is our default authoring environment, not a bonus
- Claude Code is our main tool — you'll reach for it for SQL, Python, analyses, dashboards, and internal scripts
- We're looking for analysts who are already curious and fluent with AI