Principal AI Researcher
PHRASE
Description
Principal AI Researcher
At Phrase, we help open the door to global business by providing the world's leading Language Intelligence Platform.
The Phrase Platform combines AI, agentic orchestration, and a headless, API-first architecture in one composable system. Beyond translation, it orchestrates and adapts content to culture, audience, channel, brand voice, and intended outcome in any language, for every audience. It applies the context that makes content perform in every market: quality standards, glossaries, prior translations, and cultural nuance. Every team, in every region, can ship content that is on-brand, on-point, and ready for any audience.
Phrase gives enterprises the intelligence to automate workflows, the freedom to connect their own tools and engines, and the control to govern global content at scale.
The AI Research team sits at the core of Phrase's product differentiation. We build, train, and evaluate proprietary translation models; we design the agentic workflows that orchestrate translation quality end-to-end; and we run the research cadence that feeds product development with grounded, evidence-based decisions.
The team's current research programmes span: fine-tuning LLMs for domain-specific machine translation, multi-agent evaluation pipelines, automated quality profiling from style guides, and active learning through feedback loops on real customer data. These are not exploratory proofs of concept; they are live systems serving production traffic.
The next phase requires someone who can take ownership of the most technically demanding of these streams, extend them, and help chart the research direction for the team as a whole.
What you'll be responsible for:
Core Research
- Lead design, training, and evaluation of LLMs for translation and language quality tasks, including work with fine-tuning techniques such as LoRA and DPO, and instruction-tuned models at various scales.
- Design and implement robust evaluation frameworks for translation quality, moving beyond automatic metrics (BLEU, TER, ChrF, MQM, COMET), LLM-as-judge approaches, and hybrid evaluation pipelines.
- Architect and evaluate complex agentic workflows for NLP problems: multi-step reasoning, tool use, structured output generation, and orchestration across multiple model providers.
- Take technical ownership of the team's flagship model development programme, from data curation and training pipeline design through to production integration and ongoing evaluation.
- Design and run experiments using ML pipelines, maintaining reproducibility and clear documentation of results and decisions.
Applied NLP and Product Collaboration
- Work closely with Product, Engineering, and Solutions to translate research findings into concrete, shippable capabilities.
- Provide expert input on model provider strategy: evaluation of frontier models (OpenAI, Gemini, Claude, open-source), benchmarking against internal baselines, and recommendations on production use.
- Contribute to the evolution of quality evaluation systems, including integration of style guides, and a focus on customer-led, outcome-oriented metrics and evaluation methodologies.
- Support decisions on active learning, feedback loop design, and data pipeline strategy for continuous model improvement.
Team and Leadership
- Act as a technical lead for the research team: setting direction on open problems, reviewing PRs and research write-ups, running or contributing to internal enablement sessions such as reading groups, deep dives etc.
- Mentor junior and mid-level researchers; model good research hygiene and collaborative working practices.
- Contribute to hiring decisions for the team, including take-home assignment design and technical interviews.
- Represent the AI Research team