Skip to main content
L

Data Quality Intern (d/f/m)

Luminovo

DEOn-siteInternship2w ago

Description

Your opportunity

Our software platform is built to solve engineering and procurement issues in the trillion-dollar electronics industry. In other words: Luminovo is redefining the software stack used to bring any type of electronics to live.

  • To understand what we do, check out our website, plus two of our blog posts (here and here).

Your role

  • Our data quality mission is product discovery applied to our part and component data. You take a fuzzy quality problem, figure out what it actually means for customers, measure it honestly, and hand a well-scoped, evidence-backed finding to the team that delivers the larger fix.
  • The hard part isn't running a query (our AI tooling helps with that). It's reframing "x% of parts have no pin count" into "y% of a customer's costings can't complete because of it," then giving other teams a result they can act on without them having to re-check it.
  • You'll be a junior version of this discovery loop: sharp, honest, and data-fluent. You'll follow threads the team doesn't have time to chase, turn them into decision-ready findings, and grow into more autonomy across your internship. You work within a clear direction, and you can take a fix all the way into production when it's a data-level change you can script, like manufacturer merges or backfills. You won't need to be a Rust engineer or own large refactors. AI tooling does the heavy lifting on unfamiliar code and scripting. Your judgment and rigor are what matter most.
  • This role is an internship with a duration of three to six months.

Your performance objectives

  • Turn ambiguous data-quality questions into customer-relevant findings by reframing part-level observations into business/customer impact (e.g. tenant-aware "what actually blocks costing"), defining a sensible metric or proxy, and producing a measured, caveated answer to the question set by the product manager.
  • Independently size problems and test hypotheses against our data by writing read-only queries over the data warehouse (ClickHouse) and production Postgres, and producing numbers you can defend (knowing when a result is double-counted, misleading, or too good to be true)
  • Make the effect of fixes and experiments visible by extending our dashboards and building ad-hoc visualizations that show trends, baselines, and whether an intervention actually moved coverage/correctness.
  • Run small experiments to gather evidence by writing scripts (with AI assistance) against external sources such as SiliconExpert and DigiKey, e.g., to check whether a missing-data gap is fetchable, calibrate a finding, or do spot checks on interesting cases.
  • Verify assumptions in the product itself by navigating the epibator (Rust/TS) codebase with AI tooling to confirm how data is actually resolved/used, and occasionally adding light instrumentation we find we need, without owning large refactors.
  • Apply the fixes you've scoped, safely by writing AI-assisted scripts that correct production customer data at scale: e.g. automating the research to decide whether two manufacturers are the same record and then executing thousands of merges. Make every change safe by construction: dry-run and validate against samples first, work in reversible/checkpointed batches, and put guardrails in place so we never introduce regressions or corrupt manufacturing/costing data.
  • Leave behind durable, trustworthy knowledge by following the mission's loop (brief, investigate, report, distill), citing evidence, dating facts, and writing findings other teams and stakeholders can act on without re-deriving them.
  • Be your own harshest critic by reconciling and sanity-checking your own results, clearly separating "what's proven" from "what's still a hypothesis," and flagging loudly when a finding overturns a prior assumption (incl. your own).

What you bring

More jobs