Skip to main content

Senior Data Quality Engineer (4 Months Contract ) Onsite in UAE - Octopus by RTG

Robusta

Abu DhabiOn-siteFull-Time3w ago

Description

About the Role We are seeking an experienced Senior Databricks Data Quality Engineer to lead the design, implementation, and automation of enterprise-scale data quality frameworks within a Databricks environment. The successful candidate will play a key role in establishing data quality controls, profiling frameworks, remediation processes, and AI-assisted quality monitoring across a large-scale data platform consisting of 170+ datasets and over 1,300 Critical Data Elements (CDEs).

This role requires strong expertise in Databricks, PySpark, Delta Lake, MLflow, and modern data quality management practices.

Key ResponsibilitiesData Platform & Databricks Configuration* Configure and manage Databricks workspaces, compute clusters, PySpark notebooks, Delta Lake architecture, and Unity Catalog integrations.

  • Design scalable data quality processing frameworks across 170+ datasets and 1,346 prioritized Critical Data Elements (CDEs).

Data Profiling & Quality Assessment* Develop AI-assisted profiling notebooks using PySpark to establish baseline data quality scores.

  • Assess data quality across six key dimensions including:
    • Completeness
      • Uniqueness
      • Validity
      • Consistency
      • Accuracy
      • Timeliness
  • Analyze null rates, duplicate records, invalid values, format violations, outliers, and schema drift.

Data Quality Rule Framework* Design and build a scalable Data Quality Rule Factory using parameterized PySpark functions.

  • Enable automated deployment of over 6,700 data quality rules without manual rule-by-rule development.
  • Create reusable rule templates across datasets and data quality dimensions.

Pipeline Quality Enforcement* Integrate data quality controls within Bronze, Silver, and Gold Delta Lake layers.

  • Implement quality gates that prevent data progression unless predefined thresholds are met.
  • Develop reusable Databricks Jobs for automated validation and monitoring.

Data Cleansing & AI-Driven Remediation* Build automated data cleansing pipelines for:

    • Standardization
      • Deduplication
      • Schema harmonization
  • Deploy MLflow-managed machine learning models for:
    • Anomaly detection
      • Fuzzy duplicate detection
      • Exact duplicate identification
  • Ensure explainability of model outputs and support human-in-the-loop validation processes.

Exception Management* Design failed-record handling frameworks and quarantine Delta tables.

  • Capture failure reasons, affected CDEs, rule references, and timestamps.
  • Develop automated reprocessing mechanisms for corrected records.

Data Quality Monitoring & Reporting* Build Delta Lake aggregation tables for data quality metrics.

  • Deliver data quality KPIs to Power BI dashboards including:
    • Dimension-level scores
      • Rule pass/fail rates
      • SLA adherence metrics
  • Configure automated alerting using Databricks SQL Alerts and Azure Monitor.

Predictive Data Quality Analytics* Develop predictive models to identify datasets at risk of quality degradation.

  • Support AI-assisted Root Cause Analysis (RCA) using profiling outputs and machine learning techniques. Export and prepare remediation datasets for prioritization and governance reporting.

Requirements

  • Bachelor's degree in Computer Science, Data Engineering, Information Systems, or a related field.
  • 5+ years of experience in Data Engineering or Data Quality Engineering.
  • 3+ years of hands-on experience with Databricks and PySpark.
  • Strong expertise in Delta Lake architecture and data pipeline development.
  • Experience with Unity Catalog implementation and governance.
  • Hands-on experience with MLflow and machine learning deployment.
  • Strong SQL skills and data modeling expertise.
  • Experience building enterprise-scale data quality frameworks.
  • Experience integrating Databricks with Power BI and Azure services.
  • Strong understanding of data governance, metadata management, and data quality dimensions.

Preferred Qualifications* Microsoft Azure certifications.

  • Databricks

More jobs in Abu Dhabi