Data & AI Infrastructure Engineer
Allegiance Real Estate
Description
About Us
Allegiance Real Estate is one of Dubai's fastest-growing real estate firms, driven by innovation, technology, and a commitment to delivering exceptional customer experiences. We are building Aredxb.ai — an AI-first ecosystem powered by Ask Alpha , our multi-agent investment intelligence platform — that transforms how property is discovered, analyzed, and sold across the GCC, with a franchise model expanding toward 50 international markets.
Alpha already runs in production, used daily by our agent network. The next phase of growth depends on one thing above all: the data infrastructure underneath it. That's what this role owns.
Role Overview
Alpha's AI layer — retrieval, conviction scoring, conversational intelligence — is only as good as the data feeding it. Right now, critical market data (live pricing, transaction history, developer intelligence) flows in from multiple external data providers but isn't yet structured, indexed, or production-grade. As we scale to dozens of countries, each with its own local data sources, currencies, and formats, this becomes the single biggest constraint on how good Alpha can get.
You will own this layer end to end — designing the pipelines, the warehouse architecture, and the API integrations that turn raw, messy, multi-source data into clean, reliable infrastructure our AI systems can build on. You'll work directly alongside our AI engineering lead, architecting how data flows from source to model, not just moving it from A to B.
What You'll Actually Be Doing
- Design and build the ingestion pipelines for live market data sources across multiple regions pricing indices, transaction feeds, rental yield benchmarks — each market bringing its own provider, schema, and data format
- Architect our PostgreSQL + pgvector data warehouse to scale cleanly from 1 to 50+ markets, each with different schemas, currencies, and data freshness requirements
- Build the API integration layer connecting Alpha to a growing list of external data providers, internal tools, and third-party services — designing for reliability, not just "it works once"
- Transform raw market data into structured chunks ready for RAG retrieval — partnering directly with our AI engineer to ensure what gets embedded is actually useful, accurate, and current
- Build monitoring and alerting for data pipeline health — freshness, completeness, schema drift so AI failures caused by bad data get caught before they reach a user
- Design the multi-currency, multi-market data model that supports our franchise expansion — this isn't a UAE-only system, it has to work for Southeast Asia, Europe, and beyond from day one
- Own infrastructure decisions around scaling — when to introduce caching, queuing (Celery/Redis), or a proper ETL framework as data volume grows
- Work as a technical peer to our AI engineer — you own the data and infrastructure layer, they own the AI/retrieval layer, and you build the connective architecture together
What We're Looking For
- Strong experience designing and operating production data pipelines — not just SQL queries, but the architecture of ingestion, transformation, and storage at scale
- Hands-on experience with PostgreSQL and ideally pgvector, or comparable vector-capable data stores
- Real experience integrating multiple third-party APIs reliably — handling rate limits, auth, retries, and data quality validation
- Comfort with Python for pipeline development; familiarity with task queues (Celery, Redis) or workflow orchestration tools is a strong plus
- Understanding of how AI/RAG systems consume data — you don't need to build the models, but you need to understand what "good input" looks like for retrieval and embedding
- Experience designing for multi-region or multi-currency systems is a strong plus, given our international franchise roadmap
- Comfort working in a fast-