From MLS feeds to intelligent property platform.
Monday: 3 core prompts for property enrichment. Tuesday: automation code for ingestion pipelines. Wednesday: team workflows across data, AI, and operations. Thursday: complete production architecture with multi-agent orchestration, ML pipelines, and enterprise-grade scaling patterns.
Key Assumptions
System Requirements
Functional
- Ingest listings from 800+ MLS APIs (RETS, RESO Web API)
- Normalize heterogeneous schemas to unified property model
- AI enrichment: generate descriptions, estimate valuations, extract features
- Real-time search with filters (price, beds, location, school district)
- Multi-tenant isolation with per-client branding and data access
- CRM integration for lead routing and agent assignment
- Analytics dashboards for market trends and listing performance
Non-Functional (SLOs)
💰 Cost Targets: {"per_listing_ingestion_usd":0.002,"per_enrichment_usd":0.05,"per_search_query_usd":0.0001}
Agent Layer
planner
L3Decompose listing ingestion into subtasks, route to specialized agents
🔧 schema_normalizer, policy_checker, cost_estimator
⚡ Recovery: If schema invalid → queue for manual review, If cost exceeds budget → skip optional enrichments
ingestion_executor
L2Fetch listings from MLS APIs, normalize schemas, store raw data
🔧 rets_client, reso_client, schema_mapper, deduplicator
⚡ Recovery: If MLS API down → retry with exponential backoff (max 3x), If rate limit hit → queue and throttle, If schema mismatch → log and alert
enrichment_executor
L3Generate AI-powered descriptions, valuations, feature extraction
🔧 llm_api (Claude/GPT), valuation_model, feature_extractor, image_analyzer
⚡ Recovery: If LLM timeout → retry with shorter context, If low confidence (<0.7) → flag for human review, If cost spike → switch to cheaper model
evaluator
L4Validate enriched data quality, detect hallucinations, ensure compliance
🔧 fact_checker, hallucination_detector, compliance_validator, sentiment_analyzer
⚡ Recovery: If quality < 0.7 → re-enrich with different prompt, If compliance violation → block and alert, If hallucination detected → discard and retry
guardrail
L4Policy enforcement, PII redaction, fair housing compliance
🔧 pii_detector (AWS Comprehend), policy_engine, fair_housing_checker, dmca_validator
⚡ Recovery: If PII detected → redact and log, If policy violation → block publish and notify admin, If uncertain → escalate to human review
orchestrator
L4Coordinate multi-agent workflows, handle retries, manage state
🔧 state_machine (LangGraph/Temporal), retry_handler, event_bus
⚡ Recovery: If agent fails → retry with different agent, If workflow timeout → checkpoint and resume, If deadlock → abort and alert
ML Layer
Feature Store
Update: Batch: daily for neighborhood stats, Real-time: per listing update
- • price_per_sqft
- • days_on_market
- • neighborhood_avg_price
- • school_rating_avg
- • crime_index
- • walkability_score
- • price_change_velocity
- • listing_completeness_score
Model Registry
Strategy: Semantic versioning with A/B testing
- • description_generator
- • valuation_estimator
- • feature_extractor
- • recommendation_ranker
Observability Stack
Real-time monitoring, tracing & alerting
0 activeDeployment Variants
Startup Architecture
Fast to deploy, cost-efficient, scales to 100 competitors
Infrastructure
Risks & Mitigations
⚠️ MLS API rate limits (varies by feed, 100-10K req/hour)
High✓ Mitigation: Implement adaptive rate limiting per MLS. Queue requests. Negotiate higher limits with top MLSs. Cache aggressively (30 min TTL).
⚠️ LLM cost explosion ($0.05/listing × 1M listings = $50K/day)
Medium✓ Mitigation: Set cost guardrails ($10K/day max). Use cheaper models for non-premium listings. Batch processing during off-peak hours. Cache descriptions for similar properties.
⚠️ Data quality issues (incomplete/incorrect MLS data)
High✓ Mitigation: Validation layer checks 50+ fields. Flag incomplete listings for manual review. Auto-enrich missing data from public records. Display confidence scores.
⚠️ Fair housing compliance violations (discriminatory language)
Low✓ Mitigation: Guardrail agent blocks protected class mentions (race, religion, family status). Human review for flagged content. Regular compliance audits. Legal review of prompts.
⚠️ Multi-tenant data leakage (tenant A sees tenant B's data)
Low✓ Mitigation: Row-level security in PostgreSQL. Tenant ID in every query. API gateway enforces tenant isolation. Regular security audits. Penetration testing.
⚠️ MLS contract violations (scraping, data redistribution)
Medium✓ Mitigation: Comply with MLS terms (attribution, data use restrictions). Legal review of contracts. Only display data for authorized users. Audit trail for compliance.
⚠️ Search relevance degradation (poor ranking, stale index)
Medium✓ Mitigation: Monitor search quality metrics (NDCG, CTR). A/B test ranking algorithms. Real-time index updates (<5 min). User feedback loop (thumbs up/down).
Evolution Roadmap
Progressive transformation from MVP to scale
0-3 months (MVP)
3-6 months (Scale)
6-12 months (Enterprise)
Complete Systems Architecture
9-layer architecture from client to data
Presentation
4 components
API Gateway
4 components
Agent Layer
6 components
ML Layer
5 components
Integration
4 components
Data
5 components
External
5 components
Observability
5 components
Security
5 components
Sequence Diagram - Listing Ingestion Flow
Automated data flow every hour
Data Flow - MLS to Search Index
Complete pipeline from ingestion to user search
Key Integrations
MLS APIs (RETS/RESO)
CRM Integration (Salesforce/HubSpot)
Geocoding & POI (Google Maps)
School Data (GreatSchools API)
Image Storage (S3 + CloudFront)
Security & Compliance
Authentication
Authorization (RBAC)
Secrets Management
Audit Trail
Data Privacy
Failure Modes & Fallbacks
| Failure | Fallback | Impact | SLA |
|---|---|---|---|
| MLS API down (5% of feeds) | Retry with exponential backoff (max 3x) → Queue for manual sync → Alert ops team | Delayed updates for affected feeds (5-30 min) | 99.5% API availability |
| LLM API timeout (Claude/GPT) | Retry with shorter context → Switch to backup LLM (GPT ↔ Claude) → Fall back to template-based description | Lower quality descriptions (template) for <1% of listings | 99.9% enrichment success |
| Elasticsearch cluster unhealthy | Read from PostgreSQL (slower) → Rebuild index from DB → Scale cluster | Search latency 2-5x slower (200ms → 500ms) | 99.9% search availability |
| Database connection pool exhausted | Queue writes → Scale read replicas → Shed non-critical traffic (analytics) | Delayed writes (1-5 min) but reads unaffected | 99.95% write availability |
| Enrichment quality below threshold (<0.7) | Retry with different prompt → Route to human review queue → Block publish until approved | Delayed listing publish (30 min - 2 hours) | 100% quality compliance |
| PII detected in listing description | Block publish → Redact PII → Alert compliance team → Re-submit for approval | Listing not published until sanitized | 100% PII compliance |
| Multi-region replication lag (>5 min) | Route reads to local region (eventual consistency) → Alert ops → Investigate network | Stale data in secondary regions | 99% cross-region consistency within 5 min |
RAG vs Fine-Tuning
Hallucination Detection
Evaluation Framework
Dataset Curation
Agentic RAG
Multi-Model Ensemble
Tech Stack Summary
2026 Randeep Bhatia. All Rights Reserved.
No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.