Skip to content

14. AI/ML & Analytics

14.1 AI Platform Stack

Component Technology Purpose
ML Framework PyTorch / scikit-learn / XGBoost Model training (valuation, fraud, demand)
ML Ops MLflow (self-hosted) Experiment tracking, model registry
OCR Tesseract 5 + PaddleOCR Arabic/Kurdish deed & document digitization
Geospatial ML GeoPandas + Shapely + rasterio Spatial analytics, parcel operations
Predictive Analytics Prophet / XGBoost Service demand & market forecasting
Dashboard Metabase (OSS) + Grafana Visual analytics, KPI dashboards
Data Pipeline Apache Airflow 2.x ETL orchestration
Feature Store Feast (OSS) Feature engineering for ML models

14.2 AI Use Cases — Property Domain

Use Case Model / Approach Input Output
Legacy Deed OCR PaddleOCR + post-processing Scanned Arabic/Kurdish paper deeds Structured data: owner, parcel, area, date
Automated Property Valuation XGBoost regression + spatial features Location, area, type, age, comparables Estimated market value + confidence
Fraud Detection Isolation Forest / DBSCAN Transaction patterns, ownership changes, price anomalies Risk score per transaction
Demand Forecasting Prophet time series Historical application volumes per office Weekly predicted volumes for staff scheduling
Duplicate Parcel Detection PostGIS overlap + fuzzy matching Parcel geometries, registration data Candidate duplicates for manual review
Zoning Compliance Check Rule engine + spatial queries Permit application + zoning geometry Auto-approve / flag / reject
Sentiment Analysis CAMeL Tools / ArabicBERT fine-tuned Citizen feedback text Positive / neutral / negative

14.3 Property Valuation Model Detail

graph LR
  subgraph INPUT["📥 Features"]
    F1["Location (lat/lon)"]
    F2["Area (sqm)"]
    F3["Property type"]
    F4["Age / condition"]
    F5["Floor count"]
    F6["Distance to road"]
    F7["Zoning class"]
    F8["Recent comparables"]
    F9["Neighborhood index"]
  end

  subgraph MODEL["🧠 ML Pipeline"]
    PREP["Feature Engineering<br/>(Feast)"] --> TRAIN["XGBoost<br/>Regression"]
    TRAIN --> REG["MLflow<br/>Model Registry"]
  end

  subgraph OUTPUT["📤 Result"]
    VAL["Estimated Value (IQD)"]
    CONF["Confidence Score"]
    COMP["Top 5 Comparables"]
  end

  INPUT --> MODEL --> OUTPUT

Training data: Historical sales from ownership_history + property attributes. Retraining: Monthly via Airflow DAG. Model promoted if MAPE < 15%. Serving: FastAPI endpoint, Redis-cached for 24h per property.

14.4 OCR Pipeline for Legacy Deeds

graph LR
  SCAN["📄 Scanned Deed<br/>(TIFF/PDF)"] --> PREPROC["Pre-process<br/>Deskew, denoise,<br/>binarize"]
  PREPROC --> OCR["PaddleOCR<br/>Arabic/Kurdish"]
  OCR --> EXTRACT["Entity Extraction<br/>Owner name, parcel#,<br/>area, date, notary"]
  EXTRACT --> VALIDATE["Cross-check<br/>vs existing DB"]
  VALIDATE --> QUEUE["Human Review Queue<br/>(Filament admin)"]
  QUEUE --> DB["Insert to<br/>PostgreSQL"]

Volume target: Process backlog of ~500K legacy paper deeds over 18 months. Accuracy target: >85% field extraction on first pass, remainder flagged for human review.

14.5 Analytics Dashboards (Metabase)

Dashboard Audience Key Metrics
Property Market Overview Director-level Total registered properties, transfers/month, avg price/sqm by district
Permit Pipeline Building permit team Applications by stage, avg processing time, inspection backlog
Revenue Dashboard Finance Fees collected, tax assessments issued, outstanding amounts
Office Performance Operations Applications per office, avg wait time, satisfaction rating
GIS Heatmap Analysts Transaction density, price trends by area, zoning violations
Fraud Alerts Audit team Flagged transactions, risk scores, investigation status