14. AI/ML & Analytics
| Component | Technology | Purpose |
| ML Framework | PyTorch / scikit-learn / XGBoost | Model training (valuation, fraud, demand) |
| ML Ops | MLflow (self-hosted) | Experiment tracking, model registry |
| OCR | Tesseract 5 + PaddleOCR | Arabic/Kurdish deed & document digitization |
| Geospatial ML | GeoPandas + Shapely + rasterio | Spatial analytics, parcel operations |
| Predictive Analytics | Prophet / XGBoost | Service demand & market forecasting |
| Dashboard | Metabase (OSS) + Grafana | Visual analytics, KPI dashboards |
| Data Pipeline | Apache Airflow 2.x | ETL orchestration |
| Feature Store | Feast (OSS) | Feature engineering for ML models |
14.2 AI Use Cases — Property Domain
| Use Case | Model / Approach | Input | Output |
| Legacy Deed OCR | PaddleOCR + post-processing | Scanned Arabic/Kurdish paper deeds | Structured data: owner, parcel, area, date |
| Automated Property Valuation | XGBoost regression + spatial features | Location, area, type, age, comparables | Estimated market value + confidence |
| Fraud Detection | Isolation Forest / DBSCAN | Transaction patterns, ownership changes, price anomalies | Risk score per transaction |
| Demand Forecasting | Prophet time series | Historical application volumes per office | Weekly predicted volumes for staff scheduling |
| Duplicate Parcel Detection | PostGIS overlap + fuzzy matching | Parcel geometries, registration data | Candidate duplicates for manual review |
| Zoning Compliance Check | Rule engine + spatial queries | Permit application + zoning geometry | Auto-approve / flag / reject |
| Sentiment Analysis | CAMeL Tools / ArabicBERT fine-tuned | Citizen feedback text | Positive / neutral / negative |
14.3 Property Valuation Model Detail
graph LR
subgraph INPUT["📥 Features"]
F1["Location (lat/lon)"]
F2["Area (sqm)"]
F3["Property type"]
F4["Age / condition"]
F5["Floor count"]
F6["Distance to road"]
F7["Zoning class"]
F8["Recent comparables"]
F9["Neighborhood index"]
end
subgraph MODEL["🧠 ML Pipeline"]
PREP["Feature Engineering<br/>(Feast)"] --> TRAIN["XGBoost<br/>Regression"]
TRAIN --> REG["MLflow<br/>Model Registry"]
end
subgraph OUTPUT["📤 Result"]
VAL["Estimated Value (IQD)"]
CONF["Confidence Score"]
COMP["Top 5 Comparables"]
end
INPUT --> MODEL --> OUTPUT
Training data: Historical sales from ownership_history + property attributes. Retraining: Monthly via Airflow DAG. Model promoted if MAPE < 15%. Serving: FastAPI endpoint, Redis-cached for 24h per property.
14.4 OCR Pipeline for Legacy Deeds
graph LR
SCAN["📄 Scanned Deed<br/>(TIFF/PDF)"] --> PREPROC["Pre-process<br/>Deskew, denoise,<br/>binarize"]
PREPROC --> OCR["PaddleOCR<br/>Arabic/Kurdish"]
OCR --> EXTRACT["Entity Extraction<br/>Owner name, parcel#,<br/>area, date, notary"]
EXTRACT --> VALIDATE["Cross-check<br/>vs existing DB"]
VALIDATE --> QUEUE["Human Review Queue<br/>(Filament admin)"]
QUEUE --> DB["Insert to<br/>PostgreSQL"]
Volume target: Process backlog of ~500K legacy paper deeds over 18 months. Accuracy target: >85% field extraction on first pass, remainder flagged for human review.
| Dashboard | Audience | Key Metrics |
| Property Market Overview | Director-level | Total registered properties, transfers/month, avg price/sqm by district |
| Permit Pipeline | Building permit team | Applications by stage, avg processing time, inspection backlog |
| Revenue Dashboard | Finance | Fees collected, tax assessments issued, outstanding amounts |
| Office Performance | Operations | Applications per office, avg wait time, satisfaction rating |
| GIS Heatmap | Analysts | Transaction density, price trends by area, zoning violations |
| Fraud Alerts | Audit team | Flagged transactions, risk scores, investigation status |