▶ Production Architecture
From CSV to Cloud-Scale Intelligence
How the agentic analytics system connects to Opendoor's real upstream data sources — MLS feeds, internal offer DB, AVM service, and Snowflake — in an AWS production environment.
8
Upstream systems
3
AWS ingestion layers
Real-time
Offer event streaming
Daily
Briefing cron trigger
System Diagram
Full Production Architecture
Five layers: upstream sources → AWS ingestion → Snowflake warehouse → agent tool layer → output delivery
Upstream Sources
AWS Ingestion
Snowflake Warehouse
Agent Tool Layer
Output
📋 MLS / RETS Feed
List price, sale price, DOM, inventory per market & ZIP
CoreLogic · ATTOM · Redfin
💰 Internal Offer DB
Every offer made, amount, accepted/rejected, seller response time
PostgreSQL / RDS
🏠 AVM Service
Opendoor's automated valuation model — market_estimated_value per property
Internal microservice
👥 CRM / Seller Funnel
Lead count, lead-to-offer rate, offer-to-close rate per market & segment
Salesforce / internal
🔨 Acquisition Pipeline
Homes under contract, purchase price, close date, days held
Internal ops DB
🛠 Renovation Tracker
Reno scope, cost actuals, completion date, contractor status
Internal ops DB
📈 Resale / Listing DB
Final resale price, days to sell, actual realized margin per home
Internal transaction DB
🌎 Macro Data Feed
30yr mortgage rate, affordability index, builder cancellation rate
Freddie Mac · MBA · Census
events / batch
⚡ Amazon Kinesis
Real-time streaming for offer events, acceptance decisions, AVM updates
real-time
📦 Amazon S3
Raw data lake — MLS batch drops, CRM exports, reno cost files
daily batch
🔄 AWS Glue / dbt
Transform raw events into clean, typed, de-duped tables in Snowflake
scheduled
🔒 AWS Secrets Manager
Stores ANTHROPIC_API_KEY, Snowflake credentials, MLS API tokens
security
⏰ EventBridge Scheduler
Triggers monitor-agent daily at 6am. Triggers weekly feedback recalibration.
cron
Snowflake connector
❄ Snowflake Data Warehouse
raw.mls_listings
Full MLS history — list price, sale price, DOM, ZIP, segment, date
raw.offer_events
Offer price, market_estimated_value, seller response, acceptance boolean
raw.funnel_activity
lead_count, offer_count, accepted_offer_count, close_count per week/market
raw.inventory_status
homes_owned, days_held, reno_status, estimated_hold_cost per deal
mart.market_metrics
Clean weekly snapshot — replaces housing_market.csv exactly
mart.deal_pl
Per-deal P&L: ARV, acq cost, reno cost, hold cost, net margin
mart.feedback_log
Decision Packets + action taken + observed outcome — replaces feedback_log.json
SQL → dict
🧠 Agent Tool Layer · Lambda
data_loader.py
get_market_summary() · get_market_trend() · detect_anomalies()
analyzer.py
score_market_risk() · rank_all_markets()
deal_scout.py
get_top_deals() · estimate_renovation()
capital_light.py
detect_inventory_surges() · get_contribution_margin_forecast() · rank_top_100_deals()
pricing_engine.py
analyze_pricing_accuracy() · analyze_funnel_drop() · generate_pricing_actions() · estimate_business_impact()
feedback_tracker.py
log_recommendation() · record_outcome() · recalibrate_confidence()
● Claude Opus 4.6
Orchestrates all tools · Native tool use · No wrappers
briefing / alert
📤 Output Delivery
💬 Slack / Email
Daily briefing + CRITICAL/HIGH alerts pushed to acquisitions channel
💻 Web Dashboard
Decision Packets · Risk rankings · Deal pipeline — live UI
📋 Deal Pipeline JSON
Top 100 deals pushed to acquisition tool for offer generation
📝 Briefing Archive
briefings/ → S3 bucket, version-controlled, Slack-ready markdown
🔄 Feedback Write-back
Action taken + outcome → mart.feedback_log → recalibrate weights
Upstream Systems
All 8 Data Sources
What each system provides, what Opendoor field it maps to, and how often it refreshes.
📋 MLS / RETS Feed
External
list_price, sale_price — what sellers ask vs what homes close at
days_on_market — time from list to sale or delisted
inventory_count, homes_sold — supply/demand balance per market
list_to_sale_ratio (LSR) — acceptance behavior proxy
Refresh: Daily batch → S3 → Glue → mart.market_metrics
💰 Internal Offer DB
Internal
opendoor_offer_price — exact dollar amount offered per property
seller_accepted_offer — boolean outcome per offer
days_to_accept — seller response time (pricing signal)
accepted_offer_count — weekly acceptance volume per market/segment
Refresh: Real-time events → Kinesis → raw.offer_events
🏠 AVM Service
Internal
market_estimated_value — Opendoor's AVM output per property
price_per_sqft — used for ARV and deal scoring
pricing accuracy — offer vs AVM delta triggers misalignment detection
Refresh: Per-property API call at offer time → raw.offer_events
👥 CRM / Seller Funnel
Internal
lead_count — seller inquiries per week per market
lead_to_offer_rate — funnel efficiency at top of funnel
offer_to_close_rate — downstream conversion health
Refresh: Daily batch → S3 → raw.funnel_activity
🔨 Acquisition Pipeline
Internal
acquisition_price — what Opendoor paid per home
days_held — time owned, drives hold cost accrual
inventory_count — homes owned per market right now
inventory_days_on_market — Z-score surge detection input
Refresh: Daily batch → S3 → raw.inventory_status
🛠 Renovation Tracker
Internal
reno_cost_actual — realized cost per property (vs estimate)
reno_condition — cosmetic / moderate / full_gut tier
reno_completion_date — feeds hold cost calculation
Refresh: Daily batch → S3 → mart.deal_pl
📈 Resale / Transaction DB
Internal
resale_price — actual sale price when Opendoor exits the home
resale_margin_actual — closes the feedback loop vs predicted
days_to_sell — post-reno sale velocity, key capital-light metric
Refresh: On transaction close → Kinesis → mart.feedback_log
🌎 Macro Data Feed
External
30yr mortgage rate — rate lock-out seller signal (Ken Zener framework)
builder cancellation rate — new construction competition (Ivy Zelman framework)
affordability index — demand ceiling context per market
Refresh: Weekly batch → Freddie Mac / MBA / Census APIs → S3
Production Data Flow
How Data Moves End-to-End
From upstream event to Decision Packet delivered to the acquisitions team.
▶ Real-Time Path — Offer Events
1
Seller submits to Opendoor
AVM service evaluates property → returns market_estimated_value. Pricing microservice sets offer band. Offer event fires.
AVM ServicePricing Microservice
2
Kinesis captures the event
offer_price, market_estimated_value, seller_accepted, days_to_accept streamed in real-time → raw.offer_events in Snowflake.
Amazon Kinesisraw.offer_events
3
dbt materializes mart views
Aggregates to weekly market/segment snapshots. Computes LSR, acceptance rate WoW delta, funnel conversion rates. mart.market_metrics refreshed.
dbt / Gluemart.market_metrics
4
Tool layer queries Snowflake
analyze_pricing_accuracy() runs a SQL query against mart.market_metrics instead of reading CSV. Returns same structured dict. Agent behavior unchanged.
pricing_engine.pyClaude Opus 4.6
5
Decision Packet delivered
If severity HIGH/CRITICAL → Slack alert fires to #acquisitions within minutes of issue detection. Pricing team sees action in under 1 hour.
Slack Webhookmart.feedback_log
▶ Scheduled Path — Daily Briefing
1
EventBridge fires at 6:00 AM
Triggers monitor-agent Lambda (or ECS task). No human input needed — proactive by design.
EventBridge SchedulerLambda / ECS Fargate
2
Agent pulls fresh Snowflake snapshots
All mart views reflect last 24 hours of MLS + offer + funnel + inventory data. Tools query via Snowflake Python connector — same interface as CSV.
Snowflake connector
3
Proactive loop runs all tools
Same 4-step logic: analyze_pricing_accuracy → detect_inventory_surges → CM forecast → rank_all_markets. Claude chains tools autonomously.
monitor.pyNative tool use
4
Feedback loop recalibrates weekly
Outcomes from mart.feedback_log feed recalibrate_confidence(). Weights updated. Next day's briefing uses improved confidence scores.
feedback_tracker.py
5
Briefing saved + delivered
Markdown saved to S3 (briefings/ bucket). Slack message posted to #acquisitions-daily. Web dashboard updated. Deal pipeline JSON pushed to acquisition tool.
S3SlackWeb UI
Field Mapping
Demo → Production
Every synthetic field maps to a real Snowflake table. The tool interfaces don't change — only the data source behind them.
Synthetic field Demo source Production source Snowflake table Refresh
median_sale_price
days_on_market
homes_sold
housing_market.csv MLS / RETS feed mart.market_metrics Daily
list_to_sale_ratio
price_per_sqft
housing_market.csv MLS feed + computed mart.market_metrics Daily
opendoor_offer_price
seller_accepted_offer
days_to_accept
synthetic_data.py Internal Offer DB raw.offer_events Real-time
market_estimated_value synthetic_data.py AVM microservice raw.offer_events Per offer
lead_count
offer_count
accepted_offer_count
close_count
synthetic_data.py CRM / Salesforce raw.funnel_activity Daily
inventory_days_on_market
inventory_count
synthetic_data.py Acquisition pipeline raw.inventory_status Daily
acquisition_margin_estimate
reno_cost, hold_cost
deal_scout.py tiers Acq pipeline + Reno tracker mart.deal_pl Daily
resale_margin_actual
days_to_sell
synthetic_data.py Resale / Transaction DB mart.deal_pl On close
prediction_accuracy
confidence_weights
briefings/feedback_log.json Feedback write-back mart.feedback_log Weekly
Key production principle
Every tool returns a structured dict. The data source behind the tool is an implementation detail — switching from CSV to a Snowflake query is a one-line change per tool. Claude's reasoning layer, the agentic loop, the Decision Packet format, and the feedback mechanism are all unchanged in production.
Opendoor · Agentic Analytics Engineer · Production Architecture
← Portfolio Decision Engine →