Data Product Case Study
California crash records became a local market intelligence system.
Public records can describe what happened and still be difficult to use. Meru built California Crash Intelligence to reconcile changing source files, resolve geography, and turn collision records into a reviewable map for legal, growth, and operating teams.
Business problem
Public crash data was fragmented across records, years, revisions, and geography.
System built
A cloud data pipeline, prepared analytics warehouse, API, and interactive map.
Decision value
A clearer view of where reported activity is changing and where review should begin.
Live evidence
Explore reported crashes in Los Angeles.
This embedded view is filtered to January through May 2026 and ZCTA 90504. Open the full product to change the date, geography, and available collision filters.
The purpose
Move from statewide records to a local decision.
A statewide count is rarely the question a buyer needs answered. A law firm, market leader, or operating team needs to know where activity is concentrated, whether the pattern changed, how recent the records are, and what underlying evidence deserves review.
California Crash Intelligence was designed around that decision. It combines geospatial filtering, collision context, vehicle and injury facts, recent-versus-baseline comparisons, and explicit source limitations in one inspectable product.
Data process
How the public data became usable.
Cleaning was treated as a versioned system, not a one-time spreadsheet task. Every stage keeps a path back to the source.
- 01
Preserve the public record
California CCRS crash, party, and injury records are ingested from data.ca.gov. Normalized source columns and source metadata remain available for audit and backfill.
- 02
Reconcile revisions
Stable collision IDs, normalized row keys, hashes, and source watermarks identify changed records. A rolling overlap catches late edits instead of treating each download as final.
- 03
Create typed facts
Dates, coordinates, severity, parties, injuries, vehicles, fault indicators, and location fields are cast into separate collision, party, and injury facts with explicit null handling.
- 04
Standardize inconsistent labels
Vehicle makes, severity codes, city labels, road names, and commercial-vehicle rules are mapped through versioned reference tables rather than overwritten in place.
- 05
Resolve geography
Source coordinates are matched to California ZCTA boundaries. Normalized road pairs can be checked against OpenStreetMap references when location context needs careful reconstruction.
- 06
Serve prepared signals
Partitioned BigQuery marts prepare map points, ZIP-by-day summaries, vehicle signals, collision facets, and hotspot shifts so the interface reads narrow tables instead of rebuilding joins on every request.
Sales value
The product makes local demand easier to investigate.
The map does not manufacture intent. It gives teams a more disciplined place to start territory, content, outreach, and market conversations.
Plaintiff law firms
Where is case-relevant activity emerging?
Review recent crash patterns by geography, severity, vehicle context, and reporting narrative to inform market planning and investigative follow-up.
Media and growth teams
Where should local campaigns become more specific?
Compare ZIP-level activity and changes over time before allocating content, creative, outreach, or paid-media attention.
Executive teams
What changed in the market, and is it material?
Replace anecdotal market stories with a repeatable view of recent activity, historical baselines, reporting lag, and data completeness.
Analysts and operators
Can the source be trusted enough to act on?
Trace a signal back to normalized facts, source fields, geography methods, and documented limitations before it enters a decision.
Production architecture
Built to keep analysis out of the request path.
Source
California CCRS via data.ca.gov, DMV fleet snapshots, ZCTA boundaries, and OpenStreetMap road references
Ingestion
Scheduled delta sync with overlap windows, source watermarks, row hashes, and affected-collision reconciliation
Warehouse
Append-only raw records, typed BigQuery facts, versioned references, and partitioned serving marts
Application
FastAPI on Cloud Run with a React interface on Cloudflare
Review model
Signals support market and investigative review. They are not legal conclusions, safety ratings, or case-value predictions
What this proves
Public data can become a proprietary operating advantage.
The defensible work is not the map alone. It is the process that keeps changing records traceable, makes geography queryable, and turns expensive joins into prepared answers. The same pattern applies to fragmented public, vendor, CRM, call, document, and operational datasets.
Important limitations
Records reflect reported collisions and can be revised or delayed. Missing coordinates and source-label variation affect some geographic views. Registration data describes vehicle stock, not miles traveled or risk. Review signals are not findings of fault, legal conclusions, safety ratings, or predictions of case value.
Questions about the data product.
What is California Crash Intelligence?
California Crash Intelligence is a location-level data product that turns public crash, party, injury, vehicle, and geography records into maps and market signals for professional review.
Where does the crash data come from?
The core records come from California's CCRS resources published through data.ca.gov. The system also uses California ZCTA boundaries, OpenStreetMap road references, and annual California DMV fleet snapshots for limited contextual comparisons.
How is the crash data cleaned?
The pipeline preserves raw source fields, reconciles revisions by collision ID and row hash, casts typed facts, standardizes labels through versioned rules, resolves geography, and validates freshness before preparing query-specific BigQuery marts.
How can crash intelligence support sales and marketing?
It can help professional services teams identify changing local demand, prioritize geographic research, plan market-specific content, and give executives a repeatable evidence layer for territory and campaign decisions.
Does the platform predict legal outcomes or crash risk?
No. The platform organizes reported public records for review. It does not determine liability, predict case value, measure vehicle safety, or substitute for legal, statistical, or investigative judgment.
Source note
Core collision, party, and injury records originate from California CCRS resources published through data.ca.gov. Geography and road context use public reference datasets. Meru preserves source metadata and documents transformations so prepared signals can be reviewed against the underlying record.
Suggested citation
Okabe, Jose. "California Crash Intelligence: From Public Records to Local Market Signals." Meru AI, updated June 2026. https://meruai.co/case-study/california-crash-intelligence
Author and review note
Written and reviewed by Jose Okabe, AI implementation strategist and enterprise systems architect. Jose designed and built the data pipeline, cloud warehouse, API, analytical methods, and interactive product described in this case study.
Last updated: June 2026
Your scattered data may already contain the market signal.
Meru designs the pipeline, decision model, and working product that make difficult data useful to operators.
Related: AI system integration