Data Product Case Study

California crash records became a local market intelligence system.

Public records can describe what happened and still be difficult to use. Meru built California Crash Intelligence to reconcile changing source files, resolve geography, and turn collision records into a reviewable map for legal, growth, and operating teams.

Built by Jose OkabeCaliforniaUpdated June 2026

Business problem

Public crash data was fragmented across records, years, revisions, and geography.

System built

A cloud data pipeline, prepared analytics warehouse, API, and interactive map.

Decision value

A clearer view of where reported activity is changing and where review should begin.

Live evidence

Explore reported crashes in Los Angeles.

This embedded view is filtered to January through May 2026 and ZCTA 90504. Open the full product to change the date, geography, and available collision filters.

Source: California CCRS records prepared by California Crash Intelligence.Open the interactive map

The purpose

Move from statewide records to a local decision.

A statewide count is rarely the question a buyer needs answered. A law firm, market leader, or operating team needs to know where activity is concentrated, whether the pattern changed, how recent the records are, and what underlying evidence deserves review.

California Crash Intelligence was designed around that decision. It combines geospatial filtering, collision context, vehicle and injury facts, recent-versus-baseline comparisons, and explicit source limitations in one inspectable product.

Data process

How the public data became usable.

Cleaning was treated as a versioned system, not a one-time spreadsheet task. Every stage keeps a path back to the source.

  1. 01

    Preserve the public record

    California CCRS crash, party, and injury records are ingested from data.ca.gov. Normalized source columns and source metadata remain available for audit and backfill.

  2. 02

    Reconcile revisions

    Stable collision IDs, normalized row keys, hashes, and source watermarks identify changed records. A rolling overlap catches late edits instead of treating each download as final.

  3. 03

    Create typed facts

    Dates, coordinates, severity, parties, injuries, vehicles, fault indicators, and location fields are cast into separate collision, party, and injury facts with explicit null handling.

  4. 04

    Standardize inconsistent labels

    Vehicle makes, severity codes, city labels, road names, and commercial-vehicle rules are mapped through versioned reference tables rather than overwritten in place.

  5. 05

    Resolve geography

    Source coordinates are matched to California ZCTA boundaries. Normalized road pairs can be checked against OpenStreetMap references when location context needs careful reconstruction.

  6. 06

    Serve prepared signals

    Partitioned BigQuery marts prepare map points, ZIP-by-day summaries, vehicle signals, collision facets, and hotspot shifts so the interface reads narrow tables instead of rebuilding joins on every request.

Sales value

The product makes local demand easier to investigate.

The map does not manufacture intent. It gives teams a more disciplined place to start territory, content, outreach, and market conversations.

Plaintiff law firms

Where is case-relevant activity emerging?

Review recent crash patterns by geography, severity, vehicle context, and reporting narrative to inform market planning and investigative follow-up.

Media and growth teams

Where should local campaigns become more specific?

Compare ZIP-level activity and changes over time before allocating content, creative, outreach, or paid-media attention.

Executive teams

What changed in the market, and is it material?

Replace anecdotal market stories with a repeatable view of recent activity, historical baselines, reporting lag, and data completeness.

Analysts and operators

Can the source be trusted enough to act on?

Trace a signal back to normalized facts, source fields, geography methods, and documented limitations before it enters a decision.

Production architecture

Built to keep analysis out of the request path.

Source

California CCRS via data.ca.gov, DMV fleet snapshots, ZCTA boundaries, and OpenStreetMap road references

Ingestion

Scheduled delta sync with overlap windows, source watermarks, row hashes, and affected-collision reconciliation

Warehouse

Append-only raw records, typed BigQuery facts, versioned references, and partitioned serving marts

Application

FastAPI on Cloud Run with a React interface on Cloudflare

Review model

Signals support market and investigative review. They are not legal conclusions, safety ratings, or case-value predictions

What this proves

Public data can become a proprietary operating advantage.

The defensible work is not the map alone. It is the process that keeps changing records traceable, makes geography queryable, and turns expensive joins into prepared answers. The same pattern applies to fragmented public, vendor, CRM, call, document, and operational datasets.

Important limitations

Records reflect reported collisions and can be revised or delayed. Missing coordinates and source-label variation affect some geographic views. Registration data describes vehicle stock, not miles traveled or risk. Review signals are not findings of fault, legal conclusions, safety ratings, or predictions of case value.

Questions about the data product.

What is California Crash Intelligence?

California Crash Intelligence is a location-level data product that turns public crash, party, injury, vehicle, and geography records into maps and market signals for professional review.

Where does the crash data come from?

The core records come from California's CCRS resources published through data.ca.gov. The system also uses California ZCTA boundaries, OpenStreetMap road references, and annual California DMV fleet snapshots for limited contextual comparisons.

How is the crash data cleaned?

The pipeline preserves raw source fields, reconciles revisions by collision ID and row hash, casts typed facts, standardizes labels through versioned rules, resolves geography, and validates freshness before preparing query-specific BigQuery marts.

How can crash intelligence support sales and marketing?

It can help professional services teams identify changing local demand, prioritize geographic research, plan market-specific content, and give executives a repeatable evidence layer for territory and campaign decisions.

Does the platform predict legal outcomes or crash risk?

No. The platform organizes reported public records for review. It does not determine liability, predict case value, measure vehicle safety, or substitute for legal, statistical, or investigative judgment.

Source note

Core collision, party, and injury records originate from California CCRS resources published through data.ca.gov. Geography and road context use public reference datasets. Meru preserves source metadata and documents transformations so prepared signals can be reviewed against the underlying record.

Suggested citation

Okabe, Jose. "California Crash Intelligence: From Public Records to Local Market Signals." Meru AI, updated June 2026. https://meruai.co/case-study/california-crash-intelligence

Author and review note

Written and reviewed by Jose Okabe, AI implementation strategist and enterprise systems architect. Jose designed and built the data pipeline, cloud warehouse, API, analytical methods, and interactive product described in this case study.

Last updated: June 2026

Your scattered data may already contain the market signal.

Meru designs the pipeline, decision model, and working product that make difficult data useful to operators.

Related: AI system integration