4. Data Modeling & AI Readiness
Data modeling is a continuous program of turning raw, siloed data into a shared, AI-ready asset that humans, BI tools, and LLM agents can all reason over reliably.
Process Discovery & Conceptual Modeling
Start with the business, not the schema:
Stakeholder workshops — map existing company processes, vocabulary, and pain points across Sales, Operations, Finance, Product
6 Core Questions — Who/Why/What/When/Where/How to scope the model and surface conflicting definitions before any table is built
Conceptual model — entities, relationships, and a business glossary agreed across silos, with AI agents treated as first-class data consumers
Logical & Physical Modeling
Pick the right pattern per workload — no single approach fits everything:
3NF / Normalized — operational and transactional systems where integrity and update consistency matter most
Star & Snowflake Schemas — BI dashboards and analytics with predictable joins and fast aggregations
Data Vault 2.0 — auditable, history-preserving enterprise warehouse with parallel loads and graceful schema evolution
Wide tables / One Big Table — denormalized feature stores for ML training and low-latency serving
Physical implementation on your warehouse or lakehouse — Snowflake, Databricks, BigQuery, Iceberg — with naming conventions, surrogate keys, partitioning, and data quality tests built in.
Semantic Layer & Knowledge Graphs (AI Readiness)
Make the platform meaningful enough for agents and LLMs:
Semantic layer — single definitions for metrics and dimensions (dbt Semantic Layer, Cube) consumed by BI, APIs, and AI agents alike
Ontologies & controlled vocabularies — canonical entities and reason codes so the same concept means the same thing everywhere
Knowledge graphs — Neo4j, RDF, or property graphs that capture relationships, temporality, and lineage that flat tables lose
Entity resolution — deterministic, probabilistic, and ML-based matching with golden records and shared canonical IDs
Key Capabilities
Conceptual Discovery |
Process mapping, glossaries, and definitions agreed across silos |
Mixed Model Choice |
Right pattern (3NF, star, Data Vault, graph) per workload, not dogma |
AI Enablement |
Semantic layers and knowledge graphs that LLMs and agents can trust |
Approach (adjustable to your context)
Discover business processes and stakeholders, document the 6 Core Questions
Conceptual model — entities, relationships, glossary, AI use cases
Logical model — choose 3NF, star schema, Data Vault, or a hybrid per domain
Physical model — implement on your warehouse/lakehouse with tests and lineage
Semantic & graph layer — metrics, ontologies, and knowledge graphs for AI