7.3. GDPR Compliance with dbt
Overview
This guide shows a portable approach to marking dbt models and columns as GDPR-relevant and applying tags at the warehouse level automatically. The pattern uses dbt meta
on models/columns plus a simple post-hook macro that runs ALTER TABLE/ALTER COLUMN
statements. Adjust SQL syntax to match your warehouse (Databricks Unity Catalog, Snowflake, BigQuery, etc.).
What you implement
A model-level opt-in flag to mark a table as GDPR-relevant.
Optional column-level tags for sensitive fields (for example, user identifiers).
A post-hook macro that sets/unsets warehouse tags based on
meta
.
Setup
Add macros to your project (example files in this repository):
Macro: set_gdpr_compliance_tag-- Generic GDPR tagging macro for dbt -- Notes: -- - Example SQL below uses Databricks/Unity Catalog syntax for tags. -- - If your warehouse uses a different syntax (e.g., Snowflake), adjust the ALTER statements accordingly. {% macro set_gdpr_compliance_tag() %} {% if execute %} {# Table-level meta flag to opt-in the model for GDPR tagging #} {% set is_gdpr_model = model.meta.get('is_gdpr_model') %} {% if is_gdpr_model %} {# Table-level tag #} {% set table_tag_name = 'gdpr_deletion' %} {% set table_tag_value = 'enabled' %} {% set table_level_tag_sql %} ALTER TABLE {{ this }} SET TAGS ('{{ table_tag_name }}' = '{{ table_tag_value }}'); {% endset %} {% do run_query(table_level_tag_sql) %} {# Column-level tags: set gdpr_column_type when provided #} {% set column_tag_name = 'gdpr_column_type' %} {% for column in model.columns %} {% set col_meta = model.columns.get(column).meta or {} %} {# Support either a boolean flag or explicit type string #} {% set explicit_type = col_meta.get('gdpr_column_type') %} {% set is_flagged = col_meta.get('is_gdpr_column') %} {% if explicit_type is string and explicit_type|length > 0 %} {% set column_level_tag_sql %} ALTER TABLE {{ this }} ALTER COLUMN {{ column }} SET TAGS ('{{ column_tag_name }}' = '{{ explicit_type }}'); {% endset %} {% do run_query(column_level_tag_sql) %} {% elif is_flagged %} {% set column_level_tag_sql_default %} ALTER TABLE {{ this }} ALTER COLUMN {{ column }} SET TAGS ('{{ column_tag_name }}' = 'user_identifier'); {% endset %} {% do run_query(column_level_tag_sql_default) %} {% else %} {% set column_level_untag_sql %} ALTER TABLE {{ this }} ALTER COLUMN {{ column }} UNSET TAGS ('{{ column_tag_name }}'); {% endset %} {% do run_query(column_level_untag_sql) %} {% endif %} {% endfor %} {% endif %} {% endif %} {% endmacro %}
In your dbt model YAML, annotate tables/columns via
meta
:Example model YAML with GDPR metaversion: 2 models: - name: user_events_monthly description: Monthly user engagement events used for activity analytics. meta: is_gdpr_model: true retention: 400d columns: - name: user_id description: Anonymized user identifier meta: gdpr_column_type: user_identifier - name: session_id description: Analytics session identifier meta: is_gdpr_column: true - name: event_date description: Logical partition/event date - name: event_type description: Type of event - name: advertising_campaigns description: Campaigns table retained permanently for financial audit. meta:
Configure a post-hook on the models where you want the macro to run. You can set this at the folder level (
models:
indbt_project.yml
) or per-model usingconfig(post_hook=...)
.
Usage
Model-level: set
meta: { is_gdpr_model: true }
. The macro will set a table taggdpr_deletion=enabled
.Column-level: either set
meta: { is_gdpr_column: true }
to mark a field as sensitive with default typeuser_identifier
, or provide an explicit type viameta: { gdpr_column_type: <type> }
.The macro also unsets
gdpr_column_type
on columns that aren’t flagged, keeping tags consistent when schemas evolve.
Warehouse syntax notes
Databricks/Unity Catalog: uses
ALTER TABLE <table> SET TAGS (...)
andALTER COLUMN ... SET TAGS (...)
.Snowflake: use
ALTER TABLE <table> SET TAG <name> = '<value>'
; adjust the macro accordingly.BigQuery: consider labels or policy tags via separate APIs; the post-hook may call an external procedure instead of SQL.
Example: end-to-end
Add meta in YAML for your model and columns.
Ensure the macro is available and included in the models’ post-hooks.
Run your model; dbt will execute the post-hook and apply tags to the relation and columns.
Reference implementation
-- Generic GDPR tagging macro for dbt
-- Notes:
-- - Example SQL below uses Databricks/Unity Catalog syntax for tags.
-- - If your warehouse uses a different syntax (e.g., Snowflake), adjust the ALTER statements accordingly.
{% macro set_gdpr_compliance_tag() %}
{% if execute %}
{# Table-level meta flag to opt-in the model for GDPR tagging #}
{% set is_gdpr_model = model.meta.get('is_gdpr_model') %}
{% if is_gdpr_model %}
{# Table-level tag #}
{% set table_tag_name = 'gdpr_deletion' %}
{% set table_tag_value = 'enabled' %}
{% set table_level_tag_sql %}
ALTER TABLE {{ this }} SET TAGS ('{{ table_tag_name }}' = '{{ table_tag_value }}');
{% endset %}
{% do run_query(table_level_tag_sql) %}
{# Column-level tags: set gdpr_column_type when provided #}
{% set column_tag_name = 'gdpr_column_type' %}
{% for column in model.columns %}
{% set col_meta = model.columns.get(column).meta or {} %}
{# Support either a boolean flag or explicit type string #}
{% set explicit_type = col_meta.get('gdpr_column_type') %}
{% set is_flagged = col_meta.get('is_gdpr_column') %}
{% if explicit_type is string and explicit_type|length > 0 %}
{% set column_level_tag_sql %}
ALTER TABLE {{ this }}
ALTER COLUMN {{ column }} SET TAGS ('{{ column_tag_name }}' = '{{ explicit_type }}');
{% endset %}
{% do run_query(column_level_tag_sql) %}
{% elif is_flagged %}
{% set column_level_tag_sql_default %}
ALTER TABLE {{ this }}
ALTER COLUMN {{ column }} SET TAGS ('{{ column_tag_name }}' = 'user_identifier');
{% endset %}
{% do run_query(column_level_tag_sql_default) %}
{% else %}
{% set column_level_untag_sql %}
ALTER TABLE {{ this }}
ALTER COLUMN {{ column }} UNSET TAGS ('{{ column_tag_name }}');
{% endset %}
{% do run_query(column_level_untag_sql) %}
{% endif %}
{% endfor %}
{% endif %}
{% endif %}
{% endmacro %}
version: 2
models:
- name: user_events_monthly
description: Monthly user engagement events used for activity analytics.
meta:
is_gdpr_model: true
retention: 400d
columns:
- name: user_id
description: Anonymized user identifier
meta:
gdpr_column_type: user_identifier
- name: session_id
description: Analytics session identifier
meta:
is_gdpr_column: true
- name: event_date
description: Logical partition/event date
- name: event_type
description: Type of event
- name: advertising_campaigns
description: Campaigns table retained permanently for financial audit.
meta:
Operational tips
Run this macro via post-hooks to keep tags aligned with each run, especially after schema changes.
Use explicit types like
user_identifier
,pii_email
, orbusiness_identifier
to categorize sensitivity.Keep the tagging macro idempotent and safe for reruns; the provided implementation sets or unsets tags based on current
meta
.