7.3. GDPR Compliance with dbt

Overview

This guide shows a portable approach to marking dbt models and columns as GDPR-relevant and applying tags at the warehouse level automatically. The pattern uses dbt meta on models/columns plus a simple post-hook macro that runs ALTER TABLE/ALTER COLUMN statements. Adjust SQL syntax to match your warehouse (Databricks Unity Catalog, Snowflake, BigQuery, etc.).

What you implement

  • A model-level opt-in flag to mark a table as GDPR-relevant.

  • Optional column-level tags for sensitive fields (for example, user identifiers).

  • A post-hook macro that sets/unsets warehouse tags based on meta.

Setup

  1. Add macros to your project (example files in this repository):

    Macro: set_gdpr_compliance_tag
    -- Generic GDPR tagging macro for dbt
    -- Notes:
    -- - Example SQL below uses Databricks/Unity Catalog syntax for tags.
    -- - If your warehouse uses a different syntax (e.g., Snowflake), adjust the ALTER statements accordingly.
    
    {% macro set_gdpr_compliance_tag() %}
        {% if execute %}
            {# Table-level meta flag to opt-in the model for GDPR tagging #}
            {% set is_gdpr_model = model.meta.get('is_gdpr_model') %}
    
            {% if is_gdpr_model %}
                {# Table-level tag #}
                {% set table_tag_name = 'gdpr_deletion' %}
                {% set table_tag_value = 'enabled' %}
    
                {% set table_level_tag_sql %}
                    ALTER TABLE {{ this }} SET TAGS ('{{ table_tag_name }}' = '{{ table_tag_value }}');
                {% endset %}
                {% do run_query(table_level_tag_sql) %}
    
                {# Column-level tags: set gdpr_column_type when provided #}
                {% set column_tag_name = 'gdpr_column_type' %}
                {% for column in model.columns %}
                    {% set col_meta = model.columns.get(column).meta or {} %}
    
                    {# Support either a boolean flag or explicit type string #}
                    {% set explicit_type = col_meta.get('gdpr_column_type') %}
                    {% set is_flagged = col_meta.get('is_gdpr_column') %}
    
                    {% if explicit_type is string and explicit_type|length > 0 %}
                        {% set column_level_tag_sql %}
                            ALTER TABLE {{ this }}
                            ALTER COLUMN {{ column }} SET TAGS ('{{ column_tag_name }}' = '{{ explicit_type }}');
                        {% endset %}
                        {% do run_query(column_level_tag_sql) %}
                    {% elif is_flagged %}
                        {% set column_level_tag_sql_default %}
                            ALTER TABLE {{ this }}
                            ALTER COLUMN {{ column }} SET TAGS ('{{ column_tag_name }}' = 'user_identifier');
                        {% endset %}
                        {% do run_query(column_level_tag_sql_default) %}
                    {% else %}
                        {% set column_level_untag_sql %}
                            ALTER TABLE {{ this }}
                            ALTER COLUMN {{ column }} UNSET TAGS ('{{ column_tag_name }}');
                        {% endset %}
                        {% do run_query(column_level_untag_sql) %}
                    {% endif %}
                {% endfor %}
            {% endif %}
        {% endif %}
    {% endmacro %}
    
  2. In your dbt model YAML, annotate tables/columns via meta:

    Example model YAML with GDPR meta
    version: 2
    
    models:
      - name: user_events_monthly
        description: Monthly user engagement events used for activity analytics.
        meta:
          is_gdpr_model: true
          retention: 400d
        columns:
          - name: user_id
            description: Anonymized user identifier
            meta:
              gdpr_column_type: user_identifier
          - name: session_id
            description: Analytics session identifier
            meta:
              is_gdpr_column: true
          - name: event_date
            description: Logical partition/event date
          - name: event_type
            description: Type of event
    
      - name: advertising_campaigns
        description: Campaigns table retained permanently for financial audit.
        meta:
    
  3. Configure a post-hook on the models where you want the macro to run. You can set this at the folder level (models: in dbt_project.yml) or per-model using config(post_hook=...).

Usage

  • Model-level: set meta: { is_gdpr_model: true }. The macro will set a table tag gdpr_deletion=enabled.

  • Column-level: either set meta: { is_gdpr_column: true } to mark a field as sensitive with default type user_identifier, or provide an explicit type via meta: { gdpr_column_type: <type> }.

  • The macro also unsets gdpr_column_type on columns that aren’t flagged, keeping tags consistent when schemas evolve.

Warehouse syntax notes

  • Databricks/Unity Catalog: uses ALTER TABLE <table> SET TAGS (...) and ALTER COLUMN ... SET TAGS (...).

  • Snowflake: use ALTER TABLE <table> SET TAG <name> = '<value>'; adjust the macro accordingly.

  • BigQuery: consider labels or policy tags via separate APIs; the post-hook may call an external procedure instead of SQL.

Example: end-to-end

  1. Add meta in YAML for your model and columns.

  2. Ensure the macro is available and included in the models’ post-hooks.

  3. Run your model; dbt will execute the post-hook and apply tags to the relation and columns.

Reference implementation

set_gdpr_compliance_tag macro (full)
-- Generic GDPR tagging macro for dbt
-- Notes:
-- - Example SQL below uses Databricks/Unity Catalog syntax for tags.
-- - If your warehouse uses a different syntax (e.g., Snowflake), adjust the ALTER statements accordingly.

{% macro set_gdpr_compliance_tag() %}
    {% if execute %}
        {# Table-level meta flag to opt-in the model for GDPR tagging #}
        {% set is_gdpr_model = model.meta.get('is_gdpr_model') %}

        {% if is_gdpr_model %}
            {# Table-level tag #}
            {% set table_tag_name = 'gdpr_deletion' %}
            {% set table_tag_value = 'enabled' %}

            {% set table_level_tag_sql %}
                ALTER TABLE {{ this }} SET TAGS ('{{ table_tag_name }}' = '{{ table_tag_value }}');
            {% endset %}
            {% do run_query(table_level_tag_sql) %}

            {# Column-level tags: set gdpr_column_type when provided #}
            {% set column_tag_name = 'gdpr_column_type' %}
            {% for column in model.columns %}
                {% set col_meta = model.columns.get(column).meta or {} %}

                {# Support either a boolean flag or explicit type string #}
                {% set explicit_type = col_meta.get('gdpr_column_type') %}
                {% set is_flagged = col_meta.get('is_gdpr_column') %}

                {% if explicit_type is string and explicit_type|length > 0 %}
                    {% set column_level_tag_sql %}
                        ALTER TABLE {{ this }}
                        ALTER COLUMN {{ column }} SET TAGS ('{{ column_tag_name }}' = '{{ explicit_type }}');
                    {% endset %}
                    {% do run_query(column_level_tag_sql) %}
                {% elif is_flagged %}
                    {% set column_level_tag_sql_default %}
                        ALTER TABLE {{ this }}
                        ALTER COLUMN {{ column }} SET TAGS ('{{ column_tag_name }}' = 'user_identifier');
                    {% endset %}
                    {% do run_query(column_level_tag_sql_default) %}
                {% else %}
                    {% set column_level_untag_sql %}
                        ALTER TABLE {{ this }}
                        ALTER COLUMN {{ column }} UNSET TAGS ('{{ column_tag_name }}');
                    {% endset %}
                    {% do run_query(column_level_untag_sql) %}
                {% endif %}
            {% endfor %}
        {% endif %}
    {% endif %}
{% endmacro %}
YAML meta example for GDPR
version: 2

models:
  - name: user_events_monthly
    description: Monthly user engagement events used for activity analytics.
    meta:
      is_gdpr_model: true
      retention: 400d
    columns:
      - name: user_id
        description: Anonymized user identifier
        meta:
          gdpr_column_type: user_identifier
      - name: session_id
        description: Analytics session identifier
        meta:
          is_gdpr_column: true
      - name: event_date
        description: Logical partition/event date
      - name: event_type
        description: Type of event

  - name: advertising_campaigns
    description: Campaigns table retained permanently for financial audit.
    meta:

Operational tips

  • Run this macro via post-hooks to keep tags aligned with each run, especially after schema changes.

  • Use explicit types like user_identifier, pii_email, or business_identifier to categorize sensitivity.

  • Keep the tagging macro idempotent and safe for reruns; the provided implementation sets or unsets tags based on current meta.