TrueLabel Data Platform · dm.* adversarial audit

Аудит всех dm.* витрин

Первая версия документации и adversarial review по каждой dbt DM-модели: код, materialization contract, live ClickHouse sanity checks и независимые Claude Code subreviews. Это не правка витрин и не изменение Google Sheet.

checked 2026-06-29T14:31:43Zrepo master @ 741405a36 dbt SQL models3 Claude Code subreviewslive CH dm schema
P0
9

вероятно wrong numbers / silent data loss

P1
16

важно исправить/документировать до сертификации

P2
10

DQ/perf/contract risks

P3
1

dead/scratch/cleanup

Executive verdict

Главное: слой dm сейчас нельзя считать единым сертифицированным contract layer без per-mart caveats. Основные классы риска: versionless ReplacingMergeTree + append, views that require FINAL, wrong/ambiguous grain in unique_key, late-arrival blind watermarks, dictionary drift for brand_name, mutable dimensions inside keys, and several confirmed live wrong-number signals.

  • P0 confirmed live: global_product_report__dm loses registration flags vs users; withdrawal_c underflows to UInt64 max in product reports; sms_log_hourly_stats__dm has duplicate aggregate states; lifecycle_report__dm is event-grain but declares user key and is capped at first 10 deposits.
  • Documentation gap: several exposed models have no catalog doc page: conversions/dev/tmp/posthog/bonuses and parts of retention/LTV internals.
  • Plan implication: перед фиксом нужен triage: что является certified source для BI/C-level, что legacy/dev, что должно быть view-only wrapper, и какие marts требуют contract/tests.

Top live evidence

CheckResultMeaning
global_product_report__dm registration flags{'users': '3207772', 'reg_rows': '2635237', 'missing_registration_flags': '572535'}Registration rows collide/drop; не все users имеют registration=1 in mart.
global_product_report__dm signup-day collision{'rows_reg_day': '569001', 'reg_flag_rows': '29', 'activity_on_reg_day_no_reg_flag': '531701', 'reg_and_activity_same_row': '2', 'pure_reg_rows': '27'}Signup-day activity rows often have no registration flag.
Product UInt64 underflow{'huge_deposit_c': '0', 'huge_withdrawal_c': '142148', 'max_deposit_c': '540', 'max_withdrawal_c': '18446744073709551615'} / vert {'huge_deposit_c': '0', 'huge_withdrawal_c': '141612', 'max_deposit_c': '540', 'max_withdrawal_c': '18446744073709551615'}Withdrawal count can wrap to UInt64 max.
SMS aggregate states{'max_hour': '2026-06-29 14:00:00', 'raw_rows': '462452', 'uniq_keys': '341604', 'duplicate_state_rows': '120848'}AggregatingMergeTree + >= max(report_hour) likely double-counts boundary hours.
No-FINAL physical duplicates{'dmt_no_final_dups': {'rows': '27184377', 'uniq_key': '14685157', 'dup_rows': '12499220'}, 'payments_no_final_dups': {'rows': '30990914', 'uniq_key': '16708069', 'dup_rows': '14282845'}, 'traffic_no_final_dups': {'rows': '18315261', 'uniq_key': '11113781', 'dup_rows': '7201480'}, 'flat_no_final_dups': {'rows': '3411663', 'uniq_key': '3087651', 'dup_rows': '324012'}, 'posthog_deposits_grain': {'rows': '222383', 'uniq_deposit': '208432', 'uniq_team_deposit': '217810', 'dup_team_deposit': '4573'}, 'posthog_users_grain': {'rows': '13631', 'uniq_person': '13317', 'dup_person': '314'}}Consumers querying base ReplacingMergeTree tables without FINAL can see inflated rows.

Coverage / TOC

Per-mart documentation + adversarial review

P0

bonuses__dm

Bonus issuance/deposit-trigger aggregate by date_type/status/currency

top
grain
dt × brand × cluster × date_type × type_bonus × status × currency
materialization
'distributed_table'
doc page
NO
live
engine=Distributed, rows_local=16463, basic checks ok/no generic check
  • Hardcoded May-2026 windows in users/bonus/attrs/log/deposit filters; report is not a rolling mart.
  • INNER JOIN to May-registered non-test users and INNER JOIN to bonus-linked deposits: issued-bonus counts exclude older users and bonuses without activating deposit.
  • FX dictionary default 0.0 and nullable dt through assumeNotNull can silently zero EUR sums / produce odd partitions.

Evidence: dbt_project/models/dm/bonuses__dm.sql:25,105,122,133,149,193-196 · refs: global_payments__ads, global_plbonus_bonus_attributes__ads, global_plbonus_bonus_issued_to_user_logs__ads, global_plbonus_bonus_issued_to_users__ads, global_plbonus_bonuses__ads, global_users__ads

P2

conversions_mvp__dm

Legacy/main-cluster conversion funnel MVP

top
grain
event row: Registration + successful deposit 1..7
materialization
'distributed_table'
doc page
NO
live
engine=Distributed, rows_local=view/unknown, basic checks ok/no generic check
  • Main cluster only; not global.
  • unique_key=user_id advertises wrong grain; output is multi-row per user.
  • Hardcoded cap at 7 successful deposits and no doc page.

Evidence: dbt_project/models/dm/conversions_mvp__dm.sql · refs: main_payment__ads, main_users__ads

P1

global_deposits_report__dm

Full deposit-order report enriched with user and deposit sequence

top
grain
sur_order_id
materialization
distributed_table
doc page
yes
live
engine=Distributed, rows_local=7345385, max_updated_at=2026-06-29 13:05:13, max_created_at=2026-06-29 13:05:12, max_registration_date=2026-06-29 12:59:18; DQ null_country_name:2, empty_group_name:14
  • GLOBAL INNER JOIN users can drop deposits with missing/deleted user rows.
  • Full rebuild over global_payments FINAL; heavy and overlaps with incremental dmt twin.
  • Live exact FINAL key probe OK, but use as all-deposit source must specify user join scope.

Evidence: dbt_project/models/dm/global_deposits_report__dm.sql · FINAL exact key: rows=14686465, uniq=14686465, dup=0 · refs: global_payments__ads, global_users__ads

P0

global_deposits_report__dmt

Incremental twin of global_deposits_report

top
grain
sur_order_id
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=13736219, max_updated_at=2026-06-29 13:05:13, max_created_at=2026-06-29 13:05:12, max_registration_date=2026-06-29 12:59:18; DQ empty_brand_name:97, null_country_name:3, empty_group_name:18
  • Reads global_payments__ads/global_users__ads without FINAL; live no-FINAL probe: 27,184,377 rows vs 14,685,157 unique sur_order_id.
  • Delta hardcodes clusters main/premium/luxury; new cluster will be missed.
  • order_id watermark can miss late/out-of-order inserts.

Evidence: dbt_project/models/dm/global_deposits_report__dmt.sql:101,130 · FINAL exact key: rows=14685157, uniq=14685157, dup=0 · refs: global_payments__ads, global_users__ads

P0

global_payments_report__dm

Deposit+withdrawal payment report

top
grain
sur_order_id
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=15779365, max_updated_at=2026-06-29 13:05:13, max__loaded_at=2026-06-29 13:07:14, max_created_at=2026-06-29 13:05:12, max_registration_date=2026-06-29 12:59:18; DQ empty_brand_name:110, null_country_name:3, empty_group_name:18
  • Provider dictionary join can fan out if provider+aggregator mapping is non-unique.
  • Live no-FINAL table has 30,990,914 physical rows vs 16,708,069 unique sur_order_id; consumers must use FINAL/dmf.
  • Single _loaded_at watermark across users/payments/withdrawals can under/over-select if ingestion clocks differ.

Evidence: dbt_project/models/dm/payments_report/global_payments_report__dm.sql:338-339 · FINAL exact key: rows=16708069, uniq=16708069, dup=0 · refs: global_payments__ads, global_users__ads, global_withdrawals__ads, payment_provider_info_dict

P2

global_payments_report__dmf

BI view over global_payments_report

top
grain
same as parent, FINAL view
materialization
view
doc page
yes
live
engine=View, rows_local=view/unknown, max_updated_at=2026-06-29 13:05:13, max_created_at=2026-06-29 13:05:12, max_registration_date=2026-06-29 12:59:18; DQ empty_brand_name:106, null_country_name:3, empty_group_name:18
  • Correctness depends on FINAL on every query; expensive for BI.
  • Re-derives brand_name by dictionary even though parent stores brand_name.

Evidence: dbt_project/models/dm/payments_report/global_payments_report__dmf.sql:67 · refs: global_payments_report__dm

P0

global_product_report__dm

Daily per-user product financials + registration rows

top
grain
sur_user_id × brand_id × date
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=8027320, max_date=2026-06-29, max_registration_date=2026-06-29 13:03:19, max_ftd_date=2026-06-29 13:01:47; DQ empty_brand_name:8,601, null_country_name:125,994, empty_group_name:272,890, empty_affiliate_id:20,142
  • Registration rows collide with transaction rows on signup day: live only 2,635,237 registration flags vs 3,207,772 users; 572,535 missing registration flags.
  • UInt64 underflow in withdrawal_c: live 142,148 rows >1B and max=18446744073709551615.
  • Groups by user_id, not sur_user_id; cluster identity risk.

Evidence: dbt_project/models/dm/product_report/global_product_report__dm.sql:107-206,254-270,341 · refs: global_deposits_report__dm, global_payments_users_delta__ads, global_transactions__ads, global_transactions_users_delta__ads, global_users__ads, global_users_delta__ads

P1

global_product_report__dmf

View over global_product_report

top
grain
same as parent, FINAL view
materialization
view
doc page
yes
live
engine=View, rows_local=view/unknown, max_date=2026-06-29, max_registration_date=2026-06-29 13:03:19, max_ftd_date=2026-06-29 13:01:47; DQ empty_brand_name:64, null_country_name:122,380, empty_group_name:269,170, empty_affiliate_id:16,626
  • Inherits parent registration collision and UInt64 underflow.
  • Distributed FINAL on wide table for every BI query.
  • Brand dictionary differs from parent family.

Evidence: dbt_project/models/dm/product_report/global_product_report__dmf.sql · refs: global_product_report__dm

P1

global_report_product_vert__dm

Vertical product report by product_type/is_bonus

top
grain
sur_user_id × brand/date × is_bonus × product_type
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=6573661, max_date=2026-06-29, max_registration_date=2026-06-29, max_ftd_date=2026-06-29; DQ empty_brand_name:21,822, empty_country_name:99,602, empty_group_name:276,954, empty_affiliate_id:3,244
  • UInt64 withdrawal underflow confirmed in live: 141,612 rows >1B; max=UInt64 max.
  • deposit_s rollback sign expression needs data-sign verification.
  • Incremental path uses 7-day created_at horizon; late older transactions can be missed.

Evidence: dbt_project/models/dm/product_report/global_report_product_vert__dm.sql:71-76,109 · refs: global_link_games_sport__ads, global_payments__ads, global_transactions__ads, global_users__ads

P1

global_report_product_vert__dmf

View over global_report_product_vert

top
grain
same as parent, FINAL view
materialization
view
doc page
yes
live
engine=View, rows_local=view/unknown, max_date=2026-06-29, max_registration_date=2026-06-29, max_ftd_date=2026-06-29; DQ empty_brand_name:142, empty_country_name:98,612, empty_group_name:275,380, empty_affiliate_id:3,163
  • Inherits parent count underflow.
  • Heavy FINAL on wide table.
  • Brand dictionary mismatch vs stored names.

Evidence: dbt_project/models/dm/product_report/global_report_product_vert__dmf.sql · refs: global_report_product_vert__dm

P1

global_traffic_quality__dm

Successful-deposit traffic quality facts with first/second deposit and retention flags

top
grain
successful deposit / sur_order_id plus mutable user dims
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=9184425, max_updated_at=2026-06-29 13:19:48, max_created_at=2026-06-29 13:05:10; DQ empty_brand_name:2,977
  • 6-hour delta from *_users_delta__ads can permanently miss users if DAG gap >6h.
  • user_dep_stats misses FINAL on payments; duplicate pre-merge rows can break second-deposit time.
  • Mutable dims in unique key cause no-FINAL live duplicates: 18,315,261 rows vs 11,113,781 unique key.

Evidence: dbt_project/models/dm/global_traffic_quality__dm.sql:19-27,56 · refs: global_payments__ads, global_payments_users_delta__ads, global_transactions__ads, global_transactions_users_delta__ads, global_users__ads

P2

global_traffic_quality__dmf

View over global_traffic_quality

top
grain
same as parent, FINAL view
materialization
view
doc page
yes
live
engine=View, rows_local=view/unknown, max_updated_at=2026-06-29 13:19:48, max_created_at=2026-06-29 13:05:10; DQ empty_brand_name:10
  • FINAL needed for correctness but costs per query.
  • Brand dict mismatch with parent.

Evidence: dbt_project/models/dm/global_traffic_quality__dmf.sql · refs: global_traffic_quality__dm

P0

lifecycle_report__dm

Lifecycle funnel: registration + attempts/successes first 10

top
grain
event-level per user/event_name
materialization
distributed_table
doc page
yes
live
engine=Distributed, rows_local=3880244, max_created_at=2026-06-29 13:04:29, max_registration_date=2026-06-29 13:03:19; DQ empty_brand_name:80, null_country_name:88,381, empty_affiliate_id:18, empty_sur_order_id:3,207,772
  • Keeps only first 10 successful deposits; unsafe as all-payments mart.
  • unique_key=sur_user_id is false grain; live exact probe: 7,761,191 rows vs 3,207,782 unique users.
  • Registration denominator includes removed/test/admin unless BI filters externally; stage collision attempt vs success.

Evidence: dbt_project/models/dm/lifecycle_report__dm.sql:5,34,49,57 · FINAL exact key: rows=7761191, uniq=3207782, dup=4553409 · refs: global_payments__ads, global_users__ads

P0

ltv_cohort_weekly__dm

Weekly LTV cohort aggregate by cohort/dims/week

top
grain
cohort_week × week_dt × dims × week_n_label
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=2675987, max_week_dt=2026-06-29, max_cohort_weekly=2026-06-29; DQ empty_brand_name:394, empty_group_name:1,222
  • cumulative_ggr window runs only over incremental 30-day horizon; cumulative can restart/partial after incremental runs.
  • Mutable status flags are in unique/order key: status changes create unmergeable old+new rows.
  • Incremental horizon contradicts comment that status changes apply retroactively.

Evidence: dbt_project/models/dm/ltv/ltv_cohort_weekly__dm.sql:21-23,70-71,129-146 · refs: ltv_weekly_user_brand__dmf

P1

ltv_cohort_weekly__dmf

View over LTV cohort weekly aggregate

top
grain
same as parent, FINAL view
materialization
view
doc page
yes
live
engine=View, rows_local=view/unknown, max_week_dt=2026-06-29, max_cohort_weekly=2026-06-29; DQ empty_brand_name:394, empty_group_name:1,222
  • FINAL cannot collapse rows when status flags changed because key differs.
  • Inherits partial cumulative risk from parent.

Evidence: dbt_project/models/dm/ltv/ltv_cohort_weekly__dmf.sql · refs: ltv_cohort_weekly__dm

P1

ltv_ftd_dict__dm

FTD dictionary for LTV cohorts and current user dims

top
grain
sur_user_id × brand_id
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=160753, max_ftd_date=2026-06-29, max_cohort_weekly=2026-06-29
  • FTD is floored by ftd_start default 2025-01-01; older users get false first-2025 cohort.
  • ReplicatedMergeTree does not enforce uniqueness; depends on SQL grouping.
  • cluster filter uses two cluster-derivation styles; fragile.

Evidence: dbt_project/models/dm/ltv/ltv_ftd_dict__dm.sql:50-52 · refs: global_payments__ads, global_users__ads

P0

ltv_user_lifetime__dm

Lifetime player flags and activity counts for LTV

top
grain
sur_user_id × brand_id
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=481465, basic checks ok/no generic check
  • Append + ReplacingMergeTree without version while recomputing all users; downstream reads can pick stale lifetime rows.
  • Lifetime counts are truncated to ftd_start.
  • Player type thresholds/sign conventions undocumented.

Evidence: dbt_project/models/dm/ltv/ltv_user_lifetime__dm.sql · refs: global_link_games_sport__ads, global_payments__ads, global_transactions__ads, ltv_ftd_dict__dm

P1

ltv_weekly_user_brand__dm

Dense weekly per-user LTV facts

top
grain
sur_user_id × brand_id × week_dt
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=6664393, max_week_dt=2026-06-29, max_ftd_date=2026-06-29, max_cohort_weekly=2026-06-29; DQ empty_brand_name:2,636
  • Joins ltv_user_lifetime__dm with LIMIT 1 BY and no FINAL/order, so stale lifetime attributes can attach.
  • Unknown brand_id returns blank brand_name; live has blank brand names.
  • GGR sign convention needs validation against a known week.

Evidence: dbt_project/models/dm/ltv/ltv_weekly_user_brand__dm.sql:206-210 · refs: global_link_games_sport__ads, global_payments__ads, global_transactions__ads, global_users__ads, ltv_ftd_dict__dm, ltv_user_lifetime__dm

P2

ltv_weekly_user_brand__dmf

View over weekly LTV user-brand with live status

top
grain
same as parent, FINAL + ftd_dict join
materialization
view
doc page
NO
live
engine=View, rows_local=view/unknown, max_week_dt=2026-06-29, max_ftd_date=2026-06-29, max_cohort_weekly=2026-06-29; DQ empty_brand_name:2,119, empty_group_name:1,666
  • Plain LEFT JOIN relies on matching sharding between weekly and ftd_dict.
  • Status retroactivity works here but not after materialized cohort aggregate.

Evidence: dbt_project/models/dm/ltv/ltv_weekly_user_brand__dmf.sql · refs: ltv_ftd_dict__dm, ltv_weekly_user_brand__dm

P0

posthog_deposits__dm

PostHog deposit-success events enriched with person properties

top
grain
declared deposit_id; actual deposit_id × person/team risk
materialization
'distributed_incremental'
doc page
NO
live
engine=Distributed, rows_local=110053, max__loaded_at=2026-06-29 11:28:10
  • GROUP BY includes person_id but order_by excludes person_id; identity stitching can arbitrarily collapse attribution.
  • sharding_key=toInt64OrZero(distinct_id) likely sends UUID distinct_id to shard 0.
  • Live no-FINAL: 222,383 rows; 217,810 unique(team_id,deposit_id); 208,432 unique deposit_id.

Evidence: dbt_project/models/dm/posthog/posthog_deposits__dm.sql · refs: posthog_events__ads, posthog_persons__ads

P1

posthog_deposit_users__dm

Per PostHog person deposit profile

top
grain
team_id × person_id
materialization
'distributed_incremental'
doc page
NO
live
engine=Distributed, rows_local=6924, max__loaded_at=2026-06-29 11:28:10
  • Reads posthog_deposits__dm without FINAL; live has 314 duplicate team/person physical rows.
  • is_ftd is constant 1 for every person row.
  • NULL person_created_at can create bad partitions/days_to_first_deposit.

Evidence: dbt_project/models/dm/posthog/posthog_deposit_users__dm.sql · refs: posthog_deposits__dm, posthog_persons__ads

P1

product_health__dm

Hourly per-user product health from transactions + registration

top
grain
ts_hour × sur_user_id × is_bonus
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=20184079, max_ts_hour=2026-06-29 13:00:00, max_registration_date=2026-06-29 13:03:19; DQ empty_brand_name:539,379, null_country_name:140,134, empty_group_name:525,658, empty_affiliate_id:47
  • Versionless ReplacingMergeTree + append; stale-row risk on recompute.
  • FULL JOIN merges keys via greatest/defaults; fragile invariant.
  • Brand name has multiple provenances (stored user brand vs dmf dictionary).

Evidence: dbt_project/models/dm/product_health/product_health__dm.sql:91-93,119 · refs: global_transactions__ads, global_transactions_users_delta__ads, global_users__ads, global_users_delta__ads

P3

product_health__dm_tmp

Scratch/legacy product_health rebuild variant

top
grain
ts_hour-ish, non-unique
materialization
'distributed_table'
doc page
NO
live
engine=not found, rows_local=view/unknown, basic checks ok/no generic check
  • Hardcoded ads.* references break dbt lineage.
  • Looks unscheduled/not present in live system; confirm and delete if dead.
  • Non-unique order_by and duplicated FTD logic.

Evidence: dbt_project/models/dm/product_health/product_health__dm_tmp.sql

P2

product_health__dmf

BI view over product_health

top
grain
same as parent, FINAL view + filters
materialization
view
doc page
yes
live
engine=View, rows_local=view/unknown, max_ts_hour=2026-06-29 13:00:00, max_registration_date=2026-06-29 13:03:19
  • FINAL per query on large table.
  • Magic filter brand_id NOT IN (2) undocumented.
  • Live generic probe hit one error; needs direct view check.

Evidence: dbt_project/models/dm/product_health/product_health__dmf.sql:55-56 · refs: product_health__dm

P1

psp_monitoring_alerts__dm

Current PSP monitoring alert snapshot

top
grain
alert row by provider/country/brand rollup
materialization
?
doc page
yes
live
engine=Distributed, rows_local=7, basic checks ok/no generic check
  • distributed_table overwrites each run; no alert history.
  • If now()-2h hour is not loaded yet, table silently empty; division by zero can produce inf.
  • Repeated full scans and inconsistent thresholds.

Evidence: dbt_project/models/dm/psp_monitoring_alerts__dm.sql

P2

reg_to_dep_dev__dm

Experimental registration-to-deposit user aggregate

top
grain
user/brand, main cluster dev
materialization
'distributed_table'
doc page
NO
live
engine=Distributed, rows_local=983152, max_registration_date=2025-12-01 13:08:31; DQ empty_country_name:217,091, empty_group_name:141,997
  • Reads main_payment/main_users without FINAL; aggregate inflation risk.
  • _dev model but exposed in dm schema; no docs.
  • unique_key=user_id but join can fan out by brand/cluster.

Evidence: dbt_project/models/dm/reg_to_dep_dev__dm.sql · refs: main_payment__ads, main_users__ads

P1

retention_ftd_user_brand__dm

First real-money REFILL per user/brand for retention

top
grain
sur_user_id × brand_id
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=268605, max_updated_at=2026-06-29 13:19:51, max_ftd_date=2026-06-29
  • Delta compares event created_at to build updated_at; late-arriving older REFILL missed.
  • Versionless ReplacingMergeTree despite append.
  • updated_at=now() is build time and downstream watermark.

Evidence: dbt_project/models/dm/retention/retention_ftd_user_brand__dm.sql:23,46 · FINAL exact key: rows=507141, uniq=507141, dup=0 · refs: global_transactions__ads

P1

retention_monthly_activity__dm

Monthly active REFILL/BET activity per user/brand/bonus

top
grain
sur_user_id × brand_id × active_month × is_bonus
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=1622399, max_updated_at=2026-06-29 13:19:50, max_active_month=2026-06-01
  • Same created_at > max(updated_at) late-arrival trap.
  • Versionless ReplacingMergeTree.
  • Activity excludes WIN-only / bonus-only definitions; must be documented.

Evidence: dbt_project/models/dm/retention/retention_monthly_activity__dm.sql:24,34 · FINAL exact key: rows=3228480, uniq=3228480, dup=0 · refs: global_transactions__ads

P1

monthly_retention__dm

Monthly retention facts per report month/user/brand/bonus

top
grain
report_month × sur_user_id × brand_id × is_bonus
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=15775818, max_updated_at=2026-06-29 13:20:38, max_report_month=2026-06-01, max_ftd_date=2026-06-29; DQ empty_brand_name:3,536,748, null_country_name:1,130,295, empty_group_name:3,543,066, empty_affiliate_id:157,356
  • Engine lacks version even though updated_at exists; append dedup arbitrary before FINAL.
  • Large 3× bonus fan-out spine per FTD user and unbounded lag windows.
  • Docs/tests missing for monthly model family.

Evidence: dbt_project/models/dm/retention/monthly_retention__dm.sql · FINAL exact key: rows=27568605, uniq=27568605, dup=0 · refs: global_transactions__ads, global_users__ads, retention_ftd_user_brand__dm, retention_monthly_activity__dm

P2

monthly_retention__dmf

View over monthly_retention

top
grain
same as parent, FINAL view
materialization
view
doc page
yes
live
engine=View, rows_local=view/unknown, max_updated_at=2026-06-29 13:20:38, max_report_month=2026-06-01, max_ftd_date=2026-06-29; DQ empty_brand_name:2,283, null_country_name:1,130,274, empty_group_name:3,543,045, empty_affiliate_id:157,335
  • FINAL + SELECT * REPLACE on every query.
  • No default filter for is_test/is_removed; BI must filter explicitly.

Evidence: dbt_project/models/dm/retention/monthly_retention__dmf.sql · refs: monthly_retention__dm

P2

cohort_retention__dm

Day/week/month cohort-retention booleans

top
grain
sur_user_id × brand_id × is_bonus × cohort_basis
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=1261314, max_updated_at=2026-06-29 13:19:55, max_ftd_date=2026-06-29 13:11:05; DQ empty_brand_name:1,110,672, null_country_name:1,106,364, empty_group_name:1,106,364, empty_affiliate_id:1,106,364
  • Single partition tuple(): no partition pruning / large merges.
  • 360/366-day cap may clip late-month month-12 interpretation.
  • ANY join before rank is misleading; holds only if users source is already 1-row FINAL.

Evidence: dbt_project/models/dm/retention/cohort_retention__dm.sql · refs: global_transactions__ads, global_users__ads, retention_ftd_user_brand__dm

P2

cohort_retention__dmf

View over cohort_retention

top
grain
same as parent, FINAL view
materialization
'simple_clickhouse_view'
doc page
NO
live
engine=View, rows_local=view/unknown, max_updated_at=2026-06-29 13:19:55, max_ftd_date=2026-06-29 13:11:05; DQ empty_brand_name:4,524, null_country_name:928,806, empty_group_name:928,806, empty_affiliate_id:928,806
  • FINAL on ~70-column single-partition table can be expensive.
  • Dimension blanks/nulls visible in live generic DQ checks; verify BI filters.

Evidence: dbt_project/models/dm/retention/cohort_retention__dmf.sql · refs: cohort_retention__dm

P0

sms_log_hourly_stats__dm

SMS hourly AggregatingMergeTree stats

top
grain
hour × cluster × brand × status × service
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=235537, max__loaded_at=2026-06-29 14:23:22; DQ empty_brand_name:13,744
  • Incremental boundary uses >= max(report_hour); AggregatingMergeTree states are summed, so last hour is re-counted on every run.
  • Live raw states: 462,452 rows vs 341,604 unique keys; 120,848 duplicate state rows.
  • Late events in already-closed hours are missed; _loaded_at non-aggregate is arbitrary.

Evidence: dbt_project/models/dm/sms_log_hourly_stats__dm.sql:27-31 · refs: global_plnotification_sms_log__ads

P1

sms_log_hourly_stats__dm_v

View merging SMS count states

top
grain
same SMS key, countMerge view
materialization
view
doc page
yes
live
engine=View, rows_local=view/unknown, DQ empty_brand_name:11,202
  • View aggregation pattern is OK but inherits upstream double-count/lost-late-events.
  • Blank brand_name visible in live DQ checks.

Evidence: dbt_project/models/dm/sms_log_hourly_stats__dm_v.sql · refs: sms_log_hourly_stats__dm

P1

traffic_quality_flat__dm

User-level traffic-quality flat mart

top
grain
sur_user_id
materialization
distributed_incremental
doc page
yes
live
engine=Distributed, rows_local=1696689, max_updated_at=2026-06-29 13:32:45; DQ empty_brand_name:4,420, null_country_name:88,361, empty_group_name:142,006
  • Delta compares source event times to run-time updated_at=now(); late-arriving data missed.
  • chargeback_count hardcoded 0.
  • D30 retention definition differs from global_traffic_quality.

Evidence: dbt_project/models/dm/traffic_quality_flat__dm.sql:168 · FINAL exact key: rows=3087651, uniq=3087651, dup=0 · refs: global_payments__ads, global_transactions__ads, global_users__ads

P2

traffic_quality_flat__dmf

View over traffic_quality_flat

top
grain
same as parent, FINAL view
materialization
view
doc page
yes
live
engine=View, rows_local=view/unknown, max_updated_at=2026-06-29 13:32:45; DQ empty_brand_name:38, null_country_name:88,361, empty_group_name:142,005
  • FINAL per query.
  • Brand dictionary mismatch vs parent and blank dimension values in live checks.

Evidence: dbt_project/models/dm/traffic_quality_flat__dmf.sql · refs: traffic_quality_flat__dm

Live public dm tables not backed by current dbt model list

tableenginemetadata modified
cohort_retention_v2__dmfView2026-04-14 08:26:17
global_deposits_report_enriched__dmDistributed2026-02-04 08:23:18
product_health_grouped__dmDistributed2026-01-13 10:49:23
report_product_transactions_aggDistributed2026-02-17 08:59:31
test_sport_fixDistributed2026-02-13 13:32:22
test_with_gjoinDistributed2026-01-14 12:10:21
test_with_joinDistributed2026-01-14 12:10:08

These should be classified as legacy/manual/test before any stakeholder uses “all dm.*” broadly.

Sources