Arif Ahmedarifwork
← All projects
P2Data foundation

RevOps Janitor

Continuous CRM hygiene, dedupe, and a per-record data-health score — the foundation every other GTM automation depends on.

86.2 → 93.6
avg health score (projected)
93
duplicates flagged for merge
53
deliverability-risk records suppressed

RevOps Janitor — data-health pipeline

Dirty CRM in, golden records out — watch the cleaning run end to end.

Hit 'Run cleaning' to send all 1,041 contacts down the conveyor.

1
Dedupe
2
Enrich
3
Validate
4
Re-score

Swipe to see the full pipeline →

93 duplicates166 missing titlesno LinkedIn60 unreachable53 deliverability risk278 stale
Avg health
86.2
today
100
Alex Chen · Head of Product
clean
100
Sam Patel · VP Growth
clean
72
Jordan Kim ·
Missing titleNo LinkedIn
85
Taylor Diaz · Director of Product
Free-mail on B2B
77
Morgan Silva · Product Analyst Intern
Junior personaNo LinkedIn
100
Priya Nair · Head of Data
clean
60
Wei Zhang · VP Engineering
Bad email syntaxStale (>1yr)
100
Diego Novak · Head of Product
clean
80
Noor Haddad ·
Missing title
47
Liam Garcia · Marketing Assistant
Bad email syntaxJunior personaNo LinkedIn
100
Emma Rossi · VP Product
clean
82
Yuki Tanaka · Director of Growth
No LinkedInStale (>1yr)
75
Alex Chen · Head of Product
Duplicate
100
Sara Lopez · CPO
clean
75
Omar Costa · Head of Growth
Duplicate
37
Nina Schmidt ·
Missing emailMissing titleNo LinkedIn
100
Raj Singh · Product Analytics Lead
clean
100
Zoe Muller · Head of Product
clean
62
Ivan Novak · Student
Free-mail on B2BJunior personaNo LinkedIn
100
Mei Lin · VP Product
clean

Problem

A CRM accumulates rot: duplicate contacts, missing titles (so persona scoring is blind), free-mail and syntax-broken emails (which burn sender reputation), and stale records. Reps waste time, sequences double-touch prospects, and deliverability quietly degrades — and nobody owns the number.

Who it's for

RevOps / GTM Eng owning CRM integrity at a 50–600-person SaaS company. Stakeholders: SDR leaders (deliverability), AEs (clean handoffs), and the CRO (forecast trust).

How it works

  1. Weekly cron scores every contact: an issues[] array + a 0–100 health score into a shared table.
  2. Apply safe, idempotent auto-fixes (normalize email casing, collapse typos).
  3. Emit a reviewable merge + enrichment plan (deletes/enrichment are never silent).
  4. Clay waterfall backfills missing titles / LinkedIn; HubSpot merge API handles dupes.
  5. LLM writes the weekly exec data-quality summary; posted to Slack.

Outcome

On the bundled 1,041-contact dataset: 93 duplicates, 166 missing titles, 278 stale records, and 53 deliverability-risk records identified.

Executing the plan lifts average data health from 86.2 to a projected 93.6, with 344 enrichments queued.

Suppressing the 53 deliverability-risk records protects the sender reputation P1's entire outbound motion depends on — the unglamorous work that makes the rest of the portfolio trustworthy.

How it scales with paid data

  • Point the warehouse credential at a HubSpot/Salesforce mirror; execute merges via the Merge API.
  • Add email verification (NeverBounce/ZeroBounce) for a verified email status.
  • Feed health scores into routing (P3) and outbound suppression (P1).

Stack

n8nPostgres / SupabaseHubSpotClayLLMSlack

Skills shown

SQLDedupe logicData-quality scoringAI normalizationScheduling