What Is a Data Enrichment Waterfall?
A data enrichment waterfall is a multi-provider architecture that queries multiple data sources in a specific sequence to maximize the coverage and accuracy of contact and company information. Instead of relying on a single data provider, which typically covers only 40-60% of your target market, a waterfall chains together three to six providers in priority order, passing records through each one until the required data points are found and verified.
The concept is simple but the execution matters enormously. When you search for the VP of Marketing at a mid-market SaaS company, Apollo might have a verified email for that person 55% of the time. But for the 45% Apollo misses, People Data Labs might find 30% of those. For the remaining records, Lusha or Clearbit might catch another 15%. By the end of a well-designed waterfall, you go from 55% coverage to 90%+ coverage, and the cost per enriched record actually drops because you only call expensive providers when cheaper ones fail.
Why Single-Provider Enrichment Fails
Every data provider has blind spots. Apollo is strongest for US-based technology companies with 50-5,000 employees, but its coverage drops significantly for EMEA markets, companies outside tech, and enterprises above 10,000 employees. ZoomInfo has the deepest coverage for large enterprises and traditional industries but charges significantly more per record and has less coverage for startups and small businesses. People Data Labs has the broadest raw coverage at over 1.5 billion person records, but its email verification rates are lower than purpose-built providers.
The numbers tell the story clearly. We analyzed enrichment results across 50,000 records for a B2B SaaS client targeting mid-market financial services companies. Using Apollo alone, we found verified work emails for 48% of target contacts. Using ZoomInfo alone, we hit 52%. Using PDL alone, we reached 44%. But when we built a waterfall that queried all three in sequence, with Clearbit as a fourth fallback, we achieved 91% verified email coverage. That is not a marginal improvement. It is the difference between a campaign that reaches half your market and one that reaches nearly all of it.
Beyond coverage, single-provider strategies create data quality risks. Every provider has records that are stale, incorrect, or partially complete. A waterfall architecture allows you to cross-reference data points across providers, flagging discrepancies and using consensus to improve accuracy. If two out of three providers agree on a person's current title and company, you can have high confidence in that data even if the third provider has outdated information.
Waterfall Architecture: How It Works
Provider Ordering
The sequence in which you query providers matters for both cost and quality. The general principle is to start with the cheapest provider that has acceptable coverage for your target segment, then fall through to progressively more expensive or specialized providers for records that were not enriched. A typical ordering for US B2B contacts looks like this: Apollo (low cost, good tech coverage) as the first pass, then People Data Labs (broad coverage, moderate cost), then Clearbit or ZoomInfo (deeper firmographic data, higher cost), and finally Lusha or RocketReach as a last-resort fallback for hard-to-find contacts.
The optimal ordering depends on your target market. If you are targeting enterprise accounts in traditional industries like manufacturing, insurance, or banking, you might lead with ZoomInfo because its enterprise coverage is superior. If your ICP is startups and mid-market tech companies, Apollo should be your first provider because its coverage in that segment is strong and its cost per record is among the lowest in the market.
Fallthrough Logic
Fallthrough logic defines when a record moves from one provider to the next. The simplest approach is binary: if the current provider returns a verified email, stop. If not, move to the next provider. But more sophisticated waterfalls use multi-field fallthrough. You might require a verified email AND a current title AND a phone number for the record to be considered complete. If any required field is missing, the record falls through to the next provider, which attempts to fill the gaps.
In Clay, fallthrough logic is implemented using conditional columns and enrichment steps. You create an enrichment step for your first provider, then a second enrichment step that only executes when a condition is met, such as 'email is empty' or 'email confidence is below 90%'. This conditional execution is critical for cost control. If Apollo returns a valid email for 55% of your records, you only pay for the remaining 45% at the next provider. Without conditional logic, you would pay every provider for every record, which destroys your cost advantage.
Cost Optimization
Cost optimization in a waterfall is about minimizing the total number of API calls while maximizing coverage. The math works strongly in your favor. Suppose you have 10,000 records. Apollo charges $0.03 per enrichment and covers 55% of records. You pay $300 for 5,500 enriched records. The remaining 4,500 go to PDL at $0.05 per record ($225), which covers 30% of those, yielding 1,350 more records. The remaining 3,150 go to Clearbit at $0.10 per record ($315), covering 40% of those for 1,260 more. Your total: 8,110 enriched records for $840, or $0.104 per enriched record. Compare this to using ZoomInfo alone at $0.15 per record for all 10,000 ($1,500) but only getting 5,200 back. The waterfall gives you 56% more records for 44% less money.
Step-by-Step: Building a Waterfall in Clay
Step 1: Prepare Your Input Data
Start with a clean input table in Clay. At minimum, you need company name, company domain, and the target persona's title or role. If you have LinkedIn URLs, include those as they dramatically improve match rates across all providers. Import your data from a CSV, Google Sheet, or CRM integration. Before running any enrichment, deduplicate your list by domain + title combination to avoid paying for the same lookup twice.
Step 2: Configure Provider 1, Apollo
Add an Apollo People Enrichment column to your Clay table. Map the input fields: company domain, person title (use a title keyword like 'VP Marketing' rather than an exact match), and any other available identifiers. Set the enrichment to return the top match. Apollo will return the person's full name, current title, verified email, LinkedIn URL, and basic company data. Create a formula column that checks whether the Apollo email field is non-empty and the verification status is 'valid'. This becomes your fallthrough trigger.
Step 3: Configure Provider 2, People Data Labs
Add a People Data Labs enrichment column with a condition: only run when the Apollo verification column is false or empty. Map the same input fields. PDL returns similar data points but draws from a different underlying dataset, so it catches records Apollo misses. Create another formula column that consolidates the best available email: use Apollo's email if it exists and is verified, otherwise use PDL's email if available.
Step 4: Configure Provider 3, Clearbit or ZoomInfo
Add your third provider with a condition that fires only when both previous providers failed to return a verified email. This layer catches the hardest-to-find contacts and is typically your most expensive provider per record. Because you have already filtered out 70-80% of your list through cheaper providers, the cost impact is manageable.
Step 5: Email Verification
Never skip this step. Even 'verified' emails from data providers should be run through an independent verification service. Use a tool like ZeroBounce, NeverBounce, or MillionVerifier as a final column in your Clay table. This catches emails that were valid when the provider last checked but have since become invalid due to job changes, company email system changes, or other factors. Expect to lose 5-10% of your records at this stage. That is normal and far better than sending to bad addresses and damaging your domain reputation.
Step 6: Normalization and Output
Create formula columns that normalize the output into a clean, consistent format. Standardize title formats (VP vs Vice President), clean company names (remove Inc., LLC suffixes for matching purposes), format phone numbers consistently, and merge the best data from each provider into a single set of canonical fields. Export the final enriched data to your CRM via Clay's native HubSpot or Salesforce integration, or push it to your outbound tool through a webhook.
Provider Comparison: Who to Use and When
Apollo
Strengths: Largest self-serve contact database with over 270 million contacts. Email verification is built-in. Strong coverage for US tech companies. Very affordable at roughly $0.02-0.05 per enrichment depending on plan. Also offers a solid email sending platform. Weaknesses: Coverage drops outside the US and outside tech verticals. Data can be 3-6 months stale for fast-moving roles. Phone number coverage is limited. Best for: First-pass enrichment for tech and SaaS ICP, especially when cost efficiency matters.
ZoomInfo
Strengths: Deepest coverage for enterprise accounts (10,000+ employees). Strong in traditional verticals like manufacturing, healthcare, and financial services. Excellent firmographic data including revenue estimates, org charts, and department headcounts. Good phone number coverage. Weaknesses: Expensive, with annual contracts typically starting at $25,000+. API access requires enterprise tier. Data freshness varies by segment. Best for: Enterprise-focused teams where deep firmographic intelligence justifies the cost.
People Data Labs
Strengths: Broadest raw coverage at 1.5+ billion person records. Pure API-first product, making it ideal for programmatic enrichment. Affordable per-record pricing. Strong for bulk enrichment jobs. Weaknesses: Email verification is not as robust as Apollo's. Requires more post-processing to ensure data quality. UI tools are limited. Best for: High-volume enrichment as a second or third waterfall layer, especially when you need maximum coverage.
Clearbit (HubSpot)
Strengths: Excellent technographic data showing what tools a company uses. Real-time enrichment via API is very fast. Strong company data including employee count, industry classification, and funding information. Now integrated natively into HubSpot. Weaknesses: Contact coverage is more limited than Apollo or PDL. Person-level enrichment requires company domain as input. Pricing is bundled with HubSpot for some features. Best for: Technographic enrichment and real-time inbound lead enrichment within HubSpot.
Lusha
Strengths: Best-in-class direct dial phone numbers. Good email coverage for European contacts. Chrome extension makes manual research fast. GDPR-compliant data sourcing. Weaknesses: Smaller overall database than Apollo or PDL. More expensive per record. API functionality is less mature. Best for: Finding direct dial phone numbers when phone outreach is part of your strategy, and for European contact enrichment.
Real-World Cost Savings: A Case Study
A Series B fintech company came to us spending $4,200 per month on ZoomInfo for their outbound enrichment needs. Their SDR team was processing approximately 8,000 records per month, achieving 54% email coverage. They were leaving 3,680 potential contacts on the table every month.
We built a three-layer waterfall using Apollo (first pass), PDL (second pass), and Lusha (third pass for priority accounts requiring phone numbers). The results after 90 days: email coverage increased from 54% to 89%. Total enrichment cost dropped from $4,200/month to $1,800/month. Cost per enriched contact dropped from $0.97 to $0.25. The SDR team booked 34% more meetings in the first full quarter, purely from reaching contacts they could not previously find. Annualized, the waterfall saved $28,800 in direct tool costs while generating an estimated $340,000 in additional pipeline from the improved coverage.
Batch vs Real-Time Enrichment
Batch enrichment processes large lists at scheduled intervals, typically weekly or bi-weekly. This is ideal for outbound prospecting where you build account lists, enrich them in bulk, and feed them into sequencing tools. Batch workflows are easier to build, easier to monitor, and more cost-efficient because you can optimize provider ordering across the entire dataset.
Real-time enrichment triggers instantly when a specific event occurs, such as a form submission, a website visit from a known IP, or a webhook from a partner integration. Real-time enrichment is essential for inbound lead processing, where speed matters. A lead that is enriched, scored, and routed to an AE within 60 seconds converts at 2-3x the rate of a lead that waits hours or days for manual processing.
Most production GTM systems use both. Batch for outbound list building, real-time for inbound processing. The waterfall architecture works identically in both modes. The only difference is the trigger: a scheduled import for batch, a webhook event for real-time.
CRM Integration Best Practices
Your enriched data is only valuable if it flows cleanly into your CRM and outbound tools. Key integration principles: always deduplicate before writing to CRM by matching on email address or LinkedIn URL, never on name alone. Map your enriched fields to specific CRM properties and create those properties before running your first sync. Use lifecycle stage automation to ensure enriched contacts enter the correct pipeline stage. Set up duplicate detection rules in your CRM to prevent the enrichment pipeline from creating duplicate records.
In HubSpot, use the native Clay integration or a custom webhook to create or update contacts. Set the contact owner based on territory rules during the enrichment process, not after the record arrives in HubSpot. In Salesforce, use a middleware layer like Census or a custom API integration to handle the lead creation, matching, and field mapping. Always log the enrichment source and timestamp on each record so you can track data freshness and provider performance over time.
A well-built waterfall is not a one-time project. It requires ongoing monitoring, provider evaluation, and optimization. Track your coverage rate, cost per enriched record, and email bounce rate by provider on a monthly basis. Swap provider ordering when you notice coverage shifts. Add new providers when they offer advantages for your specific ICP. The waterfall is a living system, and the teams that treat it that way consistently outperform those that build it once and forget about it.