how-to-vet-financial-data-sources — evaluating investments

The practical point: you are not buying "data," you are buying an error distribution, and the difference between 14.2% and 1.6% bad observations can easily be 0.65–1.05 percentage points/year of avoidable drag (0.8–1.2% vs 0.1–0.15%) if the data drives allocation. (Ince & Porter, 2006)

Why Data Source Verification Matters

If you accept one vendor's numbers, you implicitly accept their missingness, revisions, lags, and bias as if they were 0%. That assumption is routinely wrong at 8.4% (retroactive deletions/alterations in a major recommendations database from 2000–2007) and can materially change any backtest by 1.5–3.0% annually if you unknowingly ingest survivorship bias. (Ljungqvist, Malloy & Marston, 2009)

Even "top-tier" sources disagree at non-trivial rates: two premier mutual-fund databases diverged on returns for 32.8% of observations, with an average gap of 0.18% per month (2.16% annualized), meaning a single-source pipeline can be directionally correct yet numerically wrong for roughly 1 in 3 rows. (Elton, Gruber & Blake, 2001)

And sometimes the failure mode is acute, not gradual: on May 6, 2010, vendor-to-vendor price prints for the same security diverged by up to 99.97% (e.g., $0.01 vs $32), while 47% of providers still carried incorrect historical prices for 90+ days, and funds using multi-source verification lost 2.1% versus 6.8% for single-source users (a 4.7 percentage point difference). (SEC/CFTC Report, 2010)

Define The Verification Target (Before You Measure Anything)

You only get clean answers if you state 3 numeric specifications up front:

Decision latency: Are you making a trading decision in <24 hours or a portfolio-analysis decision in <7 days? Those are different "acceptable staleness" bands by design.
Materiality: Which data points move >1% of portfolio allocation (position sizing, eligibility, rebalancing triggers)? Those require 3 independent sources as a minimum.
Tolerance: What is "agreement"? A workable default is <0.2% for equity prices and <0.5 percentage points for dividend yield / P/E, with <2% for earnings estimates. (Quantified rules; Elton et al., 2001)

The point is: verification is not ideology; it's thresholds.

Verification Methods (Layered, Not Optional)

1) Primary-Source Anchoring (You Start With The Ground Truth)

Your first check should cover >75% of portfolio weight from primary documents (filings, exchange notices, issuer reports), because that is where the expected impact is concentrated in the top tail (e.g., the top 100 holdings can represent 78% weight in a 500-name universe). In practice, you do not need primary-source coverage for 100% of names to get most of the risk reduction; you need it for the weight that matters. (Worked example specification)

2) Triangulation (You Force A 3-Way Convergence)

You compare at least 3 secondary aggregators for anything material (>1% allocation impact). When two vendors disagree, the existence of a discrepancy is the signal; the size decides urgency:

Price: escalate if cross-source spread is >0.2%.
Fundamentals (dividend yield, P/E): escalate if spread is >0.5 percentage points.
Earnings estimates: escalate if spread is >2%. (Quantified rules)

This is not pedantry: if two high-quality providers disagree on 32.8% of rows, triangulation is a direct attack on that base rate. (Elton et al., 2001)

3) Statistical Screening (You Treat Data Errors Like Outliers)

You run outlier detection inside peer groups (sector or industry) and investigate any observation with z-score > 2.5. At typical cross-sectional shapes, that flags about 1.2% of names—about 5–8 securities in a 500-name universe—small enough for manual review, large enough to catch systematic vendor issues. (Worked example specification)

Commercial datasets can have errors in 14.2% of international equity returns with 2.3% mean absolute error per monthly return observation, and screening rules can reduce erroneous observations by 89%, so your "outlier budget" should be measured in single digits per 500, not hundreds. (Ince & Porter, 2006)

4) Temporal Consistency (You Compare Against Your Own History)

You flag any change >40% versus the 3-year historical average for investigation, because large relative moves are where both genuine business events and vendor mistakes cluster. You also set data-age requirements explicitly: <24 hours for trading decisions and <7 days for portfolio analytics. (Quantified rules)

This matters because propagation lag is a measurable source of error: index-constituent changes were reported with an average lag of 4.7 trading days, producing 0.34% annual phantom tracking error, while real-time verification against primary sources reduced lag-induced errors by 91%. (Bhattacharya & Galpin, 2011)

Accuracy Metrics (What You Measure, Not What You Hope)

Metric 1: Cross-Source Discrepancy Rate

Compute the percent of observations where vendors disagree beyond your tolerances (e.g., 0.2% price, 0.5pp dividend yield). If your discrepancy rate is >5% on a rolling 12-month window, you treat the source as failing and discontinue it. (Quantified rules)

If you do nothing, disagreement can sit at 32.8% in some domains, so a 5% discontinuation threshold is not "aggressive"; it is a governance line. (Elton et al., 2001)

Metric 2: Mean Absolute Error (MAE) Versus A Benchmark

Where you have a benchmark or primary-source series, compute MAE (e.g., 2.3% per monthly return observation in one commercial dataset), and track your screening uplift: an 89% reduction in erroneous observations implies error-rate compression from 14.2% to roughly 1.6% when the workflow is fully applied. (Ince & Porter, 2006)

Metric 3: Frequency Sensitivity (The Same Asset, Different Volatility)

If your risk estimates are frequency-mismatched, you can be "precise" and still wrong: volatility estimates shift by 23–47% when frequency changes, and daily data can report 31% lower volatility than tick-level data for the same securities. Your verification step is to match frequency to the decision horizon and document the expected bias band (up to 40% reduction in estimation bias when frequency matches timeframe). (Rosenberg & Engle, 2002)

Bias Detection (You Hunt Systematic Error, Not Random Noise)

Survivorship And Retroactive Rewrites (Point-In-Time Or It's Fiction)

A database can delete or alter history: 19,904 analyst recommendations were retroactively deleted/altered in 2000–2007 (8.4% of all recommendations in that dataset). If you cannot confirm point-in-time preservation, you assume your backtest is overstated by 1.5–3.0% annually and either re-source the data or haircut the results by 1.5% as an explicit penalty for unknown survivorship status. (Ljungqvist et al., 2009; quantified fix)

Manipulation Signatures (Clustering Beats Narratives)

Manipulated rates often look "too neat." During 2005–2012 (exposed 2012), submitted LIBOR rates clustered within 0.02% of each other on 87% of days, versus an expected clustering rate of 34% under a clean distribution. An independent cross-check against CDS spreads showed 0.35% average deviation during manipulation versus 0.08% in clean periods, and investors who cross-referenced LIBOR against CDS spreads found anomalies 4.4x faster than those relying solely on reported rates. (UK FSA Final Notice, 2012)

Your operational rule is numeric: flag if >60% of observations cluster within 0.1%, because that pattern is a manipulation candidate until disproven. (Quantified rules)

Methodology Shifts (Classification Is Also "Data")

On a 2002 methodology change, 44% of funds changed Morningstar categories overnight; 1,847 funds moved, with an average shift of 1.3 style boxes, and 12% crossing "Value"–"Growth." Correlation between Morningstar and Lipper classifications fell from 67% to 52%, and investors using multiple classification systems avoided 73% of category-driven allocation errors versus single-source users. (Morningstar Methodology, 2002)

Your rule is governance-driven: flag any source that revises >3% of historical observations annually, because that revision rate is large enough to reshape factor exposures silently. (Quantified rules)

Worked Example: You Vet Dividend Yield Data For 500 Stocks

You are validating dividend yields for 500 international dividend-paying stocks for a $50 million mandate, with annual rebalancing and quarterly monitoring, and you need 100% auditability for compliance.

You anchor to primary sources: You pull filings/annual reports for the top 100 holdings that represent 78% of portfolio weight, targeting >75% primary-source weight coverage.
You triangulate vendors: You compare Bloomberg, Refinitiv, and FactSet across all 500 names, and you flag any dividend-yield discrepancy >0.5 percentage points, targeting >95% agreement within tolerance.
You run outlier screens: You compute sector z-scores and investigate any yield with z > 2.5, expecting about 1.2% flagged (roughly 5–8 names).
You enforce temporal bands: You compare today's yield to the 3-year average and flag any change >40%, targeting >90% of names within historical bands.
You document the chain: You record source + retrieval date + checks passed for 100% of holdings, and you require written justification for 100% of overrides.
You monitor continuously: You set alerts for dividend announcements deviating >15% from vendor estimates and you review every flag in <48 hours. (Worked example specification)

Now you quantify outcomes instead of asserting them:

With 0 systematic verification, you expect 14.2% material errors and 0.8–1.2% annual return drag. (Ince & Porter, 2006; worked example outcomes)
With the full protocol, you target an 89% reduction in erroneous observations, reaching ~1.6% undetected errors and 0.1–0.15% drag. (Ince & Porter, 2006; worked example outcomes)
If you only do single-source verification, you may only cut error rate to 8.3% (a 42% reduction) and still eat 0.5–0.7% annual drag, plus compliance gaps. (Worked example outcomes)

Common Implementation Mistakes (And The Numeric Cost)

You rely on 1 source instead of 3, and you "discover" the error after allocation. If provider disagreement is 32.8% in the domain, your single-source workflow can be wrong about ~1 in 3 observations, and single-source reliance increases misallocation risk by 4.1x versus multi-source verification; the fix is minimum 3-source verification for any holding >1% weight and auto-flagging at >0.2% (price) and >0.5pp (fundamentals). (Elton et al., 2001; quantified rules)
You ignore data vintage, then trade on stale updates. If vendor propagation lags primary sources by 4.7 trading days on average, you inherit 0.34% annual phantom tracking error; the fix is to require primary-source access within 24 hours for time-sensitive events and enforce <24 hours data age for trading decisions. (Bhattacharya & Galpin, 2011; quantified rules)
You backtest without point-in-time integrity, then overstate performance. If 8.4% of recommendations are retroactively altered/deleted, you can overstate returns by 1.5–3.0% annually depending on strategy; the fix is point-in-time databases, explicit survivorship statements, and a 1.5% haircut when survivorship status is unknown. (Ljungqvist et al., 2009; quantified fix)

Implementation Checklist (Tiered By ROI)

Tier 1 (Highest ROI, Week 1):

Enforce 3-source verification for any datapoint affecting >1% allocation.
Set discrepancy flags at >0.2% (prices) and >0.5pp (dividend yield/P-E) and >2% (earnings estimates).
Require data age <24 hours (trading) and <7 days (analysis).
Add outlier review at z > 2.5, expecting ~1.2% flagged (~5–8 per 500).

Tier 2 (High ROI, Month 1):

Run quarterly verification on the top 20 holdings by weight and document results.
Discontinue any source with >5% error rate over a rolling 12-month window.
Require an operating track record of >5 years before onboarding a new source.

Tier 3 (Specialized ROI, Ongoing):

Flag manipulation candidates if >60% of points cluster within 0.1%.
Flag vendors revising >3% of historical observations annually.
For index-related data, require correlation >0.95 versus an independent benchmark series. (Quantified rules)

The Durable Lesson

The durable lesson: treat verification as a measured control system—with 3-source triangulation, 0.2% / 0.5pp discrepancy gates, z > 2.5 outlier review, and 24-hour / 7-day freshness rules—because history shows that unverified feeds can be wrong by 32.8% of rows, stale by 4.7 days, rewritten by 8.4%, and catastrophically divergent by 99.97% on the exact day you most need correctness. (Elton et al., 2001; Bhattacharya & Galpin, 2011; Ljungqvist et al., 2009; SEC/CFTC, 2010)

How to Vet Financial Data Sources