How to Vet Financial Data Sources

Equicurious Teamintermediate2025-08-03Updated: 2026-03-21
Illustration for: How to Vet Financial Data Sources. Financial data source verification prevents costly errors. Learn the 3-metric sy...

Most investors never question where their numbers come from—and it costs them. You pull a P/E ratio from a free finance site, plug it into a spreadsheet, and make a buy decision based on data that may not match what the company actually reported. A 2023 study published in the Journal of Accounting and Economics found that over 90% of variables in major databases like Compustat and FactSet contain discrepancies from as-filed SEC data—discrepancies large enough to alter research conclusions in nearly a third of tested anomalies (Du, Huddart, and Jiang, 2023). The counter-move isn't paranoia. It's a repeatable process for checking your data before it checks your portfolio.

TL;DR: Financial data passes through multiple hands before reaching your screen. Each intermediary can introduce errors. A simple cross-referencing habit—primary source first, two independent confirmations, staleness checks—catches most problems before they affect your decisions.

Primary vs. Secondary Sources (Know the Difference)

A primary source is the original filing or document produced by the reporting entity—a 10-K filed directly with the SEC via EDGAR, an earnings release issued by the company, a central bank publication. There is no intermediary standardization or interpretation layer between you and the numbers.

A secondary source is a data provider that aggregates, standardizes, or interprets those primary filings. Morningstar, Yahoo Finance, Bloomberg, and Google Finance all fall into this category. They take unique financial statements and fit them into pre-created templates. That mapping process is where errors creep in.

The point is: secondary sources trade accuracy for convenience. They let you compare Company A to Company B in a standardized format, but the standardization itself introduces risk. When a company reports a non-standard line item (say, a one-time litigation settlement netted against revenue), the provider must decide where it goes. Different providers make different choices.

Primary source → Provider standardization → Template mapping → Your screen

Each arrow is a potential error point. The Du et al. study found these weren't edge cases—they were the norm. Over 90% of variables showed discrepancies. In 6 of 21 accounting anomalies tested, the discrepancies were large enough to flip the research conclusion entirely.

Why Free Data Aggregators Are Especially Risky

On July 3, 2017, Yahoo Finance and Google Finance displayed wildly incorrect prices for major tech stocks including Apple, Amazon, and Microsoft. The cause: a Nasdaq test data feed (the UTP feed) pushed erroneous prices, and free aggregators relayed the data without independent validation. No human checked. No cross-reference fired. The numbers simply appeared on millions of screens.

This wasn't a one-off. Free aggregators operate on thin margins with minimal editorial oversight. They prioritize speed and breadth over accuracy. What the data confirms: free data sites should never be your sole source for investment decisions. They're useful for quick scans and watchlist monitoring (that's fine), but any number you plan to act on needs verification against a primary filing.

Why this matters: if you're calculating a valuation multiple—say a trailing P/E ratio—and the earnings figure on your free site includes or excludes a one-time charge differently than the company reported it, your multiple is wrong. The S&P 500 historical average trailing P/E sits around 15–17x. A data error that shifts reported earnings by even 5–10% can move a stock from "fairly valued" to "buy" territory (or vice versa) in your analysis.

A Worked Example: Cross-Referencing ROIC Data

Suppose you're evaluating a mid-cap industrial company. You pull its return on invested capital (ROIC) from three sources:

SourceReported ROICType
Yahoo Finance14.2%Secondary (free aggregator)
Morningstar12.8%Secondary (paid/analyst-curated)
Company 10-K (SEC EDGAR)11.9%Primary

The spread here is 2.3 percentage points between the highest and lowest figures. The median ROIC for S&P 500 companies runs approximately 10–12%, so the difference between 11.9% and 14.2% is the difference between "roughly average" and "meaningfully above average."

Phase 1: The Setup. You see 14.2% on Yahoo Finance and think the company earns well above its cost of capital. You're inclined to buy.

Phase 2: The Trigger. You check Morningstar—12.8%. Lower, but still decent. Then you pull the 10-K from EDGAR and calculate ROIC yourself using net operating profit after tax divided by invested capital. You get 11.9%.

Phase 3: The Outcome. The company's ROIC is average, not exceptional. The free aggregator inflated the figure by treating a one-time asset sale as part of operating income (a common standardization choice). The paid source got closer but still deviated.

The practical point: The 10-K is the ground truth. Everything else is an interpretation of it. If you had relied on Yahoo Finance alone, you'd have overestimated this company's economic moat.

Mechanical alternative: For any metric you're using in a buy/sell decision, pull the 10-K from EDGAR (free, available at sec.gov/search-filings, covering over 21 million filings from more than 150,000 entities since 1993) and calculate the number yourself. It takes 15 minutes. That's cheap insurance.

The Restatement Problem (Your Data Might Change After You Use It)

Financial statements aren't always final. A Big R restatement means a company's previously issued financials cannot be relied upon—they must be re-filed with the SEC. This has occurred at a rate of approximately 3% of public companies per year from 2005 to 2024. In 2024, Big R restatements hit a 9-year high, up 7% year-over-year (Audit Analytics).

A significant driver: the SEC barred audit firm BF Borgers and its owner from practicing before the Commission in 2023–2024. Dozens of former clients required re-audits by new firms. The most common error types were expense recognition, revenue recognition, and debt/equity misclassification.

A little r restatement (a revision correcting an immaterial error) is far more common and doesn't require re-filing, but it still means the numbers you downloaded last quarter may no longer be accurate.

The point is: data has a shelf life. Financial data older than 45 days after quarter-end (the SEC 10-Q filing deadline for large accelerated filers) should be treated as potentially outdated. And if a company has had any Big R restatement in the prior 3 years, that's a red flag requiring deeper diligence—not necessarily disqualifying, but demanding more scrutiny.

Restatement → Stale data in your model → Incorrect valuation → Bad decision

The SEC enforces this aggressively. In fiscal year 2024, the Commission issued $8.2 billion in total financial remedies (a record), including $6.1 billion in disgorgement and $2.1 billion in civil penalties. They issued 124 officer and director bars—the second-highest in a decade.

Survivorship Bias and Look-Ahead Bias (Hidden Data Distortions)

Two biases silently corrupt financial databases without leaving obvious traces.

Survivorship bias distorts historical fund and index data by excluding entities that ceased to exist—funds that merged, liquidated, or delisted. The impact: average mutual fund returns inflated by 1–2 percentage points per year. If you're comparing a fund's track record against a historical average, and that average only includes survivors, you're grading on a curve that doesn't exist.

Look-ahead bias means using data that wouldn't have been available at the point in time you're studying. The classic example: backtesting a strategy using restated earnings rather than originally reported figures. Databases often silently overwrite historical values with corrected ones (that's the whole point of restatements), so your "1998 earnings" figure may actually be a 2001 correction.

The test: Can you confirm that the historical data in your model reflects what was actually known at each point in time? If you can't, your backtest results are unreliable.

Professional Standards That Signal Trustworthy Data

Two frameworks help you evaluate whether a data source meets professional-grade standards.

CFA Institute Standard V(A)—Diligence and Reasonable Basis requires analysts to make reasonable inquiries into the sources and accuracy of all data used in investment analysis. It mandates measurable criteria for assessing outside data providers. If your data provider can't articulate their methodology and error-correction process, that's a problem.

GIPS (Global Investment Performance Standards) are adopted by over 1,600 organizations across 50 markets worldwide. All top 25 global asset managers claim GIPS compliance. GIPS requires third-party verification of performance data on a firm-wide basis. When evaluating a fund manager's track record, GIPS compliance is a baseline credibility signal (not a guarantee, but its absence is a warning).

Morningstar, as a secondary source, employs more than 100 analysts covering over 3,700 unique funds globally. Their star ratings use risk-adjusted returns over 3-, 5-, and 10-year periods. Their forward-looking Medalist ratings evaluate funds over a full market cycle of at least 5 years. This level of methodology transparency is what you should expect from any source you rely on.

Detection Signals (You're Probably Using Bad Data If...)

You're likely relying on unvetted data if:

  • You've never opened a 10-K on EDGAR for a stock you own
  • You use a single free website as your only data source for buy/sell decisions
  • You've compared fund returns without checking whether the database adjusts for survivorship bias
  • You can't name the audit firm for your largest holding (or whether it received a clean opinion)
  • You've backtested a strategy without confirming whether the data uses originally reported or restated figures

An unqualified (clean) audit opinion from a PCAOB-registered firm is the minimum bar. A going-concern qualification signals material doubt about a company's ability to continue operating—and demands immediate further investigation, not just a footnote glance.

The Materiality Question (How Wrong Is Too Wrong?)

Not every data discrepancy matters. The SEC uses a materiality threshold of typically 5% of pre-tax income (per Staff Accounting Bulletin No. 99) as a quantitative benchmark, combined with qualitative factors, to assess whether a misstatement would influence a reasonable investor's judgment.

Apply this to your own process. If two sources disagree on a company's operating margin by 0.3 percentage points, that's noise. If they disagree by 2 full points on a company with thin margins, that's material—and you need the primary source to resolve it.

Vetting Checklist (Tiered by ROI)

Essential (high ROI)—prevents 80% of data errors:

  • Verify against the primary source. Pull the relevant 10-K or 10-Q from EDGAR before acting on any secondary-source number
  • Cross-reference at least 2 independent sources. If all three agree within a reasonable tolerance, proceed. If they diverge, the primary filing wins
  • Check the audit opinion. Only rely on financials accompanied by an unqualified opinion from a PCAOB-registered firm
  • Confirm data freshness. Treat financial data older than 45 days post-quarter-end as potentially stale

High-impact (workflow integration):

  • Screen for restatements. Any Big R restatement in the prior 3 years triggers enhanced diligence
  • Adjust for survivorship bias. When evaluating historical fund performance, subtract 1–2 percentage points from reported averages
  • Verify GIPS compliance for any fund manager whose track record influences your decision
  • Use XBRL data (available on EDGAR since 2009) for machine-readable cross-company comparisons rather than manually transcribing from PDFs

Optional (good for active stock-pickers):

  • Build a personal data reconciliation spreadsheet comparing 3+ sources for your key metrics
  • Monitor PCAOB restatement data quarterly to flag audit-quality trends in sectors you're invested in
  • Subscribe to SEC EDGAR alerts for companies in your portfolio (APIs update in under 1 second)

Your Next Step (Do This Today)

Pick your largest holding. Go to sec.gov/search-filings, pull its most recent 10-K, and look up three numbers: revenue, net income, and total debt. Compare those figures against whatever source you normally use. Note any discrepancies and which direction they go.

If the numbers match within 1–2%, your source is reasonably reliable for that company. If they don't, you've just found a data quality problem worth fixing—before it compounds into a portfolio quality problem.

For a related framework on evaluating the people and publications interpreting this data, see Checklist for Evaluating Investment Newsletters. For building your own calculations from primary data, see Building a Simple Valuation Model in a Spreadsheet.

Related Articles