Backtesting Pricing Models Against Market Data
Backtesting Pricing Models Against Market Data
Backtesting replays historical market conditions through a pricing model to measure accuracy against realized outcomes. This validation technique identifies model deficiencies, quantifies hedge effectiveness, and provides evidence for regulatory review. Systematic backtesting requires clean data, appropriate granularity, well-defined KPIs, and clear remediation protocols.
Step 1: Data Sourcing and Cleaning
Data Requirements
Market data needed:
- Underlying prices (spot, futures)
- Option prices (or implied volatilities)
- Interest rates (term structure)
- Dividend information (actual payments, ex-dates)
- Corporate actions (splits, mergers)
Sourcing considerations:
| Source Type | Pros | Cons |
|---|---|---|
| Exchange tick data | Most accurate | Expensive, large storage |
| End-of-day vendors | Cost effective | Misses intraday dynamics |
| Internal trade data | Real execution prices | Limited universe |
| Synthetic reconstruction | Flexible | Potential errors |
Data Cleaning Protocol
Quality checks:
- Remove obviously erroneous quotes (negative prices, zero volume)
- Adjust for corporate actions (splits, dividends)
- Handle missing data (interpolation or exclusion)
- Align timestamps across data sources
- Verify currency and quote conventions
Survivorship bias prevention: Include delisted securities, expired options, and terminated contracts in the dataset. Using only currently listed instruments overstates historical model accuracy.
Lookahead bias prevention: Ensure each backtest timestep uses only information available at that point. Calibration on day T must use only data from day T or earlier.
Step 2: Replay Workflow and Granularity
Workflow Steps
- Initialize: Load model with parameters as of start date
- For each timestep:
- Feed market data snapshot
- Compute model prices and Greeks
- Compare to actual market prices
- Record hedging P/L if applicable
- Recalibrate if using rolling calibration
- Aggregate: Compile error statistics and KPIs
- Report: Generate validation summary
Granularity Selection
| Granularity | Use Case | Data Volume | Accuracy |
|---|---|---|---|
| Tick-by-tick | High-frequency trading | Very high | Highest |
| Minute bars | Intraday trading models | High | High |
| Hourly | Short-dated options | Moderate | Moderate |
| Daily close | Position management | Low | Sufficient for most |
| Weekly | Long-dated products | Very low | Coarse |
Recommendation: Daily granularity suffices for most pricing model validation. Use higher frequency only for models with intraday hedging requirements.
Replay Window Selection
| Product Type | Minimum Window | Recommended |
|---|---|---|
| Vanilla options | 1 year | 3 years |
| Exotic options | 2 years | 5 years |
| Rate products | 3 years | 5-10 years |
| Credit products | 5 years | Full credit cycle |
Include at least one stress period (2008-2009, 2020) in the replay window.
Step 3: KPI Selection and Thresholds
Core KPIs
Pricing accuracy:
| KPI | Definition | Threshold |
|---|---|---|
| Mean Error | Average (Model - Market) | < 0.2 vols |
| RMSE | Root mean square error | < 0.5 vols |
| Max Error | Largest single deviation | < 2.0 vols |
| Hit Rate | % within tolerance | > 95% |
Hedging performance:
| KPI | Definition | Threshold |
|---|---|---|
| P/L Error | Hedged P/L vs. expected | < 5% of premium |
| Hedge Slippage | Transaction cost impact | < 10 bps daily |
| Gamma P/L | Unexplained gamma P/L | < 20% of actual |
| Vega P/L | Unexplained vega P/L | < 15% of actual |
Pitfall Avoidance
Common errors:
| Pitfall | Description | Prevention |
|---|---|---|
| Survivorship bias | Only testing on survivors | Include delisted instruments |
| Lookahead bias | Using future information | Strict point-in-time data |
| Selection bias | Cherry-picking test periods | Use full available history |
| Overfitting to backtest | Tuning to historical data | Hold-out validation set |
Data snooping: If model parameters were adjusted after seeing backtest results, the validation is compromised. Document all parameter changes with timestamps.
Step 4: Interpretation and Remediation
Interpreting Results
Pass criteria: All core KPIs within thresholds across the full replay window and during stress periods.
Conditional pass: Minor threshold breaches (<10% above limit) in non-stress periods only. Document limitations and monitor closely.
Fail criteria:
- Any KPI > 2× threshold
- Systematic bias (consistent over/under-pricing)
- Failure during stress periods
Attribution Analysis
When KPIs fail, decompose errors:
| Error Source | Diagnostic | Remediation |
|---|---|---|
| Volatility calibration | Check smile fit | Adjust calibration weights |
| Delta hedge error | Compare actual vs. model delta | Review delta calculation |
| Gamma/convexity | Large moves show excess error | Increase hedge frequency |
| Rate sensitivity | Error correlates with rates | Check rate curve inputs |
| Dividend handling | Error around ex-dates | Verify dividend data |
Remediation Actions
For pricing errors:
- Identify root cause (data, calibration, or model)
- Adjust calibration if data issue
- Document model limitation if structural
- Request model enhancement if material
For hedge errors:
- Analyze transaction cost assumptions
- Review hedge frequency requirements
- Assess liquidity impact
- Consider adding hedge instruments
Escalation Path
Threshold breach severity:
| Level | Criteria | Action |
|---|---|---|
| Watch | Any KPI 75-100% of threshold | Increased monitoring |
| Amber | Any KPI 100-150% of threshold | Desk head notification |
| Red | Any KPI > 150% of threshold | Risk committee notification |
| Critical | Multiple Red or stress failure | Model suspension review |
Escalation timeline:
- Watch: Weekly review, document trend
- Amber: Remediation plan within 5 business days
- Red: Immediate notification; remediation within 15 days
- Critical: Same-day senior management briefing
Documentation requirements:
- Backtest run date and parameters
- Data sources and cleaning steps
- KPI results with threshold comparison
- Root cause analysis for breaches
- Remediation plan and timeline
- Sign-off from validation team
Backtest Report Template
Header:
- Model name and version
- Backtest period
- Run date
- Data sources
Summary statistics:
| KPI | Value | Threshold | Status |
|---|---|---|---|
| RMSE | 0.38 vols | < 0.5 vols | Pass |
| Max Error | 1.6 vols | < 2.0 vols | Pass |
| P/L Error | 3.2% | < 5% | Pass |
| Hedge Slippage | 7 bps | < 10 bps | Pass |
Period breakdown:
- Normal periods: [statistics]
- Stress periods: [statistics]
- By product type: [statistics]
Exceptions:
- [List any threshold breaches with explanation]
Conclusion: Model meets/does not meet validation criteria. [Recommendations]
Next Steps
For calibration procedures that generate parameters for backtesting, see Model Calibration and Validation.
For stress testing beyond historical replay, review Stress Testing Models for Extreme Moves.