Backtesting Pricing Models Against Market Data

Backtesting replays historical market conditions through a pricing model to measure accuracy against realized outcomes. This validation technique identifies model deficiencies, quantifies hedge effectiveness, and provides evidence for regulatory review. Systematic backtesting requires clean data, appropriate granularity, well-defined KPIs, and clear remediation protocols.

Step 1: Data Sourcing and Cleaning

Data Requirements

Market data needed:

Underlying prices (spot, futures)
Option prices (or implied volatilities)
Interest rates (term structure)
Dividend information (actual payments, ex-dates)
Corporate actions (splits, mergers)

Sourcing considerations:

Source Type	Pros	Cons
Exchange tick data	Most accurate	Expensive, large storage
End-of-day vendors	Cost effective	Misses intraday dynamics
Internal trade data	Real execution prices	Limited universe
Synthetic reconstruction	Flexible	Potential errors

Data Cleaning Protocol

Quality checks:

Remove obviously erroneous quotes (negative prices, zero volume)
Adjust for corporate actions (splits, dividends)
Handle missing data (interpolation or exclusion)
Align timestamps across data sources
Verify currency and quote conventions

Survivorship bias prevention: Include delisted securities, expired options, and terminated contracts in the dataset. Using only currently listed instruments overstates historical model accuracy.

Lookahead bias prevention: Ensure each backtest timestep uses only information available at that point. Calibration on day T must use only data from day T or earlier.

Step 2: Replay Workflow and Granularity

Workflow Steps

Initialize: Load model with parameters as of start date
For each timestep:
- Feed market data snapshot
- Compute model prices and Greeks
- Compare to actual market prices
- Record hedging P/L if applicable
- Recalibrate if using rolling calibration
Aggregate: Compile error statistics and KPIs
Report: Generate validation summary

Granularity Selection

Granularity	Use Case	Data Volume	Accuracy
Tick-by-tick	High-frequency trading	Very high	Highest
Minute bars	Intraday trading models	High	High
Hourly	Short-dated options	Moderate	Moderate
Daily close	Position management	Low	Sufficient for most
Weekly	Long-dated products	Very low	Coarse

Recommendation: Daily granularity suffices for most pricing model validation. Use higher frequency only for models with intraday hedging requirements.

Replay Window Selection

Product Type	Minimum Window	Recommended
Vanilla options	1 year	3 years
Exotic options	2 years	5 years
Rate products	3 years	5-10 years
Credit products	5 years	Full credit cycle

Include at least one stress period (2008-2009, 2020) in the replay window.

Step 3: KPI Selection and Thresholds

Core KPIs

Pricing accuracy:

KPI	Definition	Threshold
Mean Error	Average (Model - Market)	< 0.2 vols
RMSE	Root mean square error	< 0.5 vols
Max Error	Largest single deviation	< 2.0 vols
Hit Rate	% within tolerance	> 95%

Hedging performance:

KPI	Definition	Threshold
P/L Error	Hedged P/L vs. expected	< 5% of premium
Hedge Slippage	Transaction cost impact	< 10 bps daily
Gamma P/L	Unexplained gamma P/L	< 20% of actual
Vega P/L	Unexplained vega P/L	< 15% of actual

Pitfall Avoidance

Common errors:

Pitfall	Description	Prevention
Survivorship bias	Only testing on survivors	Include delisted instruments
Lookahead bias	Using future information	Strict point-in-time data
Selection bias	Cherry-picking test periods	Use full available history
Overfitting to backtest	Tuning to historical data	Hold-out validation set

Data snooping: If model parameters were adjusted after seeing backtest results, the validation is compromised. Document all parameter changes with timestamps.

Step 4: Interpretation and Remediation

Interpreting Results

Pass criteria: All core KPIs within thresholds across the full replay window and during stress periods.

Conditional pass: Minor threshold breaches (<10% above limit) in non-stress periods only. Document limitations and monitor closely.

Fail criteria:

Any KPI > 2× threshold
Systematic bias (consistent over/under-pricing)
Failure during stress periods

Attribution Analysis

When KPIs fail, decompose errors:

Error Source	Diagnostic	Remediation
Volatility calibration	Check smile fit	Adjust calibration weights
Delta hedge error	Compare actual vs. model delta	Review delta calculation
Gamma/convexity	Large moves show excess error	Increase hedge frequency
Rate sensitivity	Error correlates with rates	Check rate curve inputs
Dividend handling	Error around ex-dates	Verify dividend data

Remediation Actions

For pricing errors:

Identify root cause (data, calibration, or model)
Adjust calibration if data issue
Document model limitation if structural
Request model enhancement if material

For hedge errors:

Analyze transaction cost assumptions
Review hedge frequency requirements
Assess liquidity impact
Consider adding hedge instruments

Escalation Path

Threshold breach severity:

Level	Criteria	Action
Watch	Any KPI 75-100% of threshold	Increased monitoring
Amber	Any KPI 100-150% of threshold	Desk head notification
Red	Any KPI > 150% of threshold	Risk committee notification
Critical	Multiple Red or stress failure	Model suspension review

Escalation timeline:

Watch: Weekly review, document trend
Amber: Remediation plan within 5 business days
Red: Immediate notification; remediation within 15 days
Critical: Same-day senior management briefing

Documentation requirements:

Backtest run date and parameters
Data sources and cleaning steps
KPI results with threshold comparison
Root cause analysis for breaches
Remediation plan and timeline
Sign-off from validation team

Backtest Report Template

Header:

Model name and version
Backtest period
Run date
Data sources

Summary statistics:

KPI	Value	Threshold	Status
RMSE	0.38 vols	< 0.5 vols	Pass
Max Error	1.6 vols	< 2.0 vols	Pass
P/L Error	3.2%	< 5%	Pass
Hedge Slippage	7 bps	< 10 bps	Pass

Period breakdown:

Normal periods: [statistics]
Stress periods: [statistics]
By product type: [statistics]

Exceptions:

[List any threshold breaches with explanation]

Conclusion: Model meets/does not meet validation criteria. [Recommendations]

Next Steps

For calibration procedures that generate parameters for backtesting, see Model Calibration and Validation.

For stress testing beyond historical replay, review Stress Testing Models for Extreme Moves.

Backtesting Pricing Models Against Market Data

Backtesting Pricing Models Against Market Data

Step 1: Data Sourcing and Cleaning

Data Requirements

Data Cleaning Protocol

Step 2: Replay Workflow and Granularity

Workflow Steps

Granularity Selection

Replay Window Selection

Step 3: KPI Selection and Thresholds

Core KPIs

Pitfall Avoidance

Step 4: Interpretation and Remediation

Interpreting Results

Attribution Analysis

Remediation Actions

Escalation Path

Backtest Report Template

Next Steps

Related Articles

Volatility Term Structure Modeling

Stress Testing Models for Extreme Moves

Smile and Skew Interpretation