As you know, curve fitting is a danger that all system developers must constantly be aware of.
Curve fitting destroys your trading system and can give you false hope that your system will work on the live market, which results in you losing your hard-earned money.
Curve fitting often hits novice system developers without them even knowing about it. This can be particularly confusing and disheartening as you spent so much time on the strategy only to have it fail on the out-of-sample or new live data. But curve fitting can creep into the development process of professional developers. So, it's your job to be aware of curve fitting and how to avoid it. You must be vigilant about it.
This article will focus on my best tips to avoid overfitting while building your trading system. You can perform steps after your strategy is complete to help determine if it is overfit. However, we'll cover those steps later.
Curve Fitting vs. Overfitting
Frist let's clear up some terms.
When people in the trading world talk about curve fitting, they are referring to the negative behavior of a trading system when moving to the out-of-sample data segment. This often results in a failing out-of-sample performance.
The term curve fitting is not the correct term. A better word is overfitting. I may mix curve fitting with overfitting in my work, which is a bit sloppy. However, because so many associate curve fitting with overfitting, I may continue to use these terms interchangeably.
Curve fitting from a mathematical standpoint is attempting to fit a line or curve through a series of data points. That's not what we are doing with a trading system. When it comes to trading, we are attempting to optimize a trading system's performance. Often this is a crucial performance metric, such as net profit vs. drawdown. We do this by trying to introduce filters and optimize various parameters.
As traders, we depend upon finding market edges we can exploit. All successful trading relies on recurring patterns found in the historical market data and using those patterns to make a profit. This is true for discretionary traders as well as system traders.
However, much of the historical market movement is noise or random action. So, the problem arises when we over-optimize our system to the historical data, and our system begins to key into patterns within the noise instead of exploiting a true market edge. Finding a pattern in the noise can produce great-looking equity curves on historical data, but the equity curve can break down quickly on out-of-sample data.
Put another way. We want a trading system that exploits a true market edge and avoids keying-off patterns in the market noise. The idea of separating a true market edge from the noise is what this game is all about.
Overfitting occurs when a model is excessively complex and has too many parameters. If you find yourself adding filter upon filter, you may be overfitting. By having too many filters or parameters, you'll likely create an overfitted model that is not generalizable to new data. Put another way. You're likely building a trading system that can only trade on the in-sample!
Your job as a strategy trader is not to make a perfect-looking equity curve. That's easy to do. Your job is to construct a strategy that can generalize to new data. Build a strategy that is likely to work on the out-of-sample. This means following some critical steps to avoid overfitting. Some of these steps will feel unnatural or counterintuitive. Such is life. Often the reality of something is counterintuitive. That's why so many people fail at trading.
How to Prevent Overfitting When Building Your Strategy
Let's cover what you should do during your strategy development to avoid overfitting.
Have Enough In-Sample Data
Have enough in-sample data to generate hundreds of trades. Maybe 400+ trades. Your historical data should span different market regimes (bull/bear). Don't have your in-sample data segment only span a bull market. Picking the ES dates between 2009-2019 shows a clear bull market. You want your in-sample data to span different regimes, so your trading system is exposed to these various regimes.
Limit Degrees of Freedom
Degrees of freedom is a fancy word for the number of pieces of information in the data. It's determined by the number of parameters in the model relative to the number of observations. We're not going to get into the details of degrees of freedom, and we'll keep this very practical. We generally want to keep the degrees of freedom to the lowest possible.
As strategy traders, we can count the degrees of freedom by counting the optimized parameters used by our strategy. The more degrees of freedom in the data set, the more likely it is to overfit. Adding more filters and parameters to optimize allows your strategy to train on the noise of the data. These types of patterns will never predictably repeat.
Don't Attempt to Build The Perfect-Looking Equity Curve
Don't build the perfect-looking equity curve, and that's what average traders do. Average traders believe that making great-looking equity curves is a good strategy and what good strategy builders do.
You're mistaken if you're under the impression that adding several filters or optimizing many parameters will improve your strategy. Many new traders desire to build a very complex trading system, but complexity is not what we're aiming for.
You can see we're aiming for simplicity. A more straightforward system is often better able to generalize to new data. It can pick out a market edge and avoid the noise.
Don't Spend Too Much Time On One Systems
This technique is called Early Stopping. This is another fancy term used in machine learning. The problem is that a trading system can train for too long and overfit. What does this mean for you and me as system traders? Don't spend too much time tweaking a strategy.
A good strategy will work right away. You're probably overfitting if you spend many hours daily on a single strategy. If it's not working right away, move on.
Kevin Davey stated, don't torture your data! I agree.
Summary:
When building trading systems, we want to develop simple strategies to generalize the out-of-sample data better. We want to build systems that generate signals from market edges, not market noise.
Your historical price data:
- Have enough in-sample data to generate hundreds of trades for your key idea 400+
- Your historical data should span different market regimes (bull/bear)
- I like to use from 2007-2020 as my in-sample
When building a trading system, keep these points in mind:
- Try to keep your system as simple as possible
- Don't attempt to build the perfect-looking equity curve
- Limit 1-3 filters (rules).
- Key Idea + Rule 1 + Rules 2 + Rule 3
- Limit 1-3 parameters for optimization
Following these steps, you will likely significantly reduce the chances of overfitting your strategy, which means it will probably work moving forward!
You have neatly covered some of the most important considerations for how traders should avoid overfitting their systems – nicely done Jeff!
I would also add that the way we do our optimization and which parameter values we pick also has a huge impact on whether our system ends up overfit… too many traders just pick the ‘best’ values and assume it will continue to work the same in the future.
Hello Adrian. This is a good point that I did not bring up. I like to pick a value within a “stable range.” Values around the default value should not be radically different. Often this is not the “best” value.