When you review the performance of a trading model, how do you know it’s worth trading for? How do you know it’s the right system for you? How confident are you that it will continue to profit in the future? When it comes to evaluating your trading model there are many factors to take into account. Some of them are obvious such as Net Profit and Risk-Per-Trade. Others may be a bit more unfamiliar such as Sharpe ratio or Profit Factor. This article is going to be the first article in several where I highlight a method or idea that can help you gauge the quality of a given trading model. In this article I would like to highlight a statistical-based metric that can be used to help indicate the likelihood that a given system will continue to generate profits in the future.

Many people simply look at the net profit of a trading model assuming a system with more profit must be the better system. This is often far from a good idea. More profit may also mean more risk, deeper drawdown, or other compromises to achieve those higher results. When testing trading models during the development process or reviewing a commercially available system before making a purchase, it is advisable to have a few metrics on hand that will allow you to make a wise choice. There is no one single score that will give you the definitive answer. Furthermore, everyone has unique risk tolerances and expectations on what is considered tradable. Yet, we can make smarter choices than simply looking at net profit. Here is one method you should be aware of.

### Confidence Interval

It’s easy to find a trading model that has a positive average profit of $100 and then conclude it could be profitable into the future. But is there a metric we can use to help us predict what might happen into the future? A complicated approach would be to use the Monte Carlo method, but not everyone has access to this however, we all have access to a simple calculator. By visiting a topic in statistics called Confidence Intervals (CI) we can obtain a hint at what’s possible and perhaps find weaknesses in our seemingly profitable trading model.

The average net profit of a trading model is simply the historical P&L for each trade over a given time period. Let’s imagine a trading model that has produced 60 trades. Some of the trades are winners and some are losers. We add the total P&L together for each individual trade and divide it by the number of trades – 60 in this case – and we get $100. Clearly this is well above zero so in the long run this system appears profitable.

However, we also know that individual trades can be very different from our average profit per trade. Some trades produce much larger winning trades while others produce smaller winning trades. Still, other trades produce a range of losing trades. If we graph each trade’s P&L and then draw a line representing our average P&L we would see each individual trade falls around our mean value of $100. In other words, the P&L for any trade will vary around this mean value. We can measure this variation and use it to estimate the likelihood the system might remain profitable.

Statistically speaking, a trading model that exhibits a large standard deviation of profit per trade will have an increased chance of failing in the future. This is true even if the average mean is currently profitable. But what makes a standard deviation too large? This is explained below when we attempt to use our confidence interval to estimate a likely range of average P&L values into the future.

What we wish to do with our confidence interval is estimate with 95% confidence, if our system will probably produce a negative average P&L into the future. In other words, is it likely our seemingly profitable trading model is based upon chance? We can estimate this with our CI formula.

CI = t * SD / squareroot( N )

CI = Confidence Interval

t = T-score (we estimate value to be 2 and the reasoning behind this is beyond this article)

SD = P&L Standard Deviation for all trades

N = number of trades in our sample

With our imaginary trading model we have a $100 average net profit and 60 trades in our sample. Please note that in order for this method to work, you must have a minimum of 60 trades in your sample. Let’s also state the standard deviation for all trades is $400. With this information we can compute our 95% confidence interval.

CI = 2 * $400 / squareroot( 60 )

CI = $800 / 7.746

CI = $103.28

For simplicity let’s round the confidence interval to the nearest dollar which is $103. What do we do with this value? We create a range or band around our average net profit value of $100 by both adding and subtracting the confidence interval value.

upper band = Average Net Profit + CI = $203

lower band = Average Net profit - CI = -$3

We have now created a range of -$3 to $203 for our average net profit. What does this mean? Based on our calculation we have estimated with 95% confidence that our trading model’s average net profit could be as low as -$3 or as high as $203. The important number is the lower band because this represents a worse case situation. In our example, we have a negative value which indicates a losing model. Or at least, a potential losing scenario.

In short, our hypothetical trading model’s average net profit could be based upon chance and in the future could produce a negative P&L. Suddenly what seems like a solid system appears more shaky. Does this mean our trading model should be abandoned? Not necessarily.

In the case of the confidence interval there are two critical factors at play. Those values are the number of trades (N) and the value of the standard deviation between trades. Modification of the standard deviation can be achieved by altering the trading model logic. Modifying stops, targets, and other trading rules will change the standard deviation value. The goal would be to tighten the variation of each trade to reduce the standard deviation. This in turn, would create a smaller confidence interval. However, if you don’t want to modify the system or if you are unable to modify the system there is another way.

Our example system was based on 60 trades. This is really not a lot of trades. Let’s say we find more data to test our system and we get up to 100 trades. Let’s also pretend all the other performance factors stay the same. If we recalculate our confidence interval value we now get a value of $80. This gives us a range of $20 and $180. In this case, we have a system which produces positive value for the lower band. So maybe before we make a judgment on a system that appears borderline we should collect more trades first.

I should also point out that our imaginary trading model we are looking at has $30 deduction for each trade to account for slippage and commissions. So this negative effect is already factored into our confidence interval calculation. If we did not take into account slippage and commissions during our back-testing we would have to deduct this from our final range which would give us -$10 and $150. The impact of commissions and slippage just puts us back into negative territory gain. But we have them accounted for in our back-tested results.

As you can see having enough data points (trades) can have a significant impact on the confidence interval calculation. For system trading there are many reasons for having a large number of trades. Of course continuing to add more and more trades is not going to turn a losing system into a good system. The point here is sometimes you need to have more data before making an informed decision. If you have what you believe is a good system, yet you only have a few data points, the confidence interval calculation may be warning you to get more trades in the test sample.

### What Confidence Interval Does Not Tell Us and How To Use It

Our confidence interval calculation makes some assumptions about the data. That is, the data points (trades) have a normal distribution. This of course is not necessarily correct. In many cases our trade distribution looks somewhat normal with fat tails in our distribution. Thus, we must take our confidence interval results with a grain of salt.

A bigger issue, I think is the confidence interval calculation does not indicate if this system has been curve fitted. If we have a killer system with 1,000 trades with a confidence interval range of $100 – $200 that’s great. However, it’s pointless if the system is curve fitted to the historical data and there is no way our confidence interval calculation can tell us. So, what do we do?

Ideally you would validate your new trading model on out-of-sample data to help detect if your model is over-fit to the historical data. Furthermore, you would also perform the confidence interval calculation on the out-of-sample data of your trading model. By doing this, you’ll have more meaningful results. This step will help reduce the possibility of trading an over-fit trading model and give more meaning to your confidence interval test.

But even if we have a solid system that is not curve fitted to the historical data, our confidence interval calculations are no guarantee of success in the future. The markets are dynamic and changing and it’s possible that the distribution of trades will change thus altering our average trade and standard deviation. In the end, even if our system looks great on paper, validates on the out-of-sample data, and our confidence interval looks fantastic, our trading model could fail as soon as we trade it live. If this is the case, what is the point of all these testing and is it worth doing? The short answer is, yes.

In trading there is no guarantee for future results – ever. The point of testing a system is not to prove how much money it will make in the future. The goal is to find reasons why not to trade it. The purpose of any validating test is to find weaknesses so we can address those weaknesses now before we have money on the line. Our job as professional system traders is to manage risk which means eliminating risky actions. There is no certainty. This is one of the reasons this field is so psychologically difficult.

As a final point, by using confidence interval we have another tool to find weaknesses and ultimately gives us more confidence that a particular trading model will likely bring us success into the future.

Good article but your CI calculation assumes a normal distribution of returns which is not always the case.

Bob

This is true. Nor are future samples guaranteed to be identical in distribution.

Also, try to give credit where is due. You know what I mean… Most of the ideas you express here you have read somewhere. It’s good to give credit to those who deserve it, those who have done the hard work much before you.

Bob

Of course. I did not created CI or was I the first to apply it to trading. In fact, using CI is a very common technique that I’ve read about in several books and/or web sites. What I’m showing here what is CI and how I personally utilize it.

Killer article Jeff! I really love the way you think and view systems. Many, Many Thanks!

You’re welcome Mike. Glad you found it helpful.

Hi Jeff…

Very nice article…

Quick Question:

In the formula for CI there is “T-score” value in it.

How does one compute it? or /why did you use 2 in your example?

Could you pin-point me to a simple right direction regarding T-scores computation.

Thanks.

Formula: “t = sqrt(number of trades) * avg trade / std deviation of avg trade”

For the T-score we like to see a value above 1.6. For the example above I round it up to 2.0. The higher the value the more statistically significant the results are. The value of 2.0 used raises the bar to a very high level.

This is a worthwhile approach, for both the testing of a strategy and also for after it has been trading for quite a while. Comparing it to other systems over the same period of time may also be quite revealing.

I do not think that the tests will give all of the answers until there is quite a history.

Thanks for this.

Nice article Jeff. Lots of people want to develop a metric that “predicts” future performance. I’ve looked at tons of metrics, and nothing (yet) stands out as being predictive. Plus, I have a tough “gold standard” test – can the potential predictor metric identify an overoptimized, curve fit model, where all the performance metrics look super? We know that kind of strategy will perform poorly in the future. But no performance metric will say that.