Backtesting – Helping you Master EasyLanguage

9 Mistakes Quants Make that Cause Backtests to Lie

EasyLanguage Mastery Contributor — Mon, 02 Nov 2020 11:00:32 +0000

"I’ve never seen a bad backtest” -- Dimitris Melas, head of research at MSCI.

A backtest is a simulation of a trading strategy used to evaluate how effective the strategy might have been if it were traded historically. Backtesting is used by hedge funds and other researchers to test strategies before real capital is applied. Backtests testing are valuable because they enable quants to quickly test and reject trading strategy ideas.

All too often strategies look great in simulation but fail to live up to their promise in live trading. There are a number of reasons for these failures, some of which are beyond the control of a quant developer. But other failures are caused by common, insidious mistakes.

An over optimistic backtest can cause a lot of pain. I’d like to help you avoid that pain by sharing 9 of the most common pitfalls in trading strategy development and testing that can result in overly optimistic backtests:

1. In-sample backtesting

Many strategies require refinement, or model training of some sort. As one example, a regression-based model that seeks to predict future prices might use recent data to build the model. It is perfectly fine to build a model in that manner, but it is not OK to test the model over that same time period. Such models are doomed to succeed.

Don’t trust them.

Solution: Best practices are to build procedures to prevent testing over the same data you train over. As a simple example you might use data from 2007 to train your model, but test over 2008-forward.

By the way, even though it could be called “out-of-sample” testing it is not a good practice to train over later data, say 2014, then test over earlier data, say 2008-2013. This may permit various forms of lookahead bias.

2. Using survivor-biased data

Suppose I told you I have created a fantastic new blood pressure medicine, and that I had tested it using the following protocol:

a. Randomly select 500 subjects
b. Administer my drug to them every day for 5 years
c. Measure their blood pressure each day

At the beginning of the study the average blood pressure of the participants was 160/110, at the end of the study the average BP was 120/80 (significantly lower and better).

Those look like great results, no? What if I told you that 58 of the subjects died during the study? Maybe it was the ones with the high blood pressure that died! This is clearly not an accurate study because it focused on the statistics of survivors at the end of the study.

This same sort of bias is present in backtests that use later lists of stocks (perhaps members of the S&P 500) as the basis for historical evaluations over earlier periods. A common example is to use the current S&P 500 as the universe of stocks for testing a strategy.

Why is this bad? See the two figures below for illustrative examples.

Figure: The green lines show historical performance of stocks that were members of the S&P 500 in 2012. Note that all of these stocks came out of the 2008/2009 downturn very nicely.

Figure: What really happened: If, instead we use the members of the S&P 500 starting in 2008, we find that more than 10% of the listed companies failed.

In our work at Lucena Research, we see an annual 3% to 5% performance “improvement” with strategies using survivor-biased data.

Solution: Find datasets that include historical members of indices, then use those lists to sample from for your strategies.

3. Observing the close & other forms of lookahead bias

In this failure mode, the quant assumes he can observe market closing prices in order to compute an indicator, and then also trade at the close. As an example, one might use closing price/volume to calculate a technical factor used in the strategy, then trade based on that information.

This is a specific example of lookahead bias in which the strategy is allowed to peek a little bit into the future. In my work I have seen time and again that even a slight lookahead bias can provide fantastic (and false) returns.

Other examples of lookahead bias have to do with incorrect registration of data such as earnings reports or news. Assuming for instance that one can trade on the same day earnings are announced even though earnings are usually announced after the close.

Solution: Don’t trade until the open of the next day after information becomes available.

4. Ignoring market impact

The very act of trading affects price. Historical pricing data does not include your trades and is therefore not an accurate representation of the price you would get if you were trading.

Consider the chart below that describes the performance of a real strategy I helped develop. Consider the region A, the first part of the upwardly sloping orange line. This region was the performance of our backtest. The strategy had a Sharpe Ratio over 7.0! Based on the information we had up until that time (the end of A), it looked great so we started trading it.

When we began live trading we saw the real performance illustrated with the green “live” line in region B– essentially flat. The strategy was not working, so we halted trading it after a few weeks. After we stopped trading it, the strategy started performing well again in paper trading (Region C, Arg!).

How can this be? We thought perhaps that the error was in our predictive model, so we backtested again over the “live” area and the backtest showed that same flat area. The only difference between the nice 7.0 Sharpe Ratio sections and the flat section was that we were engaged in the market in the flat region.

What was going on? The answer, very simply, is that by participating in the market we were changing the prices to our disadvantage. We were not modeling market impact in our market simulation. Once we added that feature more accurately, our backtest appropriately showed a flat, no-return result for region A. If we had had that in the first place we probably would never have traded the strategy.

Solution: Be sure to anticipate that price will move against you at every trade. For trades that are a small part of overall volume, a rule of thumb is about 5 bps for S&P 500 stocks and up to 50 bps for more thinly traded stocks. It depends of course on how much of the market your strategy is seeking to trade.

5. Buy $10M of a $1M company

Naïve backtesters will allow a strategy to buy or sell as much of an asset as it likes. This may provide a misleadingly optimistic backtest because large allocations to small companies are allowed.

There often is real alpha in thinly traded stocks, and data mining approaches are likely to find it. Consider for a moment why it seems there is alpha there. The reason is that the big hedge funds aren’t playing there because they can’t execute their strategy with illiquid assets. There are perhaps scraps of alpha to be collected by the little guy, but check to be sure you’re not assuming you can buy $10M of a $1M company.

Solution: Have your backtester limit the strategy’s trading to a percentage of the daily dollar volume of the equity. Another alternative is to filter potential assets to a minimum daily dollar volume.

6. Overfit the model

An overfit model is one that models in-sample data very well. It predicts the data so well that it is likely modeling noise rather than the underlying principle or relationship in the data that you are hoping it will discover.

Here’s a more formal definition of overfitting: As the degrees of freedom of the model increase, overfitting occurs when in-sample prediction error decreases and out-of-sample prediction error increases.

What do we mean by “degrees of freedom?” Degrees of freedom can take many forms, depending on the type of model being created: Number of factors used, number of parameters in a parameterized model and so on.

Solution: Don’t repeatedly “tweak” and “refine” your model using in-sample data. And always compare in-sample error versus out-of-sample error.

7. Trust complex models

Complex models are often overfit models. Simple approaches that arise from a basic idea that makes intuitive sense lead to the best models. A strategy built from a handful of factors combined with simple rules is more likely to be robust and less sensitive to overfitting than a complex model with lots of factors.

Solution: Limit the number of factors considered by a model, use simple logic in combining them.

8. Trusting stateful strategy luck

A stateful strategy is one whose holdings over time depend on which day in history it was started. As an example, if the strategy rapidly accrues assets, it may be quickly fully invested and therefore miss later buying opportunities. If the strategy had started one day later, it’s holdings might be completely different.

Sometimes such strategies’ success vary widely if they are started on a different day. I’ve seen, for instance, a difference in 50% return for the same strategy started on two days in the same week.

Solution: If your strategy is stateful, be sure to test it starting on many difference days. Evaluate the variance of the results across those days. If it is large you should be concerned.

9. Data mining fallacy

Even if you avoid all of the pitfalls listed above, if you generate and test enough strategies you’ll eventually find one that works very well in a backtest. However, the quality of the strategy cannot be distinguished from a lucky random stock picker.

How can this pitfall be avoided? It can’t be avoided. However, you can and should forward test before committing significant capital.

Solution: Forward test (paper trade) a strategy before committing capital.

Summary

It is best to view backtesting as a method for rejecting strategies, than as a method for validating strategies. One thing is for sure: If it doesn’t work in a backtest, it won’t work in real life. The converse is not true: Just because it works in a backtest does not mean you can expect it to work in live trading.

However, if you avoid the pitfalls listed above, your backtests stand a better chance of more accurately representing real life performance.

-- Tucker Balch from Augmented Trader

Measuring Success: Key Performance Metrics

Jeff Swanson — Mon, 21 Sep 2020 10:00:16 +0000

When you see the performance of a trading system, how do you know it's okay? How do you know it's the right system for you?

Many people look at the net profit assuming the more profit system must be the better system. This is often far from a good idea. When comparing trading systems during the development process or when comparing systems before making a purchase, it is nice to have a few metrics on hand that will allow you to compare the system either to a hypothetical benchmark or against another system.

There is no one single score you can use that will work for everyone since we all have unique risk tolerances and definitions on what we consider tradable. Likewise, not all scoring systems are equal or perform under all circumstances. However, in this article, I will talk about my favorite methods used to score and rank trading systems. These are the key system performance metrics that I use during the system development process.

Number of Trades

Any trading system should have a "significant" number of trades. What is significant? Well, that varies. For a swing system that takes no more than ten trades a year, having 100 trades is good. This represents about ten years of historical testing. As a given trading system starts to produce more trades per year, I would expect to see more trades utilized during backtesting.

Often you will read that 30 trades is a minimum to be statistically significant. Is that true? I'm not so sure as it depends upon when those trades took place. Market conditions change, and if those thirty trades do not take place in various market conditions, it may not be an excellent representation.

In the end, the more trades, the better. Intraday systems should have hundreds of trades. Longer-term swing systems should have over a hundred trades.

Profit Factor

While net profit can be a factor in your decision about a particular trading system, profit factor is often even more critical, in my opinion. The profit factor measures the efficiency of your trading system. The profit factor is calculated by dividing the generated profit by the induced losses. A profit factor of 1.5 indicates that three dollars are gained for every two dollars ($3 win / $2 lost = 1.5). A number above 1.0 means you are making money. I like to see a profit factor of 1.5 or higher.

Average Profit Per Trade

Like the profit factor, the average profit per trade tells me if a system is making enough money on each trade. When designing a trading system I like to see an average profitable trade above $50 before commissions and slippage are deducted at an absolute minimum. If the average net profit is above $75 with commissions and slippage deducted, I feel even better. The higher the average profit per trade, the better.

Percent Winning Trades

I don't follow this too much. I make a note of it, but it's not all that important to me. The percent winning trades is simply the number of trades that generated a positive net profit divided by all trades taken. This factor can be necessary if you don't like to have a large string of losers. For example, often longer-term trend-following systems can be very profitable, but only have a win rate of 40% or less. Can you handle many losing trades? Maybe you are only comfortable with systems that tend to produce more winning trades than losing trades. If so, then a system with a win rate of 60% or higher would be better for you. Percent of winning trades is a psychological tolerance indicator that will vary between people.

Compounded Annual Growth Rate (CAGR)

This describes the growth as if it were a steady, fixed rate of return. We all know such a smooth ride does not happen when trading as your trading system. Yet, this is a way to smooth your return over the same trading period. Let's say your trading system produces a 5% CAGR over ten years. Over that same period, you have a bank CD that also yields a 5% return over the same time frame. Does this make the CD a better investment? Maybe. One thing to keep in mind is this: the CAGR calculation does not consider the time your money is at risk. For example, while the trading system may be retuning 5% CAGR over ten years, your money is only active in the market for a fraction of the time. It's sitting idle in your brokerage or futures account most of the time, waiting for the next trading signal. CAGR does not take into account the time your money is at risk. Remember, a 5% return in the CD is realized if your money is locked away 100% of the time. Our cash is also freed up to be put to use in other instruments with our example trading system.

Risk-Adjusted Return (RAR)

This calculation takes into account the time your money is at risk in the market. This is done by taking the CAGR and dividing it by exposure. Exposure is the percentage of time (over the test period) that your money was actively in the market. I like to see a value of 50% or better.

Maximum Intraday Drawdown and The Equity Curve

How significant are those drawdowns? Can I mentally handle such a drawdown? Along these lines, I also look at the shape of the equity curve. I think looking at the equity curve can give me an excellent feel for a trading system. When I look at an equity curve, I ask, does it climb with shallow pullbacks, or does it have steep pullbacks? Are there long extended periods with no new equity highs? Ideally, the equity curve should rise as time goes by, creating new equity highs with shallow pullbacks. I always try to imagine what it would be like to be trading that system. Could I handle it?

Net Profit / Drawdown

I like this metric. I'll often use it as a fitness function as well. I like this because it incorporates both profit and drawdown. We all want to make more money with the smallest amount of drawdown, and this metric helps clarify that. The higher the number, the better.

t-Test

This is one you don't see much of. The t-Test is a statistical test used to gauge how likely your trading system's results occurred by chance alone. You would like to see a value greater than 1.6, which indicates the trading results are more likely not to be based on luck. Any other value below shows the trading results might be based upon chance. The t-Test value should be calculated with no less than 30 trades. Below is the t-Test calculation.

t = square root ( number of trades ) * (average profit per trade trade / standard deviation of trades)

Expectancy

Expectancy is a concept that was described in Van Tharp's book "Trade Your Way To Financial Freedom." Expectancy tells you, on average, how much you expect to make per dollar at risk. Expectancy might also be a value that you optimize when testing different strategy input combinations. While computing the actual expectancy of a trading system is beyond this article, it can be estimated with the following simple formula.

Expectancy = Average Net Profit Per Trade / | Average losing trade in dollars |

For those not too familiar with mathematics, the vertical lines around the “Average losing trade in dollars” indicate that the absolute value should be used. This means if the number is a negative value, we drop the negative sign, thus making the value positive.

Expectancy Score

This value is an annualized expectancy-value that produces an objective number used to compare various trading systems. In essence, the Expectancy Score factors in “opportunity” into the value by taking into account how frequently the given trading system produces trades. Thus, this score allows you to compare very different trading systems. The higher the expectancy, the more profitable the system.

Expectancy Score = Expectancy * Number of Trades * 365 / Number of strategy trading days

Conclusion

With the above values, we can get a decent picture of how the system will perform. Of course, there are other values you could evaluate, and even more, you can do such as passing the historical trades through a Monte Carlo simulator. But these values discussed in this article are the essential values I utilize when designing a system or when evaluating a third party trading system.

In-Sample and Out-Of-Sample Testing

EasyLanguage Mastery Contributor — Mon, 03 Feb 2020 11:00:11 +0000

I am frequently asked if I do out-of-sample testing. The short answer is not always, and when I do it is not how most people do the test. There are lots of considerations and pitfalls to avoid when doing out-of-sample testing. Out-of-sample testing is not the panacea it is made out to be. There are lots of grey areas which I will discuss below.

Definition

To do in-sample (IS) and out-of-sample (OOS) testing, one first divides their historical data into two parts. The most common methods for dividing the data are 50% IS/50% OOS and 67% IS/33% OOS. I will be using 15 years of data. Here are some ways that one can divide the data:

There are different reasons one may want the IS data to be the oldest data or the newest data which I will discuss below. The IS data is used to create and optimize your strategy. After refining your strategy, you choose one variation to test on the OOS sample data. From the OOS result, one must decide if the result is good enough to say that the strategy continued to work and was not overfit to the IS data. If it passes your criteria, one can then start trading the strategy. If it does not, well it is game over on that strategy.

When I don’t use OOS Testing

The most common reason not to use OOS testing is because of lack of data. I trade a VXX/XIV strategy I can only test back to 2011. There are just not enough trades to justify breaking the data into two. How do I know that strategy is not overfit? I don’t. I often will use parameter sensitivity testing as described in this post. Here my real trading becomes the OOS test, which is the best OOS test one can have.

Out-of-sample Issues

Past is likely very different from today. Even though I have data back to the 1990’s, I don’t like going that far back for my data because those markets differ greatly from what we have today. This is the time before decimalization, high-frequency trading, government invention, lots of ETFs, and many more change. I will go with 15 years of data.

What is the size and period of OOS?

The first big decision to make is what period to use as the IS period and what period to use for the OOS along with how big each will be. I use the most recent data as my IS period and the older data for the OOS period. The reason for this is I want to optimize my strategy on the most recent market because I believe that is more likely to be closer to the future market than the farther out period. I want to capture a full market cycle of bull and bear when doing this. As to size I typically go with the 67/33 method. Next, I will show how the decision of which period to OOS can lead to very different results.

Only taking one bite of the apple

One of the core tenets of OOS testing is taking only one bite of the apple. Meaning you only test once on the OOS data to see if your strategy holds up. This is hard to do because it is against human nature. You spend weeks or months developing a strategy, and then a quick test on OOS testing fails. No one wants to throw away all that work. But here is the more subtle thing. What is a second bite? Meaning I tested a mean reversion strategy and it failed in OOS. Does that mean I can test no more mean reversion strategies? Of course not, that would be crazy. How much does one need to change a current strategy to make it different enough to use that OOS data? Is changing parameter values to something one did not test enough? Not in my book.

What about adding a new rule, say a moving average cross? Still probably not enough? What if I remove a rule and add profit target? Maybe. There is no clear change to strategy when it is OK to use the data again. It is up to each individual to decide and that means we are likely to fool ourselves and say that small change is enough. Also, we have knowledge about our test period. I know from experience that mean reversion trading did well from 2003 to 2009. If I use 2002 to 2006 as my OOS sample data for a mean reversion test, is that cheating?

Defining success

The above issues are not even that big to me. These next two are huge ones for me. When we run our strategy on the OOS data, how do we know that it passed the OOS test? First, the strategy will probably not perform as well because it was developed on the current market conditions and now it is being tested on different ones. Just because it made money doesn’t make it passed the OOS test. What if the CAGR only dropped a little? That sounds like a pass.

But what if that is the worst CAGR if you ran all your original 1,000 variations on the OOS data? Now it doesn’t sound good. How about a goal of beating a simple strategy? Buy the SPY when it crosses above its 200-day moving average and sell when it closes below it. Can we beat that CAGR by more than 100%? I use this idea and another which I will explain in the next post.

Picking only one variation

Now this is where I have my biggest issue with how OOS testing is normally done. For example, we have a strategy with 1,000 variations. We go through some methods to pick one to test on the OOS data. See this post on how I would pick The One. Now we run The One on the OOS data and make our decision. Say our strategy did “poorly” that means we should stop and consider this a failure. Here is where I have my problem.

What if I got unlucky and picked a variation that did poorly during this timeframe but many of my other choices did well. My strategy concept held up but just not the one I picked. That is wrong because what we want to know is if our strategy concept hold up in OOS. Imagine the reverse scenario. Your strategy concept sucks and it behaves as well as choosing random entry and exit times. In this case, you could get “lucky” and pick a variation that does well in OOS data. Now you are off trading a strategy that is as good as random.

Final Thoughts

In the next post, I will show results of doing IS and OOS testing on a mean reversion strategy and how picking only one variation can be dangerous. Then I will show how using a set of variations is a much better way to help determine if your strategy did well in OOS testing. The post will also show how one can come to different conclusions depending on how the IS and OOS ranges are chosen.

— by Cesar Alvarez from blog Alvarez Quant Trading.

Are your backtest results fooling you?

EasyLanguage Mastery Contributor — Mon, 30 Dec 2019 11:00:13 +0000

Have you ever started trading a strategy that performs well in the backtests but delivers a very different result when you begin trading it with real money?

Could your backtest reports be fooling you by indicating a strategy is great but really only showing you part of the overall picture?

How do you give yourself a better chance of developing trading systems that are robust and perform well going forward?

Kevin Davey (not the guy pictured above!), World Cup Trading champion from kjtradingsystems.com, has been creating trading strategies for over 25 years. In Episode 5 of the BetterSystemTrader podcast he says:

It’s amazing how easy it is to create systems that you think are good that just fall apart. –...

Click to Tweet

To reduce the chance this could occur he completes Monte Carlo analysis on all his systems to ensure they are robust and meet his risk requirements BEFORE he puts his money on the line.

What is Monte Carlo analysis and how can it be used to improve your own trading results? Read on, we’re going to show you.

What is Monte Carlo analysis?

Monte Carlo analysis is a process that allows you to get a more accurate picture of the performance of a trading strategy beyond what a standard backtest report can provide.

A backtest report shows the results of a series of trades in a specific order, but the problem is that’s just history, you don’t know what’s going to happen as it goes forward. What if a lot of losing trades all show up in a row, what type of drawdown will you experience? What’s the chance that you could get a drawdown larger than anticipated, or a string of losing trades longer than expected?

Monte Carlo analysis basically lets you scramble the order of the trades in a backtest to provide a better understanding of possible future performance, based on the assumption that future trades will have similar characteristics to historical trades but in an unknown order.

The results allow you to determine the probabilities of drawdown and profit levels and the chance your trading account could be completely wiped out.

Your worst drawdown is always in front of you.– Kevin Davey

Click to Tweet

Is it really that important?

Yes, even the seasoned pros like Kevin use it and this is why:

I’ve actually found cases where the walk forward equity curve looked great – probably a lot of people just made the decision, “Hey, I’m going to trade it.” But when I ran the Monte Carlo simulation I found out that there was really a lot more risk in the system and it was a lot riskier than I had anticipated. So basically the amount of return that I was getting compared to the amount of risk I could have, that didn’t necessarily show up in that historical equity curve, was just too much for the profit I was getting and so I basically said, “Well I can’t trade this particular system.”

Using the Monte Carlo analysis tool

Kevin has kindly offered a free copy of the Monte Carlo analysis tool he’s developed in Excel, for all Better System Trader podcast listeners. There is a link to download the tool at the end of this article but let’s first see how it works and how to apply the results to our own trading.

When you open the simulator, there are a few values you need to enter based on your own personal trading parameters. (If it prompts you to enable the macros you will need to say yes otherwise the simulator won’t work).

To setup the simulator enter your trading details in the light blue sections, starting in the top left with the base starting equity, the level at which you would stop trading the system if the account equity fell below it and the average number of trades per year:

To enter your trades into the simulator press the ‘Clear’ button and paste the list of trade profit and loss in $ from your backtest report.

For this example we’ll use a list of 1,805 trades over 10.5 years. Based on a $10,000 starting balance the CAR is 31% and Maximum Drawdown is 11%, which results in quite a smooth equity curve:

The results may seem impressive but let’s run it through the Monte Carlo simulator. By adding the trades into the simulator and pressing the Calculate button, the simulator runs through the list of trades 2,500 times, randomizing the sequence of trades each time. We’ve set a starting equity of $10,000 to match the backtest and the stop trading level has been set to $8,000.

The results from the simulator are very interesting.

Analyzing the results

We’ve run the trade list through the Monte Carlo simulator and now it’s time to compare the results with the backtest:

The first thing to notice in the Median Drawdown for the Monte Carlo simulations is 24.6%, however, the backtest reported a Maximum Drawdown of 11%. How can this be?

By switching the order of the trades we’ve identified that the strategy actually contains more risk than the backtest report shows. The favorable sequence of trades in the backtest is understating actual risk!

Also, if the backtest report only produces a drawdown of 11% but the Monte Carlo Median Drawdown is 24.6%, there are likely sequences of trades that have produced 50% drawdowns or larger, much higher than the drawdown limit of 20%.

Note that trading this strategy with a $10,000 starting balance has a 33% chance that it will meet or exceed the 20% drawdown limit. This risk of ruin is much too high.

Applying the results

The Monte Carlo results have shown that starting with a $10,000 account and a 20% drawdown limit we have a 33% chance of ruin and the Median Drawdown of 24.6% is higher than our drawdown limit. What can we do about this?

Without adjusting the strategy rules or risk per trade it seems the best approach is to start with a higher account balance. By checking the yellow results table in the Monte Carlo simulator we can see that we should probably trade this strategy with $25,000 or higher:

Conclusion

We can now see the importance of Monte Carlo analysis in the system development process. This basic example has shown us how backtest results, which only show the performance of one order of trades, may not be showing the full picture.

By running the trade list through the Monte Carlo simulator we’ve determined:

The Maximum Drawdown value in the backtest report (-11%) was based on a favorable run of trades and was understating the actual risk of drawdowns, with the Monte Carlo simulations showing a Median Drawdown of -24.6%.
The Risk of Ruin when trading a $10,000 account size was 33%, much too risky to trade, therefore a larger account size or smaller trade risk would be required to reduce the possibility of ruin.

Download it

To get your free copy of the Excel document please visit this link and scroll to the bottom of the page. The Excel document is provided by Kevin Davey and Better System Trader. Be sure to listen to Kevin Davey’s interview on Better System Trader.

— by Andrew Swanscott from Better System Trader

Finding Out What Works, And What Doesn’t Work – Part II

Kevin Davey — Mon, 22 Oct 2018 10:00:25 +0000

In Part 1 of this series, found here, I ran a simple study of a trading system, in order to look at some of the common components (such as end of day closes, short versus long bar periods) that many system developers add to their trading system. Could some of these strategy building blocks be hurting their development efforts? To answer the question, I analyzed one trading system over 7 different markets for a 5 year period. Even though I was able to extract some conclusions from the study, I also said the following:

“Of course, I made these conclusions based on one study. What if the strategy was different? What if the timeframes or markets were different? What if different years were used for the test period? Will the conclusions reached here still hold? I’ll examine those questions in Part 2.”

To summarize the earlier work, the conclusions from Part 1 were:

Striving for profit in multiple markets, as a confirmation of a strategy, is indeed possible.
Swing trading is likely more profitable than intraday trading.
Longer timeframes are generally superior to short time periods, but this is not a major effect.

In this second part of the study, I’ll now look at a different group of markets, a completely different strategy and some different timeframes. I’ll also try to answer the following questions: Will the conclusions still hold up? What type of insights, if any, do conducting studies like these actually provide? Finally, can studies like these improve your own strategy development?

Ground Rules

Previously in Part 1, I tested 7 different markets across a range of commodities.

(Group 1):

Wheat (W)
10 Year Treasury Notes (TY)
Lean Hogs (LH)
Australian Dollar (AD)
Heating Oil (HO)
Cotton (CT)
the e-mini Nasdaq (NQ)

I picked these at random, one from each major market group. The test period will be from 1/1/2007 to 12/31/2011, a 5 year period that includes some quiet, and some very chaotic, markets.

For Part 2, I will test another basket of commodities.

(Group 2):

Live Cattle (LC)
Canadian Dollar (CD)
Copper (HG)
Natural Gas (NG)
Sugar (SB)

I will also test the period 1/1/2012 to present, a smaller test period outside of the range previously tested.

Finally, I will assume $5 round turn commissions and $30 round turn slippage.

The Systems

I will use two extremely simple base strategies for the tests I am running: a simple breakout based on closing prices (Strategies A and B), or a reverse breakout (countertrend – Strategies C and D). In all cases, the breakout length X will be varied from 5 bars to 100 bars.

Here is the code for all four versions of the strategy:

Results – Repeat of Part 1 Test, Different Instruments

First, I will conduct a test of Strategies A and B for the same period, with the second group of commodities. To keep things manageable, only the group summary results are shown below:

Strategy A- Breakout, Swing Trading

$ Thousands ($K) Maximum Net Profit, 2007-2011

	5 min	15 min	30 min	60 min	120 min	daily
Median Group 1	$32	$42	$37	$38	$38	$39
Median Group 2	$24	$25	$27	$29	$37	$38

$ Thousands ($K) Maximum Drawdown for Max Net Profit case, 2007-2011

	5 min	15 min	30 min	60 min	120 min	daily
Median Group 1	$16	$19	$12	$16	$19	$14
Median Group 2	$27	$45	$33	$10	$15	$6

% of Intereaction Profitable (100%=goal) 2007-2011

	5 min	15 min	30 min	60 min	120 min	daily
Median Group 1	48.5	81	62.5	77	74.5	88
Median Group 2	24.5	18	25	30	69	60

Strategy B- Breakout, Exit of the day

$ Thousands ($K) Maximum Net Profit, 2007-2011

	5 min	15 min	30 min	60 min	120 min	daily
Median Group 1	$19	$14	$13	$8	$5	$4
Median Group 2	$27	$19	$21	$18	$12	$2

$ Thousands ($K) Maximum Drawdown for Max Net Profit case, 2007-2011

no data shown for cases where maximum net profit <$0

	5 min	15 min	30 min	60 min	120 min	daily
Median Group 1	$27	$19	$21	$18	$12	$2
Median Group 2	$19	$26	$10	$8		$8

% of Iterations Profitable (100%=goal), 2007-2011

no data shown for cases where maximum net profit <$0

	5 min	15 min	30 min	60 min	120 min	daily
Median Group 1	58	90	97	62.5	100	98
Median Group 2		49	77	80

The results of Group 2 are generally in line with Group 1 results, which show that swing trading (strategy A) is better than intraday trading (Strategy B), and that longer timeframes are slightly better than short timeframe bars.

Results–Different Timeframes, Different Time Period, Different Strategy

Of course, concluding that long timeframe bars and swing trading is superior based on only a few tests is dangerous. So, let’s look at Strategies C & D (the opposite Strategies of A & B), and see if the results have the same tendencies for different timeframes and test period, on a random subset of Group 1 and Group 2 instruments.

Strategy C- Reverse Breakout, Swing Trading

$ Thousands ($K) Maximum Net Profit, 2012 - present

	10 min	45 min	90 min
W	$17	$19	$17
AD	$36	$27	$23
HG	$73	$67	$68
SB	$6	$1	$3
Median-Subset Group	$27	$23	$20

% of Iterations Profitable (100%=goal), 2012 - present

	10 min	45 min	90 min
W	50%	70%	46%
AD	88%	61	36%
HG	83%	100%	80%
SB	0%	10%	22%
Median-Subset Group	67%	66%	41%

Strategy D- Reverse Breakout, Exit End of the day

$ Thousands ($K) Maximum Net Profit, 2012 - present

	10 min	45 min	90 min
W	$26	$11	$10
AD	$1	$1	$2
HG	$10	$9	$4
SB	$19	$12	$8
Median-Subset Group	$15	$10	$6

% of Iterations Profitable (100%=goal), 2012 - present

	10 min	45 min	90 min
W	0	0	0
AD	0	0	0
HG	0	0	0
SB	0	0	0
Median-Subset Group	0	0	0

For timeframes, the results show a tendency for the smaller timeframes to be better than the longer timeframes, exactly the opposite of what was found earlier. But this effect is small, which could mean it is not actually true, and could be just an artifact of the limited testing. This is a good example why it is dangerous to draw conclusions from only a few tests.

For the swing vs. intraday testing, the results are pretty clear – intraday testing is not nearly as good as swing trading. This agrees with the earlier conclusion.

Conclusion

Through both Part 1 and Part 2 of this study, a lot of tests have been run over various bar timeframes, instruments, strategies and time periods. Yet, it would be dangerous to make any sweeping conclusions based on these tests. Although these results show that intraday trading is less profitable than swing trading, there are surely cases where intraday trading is preferable over swing trading.

So, what can this study actually teach us? I think the following lessons come out of this study:

Intraday trading is less profitable than swing trading. This does not mean this always holds, but if you do most of your testing intraday and you are struggling to develop a strategy, perhaps opening up the strategy to allow swing trading is a good idea. It could be that you have a good swing strategy, but the end of day exit alone makes the strategy look bad.
Longer timeframes are better (Part 1 conclusion), except when they aren’t (Part 2 conclusion). So, be careful of making conclusions when the effect is small, or results are contradictory. Since many strategy developers tend to find something that does or does not work and then focus only on that (or avoid it), the lesson here is that the developer should not “box” themselves in. Maybe short timeframes rarely work with your strategies, but that does not mean you should never test them. There will be other strategies where shorter time bars are an asset.
Since a developer can never test everything, it may be worthwhile to vary your test approach. By looking at different timeframes, instruments, etc., you may discover new paths to take in your strategy development.

The key to successful strategy testing and development is to keep an open mind, and allow the data to point you in the right direction. At the same time, though, don’t let the data trick you into making false conclusions or sweeping generalizations. Remaining flexible in your development approach can lead to new and profitable strategies – ones that you might now be dismissing based solely on past testing.

If you would like to learn more about building trading systems be sure to get a copy of my latest book, Building Winning Algorithmic Trading Systems.

— Kevin J. Davey of KJ Trading Systems

System Performance and Confidence Interval

Jeff Swanson — Mon, 15 Oct 2018 10:00:25 +0000

When you review the performance of a trading model, how do you know it’s worth trading for? How do you know it’s the right system for you? How confident are you that it will continue to profit in the future? When it comes to evaluating your trading model there are many factors to take into account. Some of them are obvious such as Net Profit and Risk-Per-Trade. Others may be a bit more unfamiliar such as Sharpe ratio or Profit Factor. This article is going to be the first article in several where I highlight a method or idea that can help you gauge the quality of a given trading model. In this article I would like to highlight a statistical-based metric that can be used to help indicate the likelihood that a given system will continue to generate profits in the future.

Many people simply look at the net profit of a trading model assuming a system with more profit must be the better system. This is often far from a good idea. More profit may also mean more risk, deeper drawdown, or other compromises to achieve those higher results. When testing trading models during the development process or reviewing a commercially available system before making a purchase, it is advisable to have a few metrics on hand that will allow you to make a wise choice. There is no one single score that will give you the definitive answer. Furthermore, everyone has unique risk tolerances and expectations on what is considered tradable. Yet, we can make smarter choices than simply looking at net profit. Here is one method you should be aware of.

Confidence Interval

It’s easy to find a trading model that has a positive average profit of $100 and then conclude it could be profitable into the future. But is there a metric we can use to help us predict what might happen into the future? A complicated approach would be to use the Monte Carlo method, but not everyone has access to this however, we all have access to a simple calculator. By visiting a topic in statistics called Confidence Intervals (CI) we can obtain a hint at what’s possible and perhaps find weaknesses in our seemingly profitable trading model.

The average net profit of a trading model is simply the historical P&L for each trade over a given time period. Let’s imagine a trading model that has produced 60 trades. Some of the trades are winners and some are losers. We add the total P&L together for each individual trade and divide it by the number of trades – 60 in this case – and we get $100. Clearly this is well above zero so in the long run this system appears profitable.

However, we also know that individual trades can be very different from our average profit per trade. Some trades produce much larger winning trades while others produce smaller winning trades. Still, other trades produce a range of losing trades. If we graph each trade’s P&L and then draw a line representing our average P&L we would see each individual trade falls around our mean value of $100. In other words, the P&L for any trade will vary around this mean value. We can measure this variation and use it to estimate the likelihood the system might remain profitable.

Statistically speaking, a trading model that exhibits a large standard deviation of profit per trade will have an increased chance of failing in the future. This is true even if the average mean is currently profitable. But what makes a standard deviation too large? This is explained below when we attempt to use our confidence interval to estimate a likely range of average P&L values into the future.

The green line is the average P&L. This line is plotted for each trade and we can see it’s angled downward showing a falling P&L as time goes by. Each blue plot is a single trade’s net P&L.

What we wish to do with our confidence interval is estimate with 95% confidence, if our system will probably produce a negative average P&L into the future. In other words, is it likely our seemingly profitable trading model is based upon chance? We can estimate this with our CI formula.

CI = t * SD / squareroot( N )
CI = Confidence Interval
t = T-score (we estimate value to be 2 and the reasoning behind this is beyond this article)
SD = P&L Standard Deviation for all trades
N = number of trades in our sample

With our imaginary trading model we have a $100 average net profit and 60 trades in our sample. Please note that in order for this method to work, you must have a minimum of 60 trades in your sample. Let’s also state the standard deviation for all trades is $400. With this information we can compute our 95% confidence interval.

CI = 2 * $400 / squareroot( 60 )
CI = $800 / 7.746
CI = $103.28

For simplicity let’s round the confidence interval to the nearest dollar which is $103. What do we do with this value? We create a range or band around our average net profit value of $100 by both adding and subtracting the confidence interval value.

upper band = Average Net Profit + CI = $203
lower band = Average Net profit - CI = -$3

We have now created a range of -$3 to $203 for our average net profit. What does this mean? Based on our calculation we have estimated with 95% confidence that our trading model’s average net profit could be as low as -$3 or as high as $203. The important number is the lower band because this represents a worse case situation. In our example, we have a negative value which indicates a losing model. Or at least, a potential losing scenario.

In short, our hypothetical trading model’s average net profit could be based upon chance and in the future could produce a negative P&L. Suddenly what seems like a solid system appears more shaky. Does this mean our trading model should be abandoned? Not necessarily.

In the case of the confidence interval there are two critical factors at play. Those values are the number of trades (N) and the value of the standard deviation between trades. Modification of the standard deviation can be achieved by altering the trading model logic. Modifying stops, targets, and other trading rules will change the standard deviation value. The goal would be to tighten the variation of each trade to reduce the standard deviation. This in turn, would create a smaller confidence interval. However, if you don’t want to modify the system or if you are unable to modify the system there is another way.

Our example system was based on 60 trades. This is really not a lot of trades. Let’s say we find more data to test our system and we get up to 100 trades. Let’s also pretend all the other performance factors stay the same. If we recalculate our confidence interval value we now get a value of $80. This gives us a range of $20 and $180. In this case, we have a system which produces positive value for the lower band. So maybe before we make a judgment on a system that appears borderline we should collect more trades first.

I should also point out that our imaginary trading model we are looking at has $30 deduction for each trade to account for slippage and commissions. So this negative effect is already factored into our confidence interval calculation. If we did not take into account slippage and commissions during our back-testing we would have to deduct this from our final range which would give us -$10 and $150. The impact of commissions and slippage just puts us back into negative territory gain. But we have them accounted for in our back-tested results.

As you can see having enough data points (trades) can have a significant impact on the confidence interval calculation. For system trading there are many reasons for having a large number of trades. Of course continuing to add more and more trades is not going to turn a losing system into a good system. The point here is sometimes you need to have more data before making an informed decision. If you have what you believe is a good system, yet you only have a few data points, the confidence interval calculation may be warning you to get more trades in the test sample.

What Confidence Interval Does Not Tell Us and How To Use It

Our confidence interval calculation makes some assumptions about the data. That is, the data points (trades) have a normal distribution. This of course is not necessarily correct. In many cases our trade distribution looks somewhat normal with fat tails in our distribution. Thus, we must take our confidence interval results with a grain of salt.

A bigger issue, I think is the confidence interval calculation does not indicate if this system has been curve fitted. If we have a killer system with 1,000 trades with a confidence interval range of $100 – $200 that’s great. However, it’s pointless if the system is curve fitted to the historical data and there is no way our confidence interval calculation can tell us. So, what do we do?

Ideally you would validate your new trading model on out-of-sample data to help detect if your model is over-fit to the historical data. Furthermore, you would also perform the confidence interval calculation on the out-of-sample data of your trading model. By doing this, you’ll have more meaningful results. This step will help reduce the possibility of trading an over-fit trading model and give more meaning to your confidence interval test.

But even if we have a solid system that is not curve fitted to the historical data, our confidence interval calculations are no guarantee of success in the future. The markets are dynamic and changing and it’s possible that the distribution of trades will change thus altering our average trade and standard deviation. In the end, even if our system looks great on paper, validates on the out-of-sample data, and our confidence interval looks fantastic, our trading model could fail as soon as we trade it live. If this is the case, what is the point of all these testing and is it worth doing? The short answer is, yes.

In trading there is no guarantee for future results – ever. The point of testing a system is not to prove how much money it will make in the future. The goal is to find reasons why not to trade it. The purpose of any validating test is to find weaknesses so we can address those weaknesses now before we have money on the line. Our job as professional system traders is to manage risk which means eliminating risky actions. There is no certainty. This is one of the reasons this field is so psychologically difficult.

As a final point, by using confidence interval we have another tool to find weaknesses and ultimately gives us more confidence that a particular trading model will likely bring us success into the future.

Finding Out What Works, And What Doesn’t Work

Kevin Davey — Mon, 08 Oct 2018 10:00:20 +0000

Many traders who try system trading have previously had difficulty at discretionary or manual trading. Most of these folks eventually recognize the benefit of trading a system with well defined rules – a system that has performed well in the past. It is nice to know a trading approach has historically worked, but as with all things related to trading, past performance is no guarantee of future results.

Unfortunately, many people who try systematic/algorithmic/mechanical/rule-based trading for the first time bring along a lot of the baggage that they have acquired from their previous method. Depending on the pre-conceived notions they bring into mechanical trading, these new systematic traders may run into a lot of frustration and trouble.

Many times, for example, traders will always test with a few core concepts, such as always closing by the end of the day. This is what they were used to as a discretionary or manual trader, and therefore they never even think to test ideas out of their old comfort zone. Perhaps removing these “comfort” rules would dramatically improve performance.

In this article, I will examine three common items that new systematic traders test, and see how these items actually work when they are subjected to rigorous testing.

Ground Rules

In all the testing that follows, I will test in 7 different markets, across a range of commodities:

Wheat (W)
10 Year Treasury Notes (TY)
Lean Hogs (LH)
Australian Dollar (AD)
Heating Oil (HO)
Cotton (CT)
e-mini Nasdaq (NQ)

I picked these at random, one from each major market group. The test period will be from 1/1/2007 to 12/31/2011, a 5-year period that includes some quiet, and some very chaotic markets.

Finally, I will assume $5 round turn commissions, and $30 round turn slippage. The $30 slippage might be excessive, but I’ve always found it is better to be conservative than to underestimate slippage. I’d rather not be disappointed in real time with larger than expected slippage.

The System

I will use an extremely simple strategy for the tests I am running: a simple breakout based on closing prices. The system is always long or short, and will enter at the next bar open, long if the close is the highest close of the past X bars, and short if the close of the lowest close of the last X bars. One version of the system always exits at the end of the day, and the other version is a swing system. X will be varied from 5 bars to 100 bars.

Here is the code for the swing version, Strategy A:

Figure 1

Here is the code for the day trading version, Strategy B:

Figure 2

Day, or Night?

Since most markets these days run nearly 24 hours, we can run into issues when testing historically, as many markets traded “pit” hours years ago. Because of that, and because most of the volume is during these traditional pit hours, I will use the old pit session times, but still use electronic trading data.

Exit 1 Minute Before Close

If you use Tradestation to test, you may be familiar with their keyword “setexitonclose.” This is a neat function for backtesting, but in live trading the order is sent after the market is closed, rendering it ineffective. So, I set up the custom sessions above to exit 1 minute before the old “pit” closing time. Then, I know my strategy will exit properly, and will therefore work in backtest and in real time.

Final Pre-Test Info

Now that we have everything properly setup, it is time to run some tests. For each instrument given above, I will run tests with 5, 15, 30, 60, 120, and Daily bars. That gives us 7 instruments x 6 bar sizes = 42 tests. Then, for each test, I will run the breakout length X from 5-100 in steps of 1, which yields 96 iterations per test. Since I am running two strategies overall, I am running 42 x 96 x 2 = 8,064 unique performance sets.

For each of the 42 tests for each strategy, I will record the best Net Profit (out of the 96 iterations) and its corresponding maximum drawdown (if best Net Profit is greater than zero). For each of the 42 tests, I will also record the percentage of iterations that are profitable.

All the tests being run are shown graphically in Figure 3.

Figure 3

Questions To Answer

Once all the tests are run, I will see if I can answer the three questions below. These questions are directly related to the desires of many discretionary day traders.

Many traders love strategies that are the same across multiple instruments. An example of this is price action trading – many feel the principles of price action hold across all instruments. So, when these traders test algorithms, one demand is that a strategy is viable ONLY if it is profitable across many instruments. Question: Is multiple market profitability a reasonable requirement?
Most day traders need to be out by the end of the day. ** Question: Does forcing an exit at the end of each day improve or decrease strategy performance?**
Many day traders try to go for the smallest (shortest) bar size possible. The theory is that trades will be more frequent, losses will be smaller, and exits will be more responsive to market conditions. Question: What kind of impact does bar size have on performance?

Before I get to the results, I should point out that this is one study, with two strategies, across only seven markets. So, the conclusions I reach may not hold up over all markets, and may be different for different strategies. I’m guessing, though, that the conclusions probably do hold in general.

Results

Results of all 8,046 performance tests are shown in Table 1 for the swing Strategy A, and Table 2 for the intraday Strategy B. I will refer to these results as I discuss each of the three questions.

Strategy A- Breakout, Swing Trading

$ Thousands ($K) Maximum Net Profit, 2007-2011

	5 min	15 min	30 min	60 min	120 min	daily
W	$32	$50	$37	$37	$35	$35
TY	$17	$6	$5	$20	$22	$19
LH	$31	$27	$28	$30	$20	$8
AD	$4	$34	$37	$38	$41	$42
HO	$82	$66	$62	$106	$133	$119
CT	$105	$84	$85	$55	$87	$62
MQ	$15	$1	$2	$17	$14	$14
Median	$32	$42	$37	$38	$38	$39

$ Thousands ($K) Maximum Drawdown for Max Net Profit case, 2007-2011

	5 min	15 min	30 min	60 min	120 min	daily
W	$12	$11	$19	$17	$24	$16
TY			$12	$7	$7	$8
LH	$10	$10	$7	$6	$12	$9
AD		$19	$25	$19	$19	$14
HO	$58	$39	$59	$42	$31	$40
CT	$20	$24	$28	$14	$19	$13
MQ		$26	$12	$15	$18	$09
Median	$16	$19	$22	$16	$19	$14

% of iterations Profitable (100%=goal) 2007-2011

	5 min	15 min	30 min	60 min	120 min	daily
W	62	81	88	91	57	72
TY	0	0	9	38	51	91
LH	55	83	40	41	35	35
AD	0	48	66	70	91	85
HO	42	81	59	84	97	100
CT	89	99	72	84	96	96
MQ		1	2	42	44	38
Median	48.5	81	62.5	77	74.5	88

Tables- I

Strategy B- Breakout, Exit End Of The day

$ Thousands ($K) Maximum Net Profit, 2007-2011

	5 min	15 min	30 min	60 min	120 min	daily
W	-2	-17	-18	-14	-8	-12
TY	-22	-18	-16	-11	-4	1
LH	-16	-11	-9	-5	-5	6
AD	-36	-11	1	3	-2	26
HO	81	40	67	70	86	105
CT	-70	-43	-35	-31	-25	1
MQ	-64	-50	-32	-24	-17	-4
Median	-19	-14	-12.5	-8	-4.5	3.5

$ Thousands ($K) Maximum Drawdown for Max Net Profit case, 2007-2011

	5 min	15 min	30 min	60 min	120 min	daily
W
TY
LH						-3
AD				-13		-12
HO	-22	-21	-16	-14	-8	-17
CT
MQ
Median	-22	-21	-16	-13.5	-8	-12

% of iterations Profitable (100%=goal) 2007-2011

	5 min	15 min	30 min	60 min	120 min	daily
W
TY
LH						62
AD				25		98
HO	58	90	97	100	100	100
CT
MQ
Median	58	90	97	62.5	100	98

Tables- II

1. Is profitably across multiple instruments a reasonable requirement?

Figures 4 and 5 show the best case Net Profit for both the swing and intraday version of the strategy. With one notable exception (Heating Oil intraday trading), what is good in one market is generally good in another market. Although the maximum profit varies wildly from market to market, all the swing markets are profitable, and all but one of the intraday markets are unprofitable. This should give you some confidence that the strategy is sound (or not). Of course, it doesn’t mean the strategy is tradable in each market, since I am looking only at maximum net profit based on optimization. But, clearly the intraday Strategy B is not viable in most markets.

Figure 4

Figure 5

So, the results show that profitably across multiple markets is possible, even with markedly different instrument, suggesting that is indeed a valid requirement.

2. Is exiting end of day (intraday trading) a good idea?

Figure 5 shows the answer loud and clear – the answer is no. You’ll get more profit by swing trading. Unfortunately, it might also mean enduring more drawdowns, but think of it like this: the market “pays” people for holding overnight and weekends, and taking on that risk.

3. Are shorter timeframes better?

Since Strategy A is clearly the better strategy, I will use those results to look at the impact of bar length (timeframe). This is shown in Figure 6, where I look at the percent of iterations that were profitable for each combination of instrument and bar size. So, for example, Wheat with 5-minute bars shows that 62% of iterations were profitable (denoted with a red circle in Figure 6). This means that as I varied the breakout length from 5 to 100 in increments of 1 (96 total iterations), 60 of those runs resulted in a profitable end result. Obviously, the best case is 100% – where no matter what value you use for breakout length, the end result is positive.

Figure 6

Results of Figure 6 show that in general profitability increases as the bar length increases, although this effect is not very pronounced, and it does not hold for all instruments. This is confirmed by the results of Figure 4, where the maximum Net Profit generally increase with increasing bar period, but not dramatically.

Why is this so? Profitability could be better as bar period increases, since random market noise plays a smaller role in longer period bars (i.e., the true trend is more easily seen in larger period bars). Of course, drawdown should also be considered when looking at bar size, but in this study it does not seem to change greatly with bar size.

Conclusion

Results with this simple strategy lead to three conclusions:

Striving for profit in multiple markets, as a confirmation of a strategy, is indeed possible.
Swing trading is likely more profitable than intraday trading.
Longer timeframes are generally superior to short time periods, but this is not a major effect.

Follow On

Of course, I made these conclusions based on one study. What if the strategy was different? What if the timeframes or markets were different? What if different years were used for the test period? Will the conclusions reached here still hold? I’ll examine those questions in Part 2.

If you would like to learn more about building trading systems be sure to get a copy of my latest book, Building Winning Algorithmic Trading Systems.

— Kevin J. Davey of KJ Trading Systems

Is The MAC System Overly Optimized?

Jeff Swanson — Mon, 13 Oct 2014 10:00:43 +0000

In a recent article entitled, “Backtesting the MAC-System – How Long is Long Enough?“, the author presented a simple rotational trading system based upon moving averages. The system is one of the most basic rotational systems you could devise. Based upon signals generated from the moving averages, your investment is either within the S&P index or a bond fund. Several readers voiced their concerns over the possibility of the system being over optimized to the historical data. This is a valid point and worth looking into.

When it comes to making a profitable trading system we depend upon finding market edges we can exploit. All successful trading relies on recurring patterns found in the historical market data and exploiting those patterns to make a profit. This is true for discretionary traders as well as system traders. Optimizing our system to take advantage of market edges is not the problem. We want to optimize our system to take advantage of an edge.

Yet, much of the historical market movement is noise or random movement. So, the problem of over optimizing our system to the historical data starts to happen when our system begins to key-into patterns within the noise instead of exploiting a true market edge. Finding a pattern in the noise can produce great looking equity curves on historical data, but on out-of-sample data the equity curve can break down rather quickly.

In summary, we want a trading system that exploits a true market edge and avoids keying-off patterns in the market noise. Optimization is not the problem. Over fitting is the problem and having the ability in separating a true market edge from the noise is what this game is all about.

So, how can we test the MAC system to help us determine if it’s overly optimized? When looking at a trading model, it’s important that the parameters demonstrate robustness across a wide range. That is, the system should remain profitable over many different values. We should be able to change some of the key input values and we should have a viable looking trading model. The output (net profit in this case) should not drastically change.

In order to explore the robustness of the basic premise of this trading model I created a stripped-down version in EasyLanguage that buys and sells the S&P cash market only. The complete trading model, as presented by the author of the MAC system, has rules to move cash into a bond fund during “bear” markets, but I’m particularly interested in testing the buy/sell signals generated by the various moving averages as these seemed to produce the most amount of questions by readers.

Trading Model Rules

Our trading model is very simple with signals to go-long the S&P cash market and when to close our position. The buy/sell rules are below.

Buy Signal
The 34-day exponential moving average (EMA) of the S&P 500 becomes greater than 1.001 times the 200-day SMA.

Sell Signal
The 40-day simple moving average (SMA) of the S&P crosses below the 200-day SMA.

Testing Environment

I coded the above rules in EasyLanguage and tested them on the S&P cash market. Before getting into the details of the results, let me say this: all the tests within this article are going to use the following assumptions:

Starting account size of $100,000
Dates tested are from October 1969 through September 2014
Shares were scaled based upon price and volatility
The P&L is not accumulated
There were no deductions for slippage and commissions
There were no stops

Please note we are not adding our profits to our trading account! We are always trading a small percentage of our starting capital. Here is the position sizing formula used:

Shares = $5,000 per trade / 5 * ATR(10)

Baseline Results

Just using the simple rules above we get the following equity curve and results.

MAC Baseline Model

	MAC Baseline
Net Profit	$279,023
Profit Factor	11.65
Total Trades	27
%Winners	70%
Avg.Trade Net Profit	$10,334.19
Annual Rate of Return	2.71%
Max Drawdown(Intraday)	$55,741
Expectancy	3.16
Expectancy Score	1.73

Testing Trend Filter Period

The baseline rules call for a 200 SMA to act as a major trend filter. Is this 200 value just an outlier? Or, is this parameter robust for a wide range of values? To test the robustness of this input, I will use TradeStation’s optimization feature which will allow me to quickly test a range of values. I will test the range 60 – 260 in increments of 10 for the input value used in the SMA calculation. The results of my test are below. The x-axis displays the look-back period in days while the y-axis displays the net profit.

The optimal value (260) is not the default value (200). Many values show profitable results. The default value and above all show very similar results.

Testing Buy Period

The baseline rules call for a 34 EMA to rise above the 200 SMA a given amount to trigger a buy signal. Is the 34 period used for the EMA calculation overly optimized? To test the robustness of this input, I will use TradeStation’s optimization feature which will allow me to quickly test a range of values. I will test the range 5 – 75 in increments of 5 for the input value used in the EMA calculation. The results of my test are below. The x-axis displays the look-back period in days while the y-axis displays the net profit.

The optimal value (20) is not the default value (34). All values show profitable and very similar results.

Testing Buy Factor

The baseline rules call for a 34 EMA to rise above the 200 SMA by a factor over 1.001. Is this factor overly optimized? Again, I will use TradeStation’s optimization feature. I will test the range 0.990 – 1.010 in increments of .001 for the input value used in the factor calculation. The results of my test are below. The x-axis displays the factor value while the y-axis displays the net profit.

The optimal value (.98) is not the default value (1.001). All the values show profitable and very similar results to the default value.

Testing Exit Period

The baseline rules call for a 40 SMA cross below the 200 SMA to trigger a sell signal. Is the 40 period used for the SMA calculation overly optimized? Once again, I will use TradeStation’s optimization feature to quickly test a range of values. I will test the range 5 – 75 in increments of 5 for the input value used in the SMA calculation. The results of my test are below. The x-axis displays the look-back period in days while the y-axis displays the net profit.

The optimal value (55) is not the default value (20). All values show profitable results. Values between 35 and 60 all show similar results.

Conclusion

It appears this trading model is not overly optimized. Through my simple testing I was able to demonstrate that the core of the MAC trading system shows profitability over a wide range of values. In fact, some combinations produce better results. The default values do not appear overly optimized. Note, many different values appear to work just fine.

It’s my opinion the MAC system will likely work into the future. It works because it positions a trader into bull trends as seen from 2009 to today (late 2014) while more importantly, it gets a trader out of the market during prolonged bear markets such as seen in 2007-2008. The exact parameters don’t matter! You can use a 260 SMA or a 180 SMA for the major trend filter. You can enter the market using a 40 EMA or a 20 EMA. You can use a factor of 1.001 or .0098 as the buy threshold. They all produce positive results which speaks to the robustness of the system.

It’s true the testing period within this article only produced 16 trades (samples) and this is a very low value. Part of this is due to the fact I don’t have all the historical data at my disposal to test. However, based upon the authors original work of going back to 1955, I would feel confident that we would have a sufficient number of samples.

Simple systems on daily charts applied to a broad market index provide a systematic approach that often will do far better than what most people do. That is, make buying and selling decisions on emotional reactions.

What About SPY?

Several people have asked about this model on SPY. Strictly speaking, the MAC-System presented in this article is not the complete MAC-System. The EasyLanguage code I wrote for the above tests was designed strictly to test the robustness of the input values – not to test the overall performance of the strategy. The trading model presented is a stripped down version of the MAC-System. I would refer you to the article “Backtesting the MAC-System – How Long is Long Enough?” for a more complete description of the performance of SPY on the MAC-System

However given the strategy code presented in this article is not the complete MAC-System, below are the results when applied to SPY. Profits are reinvested. All trading equity is invested for each trade. $16 commissions and $.04 slippage per share was deducted per round trip. The TradeStation Performance Report can be download here MAC_Test_SPY.

Seasonality And The Ivy-10 Portfolio

Jeff Swanson — Mon, 11 Aug 2014 10:00:55 +0000

As the common saying states, “Sell in May and go away.” As we are now in early August our seasonality trigger has recently triggered a sell signal. So, through May, June and most of July we continued to hold on our position. If you receive our free weekly newsletter you were alerted to the seasonality switch the week it changed.

At this time I think it would be a good idea to review where we stand based upon on our seasonality study. If you will recall, the seasonality study goes long the SPY in November and sells in May. This is the classic seasonality hold period which does appear to hold an edge for the S&P market. In order to avoid buying into a falling market or selling into a rising market I tested several short-term filters to help pinpoint an entry. In the end I decided on a 40-period simple moving average to act as an entry method. For more information please see the original article here.

Where We Stand

Below is the daily chart of the SPY with a 40-period simple move average applied. At this point we can see price closed below our SMA on August 1, 2014. This signaled our system to close all open positions. This latest trade, which was opened on November 4, 2014 generated a profit of $9,318 on 56 shares.

At this time our seasonality indicator is bearish and will be denoted with a red arrow. You will find our seasonality status, along with two other broad market indicators, within our weekly newsletter.

The seasonality filter does seem to show an edge for the broad U.S. markets. This got me thinking to how this seasonality filter would on other strategies. For example, will it perform well as a filter for the Ivy-10 Portfolio? Will it reduce unproductive trades during the months of May-October giving us a better return or better drawdown? To test this I came up with a simple test.

Seasonality & Ivy-10 Portfolio

The Ivy-10 is a slight variation on the well known Ivy Portfolio. If you are unfamiliar with this topic please take some time to read the following:

Using ETFReplay.com I backtested the Ivy-10 portfolio from 2003 through December 31, 2013. The ETFReplay website does not have the ability to exclude trading during particular months. However, you can download a summary of trading performance for each month as a CSV file which can be imported into Excel. So I did just that. The Excel document used during this article can be downloaded at the bottom of this article.

Once I had the data within Excel I then named the tab holding all the Ivy-10 Portfolio trades as “Ivy-10 Standard”. This tab contains all the trades for all the months. I then computed the returns for each year and the return for the complete historical backtest. The total return for the backtest is 248.8%.

Next, I duplicated this tab containing all the trades into another tab called “Ivy-10 Seasonality”. Within this tab I wanted to eliminate all trades that took place during our off-season. That is, the months of May, June, July, August, September, and October. I did this by simply deleting the trading results for those months. I then computed the returns for each year and the return for the complete historical backtest. The total return for the backtest is 154.4%.

I summed up the result on a tab called “Totals”. Below is a snapshot of the final results.

You can see using our seasonality filter does not help us at all. It in fact reduces our returns by 94.4%. It’s interesting to note that our filtered Ivy-10 system missed two large years, 2009 and 2003. These years consisted of the market coming out of a prolonged bear market. This is really where the power of a mechanized trading strategy can come into play. After a strong bear market, many market participants are too frightened to re-enter the market. Remember how difficult it was to start buying in 2009? Yet, this was the best time! As the signals rolled in to go-long during 2009 much profit was realized for those brave souls.

Over the past couple of years the seasonality filter appears to provide some benefit. Looking at the years 2010, 2011, 2012 and 2013 our Ivy-10 Seasonality system performed significantly better. With only a few data points we can’t draw much of a conclusion if the seasonality filter will be of much help. Thus, I won’t be filtering my Ivy-10 trades using this filter at this time. As we collect more data over the years I’ll be keeping an eye on this.