Algorihtmic Trading – Helping you Master EasyLanguage https://easylanguagemastery.com Helping you Master EasyLanguage Thu, 25 Jan 2024 11:14:54 +0000 en-US hourly 1 https://wordpress.org/?v=7.0 https://easylanguagemastery.com/wp-content/uploads/2019/02/cropped-logo_size_icon_invert.jpg Algorihtmic Trading – Helping you Master EasyLanguage https://easylanguagemastery.com 32 32 A Timely Function in Easylanguage https://easylanguagemastery.com/building-strategies/a-timely-function-in-easylanguage/?utm_source=rss&utm_medium=rss&utm_campaign=a-timely-function-in-easylanguage https://easylanguagemastery.com/building-strategies/a-timely-function-in-easylanguage/#comments Mon, 29 Jan 2024 11:00:00 +0000 https://easylanguagemastery.com/?p=533439

Learn how to constrain trading between a Start and End Time – not so “easy-peasy”

Why waste time on this?

Is Time > StartTime and Time <= EndTime then…  Right?

This is definitely valid when EndTime > StartTime.  But what happens when EndTime < StartTime.  Meaning that you start trading yesterday, prior to midnight, and end trading after midnight (today.)  Many readers of my blog know I have addressed this issue before and created some simple equations to help facilitate trading around midnight.  The lines of code I have presented work most of the time.  Remember when the ES used to close at 4:15 and re-open at 4:30 eastern?   As of late June 2021, this gap in time has been removed.  The ES now trades between 4:15 and 4:30 continuously.   I discovered a little bug in my code for this small gap when I was optimizing a “get out time.”   I wanted to create a user function that uses the latest session start and end times and build a small database of valid times for the 24-hour markets.  Close to 24 hours – most markets will close for an hour.  With this small database you can test your time to see if it is a valid time.  The construction of this database will require a little TIME math and require the use of arrays and loops.  It is a good tutorial.  However, it is not perfect.  If you optimize time and you want to get out at 4:20 in 2020 on the ES, then you still run into the problem of this time not being valid.  This requires a small workaround.  Going forward with automated trading, this function might be useful.  Most markets trade around the midnight hour – I think meats might be the exception.

Time Based Math

How many 5-minute bars are between 18:00 (prior day) and 17:00 (today)?  We can do this in our heads 23 hours X (60 minutes / 5 minutes) or 23 X 12 = 276 bars.  But we need to tell the computer how to do this and we also should allow users to use times that include minutes such as 18:55 to 14:25. Here’s the math – btw you may have a simpler approach.

Using startTime of 18:55 and endTime of 14:25.

1. Calculate the difference in hours and minutes from startTime to midnight and then in terms of minutes only.

a. timeDiffInHrsMins = 2360 – 1855 = 505 or 5 hours and 5 minutes.  We use a little short cut hear.  23 hours and 60 minutes is the same as 2400 or midnight. 

b. timeDiffInMinutes = intPortion(timeDiffInHrsMins/100) * 60 + mod(timeDiffInHrsMins,100).  This looks much more complicated than it really is because we are using two helper functions – intPortion and mod:

I) intPortion – returns the whole number from a fraction.  If we divide 505/100 we get 5.05 and if we truncate the decimal we get 5 hours.

II) mod – returns the modulus or remainder from a division operation.  I use this function a lot.  Mod(505/100) gives 5 minutes.

III) Five hours * 60 minutes + Five minutes = 305 minutes.

2. Calculate the difference in hours and minutes from midnight to endTime and then in terms of minutes only.

a. timeDiffInHrsMins = endTime – 0 = 1425 or 14 hours and 25 minutes.  We don’t need to use our little, short cut here since we are simply subtracting zero.  I left the zero in the calculation to denote midnight.

b. timeDiffInMinutes = timeDiffInMinutes + intPortion(timeDiffInHrsMins/100) * 60 + mod(timeDiffInHrsMins,100).  This is the same calculation as before, but we are adding the result to the number of minutes derived from the startTime to midnight.  

I) intPortion – returns the whole number from a fraction.  If we divide 1425/100, we get 14.05 and if we truncate the decimal, we get 14.

II) mod – returns the modulus or remainder from a division operation.  I use this function a lot.  Mod(1425/100) gives 25.

III) 14* 60 + 25 = 865 minutes.

iv) Now add 305 minutes to 865.  This gives us a total of 1165 minutes between the start and end times.

3. Now divide the timeDiffInMinutes by the barInterval.  This gives 1165 minutes/5 minutes or 233 five-minute bars.

Build Database of all potential time stamps between start and end time

We now have all the ingredients to build are simple array-based database.  Don’t let the word array scare you away.  Follow the logic and you will see how easy it is to use them.   First, we will create the database of all the time stamps between the regular session start and end times of the data on the chart.  We will use the same time-based math (and a little more) to create this benchmark database.  Check out the following code.

// You could use static arrays
// reserve enough room for 24 hours of minute bars
// 24 * 60 = 1440
// arrays: theoTimes[1440](0),validTimes[1440](0);
// syntax - arrayName[size](0) - the zero sets all elements to zero
// this seems like over kill because we don't know what
// bar interval or time span the user will be using

// these arrays are dynamic
// we dimension or reserve space for just what we need
arrays: theoTimes[](0),validTimes[](0);

// Create a database of all times stamps that potentiall could
// occur

numBarsInCompleteSession = timeDiffInMinutes/barInterval;

// Now set the dimension of the array by using the following
// function and the number of bars we calculated for the entire
// regular session
Array_setmaxindex(theoTimes,numBarsInCompleteSession);
// Load the array from start time to end time
// We know the start time and we know the number of X-min bars
// loop from 1 to numBarsInCompleteSession and
// use timeSum as the each and every time stamp
// To get to the end of our journey we must use Time Based Math again.
timeSum = startTime;
for arrayIndex = 1 to numBarsInCompleteSession
Begin
timeSum = timeSum + barInterval;
if mod(timeSum,100) = 60 Then
timeSum = timeSum - 60 + 100; // 1860 - becomes 1900
if timeSum = 2400 Then
timeSum = 0; // 2400 becomes 0000
theoTimes[arrayIndex] = timeSum;

print(d," theo time",arrayIndex," ",theoTimes[arrayIndex]);
end;

Create a dynamic array with all possible time stamps

This is a simple looping mechanism that continually adds the barInterval to timeSum until numBarsInCompleteSession are exhausted.  Reade about the difference between static and dynamic arrays in the code, please.  Here’s how it works with a session start time of 1800:

theoTimes[01] = 1800 + 5 = 1805
theoTimes[02] = 1805 + 5 = 1810
theoTimes[04] = 1810 + 5 = 1815
theoTimes[05] = 1815 + 5 = 1820
theoTimes[06] = 1820 + 5 = 1830
...
//whoops - need more time based math 1860 is not valid
theoTimes[12] = 1855 + 5 = 1860

Insert bar stamps into our theoTimes array

More time-based math

Our loop hit a snag when we came up with 1860 as a valid time.  We all know that 1860 is really 1900.  We need to intervene when this occurs.  All we need to do is use our modulus function again to extract the minutes from our time.

If mod(timeSum,100) = 60 then timeSum = timeSum – 60 + 100.  Her we remove the sixty minutes from the time and add an hour to it.

1860 – 60 + 100 = 1900 // a valid time stamp

That should fix everything right?  What about this:

theoTimes[69] = 2340 + 5 = 2345
theoTimes[70] = 2345 + 5 = 2350
theoTimes[71] = 2350 + 5 = 2355
theoTimes[72] = 2355 + 5 = 2400 // whoops

2400 is okay in Military Time but not in TradeStation

This is a simple fix with.  All we need to do is check to see if timeSum = 2400 and if so, just simply reset to zero.

Build a database on our custom time frame.

Basically, do the same thing, but use the user’s choice of start and end times.

//calculate the number of barInterval bars in the
//user defined session
numBarsInSession = timeDiffInMinutes/barInterval;

Array_setmaxindex(validTimes,numBarsInSession);

startTimeStamp = calcTime(startTime,barInterval);

timeSum = startTime;
for arrayIndex = 1 to numBarsInSession
Begin
timeSum = timeSum + barInterval;
if mod(timeSum,100) = 60 Then
timeSum = timeSum - 60 + 100;
if timeSum = 2400 Then
timeSum = 0;
validTimes[arrayIndex] = timeSum;
//print(d," valid times ",arrayIndex," ",validTimes[arrayIndex]," ",numBarsInSession);
end;

Create another database using the time frame chose by the user

Don’t allow weird times!

Good programmers don’t allow extraneous values to bomb their functions.  TRY and CATCH the erroneous input before proceeding.  If we have a database of all possible time stamps, shouldn’t we use it to validate the user entry?  Of course, we should.

//Are the users startTime and endTime valid
//bar time stamps? Loop through all the times
//and validate the times.

for arrayIndex = 1 to numBarsInCompleteSession
begin
if startTimeStamp = theoTimes[arrayIndex] then
validStartTime = True;
if endTime = theoTimes[arrayIndex] Then
validEndTime = True;
end;

Validate user's input

Once we determine if both time inputs are valid, then we can determine if the any bar’s time stamp during a back-test is a valid time.

if validStartTime = false or validEndTime = false Then
error = True;
//Okay to check for bar time stamps against our
//database - only go through the loop until we
//validate the time - break out when time is found
//in database. CanTradeThisTime is the name of the function.
//It returns either True or False

if error = False Then
Begin
for arrayIndex = 1 to numBarsInSession
Begin
if t = validTimes[arrayIndex] Then
begin
CanTradeThisTime = True;
break;
end;
end;
end;

This portion of the code is executed on every bar of the back-test

Once and only Once!

The code that creates the theoretical and user defined time stamp database is only done on the very first bar of the chart.  Also, the validation of the user’s input in only done once as well.  This is accomplished by encasing this code inside a Once – begin – end.

Now this code will test any time stamp against the current regular session.  If you run a test prior to June 2021, you will get a theoretical database that includes a 4:20, 4:25, and 4:30 on the ES futures.  However, in actuality these bar stamps did not exist in the data.  This might cause a problem when working with a start or end time prior to June 2021, that falls in this range.

Function Name:  CanTradeThisTime

Complete code:

// Function to determine if time is in acceptable
// set of times
inputs:startTime(numericSimple),endTime(numericSimple);

vars: sessStartTime(0),sessEndTime(0),
startTimeStamp(0),timeSum(0),timeDiffInHrsMins(0),timeDiffInMinutes(0),
validStartTime(False), validEndTime(False);

vars: error(False),arrayIndex(0),
numBarsInSession(0),numBarsInCompleteSession(0);

arrays: theoTimes[](0),validTimes[](0);
vars: arrCnt(0),seed(0);

canTradeThisTime = false;

once
Begin

sessStartTime = sessionStartTime(0,1);
sessEndTime = sessionEndTime(0,1);

if sessStartTime > sessEndTime Then
Begin
timeDiffInHrsMins = 2360 - sessStartTime;
timeDiffInMinutes = intPortion(timeDiffInHrsMins/100) * 60 + mod(timeDiffInHrsMins,100);

timeDiffInHrsMins = sessEndTime - 0;
timeDiffInMinutes += intPortion(timeDiffInHrsMins/100) * 60 + mod(timeDiffInHrsMins,100);
end;

if sessStartTime <= sessEndTime Then
Begin
timeDiffInHrsMins = (intPortion(sessEndTime/100) - 1)*100 + mod(sessEndTime,100) + 60 - sessEndTime;
timeDiffInMinutes = intPortion(timeDiffInHrsMins/100) * 60 + mod(timeDiffInHrsMins,100);
end;

numBarsInCompleteSession = timeDiffInMinutes/barInterval;

Array_setmaxindex(theoTimes,numBarsInCompleteSession);

timeSum = startTime;
for arrayIndex = 1 to numBarsInCompleteSession
Begin
timeSum = timeSum + barInterval;
if mod(timeSum,100) = 60 Then
timeSum = timeSum - 60 + 100;
if timeSum = 2400 Then
timeSum = 0;
theoTimes[arrayIndex] = timeSum;

print(d," theo time",arrayIndex," ",theoTimes[arrayIndex]);
end;

if startTime > endTime Then
Begin
timeDiffInHrsMins = 2360 - startTime;
timeDiffInMinutes = intPortion(timeDiffInHrsMins/100) * 60 + mod(timeDiffInHrsMins,100);
timeDiffInHrsMins = endTime - 0;
timeDiffInMinutes += intPortion(timeDiffInHrsMins/100) * 60 + mod(timeDiffInHrsMins,100);
end;

if startTime <= endTime Then
Begin
timeDiffInHrsMins = (intPortion(endTime/100) - 1)*100 + mod(endTime,100) + 60 - startTime;
timeDiffInMinutes = intPortion(timeDiffInHrsMins/100) * 60 + mod(timeDiffInHrsMins,100);
end;

numBarsInSession = timeDiffInMinutes/barInterval;

Array_setmaxindex(validTimes,numBarsInSession);

startTimeStamp = calcTime(startTime,barInterval);

timeSum = startTime;
for arrayIndex = 1 to numBarsInSession
Begin
timeSum = timeSum + barInterval;
if mod(timeSum,100) = 60 Then
timeSum = timeSum - 60 + 100;
if timeSum = 2400 Then
timeSum = 0;
validTimes[arrayIndex] = timeSum;
print(d," valid times ",arrayIndex," ",validTimes[arrayIndex]," ",numBarsInSession);
end;
for arrayIndex = 1 to numBarsInCompleteSession
begin
if startTimeStamp = theoTimes[arrayIndex] then
validStartTime = True;
if endTime = theoTimes[arrayIndex] Then
validEndTime = True;
end;
end;

if validStartTime = False or validEndTime = false Then
error = True;

if error = False Then
Begin
for arrayIndex = 1 to numBarsInSession
Begin
if t = validTimes[arrayIndex] Then
begin
CanTradeThisTime = True;
break;
end;
end;
end;

Complete CanTradeThisTime function code

Sandbox Strategy function driver

inputs: startTime(1800),endTime(1500);

if canTradeThisTime(startTime,endTime) Then
if d = 1231206 or d = 1231207 then
print(d," ",t," can trade this time");

I hope you find this useful.  Remember to purchase by Easing into EasyLanguage books at amazon.com.  The DayTrade edition is still on sale.  Email me with any question or suggestions or bugs or anything else.

>>By George Pruitt from blog georgepruitt.com

]]>
https://easylanguagemastery.com/building-strategies/a-timely-function-in-easylanguage/feed/ 4
My Best Tips To Avoid Curve Fitting When Building A Trading System https://easylanguagemastery.com/building-strategies/my-best-tips-to-avoid-curve-fitting-when-building-a-trading-system/?utm_source=rss&utm_medium=rss&utm_campaign=my-best-tips-to-avoid-curve-fitting-when-building-a-trading-system https://easylanguagemastery.com/building-strategies/my-best-tips-to-avoid-curve-fitting-when-building-a-trading-system/#comments Mon, 15 Jan 2024 11:00:00 +0000 https://easylanguagemastery.com/?p=530592

This article will show you my best tips to crush curve fitting while building your trading system. Doing so will help you create systems that work on the live market.

As you know, curve fitting is a danger that all system developers must constantly be aware of. Curve fitting destroys your trading system and can give you false hope that your system will work on the live market, which results in you losing your hard-earned money.

Curve fitting often hits novice system developers who need to learn about it. This can be particularly confusing and disheartening as you spent so much time on the strategy only to have it fail on the out-of-sample or new live data. But curve fitting can creep into the development process of professional developers. So, it's your job to be aware of curve fitting and how to avoid it. You must be vigilant about it.

Let's dive into how to minimize this common and costly problem.

Curve Fitting vs. Overfitting

First, let's clear up some terms.

When people in the trading world talk about curve-fitting, they refer to a trading system's negative behavior when moving to the out-of-sample data segment. This often results in a failing out-of-sample performance.

The term curve fitting is not the correct term. A better word is overfitting. I may mix curve fitting with overfitting in my work, which is sloppy. However, because so many associate curve-fitting with overfitting, I may continue to use these terms interchangeably.

From a mathematical standpoint, curve-fitting attempts to fit a line or curve through a series of data points. That's not what we are doing with a trading system. Regarding trading, we are attempting to optimize a trading system's performance. Often this is a crucial performance metric, such as net profit vs. drawdown. We do this by trying to introduce filters and optimize various parameters.

As traders, we depend upon finding market edges we can exploit. All successful trading relies on recurring patterns found in the historical market data and using those patterns to make a profit. This is true for discretionary traders as well as system traders.

However, much of the historical market movement is noise or random action. So, the problem arises when we over-optimize our system to the historical data, and our system begins to key into patterns within the noise instead of exploiting a true market edge. Finding a pattern in the noise can produce great-looking equity curves on historical data, but the equity curve can break down quickly on out-of-sample data.

Put another way. We want a trading system that exploits a true market edge and avoids keying off patterns in the market noise. The idea of separating a true market edge from the noise is what this game is all about.

An Over Fit Strategy

This strategy is likely overfit to the historical data. A great example of a "curve fit" trading system.

Overfitting occurs when a model is excessively complex and has too many parameters. If you find yourself adding filter upon filter, you may be overfitting. Too many filters or parameters will likely create an overfitted model that needs to be more generalizable to new data. Put another way. You're likely building a trading system that can only trade on the in-sample!

How to Prevent Overfitting When Building Your Strategy

To avoid overfitting, let's cover what you should do during your strategy development.

1. Don't Attempt to Build The Perfect-Looking Equity Curve

We'll start with a mind-shift change. This one took me a year or two to fully accept, but it's so important.

The novice strategy developers believe they should build perfect-looking equity curves. I used to think that. I remember spending hours or days on a single strategy, adding more filters to make a great-looking backtest. 

Remember, your job as an algorithmic trader is not to build a perfect-looking equity curve. That's easy to do. Your job is to construct a strategy that can generalize to new data. This means following some critical steps to avoid overfitting. The remainder of my recommendations will help you do just that. Some of these steps will feel unnatural or counterintuitive. That's why so many people fail at trading.

2. Have Enough Proper In-Sample Data

Have enough in-sample data to generate hundreds of trades. I suggest 300+ trades. Your historical data should span different market regimes (bull/bear). Don't have your in-sample data segment only span a bull market. Picking the ES dates between 2009 and 2019 shows a clear bull market. You want your in-sample data to span different regimes so your trading system is exposed to these various regimes.

3. Limit Degrees of Freedom

Degrees of freedom is a fancy word for the number of pieces of information in the data. It's determined by the number of parameters in the model relative to the number of observations. We won't get into the details of degrees of freedom and keep this very practical.

We generally want to keep the degrees of freedom to the lowest possible. As algorithmic traders, we can count the degrees of freedom by counting the optimized parameters used by our strategy. The more degrees of freedom in the data set, the more likely it is to overfit. Adding more filters and parameters to optimize allows your strategy to train on the noise of the data. These types of patterns will never predictably repeat.

Suppose you can build a trading system with no parameters. Those do exist, but you will have a few parameters most of the time. Starting in algo trading, you should keep the parameters below 4. That may be challenging to do. Once you gain some experience, I've seen evidence that up to 6 parameters can be acceptable. 

4. Don't Spend Too Much Time On One Systems

This technique is called Early Stopping. This is another fancy term used in machine learning. The problem is that a trading system can train too long and overfit. What does this mean for you and me as system traders? Don't spend too much time tweaking a strategy. A good strategy will work right away. You may be overfitting if you spend many hours daily on a single strategy. If it's not working right away, move on. Kevin Davey stated, don't torture your data! I agree.

5. Optimization Done Right!

Optimization can be a real source of fun when building a trading system. Seeing how optimizing the parameters gets that perfect-looking equity curve is fun! But as we now know, it's not the right thing to do.

Optimizing can be a strong source of curve-fitting. However, you can significantly reduce the risk by following a few key points.

  • You want to avoid picking the best value when optimizing a parameter. Instead, look for a stable region.
  • Ideally, you want to find a cluster or range of values where your system performs well.
  • Abandon this parameter/filter if there are no stable regions. In this case, the optimization will often look choppy with drastic changes.

Below is a bar graph of a look-back period optimization.. The x-axis contains the look-back period, while the y-axis contains the trading system's total P&L.  The blue arrow is highlighting what most people would select. The largest bar! 

How not to optimize

This is not how you optimize correctly. Don't pick the best value!

You don't want to pick the best value. In the case above, the best value is a four. Most notice algorithmic traders will pick that value. However, that's a big mistake. You want to pick a value within a stable range. See below.

In this case, above, we can see a stable range can be seen between values 10 and 26. I would simple pick the middle value 18. 

Summary

When building trading systems, we want to develop simple strategies to generalize the out-of-sample data better. We want to build systems that generate signals from market edges, not market noise.


When building a trading system, keep these points in mind:

  • Don't attempt to build the perfect-looking equity curve
  • Try to keep your system as simple as possible
  • Limit 1-3 filters (rules).
  • Key Idea + Rule 1 + Rules 2 + Rule 3
  • Limit 1-3 parameters for optimization
  • When optimizing, don't pick the best value. Pick values within a stable range.

Your historical price data:

  • Have enough in-sample data to generate hundreds of trades for your key idea 300+
  • Your historical data should span different market regimes (bull/bear)
  • I like to use from 2007-2020 as my in-sample when building strategies for the stock index markets. It covers both bull and bear regimes.

Following these steps, you will likely significantly reduce the chances of overfitting your strategy, which means it's more likely to work on the live market.

]]>
https://easylanguagemastery.com/building-strategies/my-best-tips-to-avoid-curve-fitting-when-building-a-trading-system/feed/ 2