Login Page - Create Account

Support Board


Date/Time: Tue, 30 Apr 2024 13:58:23 +0000



[User Discussion] - Offering To The Community: Enhanced Intraday Data File Compression with Large Speedups

View Count: 3130

[2016-02-04 19:14:19]
bjohnson777 (Brett Johnson) - Posts: 284
*** This probably won't work anymore with the date change in SC 2150.

In 1bar mode, this program will essentially do the same as the built in Intraday Data File Compression but 10-20x faster.

Where it gets interesting is 4bar mode where each bar is split into 4 ticks for Open, High, Low, and Close. For some reason this mode will run 6x faster on my system (noting the compressed file size is 4x larger) than the usual 1bar compression.

Note that this is an external command line program (EXE) and not a SC plugin study (DLL). It's a bit more complicated to run. Details below. The source code compiles cleanly on my 32bit and 64bit systems (linux and win). If you don't want to compile it yourself, I've attached 2 EXE's for win platforms. They are self contained and should run cleanly. If you're using an ancient computer that is 32bit only, choose the 32bit file. If you're running a multi-core system bought within the past several years, choose the 64bit file. The 32bit file should run on both if there's a problem.

This program will keep the up/down volume ratios from tick by tick data similar to the SC built in function.

Support didn't think this was possible, but it works just fine with Ask/Bid Volume (SC_ASKVOL and SC_BIDVOL). I'll be updating my other studies I've posted to make this default in a few days.

I also programmed a 2bar version where up counts are one bar and down counts are the other. For some reason this causes SC to hang and barf. This is probably a SC bug that needs to be looked at. DO NOT use the 2bar output for now.

-----

Support:

Have a look at SCIDRecordWrite4Bars() just above main(). This is what I'm using to create the 4 OHLC bars. I've already provided the code, so this speed up needs to be integrated into the SC compress function. It should be easy.

The SCID file format page also needs some updating:
http://www.sierrachart.com/index.php?page=doc/doc_IntradayDataFileFormat.html

The data types should be size specific and not generic anymore. A long int may be 32bit or 64bit depending on the compiler. An int32_t or uint32_t will be the same on all compilers. Have a look at the top of my source file for what I'm using.

Also s_Record doesn't exist anymore on the doc page.

-----
Pieces from my file notes:


This program is also an example of a working SCID reader. If
Version 2 comes out, this program will likely need updating. Using
this program on another version will probably corrupt that version.

Building and running: This program uses specific sized data types and should
compile cleanly on 32bit and 64bit systems.

Linux:
g++ -O3 --static -o SC_CompressDataUnitSize SC_CompressDataUnitSize.cpp
Copy SC_CompressDataUnitSize to the SC Data directory.
Example: ./SC_CompressDataUnitSize -c4 -tm -u1 GBPUSD.scid.old GBPUSD.scid

Windows:
Change directory to C:\SierraChart\CPPCompiler\bin
Copy the SC_CompressDataUnitSize.cpp source file here.
Open a DOS window.
g++.exe -O3 --static -o SC_CompressDataUnitSize.exe SC_CompressDataUnitSize.cpp
Copy SC_CompressDataUnitSize.exe to the SC Data directory.
If you haven't already, fully exit SC to avoid causing data corruption.
While in the Data directory, rename what ever files you want to compress with
a ".old" extension. In this example GBPUSD.scid gets renamed to GBPUSD.scid.old.
If there is a problem, delete the bad SCID file (GBPUSD.scid in this example)
and rename GBPUSD.scid.old back to GBPUSD.scid.
Open a DOS window and run from the Data directory:
SC_CompressDataUnitSize.exe -c4 -tm -u1 GBPUSD.scid.old GBPUSD.scid

After this program finishes, use SC to export the data to CSV if necessary.
This program includes CSV exports (mainly for debugging), but SC will probably
export cleaner and more usable files.

This program aligns the output bars to the beginning of each time block.
This keeps bars aligned in non-market graphing programs and spreadsheets.

My ChartBook Load Speed Test:
Original 1 Tick SCID Size: 6g (around 4-5min to load)
smashed by SC = 80sec (49megs).
1bar = 80sec (48megs). This is essentially similar to smashed by SC.
2bar = ?sec (95megs). Hangs
4bar = 14sec (191megs).

Usage Screen:

Usage: SC_CompressDataUnitSize.exe -opts InFile.scid OutFile.scid
Program to compress Sierra Chart Version 1 SCID tick by tick files down
in size with different methods while preserving up and down volume counts.
This version is 10-20x faster than the built in SC function. The "-c4"
option also offers a 6x run time speed up than the traditional compression.

Options (-opts) start with the dash (-) character followed by a letter and
a number (replaces the #) with some options.
-c#: Bar Consolidation Type. 1 is similar to SC's built in function. 2 hangs SC
for some reason. It will produce an up and down bar for each time unit. 4 will
give the 6x speed up. It produces 4 bars for Open, High, Low, and Close ticks.
-t#: Time Prefix. Options are s for seconds, m for minutes, h for hours, and d
for days.
-u#: Time Units. The number of units for -t. "-tm -u1" would give 1min bars.
Note SCID intraday files can have a maximum bar length of 1 day. Anything over
that will be rounded down to 1 day. Use the daily CSV text format for anything
higher than a day. Watch out for uneven dividing of the time units into 1
trading day. This program does not convert time zones. Be careful with larger
time units.
-x#: Cut days older than # back. This is used for trimming down SCID files.
-y#: Do not process (just pass them through) ticks from the last # days.
Days back are CALENDAR days, not trading days. Watch out for weekends and
holidays. Usually give 3-4 extra days to account for that.
-r: Write out CSV file from the SCID input. Watch out for file size.
-R: Same as -r except more human readable for debugging.
-w: Write out CSV file from the SCID output. Watch out for file size.
-W: Same as -w except more human readable for debugging.
-d: Enable debugging mode. More output is given.
-B: Batch mode. Doesn't display the warning. Use with caution.

Time Options Examples: 1sec bars: -ts -u1. 30sec bars: -ts -u30.
1min bars: -ts -u60 or -tm -u1. 10min bars: -tm -u10.
45min bars: -tm -u45. 1hr bars: -tm -u60 or -th -u1.
4hr bars: -th -u4. 6hr bars: -th -u6. 1day bars: -td -u1.

Convert forex EUR/USD tick by tick data to fast 1min bars discarding anything
older than 30 days and not converting the past 7 days with debug CSV files:
SC_CompressDataUnitSize.exe -c4 -tm -u1 -x30 -y7 -R -W EURUSD.scid.old EURUSD.scid

Version 0.9 2016-02-03 GPL'd and Open Sourced by Brett Johnson

-----
List of my programs available on "Brett Johnson's Standard Tool Kit" DLL page.
Offering To The Community: Brett Johnson's Standard Tool Kit
Date Time Of Last Edit: 2020-09-16 09:38:42
attachmentSC_CompressDataUnitSize_32bit.exe - Attached On 2016-02-04 18:29:05 UTC - Size: 121.5 KB - 638 views
attachmentSC_CompressDataUnitSize_64bit.exe - Attached On 2016-02-04 18:29:10 UTC - Size: 189 KB - 512 views
attachmentSC_CompressDataUnitSize.cpp - Attached On 2016-02-04 19:01:35 UTC - Size: 45.4 KB - 628 views
[2016-02-24 16:39:31]
sigmadict - Posts: 95
Hello,
I am interested in your post but I don't realy understand what this will speed up.
The chart loading the data, also replay mode ?

Thanks
Regards
[2016-02-25 07:19:00]
bjohnson777 (Brett Johnson) - Posts: 284
When the chart is first opened, the 4bar mode loads much faster. Recalcs are also faster. For some reason, 4 separate bars as ticks load much faster than a single consolidated bar of the same data. The developers haven't commented on it.

If you have a lot of older tick data you want to compress down to a smaller size (like to 1min), it's worth looking into.

The developers fixed part of the doc page, but the specific data type sizes should be: (missing 't' and underscore is in the wrong place)
//As seen from:
#include <stdint.h>
uint16_t
uint32_t

[2016-02-26 05:12:21]
sigmadict - Posts: 95
Hello,
thanks for your answer.
I am not a programmer, so I don't understand the codes.
I was interested in getting Sierra Chart faster and I was curious reading your post.
Do you only need to install .exe to make it work, or you need advance skills ?
Also, is this going to reduce accuracy of Tick by Tick Simulation Replays, or order entries ?
Sorry if I ask simple questions.

Thanks
Regards
[2016-02-26 09:58:46]
bjohnson777 (Brett Johnson) - Posts: 284
The code block is for the developers. They need to fix that page.

The program does the same thing in File >> Data/Trade Service Settings >> Data File Management. Long description here:
http://www.sierrachart.com/index.php?page=doc/doc_DataSourceSettings.php#DataFileManagement

Both methods take multiple ticks and combine them into a single unit. The less tick data there is to compute through, the faster it will be.

If you have to have tick by tick data for your replays, this won't work very well and neither method is recommended. Simulated orders during the replay will change their entry/exit points a little. Live orders are unaffected.

For what I'm interested in, I just need the results from the tick by tick data for accurate up/down volume counts. Those will be preserved. In my case, I convert individual ticks into 1 minute bars. It converted a 6g intraday data file into 200megs.

I ran across another post where the developers mention they are working on increasing the data file speed. I'm not sure what they're doing, but they think it will be out in a couple months.
[2019-06-06 22:05:56]
Chad - Posts: 231
Hi Brett,
In order to do a 'batch process' of several .scid files for compression, is there a feature included with the .exe that you made, or should I modify the source script in order to do so?
[2019-06-07 06:00:10]
bjohnson777 (Brett Johnson) - Posts: 284
If your under windows, you'll need to write your own .bat file. The exe only operates on one scid file at a time. I'm running SC under WINE, so I use a linux shell script to loop through all the scid files in the data directory.

If this is your first time running the program, copy or zip your SC data directory out to make sure you don't run into any problems (like maybe corrupted data).

I've been using -c1 to combine all the ticks into a single bar. A little higher up I mentioned that the SC devs were working on speeding up loading. Looks like they did that sometime ago.

To post a message in this thread, you need to log in with your Sierra Chart account:

Login

Login Page - Create Account