Login Page - Create Account

Support Board


Date/Time: Tue, 23 Apr 2024 10:08:18 +0000



[User Discussion] - Python for Sierra Chart

View Count: 34131

[2013-06-13 03:00:17]
Kiwi - Posts: 374
For a while I've been looking at moving my development to Python rather than C++.

To enable this required a few extra bits and pieces that I'll post to the board for anyone interested. The original post is here.

The attached file runs with Python3 (might run with 2 as well). It:

1. Reads an SCID file.
2. Converts it to a pandas dataframe (for time series manipulations)
3. Writes it back to an SCID file leaving the T, BV and AV fields free so that they can be used for commands to C++ code to indicate conditions on the SC chart or to act on Sierra (place orders etc).

An entire read/convert/write loop takes under 1ms on a 3.2GHz machine so there is no appreciable lag between reading a new tick and writing it to the output file.

.
attachmentSCID_to_DF_RT.py - Attached On 2013-06-13 02:59:38 UTC - Size: 10.03 KB - 1854 views
[2013-06-13 19:24:24]
ganz - Posts: 1048
Hello sir

Thank you for the file. Very interesting.

gd lck
[2013-06-20 23:10:17]
Kiwi - Posts: 374
Updated Version with new methods in for converting the dataframe to a longer timeframe and for mapping longer timeframe development onto the lower timeframe.

#!/usr/bin/python3
from __future__ import print_function
import numpy as np
import pandas as pd
import struct
import sys
from time import sleep, time

o = O = 'O'
h = H = 'H'
l = L = 'L'
c = C = 'C'
v = V = 'V'
x = 'x'
y = 'y'
z = 'z'

time_list = []
overrun_list = []
overruns = 0

lt = 15
mt = 5
st = 1

ohlc = {o: 'first', h: 'max', l: 'min', c: 'last',
v: 'sum', x: 'sum', y: 'sum', z: 'sum'}
cols = [O, H, L, C, V, x, y, z]
time_list = []



class SierraFile(object):
""" """
def __init__(self, filename):
self.filename = str(filename)
# self.tzAdjust = t imedelta(hours=+10).seconds/d2s
self.tzAdjust = np.timedelta64(10, 'h') / np.timedelta64(1, 'D')
self.excelDate = np.datetime64('1899-12-30')
self.sizeHeader = 0x38
self.sizeRecord = 0x28
self.pos = 0
self.last = 0

def read_existing_records(self):
with open(self.filename, 'rb') as fscid:
fscid.read(self.sizeHeader) # discard header
rows = []
ts = []
for i in range(1000000):
data = fscid.read(self.sizeRecord)
if data not in ('', b''):
d = struct.unpack('d4f4I', data)
dt = d[0] + self.tzAdjust
ts.append(self.excelDate + np.timedelta64(int(dt))
+ (np.timedelta64(int(round((dt - int(dt))
* 86400)), 's')))
datarow = [d[1], d[2], d[3], d[4], d[5], 0, 0, 0]
rows.append(datarow)
else:
break
self.pos = self.last = fscid.tell()
return (ts, rows)

def read_record(self):
global overruns, overrun_list
with open(self.filename, 'rb') as fscid:
fscid.seek(0, 2) # Go to the end of the file
self.last = fscid.tell()
if self.last == self.pos: # no new data >> nothing to do
return (-999, 0, 0)
else: # data to collect
if self.pos < self.last - self.sizeRecord: # > 1 record
print('Overrun', self.last - self.pos,
(self.last - self.pos) / self.sizeRecord)
overruns += 1
overrun_list.append(np.datetime64('now'))
late_flag = True
else:
late_flag = False
fscid.seek(self.pos, 0)
self.pos += self.sizeRecord
data = fscid.read(self.sizeRecord)
d = struct.unpack('d4f4I', data)
dt = d[0] + self.tzAdjust
new_time = (self.excelDate + np.timedelta64(int(dt))
+ (np.timedelta64(int(round((dt - int(dt))
* 86400)), 's')))
datarow = [d[1], d[2], d[3], d[4], d[5], 0, 0, 0]
return (new_time, datarow, late_flag)

def write_existing_records(self, dataframe):
with open(self.filename, 'wb') as fscid:
header = b'SCID8\x00\x00\x00(\x00\x00\x00\x01\x00'
fscid.write(header)
for i in range(21):
fscid.write(b'\x00\x00')
for i in range(dataframe.end):
da = ((dataframe.df.index.values[i] - self.excelDate)
/ np.timedelta64(1, 'D') - self.tzAdjust)
db, dc, dd, de, df, dg, dh, di = dataframe.df.iloc[i]
di = 0x11100111
df = int(df)
dg = int(dg)
dh = int(dh)
di = int(di)
wt = struct.pack('d4f4I', da, db, dc, dd, de, df, dg, dh, di)
fscid.write(wt)

def write_record(self, dataframe):
with open(self.filename, 'ab') as fscid:
i = dataframe.end - 1
da = ((dataframe.df.index.values[i] - self.excelDate)
/ np.timedelta64(1, 'D') - self.tzAdjust)
db, dc, dd, de, df, dg, dh, di = dataframe.df.iloc[i]
di = 0x88300388
df = int(df)
dg = int(dg)
dh = int(dh)
di = int(di)
record = struct.pack('d4f4I', da, db, dc, dd, de, df, dg, dh, di)
fscid.write(record)


class SierraFrame(object):
"""
DataFrame is the basic object for analysis:
init reads the .scid file into the initial object, 5 sec assumed
extend_frame adds 5000 rows to the df because appending rows is slow
add appends new data in the extended frame for real time operation
build_tf creates a new dataframe that is a multiplier of the input df
build_htf_array creates an array showing higher timeframe bars as
they develop for the lower timeframe array
countfloats is a test method
"""
def __init__(self, time_index, data):
self.df = pd.DataFrame(data, index=time_index,
columns=[O, H, L, C, V, x, y, z])
self.end = len(self.df)
self.pos = 0

def extend_frame(self):
'''
Create a 5000 row array from last time in self.df
and append it to self.df
Remove lunch break from array
'''
print('Extending DataFrame Now')
s5 = np.timedelta64(5, 's')
h1 = np.timedelta64(1, 'h')
sl = np.datetime64('today') + np.timedelta64(14, 'h')
el = np.datetime64('today') + np.timedelta64(15, 'h')
start_time = self.df.index.values[self.end - 1]
dtgen = ((start_time + i * s5) for i in range(1, 5000))
dtstrip = ((i + h1 if sl <= i < el else i) for i in dtgen)
dg = pd.DataFrame(index=dtstrip, columns=self.df.columns)
#dg.iloc[:] = 0.0
#dg[[v, x, y, z]] = dg[[v, x, y, z]].astype('int')
self.df = self.df.append(dg)
self.df = self.df.astype(np.float64)

def add(self, new_time, datarow):
'''
Add a row to an existing extended df but:
extend if its within 5 of the end
fill with last bar if its not the next bar
convert the four integer columns to float for df speed of access
'''
if self.end > len(self.df) - 5:
self.extend_frame() # not needed if first fill > day length
np_time = np.datetime64(new_time)
if np_time < self.df.index.values[self.end]:
return # new data is earlier than current
while np_time > self.df.index.values[self.end]:
self.df.iloc[self.end] = self.df.iloc[self.end - 1]
self.end += 1 # fill with prior row if new is later
for i in [4, 5, 6, 7]:
datarow[i] = float(datarow[i])
self.df.iloc[self.end] = datarow # fill when times match
#self.df.iloc[self.end] = self.df.iloc[self.end].astype(np.float64)
self.end += 1

def build_tf(self, ht):
'''
Create higher timeframe df that is a multiplier of the input, di
with ht being the high timeframe bar length in minutes
'''
return self.df.resample(str(ht)+'min', how=ohlc)[cols]

def build_htf_array(self, st, ht):
'''
Map higher timeframe development on to input df
with ht being the high timeframe bar length in minutes
'''
di = self.df.resample(str(st)+'min', how=ohlc)[cols]
dih = di.iloc[:,0:5]
for i in range(len(dih)):
if i == 0 or i//ht > (i-1)//ht:
bO = dih.iloc[i, 0]
bH = dih.iloc[i, 1]
bL = dih.iloc[i, 2]
bC = dih.iloc[i, 3]
else:
dih.iloc[i, 0] = bO
dih.iloc[i, 1] = bH = max(bH, dih.iloc[i, 1])
dih.iloc[i, 2] = bL = min(bL, dih.iloc[i, 2])
bC = dih.iloc[i, 3]
return dih

def countfloats(self):
length = len (self.df)
width = len(self.df.iloc[0])
floats = 0
nonfloats = 0
for i in range(length):
for j in range(width):
if isinstance(self.df.iloc[i,j], float):
floats += 1
else:
nonfloats += 1
return (floats, nonfloats)

def build_htf_array(di, ht):
'''
Map higher timeframe development on to input df
with ht being the high timeframe bar length in minutes
'''
dih = di.iloc[:,0:5].copy()
for i in range(len(dih)):
if i == 0 or i//ht > (i-1)//ht:
bO = dih.iloc[i, 0]
bH = dih.iloc[i, 1]
bL = dih.iloc[i, 2]
bC = dih.iloc[i, 3]
else:
dih.iloc[i, 0] = bO
dih.iloc[i, 1] = bH = max(bH, dih.iloc[i, 1])
dih.iloc[i, 2] = bL = min(bL, dih.iloc[i, 2])
bC = dih.iloc[i, 3]
return dih

def build_tf(di, ht):
'''
Create higher timeframe df that is a multiplier of the input, di
with ht being the high timeframe bar length in minutes
'''
return di.resample(str(ht)+'min', how=ohlc)[cols]



def SierraRun():
global time_list
time0 = time()
#filename = '/home/john/zRamdisk/SierraChart/Data/HSI-201306-HKFE-TD.scid'
filename = '/home/john/zRamdisk/SierraChart/Data/HSIM13-FUT-HKFE-TD.scid'
hsi = SierraFile(filename)
time_index, data = hsi.read_existing_records()
da = SierraFrame(time_index, data)
import ipdb; ipdb.set_trace() # XXX BREAKPOINT
da.extend_frame()
wtst = SierraFile('/home/john/zRamdisk/SierraChart/Data/HSI-INPUT.scid')
wtst.write_existing_records(da)
print('df ready', da.end - 1, time() - time0)
print(da.df[da.end - 1:da.end + 1])
print()
df = da.df
print('\n', np.datetime64('now'), da.end)
print(df[da.end - 5:da.end + 5])

import ipdb; ipdb.set_trace() # XXX BREAKPOINT


#time_list = []
#for i in range(4000):
#intime = df.index.values[da.end]
#time0 = time()
#da.add(intime, [1.0, 2.0, 3.0, 4.0, 5, 6, 7, 8])
#time_list.append(time() - time0)

#if time_list:
#print('TimeStats', max(time_list),
#sum(time_list) / len(time_list))
#print('\nEnd of NaN version')

# print('next', hsi.pos, hsi.last)
# jtst = SierraFile('/home/john/zRamdisk/SierraChart/Data/HSI-INPUT.scid')
# time_index, data = jtst.read_existing_records()
# ja = SierraFrame(time_index, data)
# jf = ja.df
# print('\n', ja.end)
# print(df[ja.end-5:ja.end+5])
# print('next', jtst.pos, jtst.last)
# return # ###################
counter = 0
# sys.stdout = os.fdopen(sys.stdout.fileno(), "w", newline=None)
counter_flag = False
timer_no_data = time()
timer_no_data_flag = False
overruns = 0
overrun_list = []
while True:
time0 = time()
new_time, data, late_flag = hsi.read_record()
if new_time != -999:
#time1 = time()
da.add(new_time, data)
#print("{:.6f}".format(time() - time1), end = ' ')
sys.stdout.flush()
wtst.write_record(da)
if counter > 3:
time_list.append(time() - time0)
timer_no_data = time()
#print(da.df[da.end-1:da.end], da.end)
print('.', end=' ')
sys.stdout.flush()
if timer_no_data_flag:
print('Data Restored')
timer_no_data = time()
timer_no_data_flag = False
counter += 1
counter_flag = True
if time() - timer_no_data >= 120 and not timer_no_data_flag:
timer_no_data_flag = True
print('Data lost for two minutes')
if not late_flag:
sleep_time = 0.1 - (time() - time0)
if sleep_time > 0:
sleep(sleep_time)
if counter % 12 == 0 and counter_flag:
counter_flag = False
print(' Overruns:', overruns, overrun_list, end=' ')
print('TimeStats', "{:.6f} {:.6f}".format(max(time_list),
sum(time_list) / len(time_list)), '\n', end=' ')
# print(df[da.end-1:da.end])
sys.stdout.flush()
# break
if counter % 60 == 0 and counter != 0:
import ipdb; ipdb.set_trace() # XXX BREAKPOINT


def main():
SierraRun()

if __name__ == '__main__':
"""
Takes a SierraChart scid file (input argument 1) and converts
it to a Pandas DataFrame
Timezone conversion can follow the users local timezone, or a
specified integer (input l or an integer but if the default
filename is being used, '' must be specified for the filename)
"""
print('start')
sys.stdout.flush()
main()
print('fin')
if time_list != []:
print('TimeStats', "{:.6f} {:.6f}".format(max(time_list),
sum(time_list) / len(time_list)), '\n', end=' ')

Date Time Of Last Edit: 2013-06-21 02:52:49
[2014-01-08 14:40:26]
vectorTrader - Posts: 86
Kiwi, you may be just the guy I am looking for. can you tell me how I can either create a separate .bat file or program(possibly using python, to print my chartbook. I want to create an automatic way of printing some graphs at the end of the day. Thanks for the help

[2014-01-08 23:39:03]
Kiwi - Posts: 374
If your using Linux then, yes I probably can, but I abandoned Windows a while back.
[2014-01-09 04:05:37]
vectorTrader - Posts: 86
I am still interested. While I use a win7 to trade, I really would like to be on linux anyway if I can. How are you graphing/trading on linux? If so I would love to see some screenshot of whatever you are using.

[2014-01-09 05:39:24]
onnb - Posts: 660
this is off the original topic on python, jbutta, for what its worth, the following study will create an image for you on bar close. I used it in order to save images to a web server and it works quite well. You might need to adapt it for your needs like saving the file on session close or at a specific time of day. You would then apply this study to all charts you want saved. SC saves them for you in the images directory same as you would be saving an image manually.


  if (sc.SetDefaults)
  {
    sc.GraphName = "Save Chart Image to File";
    sc.StudyDescription = "";
    sc.AutoLoop = 1; // true
    sc.GraphRegion = 2;
    sc.HideStudy = 1;
    sc.DrawZeros = 0;
    sc.FreeDLL = 1;
    return;
  }
  
  if (sc.GetBarHasClosedStatus() == BHCS_BAR_HAS_NOT_CLOSED)
  {
    return;
  }

  sc.SaveChartImageToFile = 1;

[2014-01-09 13:44:53]
vectorTrader - Posts: 86
Awesome thanks. I think this is what I needed to get where I want to go.
I appreiciate it.

[2014-01-09 13:55:57]
Hendrixon - Posts: 130
Do you mean to develop studies in Python?
What does it give that C++ don't?
[2014-01-09 17:17:04]
vectorTrader - Posts: 86
nothing, I just want to be on linux eventually for my trading platform.

As for this program, I have only programmed a little for NT7 and nothing substantial. I hope to be able to write something today for it. thanks

[2014-01-09 21:28:05]
vectorTrader - Posts: 86
ONNB,
thanks. I am new to coding and sc coding for that matter. I want to create a function that save the chart at the end of the day at 16:15 each day.
I tried to work with what I saw in other codes, but It just keeps saving continuous png's. I just want it to save once at the end of the day. Can you tell me what I did wrong here? remember im a newbie.

thanks in advance!

SCDLLName("Autosave")

SCSFExport scsf_autosave(SCStudyGraphRef sc)
{
  SCInputRef Time1 = sc.Input[0];
  
   if (sc.SetDefaults)
{
sc.GraphName = "Save Chart Image to File";
sc.StudyDescription = "";
sc.AutoLoop = 1; // true
sc.GraphRegion = 2;
sc.HideStudy = 1;
sc.DrawZeros = 0;
sc.FreeDLL = 1;
  Time1.Name = "Time to Print";
  Time1.SetTime(HMS_TIME(16,15,0));

return;
}
SCDateTime Print_Time(sc.BaseDateTimeIn[sc.Index].GetDate(),Time1.GetTime() );//Set the print time for each day.
SCDateTime Current_Time(sc.BaseDateTimeIn[sc.Index].GetDate(),sc.BaseDateTimeIn[sc.Index].GetTime());//Get the current time
  sc.Subgraph[0][sc.Index]=0; //Flag signal
  if (Current_Time>=Print_Time)
  {
    sc.Subgraph[0][sc.Index]=1; //attempt to flag Print time so it only prints once per day
  }
  else
   return;
  //I want it to print/save at the change of state
  if (sc.Subgraph[0][sc.Index-1]==0 && sc.Subgraph[0][sc.Index]==1)
    {
      sc.SaveChartImageToFile = 1;
      sc.AddMessageToLog("Printed file",true);
    }
  else  
    return;  
}

[2014-01-09 23:45:45]
onnb - Posts: 660
the best way to do this depends on your specific circumstances but sticking as much possible to your code...

what is happening is that the study function is like OnBarUpdate from NT. It gets called repeatably for any given bar. So the code here get called many times within the first bar encountered that starts at 16:15 or later


if (sc.Subgraph[0][sc.Index-1]==0 && sc.Subgraph[0][sc.Index]==1)
{
sc.SaveChartImageToFile = 1;
sc.AddMessageToLog("Printed file",true);
}

The simplest way I can think of to leave the rest of your approach intact and still get this to print just once would be to add a condition that you are on bar close for the study to process. That way the study only processes once.

You would add this like so:


if (sc.GetBarHasClosedStatus() == BHCS_BAR_HAS_NOT_CLOSED)
{
return;
}

SCDateTime Print_Time(sc.BaseDateTimeIn[sc.Index].GetDate(),Time1.GetTime() );//Set the print time for each day.
SCDateTime Current_Time(sc.BaseDateTimeIn[sc.Index].GetDate(),sc.BaseDateTimeIn[sc.Index].GetTime());//Get the current time
sc.Subgraph[0][sc.Index]=0; //Flag signal
if (Current_Time>=Print_Time)
{
sc.Subgraph[0][sc.Index]=1; //attempt to flag Print time so it only prints once per day
}
else
return;
//I want it to print/save at the change of state
if (sc.Subgraph[0][sc.Index-1]==0 && sc.Subgraph[0][sc.Index]==1)
{
sc.SaveChartImageToFile = 1;
sc.AddMessageToLog("Printed file",true);
}
else
return;

hope this helps

[2014-01-10 05:04:57]
Kiwi - Posts: 374
You wouldn't develop studies in Python. Python or R or Julia give you an advantage when you want to analyse data statistically or look for generic tendencies etc etc.

If you just want to create studies or executable systems (or test them) then its easiest to do it in C (there isn't much ++ in what we do for Sierra Chart fortunately). In my case, if I wanted to analyse that data in a sophisticated way then I'd export the results of the first phase in a text form and analyse it with the higher level language and its libraries.
[2014-01-10 06:20:40]
onnb - Posts: 660
So you are analyzing the feed in real time and then writing back the trading actions which are then executed by a SC study? Did I get that right?
[2014-01-10 11:37:13]
norvik - Posts: 22
Kiwi, I have a question, do you know some Python-based library like
Esper ? Esper is Complex Event Prosessing framework eventual in Java and C#.

Thanks.
[2014-01-10 13:15:29]
Hendrixon - Posts: 130
Got you kiwi, thanks.
Can you give an example of a statistical analysis or generic tendency that is needed to be done on the raw data?
[2014-01-10 21:35:06]
vectorTrader - Posts: 86
thanks to everyone here. I was able to figure it out even though I'm not a coder.

Kiwi, actually I was working was going to teach myself R so that I could use it in my business(Full Time) to do just what you are talking about. I am also interested in R do do some statistical modeling based on my trading methodology. Have you had any success modeling and has it help your trading?

[2014-01-12 23:48:18]
Kiwi - Posts: 374
onnb, yes, I'd agree with that view.

norvik, if real time event processing was a key driver I'd be using C / C++ or Java (or one of the functional languages) not Python.

Hendrixon, for generic tendencies just look at Lawrence or Ernie Chan's stuff. An example of statistical analysis would be to take your test outputs and Monte Carlo them or just use statistical packages to examine their predictive power.

jbutta, R has very powerful libraries although its an ugly language and rather slow. Modelling is an interest rather than a core of my trading. So far I find simple traditional approaches still provide the best results for me.
[2014-01-29 20:36:24]
vladaman - Posts: 1
Here is simple Sierra Chart SCID File Reader written in Java https://gist.github.com/vladaman/8696352
[2014-03-02 23:11:31]
User15451 - Posts: 27
Kiwi,

I have a couple of offline questions can you provide your e-mail?

Sincerly

TS
[2014-07-06 04:01:40]
ganz - Posts: 1048
for someone who's interested in

http://www.youtube.com/watch?v=0unf-C-pBYE
Date Time Of Last Edit: 2014-07-06 05:08:44
[2014-07-06 12:32:55]
ganz - Posts: 1048
Hi All

I'm not a programmer so this is my simple solution
to get data from *.scid and store it to *.hdf5
in order to pandas it later


#!/usr/bin/python3
import struct
import datetime as dt
import sys
import pandas as pd
import numpy as np

inputfile = open(sys.argv[1],'rb')
dt.timedelta(microseconds=1)

with inputfile as f:
  f.read(56)
  df_src=[]
  ts_src=[]
  while True:
    tick=f.read(40)
    if not tick:
      break
    src = struct.unpack('d4f4L', tick)
    ts_tmp=dt.datetime(1899, 12, 30) + dt.timedelta(src[0])
    ts_src.append(ts_tmp)
    df_tmp=[src[4],src[7],src[8]]
    df_src.append(df_tmp)
tubus = pd.HDFStore('tubus.h5')
df=pd.DataFrame(df_src, index=ts_src, columns=['Price', 'bidVol', 'askVol'])
df.to_hdf('tubus.h5','df')
print(df.index)
print(df.head())
print(tubus)
tubus.close()

1. how to run it: ~/>python3 this_script.py chart.scid

2. the script parses *.scid and creates the df DataFrame (TimeSeries): Price, bidVol, askVol

3. the script creates HDF5 file tubus.h5 and stores df

for 500MB.scid it takes 48s on i5/hugeRAM/HDD

Date Time Of Last Edit: 2014-07-06 12:34:11
[2014-07-08 00:32:15]
Kiwi - Posts: 374
Hi Ganz,

Very nice. One thing that has me confused. Why use HDF5?

I did some research after your Python post in the weekend and it seemed to me that HDF5 was about very very big data sets and random access into them. Also the ability to group different data types.

Now Sierra data is essentially sequential in nature and 2 data types (double + int) with fixed data in each column. Also the type of operations I do on them is also sequential - sometimes with some work to coerce the data into that of a higher time period before operation.

In that case would it be better to store them as CSV and/or just .scid files? Possibly the CSV files could be stored with some form of compression. I'm leaping completely out of my experience here and suggesting the bz2 sequential compression ... I probably need to try it.

https://docs.python.org/2/library/bz2.html#sequential-de-compression

Or does HDF5 do really nice compressed serialization?

My other question relates to Python vs C and how the two should fit into the Sierra Chart world so I'll address it in your other thread. I need to think about it a bit more first though.
Date Time Of Last Edit: 2014-07-08 00:40:04
[2014-07-09 14:57:04]
ganz - Posts: 1048
Kiwi

Hi
Why use HDF5?
In that case would it be better to store them as CSV and/or just .scid files?
other question relates to Python vs C

the reason was explained there long term request: make SC as python/pandas compatible

the idea is to get an Integrated Trading Environment for stocks, options, futures, bitcoin, etf/cfd, forex ...

to achive that the solution should be crossplatform, flexible and stores data using well known data format/scripting language at the production level

imho
[2014-07-10 03:45:46]
Kiwi - Posts: 374
OK.

I had read that and didn't see why HDF5 was chosen. Python is extremely happy reading CSVs (or even SCIDs with a little conversion).

To post a message in this thread, you need to log in with your Sierra Chart account:

Login

Login Page - Create Account