Login Page - Create Account

Support Board


Date/Time: Mon, 29 Apr 2024 10:26:04 +0000



Post From: Python for Sierra Chart

[2024-02-19 22:08:45]
Deiter - Posts: 3
Many thanks to everyone who has contributed to this thread. I have borrowed from your great work and added some Polars (rather than Pandas) work to further my own work with Sierra Chart data. I present this work here (it's an extract of a Jupyter Notebook that I am running in VS Code), in the hope of repaying some of the goodness y'all have provided.

# Purpose: Getting Sierra Chart .scid file information into a Polars dataframe for analysis and backtesting

# Approach:
# - Use Polars, rather than Pandas, to reap increased speed and large-file handling benefits.
# - Use Numpy to extract binary data from scid file and cast into appropriate data types in a Polars dataframe.
# - Handling of the SCDateTime data is the hardest part, as it cleverly countains both timestamp and trade count information in a single 64 bit integer, so I use some string methods and regex to extract the desired information.

# Outcome Notes:
# - My final dataframe (df_extract) requires further work to get the timestamp info (in column "SCDateTime_ms", signifying that it is an integer of milliseconds), into a Python datatype for time analysis. I also need to account for the cases where the trade counter information maxes out at 999 (see the Reddit discussion referenced below for details).
# - According to the Jupyter Notebook I ran this in, getting to my final dataframe (thus far) took 4.4 seconds for a 41 million row ES futures .scid file.
# - I'm not interesting in comparing different approaches to see speed differences (there are many articles online about the speed of vectorized Numpy operations vs loops and Polars vs Pandas), so I don't plan to run a bunch of speed tests. 41M rows in 4.4s is good enough for me right now.

# References:
# - Sierra Chart Support Board thread Python for Sierra Chart
# - Reddit post on reading scid data https://www.reddit.com/r/algotrading/comments/169ax2x/the_hello_world_guide_to_reading_execution_data/
# - Awesome Sierra Chart documentation Intraday Data File Format
# - The Polars website https://pola.rs/

import sys
from pathlib import Path

import numpy as np
import polars as pl


# The following section is borrowed, with one small change to the data type definition, from the Sierra Chart Support Board referenced above.


def get_scid_np(scidFile, limitsize=sys.maxsize):
# Check Path Exists

f = Path(scidFile)

assert f.exists(), "SCID file not found"

# set the offset from limitsize

stat = f.stat()

offset = 56 if stat.st_size < limitsize else stat.st_size - ((limitsize // 40) * 40)

### Setup the SCID dtype

sciddtype = np.dtype(
[
("SCDateTime", "<u8"),
("Open", "<f4"),
("High", "<f4"),
("Low", "<f4"),
("Close", "<f4"),
("NumTrades", "<u4"),
("TotalVolume", "<u4"),
("BidVolume", "<u4"),
("AskVolume", "<u4"),
]
)

scid_as_np_array = np.fromfile(scidFile, dtype=sciddtype, offset=offset)

return scid_as_np_array

intermediate_np_array = get_scid_np("ESH24_FUT_CME.scid")

df_raw = pl.DataFrame(intermediate_np_array)


# Now convert the SCDateTime 64 bit integer into a string.
df_SCDateTime_as_string = df_raw.with_columns(
pl.col("SCDateTime").cast(pl.Utf8).alias("SCDateTime_string"),
)


# Trickiest part thus far. Use regex to extract the timestamp info and trade counter info and cast them back into integers for further analysis.

df_extract = df_SCDateTime_as_string.with_columns(
pl.col("SCDateTime_string")
.str.extract(r"^.{13}", 0)
.cast(pl.Int64)
.alias("SCDateTime_ms"),
pl.col("SCDateTime_string")
.str.extract(r".{3}$", 0)
.cast(pl.Int32)
.alias("SCDateTime_trade_counter"),
)

Date Time Of Last Edit: 2024-02-19 22:19:11