Date/Time: Tue, 30 May 2023 11:33:33 +0000
Post From: New Teton Order Routing Service for CME, CBOT, NYMEX, COMEX, FairX, upcoming EUREX
Sierra Chart Engineering - Posts: 104368
We received this incident report from Rithmic, this morning and we would like to discuss, if issues like this would affect our Teton order routing, and if they would affect our Teton order routing, how would we handle them.
Rithmic Incident Report
Incident: Partial loss of access to Chicago Area servers
Date of Incident: Tuesday January 18, 2022 03:53:22 CST
Date of Resolution : Tuesday January 18, 2022 07:12:58 CST
Date of Report: Tuesday January 18, 2022
Scope of Failure: Some Rithmic customers found that logins to the Chicago Area could not proceed.
Root Cause Analysis
Failure of a Network Hardware component led to a transient network outage. A Rithmic Software component implemented an automated recovery process, but some subsystems failed to recover to an open state. This was remedied by Rithmic Operations at 7:12 CST.
Rithmic Operations has an open issue item with the Hardware Vendor. In the meantime, acting on directions of Vendor representatives, features in the Hardware components which contributed to the outage have been disabled to prevent a recurrence.
Rithmic Operations is enhancing the discovery of failed software routing subsystems, and is liaising with Rithmic Development to improve automated recovery.
Failure of a Network Hardware component led to a transient network outage.
We do not know what "component" they are referring to. All of the network related functionality for the Teton order routing service, is provided by another infrastructure provider that we work with. They provide us a fully redundant network, that is used by high-frequency traders.
This other provider is responsible for the development, maintenance, and monitoring of that network.
So therefore, we would not expect a problem like this to occur at all. If there is some network component failure, there is full redundancy.
Although we did a full review of how we connect into this network, and we did find one point, where our server connects into the local area network and there is just a single connection. We will add a second connection for redundancy, in case this "infrastructure" services provider of ours, has a failure with the network switch that our server plugs into.
So we will cover this additional possibility and solve it so it is fully redundant.
We wonder why Rithmic does not have redundancy with this particular "component" that failed.
They then mention this:
A Rithmic Software component implemented an automated recovery process, but some subsystems failed to recover to an open state.
The only software on our side, involved in order routing, is Sierra Chart itself and it is a perfectly reliable stable single process, that has flawless connectivity. The connections to the exchange, and the connections from users, are all automatic and stable. There would be no such problem at all on our side like this. If there is loss of connectivity, the reconnections are automatic and graceful. And there are no subsystems. It is a single unified highly reliable process.
Although the process, of interacting with the external clearing firms, does involve another secure FTP program developed by another provider. For this we use Bitvise. Which is a very good stable program. And one program, that we would rate as quality software. Which unfortunately there is not much of in this world.
And we also want to point out, that we do have a backup server, with its own network connectivity . There is a real-time copy of all the trading related data, made to that server. So we can failover to another server as needed. The likelihood of this, is next to nonexistent, because it would require a catastrophic server failure as a reason to failover, but the event of that, is extremely unlikely and has never occurred in more than 15 years of operations with utilizing dedicated servers.
These are high quality servers, using Xeon processors, error checking and correcting memory, solid-state drives, with RAID 1 configurations. And dual power supplies, with redundant power from the data center.
Sierra Chart Support - Engineering Level
Your definitive source for support. Other responses are from users. Try to keep your questions brief and to the point. Be aware of support policy:
For the most reliable, advanced, and zero cost futures order routing, *change* to the Teton service:
Sierra Chart Teton Futures Order Routing
Date Time Of Last Edit: 2022-01-30 16:23:42