← Back to Index

FPGA-Accelerated HFT Gateway

SystemVerilog / Rust / C++ / Artix-7 / RP2040

System Architecture

This project emulates a full-stack High-Frequency Trading (HFT) environment, splitting responsibilities across three hardware tiers to optimize for latency and determinism. The system consists of a Rust-based Dashboard (Host PC) for visualization, an RP2040 Microcontroller acting as a binary feed handler and traffic generator, and an Artix-7 FPGA implementing the matching engine and pre-trade risk checks.

The RP2040 communicates with the FPGA via a 4MHz SPI link, transmitting 64-bit proprietary packets containing sequence numbers, commands, and price data. The FPGA acts as an SPI slave, deserializing the stream and processing orders in hardware. We verified the correctness of the RTL using SymbiYosys formal verification and randomized testbenching.

System Architecture Diagram
Fig 1. High-level topology. The RP2040 generates pseudo-random market data ("Tick") and measures the hardware round-trip time ("Trade") to the microsecond level. The FPGA maintains the order book state in registers. The PC displays results in real-time.

Hardware Matching Engine (Verilog)

The core logic is a Systolic Array Order Book implemented in Verilog. Unlike software implementations that require pointer chasing (Linked Lists) or rebalancing (RB-Trees), this design utilizes a parallel register array. On every clock cycle, each cell in the array compares the incoming price against its current stored value and its neighbor's value.

This allows for O(1) insertion time from the perspective of the control logic. The sorting invariant \(B_0 \ge B_1 \ge \dots \ge B_N\) is maintained automatically by the hardware structure. The state machine also enforces pre-trade risk limits (Max/Min Price) and verifies checksums before allowing an order to mutate the book state.

always @(posedge clk) begin
    if (rst) begin
        for (i=0; i<DEPTH; i=i+1) bins[i] <= 0;
    end else if (insert_en) begin
        // Parallel Compare-and-Shift Logic
        if (new_price > bins[0]) bins[0] <= new_price;
        for (i = 1; i < DEPTH; i = i + 1) begin
            if (new_price <= bins[i-1] && new_price > bins[i]) 
                bins[i] <= new_price; // Insert Here
            else if (new_price > bins[i-1]) 
                bins[i] <= bins[i-1]; // Shift Down
        end
    end
end
Fig 2. The Systolic Array update logic. All bins update simultaneously. If a new high bid arrives, it overwrites the current slot and pushes existing bids down the array in a single clock transition.

Control Logic & RTL

The control path is governed by a finite state machine that synchronizes with the SPI chip select line. It transitions from Idle to Validation upon packet reception, verifying the 8-bit checksum against the payload. Valid packets trigger the Execute state, which drives the write-enable signals for the systolic array or updates global risk parameters.

RTL Schematic
Fig 3. Logic representation of the FSM and top-level module. The schematic highlights the data path from the SPI deserializer shift registers into the checksum validator and finally the matching engine. Image generated with TerosHDL.
RTL Schematic
Fig 4. Gate representation of the top-level module. This highlights the relative narrow width of the module, demonstrating parallelism and low-cycle processing. The grey rectangle highlights the systolic array. More importantly, it is a fantastic reminder of how incredible it is that a simple bitstream can encode all of this custom decision logic to be run on premade silicon in deterministic time. Image generated with TerosHDL.

Feed Handler & Telemetry (C++ / Rust)

The RP2040 runs a dedicated C++ firmware on Core 1 to drive the SPI bus. It generates a "Random Walk" market simulation and injects faults (checksum errors) to test the FPGA's resilience. Crucially, it measures the Tick-to-Trade latency by capturing the system timestamp t0 before assertion of Chip Select and t1 after the SPI transaction completes.

Telemetry data is streamed via USB-UART to a Rust application using the Ratatui library. This TUI visualizes the live market spread, logs packet statuses (Checksum OK/Fail, Risk Reject) so you can see their proportions relative to each other, and plots the FPGA's internal "Top of Book" against the generated market price in real-time, visualizing inefficiency.