FPGA Programming — TH Deggendorf

Stock Market Data Processing on DE1-SoC FPGA

This page documents an FPGA project that implements financial data pre-processing on the DE1-SoC board using VHDL. The design applies exponential smoothing to a static stock market dataset using custom IEEE-754 based floating-point arithmetic modules. A Mealy-type FSM, parameterizable generics, on-chip memory resources, and a self-checking testbench are used to structure and verify the design.

Recommended Visual Evidence

These figure slots are prepared for the strongest visual proof. Each card opens in a popup and already uses a safe aspect ratio for web display. Empty slots can stay visible during development or receive a real image path later.

Basic Idea

The goal of this project is to demonstrate how a lightweight financial signal-processing algorithm can be mapped directly into FPGA hardware. Rather than relying on vendor floating-point IP cores, the arithmetic path is implemented manually in VHDL using IEEE-754 single-precision format principles. A finite state machine coordinates the path from static ROM data through RAM access, smoothing computation, and display output.

The design is parameterizable and verified with a self-checking testbench. Timing analysis confirms that the implementation meets the 50 MHz target clock used by the board. The worst-case timing margin is positive but should not be described as a 150 MHz final result.

Design Walkthrough

1. Project overview+

This project implements a modular and parameterizable FPGA-based system for pre-processing financial data using exponential smoothing. Static stock market data is stored in on-chip ROM. The system writes the selected data to RAM, reads it back through a controlled memory sequence, applies exponential smoothing using custom IEEE-754 based floating-point arithmetic modules in VHDL, and displays selected results or states on the DE1-SoC seven-segment display.

Technical form

High-level data flow:
  ROM with static dataset
    → RAM write stage
    → RAM read stage
    → Exponential smoothing datapath
    → Result storage / output selection
    → Seven-segment display

The computation runs on fixed ROM data. This makes the behavior repeatable and suitable for verifying the control path, arithmetic datapath, and display interface. It should not be presented as a real-time market data system yet.

2. DE1-SoC FPGA board+

The DE1-SoC board contains a Cyclone V SoC device with FPGA fabric and an ARM Cortex-A9 hard processor system. This project targets the FPGA fabric only. The design uses FPGA logic for control and arithmetic, on-chip memory resources for deterministic storage, board switches for mode selection, and the six-digit seven-segment display for visible output feedback. Pin assignments and FPGA-side peripheral mappings are configured through Quartus Prime.

The ARM HPS is not used in this implementation. Mentioning this clearly prevents the reader from assuming that the smoothing algorithm runs on software.

3. Generic parametrization+

The top-level and sub-level entities are parameterized through VHDL generics. This allows the input width, address width, number of stored samples, and processed output width to be changed at compile time without rewriting the main control logic. This is important because the project is meant to behave like reusable hardware rather than a one-off fixed script.

Technical form

Key generics used:
  DATA_IN_LEN       — input data width
  ADDRESS_LEN       — address bus width
  NUM_DATA          — number of stored data points
  DATA_PROCESS_LEN  — processed output width

Example default configuration from the report tables:
  DATA_IN_LEN       => 64
  ADDRESS_LEN       => 10
  NUM_DATA          => 1024
  DATA_PROCESS_LEN  => 32

Use these values as report-level defaults. If the GitHub implementation currently uses another test configuration, state it separately as an implementation variant instead of mixing it with the report table.

4. Finite State Machine control+

The top-level entity uses a Mealy-type finite state machine to coordinate the complete data flow. The FSM reacts to current state and input conditions such as start, reset, switch mode, RAM completion, and smoothing completion. The main states are IDLE, SEL, RAM_PROCESS, EXPONENTIAL_SMOOTHING, DISPLAY_RESULT, and DISPLAY_INT_RESULT.

Technical form

Main FSM sequence:

  IDLE
    → SEL
      → RAM_PROCESS
        → EXPONENTIAL_SMOOTHING
          → DISPLAY_RESULT

Important control signals:
  start / reset
  sel_sw mode input
  RAM write-read completion
  smoothing completion
  display mode selection

Keep the FSM explanation focused on control sequencing. Do not describe state names as floating-point values. The display entity maps states and selected numeric values to seven-segment patterns.

5. Floating-point arithmetic in VHDL+

Custom floating-point modules were designed because vendor floating-point IP cores were restricted for the project. The design follows IEEE-754 single-precision format principles for the required arithmetic path. The smoothing datapath needs two floating-point multiplications and one floating-point addition for each processed data point.

Technical form

Single-precision format used by the datapath:
  [1 sign bit][8 exponent bits][23 fraction bits]

Smoothing operations:
  term A = α × xₙ
  term B = (1 − α) × Sₙ₋₁
  Sₙ     = term A + term B

Avoid saying fully IEEE-754 compliant unless every special case is verified. A safer and more accurate formulation is IEEE-754 based or IEEE-754 single-precision format based.

6. Exponential smoothing algorithm+

Exponential smoothing is used as a lightweight financial time-series pre-processing method. It computes a weighted value where the current input and the previous smoothed value are combined through the smoothing factor alpha. In this hardware design, the formula is mapped to a multi-cycle FSM-controlled datapath rather than a one-result-per-clock streaming pipeline.

Technical form

Exponential smoothing formula:
  S₁ = x₁
  Sₙ = α · xₙ + (1 − α) · Sₙ₋₁

where:
  xₙ   = current data value
  Sₙ₋₁ = previous smoothed value
  α    ∈ (0, 1)

α close to 1  → faster response to new data
α close to 0  → smoother and slower response

This wording removes the unsupported one-smoothed-value-per-clock claim and matches the report's multi-cycle arithmetic modules.

7. Memory design with ROM and RAM+

The system uses ROM for the static dataset and on-chip RAM for controlled data storage. ROM provides deterministic input values. The RAM controller writes selected data into RAM, later reads it back, and passes it into the smoothing stage. The FSM controls the sequence through write-enable, read-enable, address, data-valid, and completion signals.

Technical form

Memory access sequence:
  1. Select operation mode
  2. Read static value from ROM
  3. Write selected value to RAM
  4. Assert write completion
  5. Read value back from RAM
  6. Pass value to smoothing datapath
  7. Store or display selected result

Do not claim single-cycle RAM latency unless it is explicitly measured and configured. The stronger supported claim is deterministic FPGA-side storage using on-chip memory resources.

8. Verification with self-checking testbench+

A self-checking testbench was used to verify the integrated top-level behavior and important output paths. The testbench drives reset, start, switch modes, and control inputs. It waits for expected state or display behavior and reports mismatches automatically. This reduces dependence on manual waveform inspection for basic correctness checks.

Technical form

Testbench strategy:
  1. Reset the DUT
  2. Apply selected switch mode and start sequence
  3. Wait for expected state or output pattern
  4. Compare actual output against expected output
  5. Report PASS or fail with a severity error

Do not say that every individual module output is automatically verified unless separate module-level testbenches exist. The report mainly supports integrated top-level self-checking verification.

9. Timing analysis and maximum frequency+

After synthesis and place-and-route in Quartus Prime, timing analysis was performed with TimeQuest. The design meets the required 50 MHz target clock. The reported Fmax values depend on the operating corner. Worst-case slow-corner results are slightly above the required frequency, while faster corners reach above 100 MHz. This means the result is valid for the target board clock, but the worst-case timing margin should be described as positive and limited rather than excessive.

Technical form

Timing summary:
  Target clock: 50 MHz
  Target period: 20 ns

Report-level result:
  Worst-case Fmax: approximately 55–57 MHz
  Faster operating corners: above 100 MHz

Conclusion:
  The design meets the 50 MHz requirement.
  Do not present this as a 150 MHz final design.

This is the most important correction. The previous 150 MHz and 3× margin wording was too strong for the report data.

10. Seven-segment output display+

The output display is handled by a dedicated seven-segment display entity. The display logic maps FSM states and selected numeric values to six seven-segment outputs. For numeric display mode, the input is interpreted through a fixed-point style digit extraction path before being encoded for display. This makes the display useful for debugging selected internal states and showing final or intermediate output values.

Technical form

Display examples:
  state/debug mode → predefined segment patterns
  numeric mode     → selected 32-bit value
                    → fixed-point style digit extraction
                    → six seven-segment outputs

This is more accurate than saying raw 32-bit floating-point data is directly converted to BCD. The report describes a display-specific conversion path.

Key Accomplishments

  • Custom IEEE-754 based floating-point addition and multiplication modules were implemented in VHDL without direct vendor floating-point IP dependency.
  • A Mealy-type FSM sequences ROM access, RAM write/read behavior, smoothing execution, and seven-segment display output.
  • The design is parameterized through generics for input width, address width, dataset size, and processed output width.
  • Quartus TimeQuest analysis shows that the design meets the 50 MHz target clock. Worst-case Fmax is slightly above the target, while faster operating corners reach above 100 MHz.
  • A self-checking testbench verifies the integrated top-level behavior and important output paths with automatic pass/fail checks.

Possible Extensions

  • Add dynamic data input through HPS, UART, SPI, or external memory to replace the static ROM-based dataset.
  • Add further financial pre-processing modules such as moving average, volatility estimation, or Bollinger band computation.
  • Extend the debug interface with richer state visibility through LEDs, seven-segment modes, or a UART logging path.
  • Add a hardware inference stage after smoothing only after the memory interface and timing margin are improved.

References and Materials

These sources match the implementation context more accurately than a generic reference list. They cover the board, Quartus toolchain, floating-point standard background, and VHDL arithmetic packages.

IEEE 754 Floating-Point Standard

Reference standard for floating-point formats and arithmetic behavior. In this project the custom datapath follows IEEE-754 single-precision format principles for the operations required by exponential smoothing.

https://standards.ieee.org/ieee/754/6210/

FPHDL VHDL Fixed and Floating-Point Packages

David Bishop's VHDL fixed-point and floating-point package work was used as a reference point for VHDL arithmetic concepts and package structure.

https://github.com/FPHDL/fphdl
© 2026 Atam Oguz Erkara