Skip to content

Latest commit

 

History

History

doc

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

ReckOn Online-Learning Spiking RNN Processor Documentation

Copyright (C) 2020-2022 University of Zurich

The documentation for ReckOn is under a Creative Commons Attribution 4.0 International License (see doc/LICENSE file or http://creativecommons.org/licenses/by/4.0/), while the ReckOn HDL source code is under a Solderpad Hardware License v2.1 (see LICENSE file or https://solderpad.org/licenses/SHL-2.1/).

Creative Commons Attribution 4.0 International License

Before reading the documentation, it is strongly advised to read our ISSCC 2022 paper in order to have a clear overview of the ReckOn online-learning spiking RNN processor.

Part of the documentation structure, formatting and contents is adapted from the documentation of the ODIN SNN processor.

Current documentation revision: v1.0. It only contains basic descriptions on the chip communication buses, memory addressing schemes and main configuration registers. Unpublished parts of the chip have been removed from the currently-released HDL code, as well as from the documentation. A documentation and HDL update will take place upon publication of the omitted parts.

Contents

  1. Architecture
  2. Interfaces and commands
  3. Global configuration registers
  4. Testbench
  5. Implementation tips
  6. Citing ReckOn
  7. Revision history

1. Architecture

ReckOn is a spiking recurrent neural network (RNN) processor enabling on-chip learning over second-long timescales based on a modified version of the e-prop algorithm (we released a PyTorch implementation of the vanilla e-prop algorithm for leaky integrate-and-fire neurons here). It was prototyped and measured in 28-nm FDSOI CMOS at the Institute of Neuroinformatics, University of Zurich and ETH Zurich, and published at the 2022 IEEE International Solid-State Circuits Conference (ISSCC) with the following three main claims:

  • ReckOn demonstrates end-to-end on-chip learning over second-long timescales while keeping a milli-second temporal resolution,
  • it provides a low-cost solution with a 0.45-mm² core area, 5.3pJ/SOP at 0.5V, and a memory overhead of only 0.8% compared to the equivalent inference-only network,
  • it exploits a spike-based representation for task-agnostic learning toward user customization and chip repurposing at the edge.

ReckOn implements a (256)-r256-16 network topology with 256 virtual input neurons, 256 recurrent leaky integrate-and-fire (LIF) neurons with all-to-all connectivity and 16 output leaky integrator (LI) neurons. A future revision of the documentation will extensively cover architectural details of ReckOn, including the embedded FSMs. For the time being, we briefly describe hereunder how the main SRAM resources of ReckOn are accessed. We refer the reader to the paper for more information on the network architecture, as well as for block diagrams of the complete system and of the e-prop-based learning scheme.

  • Neuron SRAM: This 2-kB SRAM contains 128 words of 128 bits. Each word contains the individual state data of two neurons (i.e. current membrane potential and eligibility trace values) and their shared parameters (i.e. leakage decay factor alpha and firing threshold) as follows:
Word bit range Description (N represents the 7-bit word address)
<127:116> 12 LSBs of the fractional part of the 16-bit leakage decay factor alpha (see Section 3 for the 4 MSBs). Shared parameter between neurons 2N and 2N+1.
<115:100> 16-bit firing threshold. Shared parameter between neurons 2N and 2N+1.
<99:90> 10-bit output eligibility trace associated to neuron 2N+1.
<89:78> 12-bit recurrent eligibility trace associated to neuron 2N+1.
<77:66> 12-bit input eligibility trace associated to neuron 2N+1.
<65:50> 16-bit membrane potential associated to neuron 2N+1.
<49:40> 10-bit output eligibility trace associated to neuron 2N.
<39:28> 12-bit recurrent eligibility trace associated to neuron 2N.
<27:16> 12-bit input eligibility trace associated to neuron 2N.
<15:0> 16-bit membrane potential associated to neuron 2N.
  • Input/recurrent weight SRAMs: These 64-kB SRAMs contain 4k words of 128 bits for the storage of 8-bit input/recurrent weights. The 8 MSBs of the 12-bit word address contain the pre-synaptic neuron index, the 4 LSBs of the 12-bit word address contain the 4 MSBs of the post-synaptic neuron index. The 4 LSBs of the post-synaptic neuron index represent the byte address of the target weight in the accessed 128-bit word.

  • Output weight SRAM: This 8-kB SRAM contains 512 words of 128 bits for the storage of 8-bit output weights. Only the first 256 words are used, the MSB of the 9-bit word address is thus fixed to 0 and the 8 LSBs represent the pre-synaptic neuron index. The accessed 128-bit word thus contains the output weights of all 16 output neurons.

2. Interfaces and commands

The top-level file reckon.v contains three main interfaces: the SPI bus (Section 2.1), the input AER bus (Section 2.2) and an output bus (Section 2.3). Other I/O pins are described as follows:

Pin Direction Description
CLK_EXT Input External clock.
CLK_INT_EN Input Enable signal for the internal clock generator.
RST Input Global reset signal.
SAMPLE Input Signals the start and the end of an input data sample.
TIME_TICK Input Signals the start of a new timestep (rising edge).
TARGET_VALID Input Signals timesteps for which a target is expected for e-prop updates.
INFER_ACC Input Signals timesteps for which counts of the winning output neurons are to be updated (for classification tasks, over the course of timesteps during which INFER_ACC was enabled, the label of the output neuron with the highest output over most timesteps will represent the network inference).
SPI_RDY Output [For debug/monitoring purposes] Signals when the global FSM enters the CONFIG state, during which the network state is frozen and can be safely read/written through SPI.
TIMING_ERROR_RDY Output If SPI_TIMING_MODE=1, signals the occurrence of a timing error (TIME_TICK was asserted before the global FSM finished processing the current timestep). If SPI_TIMING_MODE=0, signals when the global FSM finished processing the current timestep and TIME_TICK can be safely asserted.

2.1. SPI bus

SPI
Fig. 1 - 32-bit SPI timing diagram for (a) write and (b) read operations.

ReckOn implements a standard 32-bit SPI slave bus with the following interface:

Pin Direction Width Description
SCK Input 1-bit SPI clock generated by the SPI master.
MOSI Input 1-bit Master output, slave input.
MISO Output 1-bit Master input, slave output. As the fabricated chip was pad-limited, other signals can be displayed on the MISO pin (see Section 3).

When using the SPI bus, the SPI_EN_CONF configuration register should be asserted first (Section 3). In order to ensure proper operation, the SCK SPI clock should operate at a frequency at least 4x smaller than the clock frequency of ReckOn. The SPI write and read operations follow the timing diagram shown in Figs. 1(a) and 1(b), respectively: a 32-bit address field is first transmitted by the SPI master, before data associated to this address is sent by the master (write) or received from ReckOn (read). Depending on the contents of the 32-bit address field a, where a<31> indicates whether a write (0) or read (1) operation should be pursued, the SPI bus can be used to access the configuration registers or the on-chip SRAM / register file contents as follows:

code<2:0> (a<30:28>) addr<15:0> (a<15:0>) Description
3'b000 {addr_conf<15:0>} Write to configuration register at address addr_conf (not readable).
3'b001 {n/a,addr_word<6:0>,addr_32b<1:0>} Read/write to the neuron SRAM (128 128-bit words). 4-byte chunk addr_32b from word address addr_word of the SRAM is read/written.
3'b010 {n/a,addr_oneur<3:0>} Read/write to the 16-bit membrane potential of output neuron addr_oneur (register-file-based storage).
3'b011 {n/a,addr_word<11:0>,addr_32b<1:0>} Read/write to the input weight SRAM (4096 128-bit words). 4-byte chunk addr_32b from word address addr_word of the SRAM is read/written.
3'b100 {n/a,addr_word<11:0>,addr_32b<1:0>} Read/write to the recurrent weight SRAM (4096 128-bit words). 4-byte chunk addr_32b from word address addr_word of the SRAM is read/written.
3'b101 {n/a,addr_word<8:0>,addr_32b<1:0>} Read/write to the output weight SRAM (512 128-bit words, of which only the first 256 words are used). 4-byte chunk addr_32b from word address addr_word of the SRAM is read/written.

In order to accelerate initialization/readback of the ReckOn SRAMs, grouped data read/write operations can be performed over SPI from a single address field. To do so, a<27:16> contains the number of SPI data transactions to be performed from a starting address of a<15:0>, which will then be incremented internally in the SPI module. If a<27:16> is given a value of 1, a standard SPI transaction with a single data field is performed.

2.2. Address-event representation (AER) input bus

AER
Fig. 2 - Input AER four-phase handshake timing diagram.

Address-event representation (AER) buses follow a four-phase-handshake protocol for asynchronous communication between neuromorphic chips. As ReckOn follows a synchronous digital IC design flow, a double-latching barrier is placed on the REQ line of the input AER bus in order to limit metastability issues.

The input AER bus has the following interface:

Pin Direction Width Description
AERIN_ADDR Input 8-bit AER address field.
AERIN_TAR_EN Input 1-bit Indicates whether data in AERIN_ADDR represents the address of a virtual input neuron (0) or target data for e-prop-based learning (1).
AERIN_REQ Input 1-bit AER request handshake line.
AERIN_ACK Output 1-bit AER acknowledge handshake line.

2.3. Output bus

The output bus interface has been simplified compared to the one in the fabricated chip in order to remove unpublished blocks. The output bus in the currently-released version of the HDL has a format similar to the AER interface described in Section 2.2 and its sole purpose is to transmit inference results.

Pin Direction Width Description
OUT_DATA Output 8-bit Data field.
OUT_REQ Output 1-bit Request handshake line.
OUT_ACK Input 1-bit Acknowledge handshake line.

For classification setups (see Section 3 for the associated configuration registers), only one transaction takes place at the end of the sample and the 4 LSBs of OUT_DATA contain the index of the output neuron with the highest output averaged over the timesteps during which the INFER_ACC pin was asserted. For regression setups, a series of output transactions takes place at each timestep, where the 16-bit membrane potential values of active output neurons are successively transmitted (the 8 LSBs first, followed by the 8 MSBs).

3. Global configuration registers

Configuration registers can be written through the SPI bus (no readback operation is available) and are defined as follows:

Register Name Addr<15:0> Width Reset value Description
SPI_EN_CONF 0 1-bit 1'b1 Enables access to the network internal state through SPI and ensures the control FSM goes into a safe state to do so, which will be signalled through the SPI_RDY pin.
SPI_RO_STAGE_SEL 1 9-bit / Selects the stage of the ring-oscillator-based local clock generator (not used in the released HDL code as technology-specific blocks, incl. clock gen and frequency divider, were removed).
SPI_GET_CLKINT_OUT 2 1-bit 1'b0 Enables a frequency-divided copy of the locally generated clock to be displayed on the SPI MISO pin for monitoring purposes (not used in the released HDL code as technology-specific blocks, incl. clock gen and frequency divider, were removed).
SPI_GET_TAR_REQ_OUT 3 1-bit 1'b1 Enables the target request signal to be displayed on the MISO pin.
SPI_RST_MODE 8 1-bit 1'b0 Selects the spike reset mode of LIF neurons (1: reset to zero, 0: reset by subtraction).
SPI_DO_EPROP 9 3-bit 3'b111 Enables e-prop updates (bit 0: input weight updates, bit 1: recurrent weight updates, bit 2: output weight updates). Input/recurrent/output weights can be independently configured in any plastic/frozen configuration.
SPI_LOCAL_TICK 10 1-bit 1'b0 Enables local generation of timestep ticks (see SPI_CYCLES_PER_TICK for the timestep duration). If configured to 0, timestep ticks are provided externally through the TIME_TICK pin.
SPI_ERROR_HALT 11 1-bit 1'b1 Enables halting the network operation if a timing error takes place (i.e. a timestep tick occurred before the global FSM finished processing the current timestep) for debugging purposes. A network reset will be necessary.
SPI_FP_LOC_WINP 12 3-bit 3'd0 Input weight scaling parameter. The stored 8-bit input weights are sign-extended to 16 bits and left-shifted by the value of SPI_FP_LOC_WINP before being added to the neuron membrane potentials.
SPI_FP_LOC_WREC 13 3-bit 3'd0 Recurrent weight scaling parameter. The stored 8-bit recurrent weights are sign-extended to 16 bits and left-shifted by the value of SPI_FP_LOC_WREC before being added to the neuron membrane potentials.
SPI_FP_LOC_WOUT 14 3-bit 3'd0 Output weight scaling parameter. The stored 8-bit output weights are sign-extended to 16 bits and left-shifted by the value of SPI_FP_LOC_WOUT before being added to the neuron membrane potentials.
SPI_FP_LOC_TINP 15 3-bit 3'd0 Radix point location of input traces (left-shifted by the value of SPI_FP_LOC_TINP).
SPI_FP_LOC_TREC 16 3-bit 3'd0 Radix point location of recurrent traces (left-shifted by the value of SPI_FP_LOC_TREC).
SPI_FP_LOC_TOUT 17 3-bit 3'd0 Radix point location of output traces (left-shifted by the value of SPI_FP_LOC_TOUT).
SPI_LEARN_SIG_SCALE 18 4-bit 4'd0 Learning signals scaling parameter, which are left-shifted by the value of SPI_LEARN_SIG_SCALE.
SPI_REGUL_MODE 19 3-bit 3'b000 Selects the weight regularization mode (bit 0: multiplicative regularization, bit 1: additive regularization). If bit 2 is asserted, regularization is enabled during all timesteps, not only when the TARGET_VALID pin is asserted (for use only with additive regularization).
SPI_REGUL_W 20 2-bit 2'b00 Enables weight regularization (bit 0: input weights, bit 1: recurrent weights). Input/recurrent weights can be independently configured in any regularized/non-regularized configuration.
SPI_EN_STOCH_ROUND 21 1-bit 1'b0 Enables stochastic rounding in the eligibility traces and neuron membrane potentials.
SPI_SRAM_SPEEDMODE 22 8-bit 8'h00 Configuration of the SRAM macro speed modes (not used in the released HDL code as technology-specific blocks were removed).
SPI_TIMING_MODE 23 1-bit 1'b0 Controls the pin function of TIMING_ERROR_RDY (see pin description in Section 2).
SPI_REGRESSION 25 1-bit 1'b0 Should be programmed to 1 for regression tasks and 0 for classification tasks.
SPI_SINGLE_LABEL 26 1-bit 1'b1 Should be programmed to 1 for classification tasks in order to provide the classification label only once per sample, instead of at every timestep.
SPI_NO_OUT_ACT 27 1-bit 1'b0 Disables the hard-sigmoid non-linearity applied to the membrane potential of output neurons.
SPI_SEND_PER_TIMESTEP 30 1-bit 1'b0 Enables sending the network output (format conditioned by SPI_SEND_LABEL_ONLY) at every timestep instead of once at the end of the sample. Typically for use in regression tasks.
SPI_SEND_LABEL_ONLY 31 1-bit 1'b1 Configures the network output contents sent over the output bus (1: winning neuron label, 0: membrane potential values of all enabled output neurons). Typically configured to 1 for classification tasks and 0 for regression tasks.
SPI_NOISE_EN 32 1-bit 1'b0 Enables the addition of random noise to membrane potential updates of LIF neurons (noise magnitude configured with SPI_NOISE_STR).
SPI_FORCE_TRACES 33 1-bit 1'b0 Forces eligibility trace computation even if e-prop updates are disabled (for monitoring purposes).
SPI_CYCLES_PER_TICK 64 32-bit / Number of clock cycles per locally generated timestep tick (used only if SPI_LOCAL_TICK is enabled).
SPI_ALPHA_CONF 65-68 128-bit 128'h0 Each bit of SPI_ALPHA_CONF selects the 4 MSBs of the 16-bit leakage decay factors alpha associated to every pair of two LIF neurons, which consist of a single-bit integer part and a 15-bit fractional part (1: the integer part bit is 1 and the three MSBs of the fractional part are 3'b000, 0: the integer part bit is 0 and the three MSBs of the fractional part are 3'b111). The 12 LSBs of alpha's for every pair of two LIF neurons are defined in the neuron memory (Section 1).
SPI_KAPPA 69 8-bit 8'h7A Defines the value of the 8-bit leakage factor kappa shared among all output LI neurons, which consists of a single-bit integer part and a 7-bit fractional part.
SPI_THR_H_0 70 16-bit / Defines the membrane potential threshold separating the first and second segments of the straight-through-estimator (STE) function.
SPI_THR_H_1 71 16-bit / Defines the membrane potential threshold separating the second and third segments of the straight-through-estimator (STE) function.
SPI_THR_H_2 72 16-bit / Defines the membrane potential threshold separating the third and fourth segments of the straight-through-estimator (STE) function.
SPI_THR_H_3 73 16-bit / Defines the membrane potential threshold separating the fourth and fifth segments of the straight-through-estimator (STE) function.
SPI_H_0 74 5-bit / Defines the value of the first segment of the straight-through-estimator (STE) function.
SPI_H_1 75 5-bit / Defines the value of the second segment of the straight-through-estimator (STE) function.
SPI_H_2 76 5-bit / Defines the value of the third segment of the straight-through-estimator (STE) function.
SPI_H_3 77 5-bit / Defines the value of the fourth segment of the straight-through-estimator (STE) function.
SPI_H_4 78 5-bit / Defines the value of the fifth segment of the straight-through-estimator (STE) function.
SPI_LR_R_WINP 79 5-bit / Input weight update probability scaling parameter (applies a right shift by the value of SPI_LR_R_WINP).
SPI_LR_P_WINP 80 5-bit / Input weight update probability scaling parameter (applies a left shift by the value of SPI_LR_P_WINP).
SPI_LR_R_WREC 81 5-bit / Recurrent weight update probability scaling parameter (applies a right shift by the value of SPI_LR_R_WREC).
SPI_LR_P_WREC 82 5-bit / Recurrent weight update probability scaling parameter (applies a left shift by the value of SPI_LR_P_WREC).
SPI_LR_R_WOUT 83 5-bit / Output weight update probability scaling parameter (applies a right shift by the value of SPI_LR_R_WOUT).
SPI_LR_P_WOUT 84 5-bit / Output weight update probability scaling parameter (applies a left shift by the value of SPI_LR_P_WOUT).
SPI_SEED_INP 85 25-bit / Seed of the unfolded LFSR generating random numbers for stochastic input weight updates.
SPI_SEED_REC 86 25-bit / Seed of the unfolded LFSR generating random numbers for stochastic recurrent weight updates.
SPI_SEED_OUT 87 22-bit / Seed of the unfolded LFSR generating random numbers for stochastic output weight updates.
SPI_SEED_STRND_NEUR 88 30-bit / Seed of the unfolded LFSR generating random numbers for stochastic rounding of the LIF neuron membrane potentials.
SPI_SEED_STRND_ONEUR 89 15-bit / Seed of the unfolded LFSR generating random numbers for stochastic rounding of the output neuron membrane potentials.
SPI_SEED_STRND_TINP 90 30-bit / Seed of the unfolded LFSR generating random numbers for stochastic rounding of the input eligibility traces.
SPI_SEED_STRND_TREC 91 30-bit / Seed of the unfolded LFSR generating random numbers for stochastic rounding of the recurrent eligibility traces.
SPI_SEED_STRND_TOUT 92 30-bit / Seed of the unfolded LFSR generating random numbers for stochastic rounding of the output eligibility traces.
SPI_SEED_NOISE_NEUR 93 17-bit / Seed of the unfolded LFSR generating random numbers for the configurable amount of noise added to the LIF neuron membrane potentials.
SPI_NUM_INP_NEUR 94 8-bit 8'hFF Number of input neurons enabled in the network (should be configured to the target number of neurons -1).
SPI_NUM_REC_NEUR 95 8-bit 8'hFF Number of recurrent neurons enabled in the network (should be configured to the target number of neurons -1).
SPI_NUM_OUT_NEUR 96 4-bit 4'hF Number of output neurons enabled in the network (should be configured to the target number of neurons -1).
SPI_REGUL_F0 98 12-bit / Value of the post-synaptic recurrent eligibility traces above which regularization on the pre-synaptic weights is turned on.
SPI_REGUL_K_INP_R 99 5-bit / Input weight additive regularization scaling parameter (applies a right shift by the value of SPI_REGUL_K_INP_R).
SPI_REGUL_K_INP_P 100 5-bit / Input weight additive regularization scaling parameter (applies a left shift by the value of SPI_REGUL_K_INP_P).
SPI_REGUL_K_REC_R 101 5-bit / Recurrent weight additive regularization scaling parameter (applies a right shift by the value of SPI_REGUL_K_REC_R).
SPI_REGUL_K_REC_P 102 5-bit / Recurrent weight additive regularization scaling parameter (applies a left shift by the value of SPI_REGUL_K_REC_P).
SPI_REGUL_K_MUL 103 5-bit / Input and recurrent weight multiplicative regularization scaling parameter (applies a right shift by the value of SPI_REGUL_K_MUL).
SPI_NOISE_STR 104 4-bit / Neuron noise scaling parameter. Noise is generated as pseudo-random 16-bit words to be added to the LIF neuron membrane potentials, right-shifted by the value of SPI_NOISE_STR.

4. Testbench

A simple testbench file demonstrating online learning is provided (see testbench.sv file, you may want to update paths listed in lines 33-37). The data folder contains two single 50-sample batches (one for training, one for test) of the delayed-supervision navigation task, whose size and complexity are suitable for RTL simulations (see paper for details). The first line of these dataset files indicates the number of samples, the subsequent lines represent events: the first tuple element contains the input neuron index, the second one the time in milliseconds. Special events are used to represent the end of the sample (the first element has index -1) and the start of the delayed supervision (the first element has index -2, the second represents the target label). Random input, recurrent and output weight initialization files are provided as well.

5. Implementation tips

When implementing ReckOn or adapting it for a particular application, care should be taken with the following points:

  • The provided Verilog HDL code can directly be used for behavioral simulation. For implementation with a specific technology node, the behavioral descriptions of the neuron and weight SRAMs in srnn.v need to be replaced with SRAM macros. Block RAM (BRAM) instances can be used for FPGA implementations.

  • The different Verilog modules contain as parameters N and M, where N represents the number of neurons in the input and recurrent layers and M is log2(N). These parameters help rescaling ReckOn to different network dimensions, however the datapath is not entirely generic in N and M and further adaptations are needed if the dimensions are changed.

6. Citing ReckOn

Upon usage of the documentation or source code, please cite the paper associated to ReckOn:

[C. Frenkel and G. Indiveri, "ReckOn: A 28nm sub-mm² task-agnostic spiking recurrent neural network processor enabling on-chip learning over second-long timescales," IEEE International Solid-State Circuits Conference (ISSCC), 2022]

7. Revision history

Rev Date Author Description
1.0 18 Feb. 2022 C. Frenkel Basic doc without unpublished parts