# Hardware Overview

Figure from DAC paper:

![Hardware block diagram](img/hardware.svg)



# RTL Description

This describes the VHDL blocks of the system.

![RTL block diagram](img/rtl.svg)

Left: Memory related

Middle: Control related

Right: Timing related

## IPIC slave
Translates the Xilinx IPIC signals to a custom protocol.

## Clock Gate Crossing (`mem_crossing.vhd`)
This block makes sure that all timings are respected across a clock gate
boundary.

## MemSlave (`mem_slave.vhd`)
This translates single port access to dual port access. Additionally, it
supports byte-wise / word-wise write access from the single port side by
translating this into a sequence of a read and a write access.

## MemStall (`memstall.vhd`) / Cpu Timer
Only used if clock gating is not available. Instead of clock gating the CPU
directly during overhead, the CPU is stalled / delayed during the next memory
access.

## MemMux (`memmux.vhd`)
Translates a dual port memory signal into multiple dual port memory signals
going to different arrays.

## MemCtrl (`memctrl.vhd`)
Refresh Controller.

## Synchronizer (`synchronizer.vhd`)
Synchronizes memory arrays in case of different overhead. This block decides
if the simulation clock can be advanced.

## MemArray (`memarray_ts.vhd`)
Stores the data and write timestamps. On read, the data and its delta t is
handed to a modifier block to apply events.

## Mod (`mod_ev.vhd`)
This modifier block contains the pointer and event storage and applies events to
the data on read.



# Generics

## Global Configuration
| Name | Description | Default | Explanation |
| ---- | ----------- | ------- | ----------- |
| TIMERSIZE | bit width of global timer | 32 | 32-bit global timer |
| SKIPSIZE | bit width of RFSKIP | 3 | 0 <= RFSKIP < 8 |

## Behavior Configuration
| Name | Description | Default | Explanation |
| ---- | ----------- | ------- | ----------- |
| MEMARRAY_MODEL | | TS | Use timestamp memarray model |
| MOD_MODEL | | EV | Use event modifier model |

## Refresh Configuration
| Name | Description | Default | Explanation |
| ---- | ----------- | ------- | ----------- |
| READWRITE | | True | Enable refresh reads & writes

## Memory Size Configuration
| Name | Description | Default | Explanation |
| ---- | ----------- | ------- | ----------- |
| ARRAYS_LOG | log2 of number of memory arrays | 6 | 64 arrays |
| ROWS_LOG | log2 of number of rows per array | 12 | 4096 words per array (RFSKIP is used to divide them into smaller virtual arrays) |
| WORDSIZE_LOG | log2 of word size | 5 | 32-bit words |

## Event Configuration
| Name | Description | Default | Explanation |
| ---- | ----------- | ------- | ----------- |
| NEVENTS_LOG | log2 of number of events per memory array | 12 | 4096 events per array |
| TDIFFSIZE | bit width of delta t in event list | 15 | 15 bits for every delta t |
| ACTSIZE | bit width of action in event list | 2 | 4 possible actions |



# Memory Map

## Memory AXI Port
| Base Address | Hight Address | Description | Example |
| ------------ | ------------- | ----------- | ------- |
| 0x00000000 | 0x000FFFFF | Emulated eDRAM | Address 0x4: second memory word (assuming 32-bit words) |

Note: Byte-wise / Word-wise access is possible.

## Control AXI Port
| Base Address | Hight Address | Description | Example |
| ------------ | ------------- | ----------- | ------- |
| 0x00000000 | 0x0000FFFF | Registers | Address 0x4: second register |
| 0x00001000 | 0x0001FFFF | Pointer RAM | Address 0x4: second pointer |
| 0x00002000 | 0x0002FFFF | Event RAM | Address 0x4: second event |

### Registers
| Base Address | Name | Description |
| ------------ | ---- | ----------- |
| 0x00 | CTRL | Control register |
| 0x04 | RFINT | Refresh interval in cycles |
| 0x08 | RFSKIP | |
| 0x0c | SIMTS | Global timer value |
| 0x10 | READ_COUNTER | Number of read accesses from the CPU |
| 0x14 | WRITE_COUNTER | Number of write accesses from the CPU |
| 0x18 | REFRESH_COUNTER | Number of refreshes in array 0 |
| 0x1c | FLIP_COUNTER | Number of bit flips in array 0 |
| 0x20 | STALL_COUNTER | deprecated |
| 0x24 | MOD_SEL | |

#### CTRL: Control Register
| Bit | Name | Description | Note |
| --- | ---- | ----------- | ---- |
| 0 | RST | 1: Reset eDRAM emulator | All registers are reset, but the integrated RAMs retain their data. |
| 1 | TEN | 1: Enable global timer | Used to start the simulation. |
| 2 | SEN | 1: Enable stall block during overhead cycles | Don't use this if glock gating is enabled. |
| 3 | GEN | 1: Enable clock gating during overhead cycles | Always enable this together with TEN. |
| 4 | RFDIST | 0: Enable bulk refresh. 1: Enable distributed refresh | Usually set to 1. |

#### RFINT: Refresh Interval Register
If RFDIST=False (bulk refresh): Refresh the complete memory every RFINT cycles.

If RFDIST=True (distributed refresh): Refresh one word every RFINT cycles.

#### RFSKIP Register
The number of arrays that can be put on an FPGA is limited due to increasing
complexity of combinational logic in the Synchronizer block.
To emulate a greater number of arrays, including parallel refresh, arrays can be 
divided into virtual arrays. Virtual arrays are refreshed virtually in parallel.
This is achieved by reading and writing in series. The overhead cycles that
occur due to this are masked by clock gating.

RFSKIP is the log2 of the number of virtual arrays per array.

##### Example
ROWS_LOG = 2, RFSKIP = 1, RFINT = 4, RFDIST = True

Outside view (clock gated domain):

| Cycle | Action |
| ----- | ------ |
| 0 | Refresh read (rows 0, 2). Memory Unavailable. |
| 1 | Refresh write (rows 0, 2). Memory Unavailable. |
| 2 | Normal operation. |
| 3 | Normal operation. |
| 4 | Refresh read (rows 1, 3). Memory Unavailable. |
| 5 | Refresh write (rows 1, 3). Memory Unavailable. |
| 6 | Normal operation. |
| 7 | Normal operation. |
| 8 | Refresh read (rows 0, 2). Memory Unavailable. |

Inside view (assuming no overhead due to modifier block):

| Outside Cycle | Inside cycle | Action | Clock gated? |
| ------------- | ------------ | ------ | ------------ |
| 0 | 0 | Read row 0 | non-gated |
| 0 | 1 | Write row 0 | gated |
| 0 | 2 | Read row 2 | gated |
| 1 | 3 | Write row 2 | non-gated |

#### SIMTS: Global timer value
In cycles. Global timer is enabled through TEN. The timer is stopped during
overhead cycles. This timer can be used instead of an external clock gated
performance counter.

#### MOD_SEL Register
MOD_SEL = 0: Accesses to Pointer RAM & Event RAM write to all arrays in
parallel.

MOD_SEL > 0: Accesses to Pointer RAM & Event RAM write to array (MOD_SEL - 1).

### Pointer RAM
Write only access to the pointer RAM of the array seclected through MOD_SEL.
Number of rows: ROWS_LOG = 12.

Row layout (assuming NEVENTS_LOG = 12):

| 23 ... 12 | 11 ... 0 |
| --------- | -------- |
| Pointer to first event | Number of events |

### Event RAM
Write only access to the event RAM of the array seclected through MOD_SEL.
Number of rows: NEVENTS_LOG = 12

Row layout (assuming TDIFFSIZE = 15, ACTSIZE = 2, WORDSIZE_LOG = 5):

| 21 ... 7 | 6 ... 5 | 4 ... 0 |
| -------- | ------- | ------- |
| Delta t | Action | Bit index |
