# Preallocating Resources for Distributed Memory based FPGA Debug

Robert Hale & Brad Hutchings



### FPGA Debug - Logic Analyzer

- 1. External?
- 2. Internal?
  - Time
  - Resources





#### Xilinx Internal Logic Analyzer (ILA)



FPGA with 94% of LUT resources utilized:

Xilinx ILA:

Where will the embedded logic analyzer fit?



#### Xilinx Shift Register LUT (SRL)



#### Distributed Memory (DIME) Debug





#### **4-bit Counter Output**



#### **DIME Debug - Research Questions**

1. Can we enable internal debug when the device is 90%+ utilized?

2. How will DIME trace buffers impact the user circuit (timing)?

3. What is the ideal organization of DIME buffers on the device?



#### **DIME Preallocation - Research Questions**

1. Will this hurt the performance of the user circuit?

2. Will this improve performance of the combined DIME + user circuit?

#### **Benchmarks**

LC3

Sudoku

**RPulseG** 

**RNG** 

uFIFO



#### **Preallocation - Affect User Design?**

Implementation:No impact

Timing:Max 0.1ns

| Benchmark   | Original min. period | Prealloc min. period |
|-------------|----------------------|----------------------|
| LC3 70%     | 4.9ns                | 5.0ns                |
| LC3 80%     | 5.2ns                | 5.2ns                |
| LC3 90%     | 6.2ns                | 6.3ns                |
| Sudoku 75%  | 6.6ns                | 6.7ns                |
| Sudoku 94%  | 7.0ns                | 6.9ns                |
| RNG 70%     | 1.6ns                | 1.6ns                |
| RNG 80%     | 1.6ns                | 1.6ns                |
| RNG 90%     | 1.6ns                | 1.7ns                |
| uFIFO 70%   | 3.5ns                | 3.6ns                |
| uFIFO 80%   | 3.8ns                | 3.8ns                |
| uFIFO 90%   | 3.7ns                | 3.6ns                |
| rpulseg 70% | 1.6ns                | 1.6ns                |
| rpulseg 80% | 1.6ns                | 1.6ns                |
| rpulseg 90% | 1.6ns                | 1.6ns                |

#### **Preallocation - Affect DIME Debug?**



#### **Results**



#### Can we lengthen **DIME trace buffers?**



LC3

8ns, original 7.0ns, prealloc

#### Conclusion

- DIME debug: 90%+ utilized designs
- Preallocating FPGA resources for DIME debug:
  - Almost no impact on original design
  - Reduce timing penalty (up to 2ns)
  - $\circ$  Increase trace buffer count (up to  $\sim 3x$ )
- DIME trace buffers can be lengthened

## Thank you

Research supported by Xilinx Research Labs

Robert Hale robert.hale@byu.edu





#### Contributions

- Pros/cons of preallocating LUTs for DIME trace buffers
- 5 unique (duplication-based) benchmarks
- Extending DIME trace buffers to 256 bits



# Process **BSCAN** BSCAN

#### **Experiments**

<describe experiments>

<diagram?>

#### **DIME Debug - Research Questions**

1. How many user signals can I access?

2. How big are the trace buffers?

3. What is the impact to the user circuit (timing)?

#### Pinterest FPGA:



## Can we lengthen DIME trace buffers?



LC3

150

100

--- 8ns, original --- 7.0ns, prealloc

#### Real FPGA:



Where will the embedded logic analyzer fit?

#### **Experiments**

Will preallocating resources improve the distributed-memory debug process?

- Timing?
- Debug bits?