Capella: Customizing Perception for Edge Devices by Efficiently Allocating FPGAs to DNNs
Younmin Bae, Ramyad Hadidi, Bahar Asgari, Jiashen Cao, Hyesoon Kim
HPArch Lab
Github repo is available here.
Implementation of a split, distributed CNN (ResNet18-V1), deployed to 2 PYNQ FPGA boards using TVM/VTA; compared with original ResNet18-V1 deployed on 1 PYQN board.
Motivation
Implementation of deep neural networks (DNNs) are hard to achieve on edge devices because DNNs often require more resources than those provided by individual edge devices.
The idea of this project is to create an edge-tailored model by splitting a DNN into independent narrow DNNs to run separately on multiple edge devices in parallel.
The outputs from the split networks are then concatenated and fed through the fully connected layers to perform inference. Picture below shows the technique on an example DNN (VGG-S). Our implementation targets ResNet18-V1.
Code Description
splitnet.py
contains split models built with MxNet Gluon. Onlyresnet18_v1_split
is implemented so far.resnet18_v1_split
returns a split version ofmxnet.gluon.model_zoo.vision.resnet18_v1
; initialized with random weights.
demo.py
demonstrates how to deploy split networks to 2 PYNQ FPGA boards with TVM/VTA and how to concatenate the results.autotune.py
uses TVM’s autotuning tool to achieve fast performance when runningresnet18_v1_split
on PYNQ FPGA. Currently broken.
Setup
PYNQ Boards
To deploy the split networks, first acquire 2 PYNQ boards and set them up following instructions here.
After PYNQ boards are set up, follow instructions here to launch TVM-based RPC servers on both boards. You should see the following output when starting the RPC server:
INFO:root:RPCServer: bind to 0.0.0.0:9091
The RPC server should be listening on port 9091
.
Local
The following instructions apply to your local machine. CNN models are developed, compiled & uploaded to PYNQ boards from your local machine via RPC.
First, install TVM with LLVM enabled. Follow the instructions here.
Install the necessary python dependencies:
pip3 install --user numpy decorator attrs
Next, you need to add a configuration file for VTA:
cd <tvm root>
cp vta/config/pynq_sample.json vta/config/vta_config.json
When the TVM compiler compiles the convolutional operators in a neural network, it queries a log file to
get the best knob parameters to achieve fast performance. Normally, for a particular network, this log file
is generated using TVM’s autotuning tool (autotune.py
).
However, since this tool seems to be broken, log file
for resnet18_v1_split
was manually created.
Move this log file to where the compiler can find it:
cd <project root>
cp vta_v0.05.log ~/.tvm/tophub/vta_v0.05.log
Usage
After setup has been complete on both the PYNQ and host end, you are
now ready to deploy the split networks. demo.py
is a minimal example that shows you how to do this.
First, install additional Python dependencies:
pip3 install --user mxnet pillow
Then run the demo:
python3 demo.py [--cpu] [--nonsplit] [--i]
Options:
-
--cpu
Run model on local machine instead of PYNQ boards. -
--nonsplit
Run the non-split version of the model. -
--i
Run the interactive version of the demo. This allows you to enter paths to image files to feed to model.
By default, the demo downloads 50 images of animals from Google Images, feeds them to the model, and reports the mean and standard deviation (in sec) of the inference delays.