Project Checkpoint

Espresso

Speeding up and debittering Caffe by adding Halide

Project checkpoint (April 20, 2015)

What we've accomplished so far

The concerns we've raised in the project proposal describe skepticism about the capabilities of Halide and whether it would be able to evaluate neural networks, because Halide is a domain-specific language used for image processing. The first phase of our project was to scope out how feasible this task was.

Up until now, we've created some starter code in Halide/C++ that is able to evaluate a simple feed-forward neural network without adding any specialized scheduling operations. This network is currently a toy network suitable for research and does not have the industrial strength to solve any classical machine learning problems. However, based on what we've learned about the capabilities of Halide, we expect to be able to implement a convolutional network in Halide capable of image recognition with no additional modifications to the Halide compiler.

Recalibrating our goals and deliverables

We also expect training neural networks to be possible, but the the tools provided by Halide may not be enough to do so efficiently. Our main source of friction for training is that training is an iterative process modifying large amounts of memory at a time. On the other hand, Halide seems only to be able to efficiently take in an input image and produce a single output image. This limitation can be overcome by writing host-side code and repeatedly copying buffers, but it is unclear how much slower this approach is than optimal.

One concern expressed by course staff was that implementing Caffe-like behavior for our project would be a "bonus" and not a completely necessary task. However, we continue to express interest in such a feature because it would be useful in setting up, debugging, and optimizing real-world neural networks.

We have completed one of our "plan to achieve" goals: Implement evaluation of a hard-coded neural network. Our stated goals remain feasible (or unnecessary, in the case of modifying Halide). However, we have decided to prioritize some goals over others in the interest of the Parallelism competition. Those goals are restated here:

Plan to achieve

[x] Explore Halide's capabilities by implementing evaluation of a simple hard-coded neural network. ~~If necessary, extend Halide with the appropriate functionality to enable neural network evaluation.~~
[ ] Implement evaluation of convolutional neural networks. ~~If necessary, extend Halide with functionality to enable convolutional neural networks.~~
[ ] Implement evaluation of neural networks constructed from parsing Caffe configuration files.

Hope to achieve

[ ] Optimize the training and evaluation of neural networks to be competitive with Caffe.
[ ] Implement training of a simple hard-coded neural network. If necessary, extend Halide with functionality to enable neural network training.
[ ] Implement training of convolutional neural networks.
[ ] Create a demo demonstrating real-time network evaluation.

Stretch goals

[ ] Beat Caffe in a benchmark.
[ ] Improve the debugging experience in Halide.
[ ] Implement additional types of neural networks, such as batch normalization or parameterized ReLUs.

Deliverables

Presentation explaining the components of our system, placing emphasis on ease of use.
Graphs demonstrating the efficiency of our project as compared to Caffe.
A small demo demonstrating real-time network evaluation using our project.

Detailed schedule

Week 2-tail (2015 Apr 16 - 2015 Apr 19)

Explore the efficiency of training a simple hard-coded neural network. (jczhang)
Perform preliminary optimization to understand Halide scheduling. (jchen4)

Week 3-head (2015 Apr 20 - 2015 Apr 22)

Extend the hard-coded neural network with an implementation of convolutions. (jczhang)
Explore the protocol buffer configuration files used by Caffe. (jchen4)

Week 3-tail (2015 Apr 23 - 2015 Apr 26)

Implement infrastructure (the "glue") for neural network evaluation. (jczhang)
Implement loading preconfigured neural networks in the Caffe format. (jchen4)

Week 4-head (2015 Apr 27 - 2015 Apr 29)

Optimize the performance of evaluation of Caffe preconfigured networks by testing scheduling configurations. (jczhang, jchen4)

Week 4-tail (2015 Apr 30 - 2015 May 3)

Implement training of convolutional neural networks. (jczhang)
Continue optimizing performance of evaluation of networks. (jchen4)

Week 5-head (2015 May 4 - 2015 May 6)

Finish incomplete code. (jczhang, jchen4)
Collect data on performance benchmarks. (jczhang, jchen4)

Week 5-tail (2015 May 7 - 2015 May 10)

Finish incomplete code, or create a real-time application utilizing our project. (jczhang, jchen4)
Finish demo for presentation day. (jczhang, jchen4)