Hardware Accelerators for Machine Learning and AI

Machine learning (ML) and AI technologies have revolutionized the ways in which we interact with large-scale, imperfect, real-world data.  We can cast these as high-dimensional optimizations; we can manage the inherent uncertainties via the mechanics of probability; and we can search for answers to complex questions across a range of vital applications. What we cannot do is solve these problems quickly and efficiently, or in low-power form-factors, in standard computing architectures.  This has motivated significant work in custom accelerator architectures.

We’ve worked extensively in three different areas for ML/AI accelerators:
  • Speech recognition in silicon: Our In Silico Vox project was the first to demonstrate how to migrate a complete, speaker-independent, large-vocabulary recognizer from its software-only form into a custom hardware form factor. We designed high-speed FPGA and virtual silicon prototypes, were featured in the Economist magazine as we developed this technology. Voci Technologies was acquired by Medallia in 2020, which delivered ultra-fast, high-accuracy enterprise-scale voice analytics solutions.  
  • ML inference hardware for probabilistic graphical models (PGMs): In these graphs, labels on nodes encode what we know and “how much” we believe it;  edges encode belief relationships among labels; statistical inference answers questions such as “if we observe some of the labels in the graph, what are most likely labels on the remainder?”  These problems are interesting because they can be very large (e.g., every pixel in an image is a node) and because we need answers very fast (e.g., at video frame rates). We have demonstrated a range of algorithms (belief propagation, graph cuts, MCMC) running a range of real-world tasks, in both FPGAs and custom silicon. 
  • Privacy-preserving learning: As ML technologies such a deep neural net proliferate, there is growing concern about ways to secure the personal data – audio, video, text, etc – used to train and to execute recognition and classifier tasks on these architectures.  We’ve recently been focusing on Homomorphic Encryption (HE) technologies, which are promising in terms of security but vastly too slow in software.  We have been demonstrating a range of novel custom-building blocks for hardware HE.
     

Key Papers

Homomorphic Encryption for Privacy-preserving Learning

  1. Sunwoong Kim, Keewoo Lee, Wonhee Cho, Yujin Nam, Jung Hee Cheon, and Rob A. Rutenbar,” Hardware Architecture of a Number Theoretic Transform for a Bootstrappable RNS-based Homomorphic Encryption Scheme,” in Proc. IEEE Int’l Symposium on Field Programmable Computing Machines, July 2020.
  2. Sungwoon Kim, R. A. Rutenbar, et al., “FPGA-based Accelerators for Fully Pipelined Modular Multipliers for Homomorphic Encryption,” in Proc. 2019 Int’l Conference on ReConfigurable Computing and FPGAs, December 2019, Cancun, Mexico.
  3. Sunwoong Kim, Rob A. Rutenbar, “Area-Efficient Iterative Single-Precision Floating-Point Multiplier Architecture for an FPGA,” in Proc. Great Lakes Symposium on VLSI, May 2019.

 

ML Inference Hardware 

  1. Glenn G. Ko, Yuji Chai, Marco Donato, Paul N. Whatmough, Thierry Tambe, Rob A. Rutenbar, David Brooks, Gu-Yeon Wei, “A 3mm2 Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception using Parallel Gibbs Sampling in 16nm,” Proc. IEEE 2020 Symposium on VLSI Technology, to appear June 2020.
  2. Glenn G. Ko, Yuji Chai, Rob A. Rutenbar, David Brooks and Gu-Yeon Wei, “Accelerating Bayesian Inference for Structured Graphs Using Parallel Gibbs Architecture,” in Proc. Int’l Conference on Field Programmable Logic & Applications (FPL), September 2019.
  3. Tianqi Gao, R.A. Rutenbar, “A Virtual Image Accelerator for Graph Cuts Inference on FPGA,” Abstract in Proc. 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP Poster), July 2019.
  4. Glenn G. Ko, Yuji Chai, Rob A. Rutenbar, David Brooks and Gu-Yeon Wei, “FlexGibbs: Reconfigurable Parallel Gibbs Sampling Accelerator for Structured Graphs,” Abstract in Proc. IEEE Int’l Symposium on Field Programmable Computing Machines(FCCM poster), May 2019.
  5. Tianqi Gao, Rob A. Rutenbar, “A Pixel-Parallel Virtual-Image Architecture for High Performance and Power Efficient Graph Cuts Inference,”  Abstract in Proc. 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, (ISFPGA poster)  Feb. 2019.
  6. Sunwoong Kim, Rob A. Rutenbar, “Accelerator Design with Effective Resource Utilization for Binary Convolutional Neural Networks on an FPGA,” Abstract in Proc. IEEE Int’l Symposium on Field Programmable Computing Machines (FCCM poster), May 2018.
  7. Glenn G. Ko and Rob A. Rutenbar, “Real-Time and Low-Power Streaming Source Separation using Markov Random Field,” IEEE Journal of Emerging and Selected Topics in Circuits and Systems (JETCAS), Special Issue on Frontiers of Hardware and Algorithms for On-chip Learning, Vol. 14, Issue 2, May 2018.
  8. Tianqi Gao, Jungwook Choi, Shang-nien Tsai and Rob A. Rutenbar, “Toward a Pixel-Parallel Architecture for Graph Cuts Inference on FPGA, 2017 International Conference on Field-Programmable Logic and Applications,  Sept. 2017.  
  9. Glenn Ko and Rob A. Rutenbar, “A Case Study of Machine Learning Hardware: Real-Time Source Separation using Markov Random Fields via Sampling-based Inference”, IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP), April 2017.
  10. J. Choi and R.A. Rutenbar, “Configurable and Scalable Belief Propagation Accelerator for Computer Vision,” Int’l Conference on Field Programmable Logic and Applications (FPL’16), September 2016.
  11. J. Choi, A. D. Patil, R.A. Rutenbar, N. R. Shanbhag, “Analysis Of Error Resiliency Of Belief Propagation In Computer Vision,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2016.
  12. Eric P. Kim, Jungwook Choi, Naresh R. Shanbhag, and Rob A. Rutenbar, “Error Resilient and Energy Efficient MRF Message Passing Based Stereo Matching,” IEEE Transactions on VLSI Systems (TVLSI), Vol. 24, No. 3, March 2016.
  13. Jungwook Choi and Rob A. Rutenbar, “Video-Rate Stereo Matching Using Markov Random Field TRW-S Inference on a Hybrid CPU+FPGA Computing Platform,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 26, no. 2, February 2016.
  14. Skand Hurkat, Jungwook Choi, Eriko Nurvitadhi, Jose F. Martinez, Rob A. Rutenbar, “A Fast Hierarchical Implementation of Sequential Tree reweighted Belief Propagation for Probabilistic Inference,”  in Proc. 25th International Conference on Field Programmable Logic and Applications (FPL’15) London, England, Sept. 2015.
  15. Eric P. Kim, Jungwook Choi, Naresh R. Shanbhag, and Rob A. Rutenbar, “A Robust Message Passing Based Stereo Matching Kernel via System-Level Error Resiliency,” in IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP), Apr. 2014.
  16. Jungwook Choi and Rob. A. Rutenbar, “FPGA Acceleration of Markov Random Field TRW-S Inference for Stereo Matching”, in Proc. 2013 Eleventh IEEE/ACM Int’l Conference Formal Methods and Models for Codesign (MEMOCODE), October 2013.  
  17. Jungwook Choi, Erik Kim, Rob A. Rutenbar, Naresh Shanbhag,  “Error Resilient MRF Message Passing Architecture for Stereo Matching”, in Proc. 2013 IEEE Workshop on Signal Processing Systems, Taipei, Taiwan from October 16-18, 2013.  (Best Student Paper Award.)
  18. Jungwook Choi and Rob A. Rutenbar, “Video-Rate Stereo Matching Using Markov Random Field TRW-S Inference on a Hybrid CPU+FPGA Computing Platform,” in Proc. ACM Int’l Symposium on FPGAs (ISFPGA), February 2013.
  19. Minje KimParis Smaragdis, Glenn G. Ko, and Rob A. Rutenbar,“Stereophonic Spectrogram Segmentation Using Markov Random Fields,”in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Santander, Spain, September 2012. 
  20. Jungwook Choi and Rob A. Rutenbar, “Hardware Implementation of MRF MAP Inference on an FPGA Platform,” Proc. 22nd Intl Conference on Field Programmable Logic and Applications (FPL), August 2012.

 

Speech Recognition in Silicon

  1. Patrick Bourke, Kai Yu and Rob A. Rutenbar, “Mobile Speech Hardware: The Case for Custom Silicon,” Chapter 2 in Speech in Mobile and Pervasive Environments, Nitendra Rajput and Amit Anil Nanavati, Eds., Wiley, February 2012, pp. 7-56, 2012. ISBN:  0470694351.
  2. Jeffrey R. Johnston, Rob A. Rutenbar, “A High-Rate, Low-Power, ASIC Speech Decoder Using Finite State Transducers,”  in Proc. 23rd IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP’2012, Delft, The Netherlands, July 2012.
  3. E.C. Lin and R.A. Rutenbar, “A Multi-FPGA 10x-Real-Time High-Speed Search Engine for a 5000-Word Vocabulary Speech Recognizer,” Proc. 2009 ACM International Symposium on FPGAs (ISFPGA), February 2009. 
  4. P. Bourke and R.A. Rutenbar, “A Low-Power Hardware Search Architecture for Speech Recognition,” Proc. Interspeech 2008, October 2008.
  5. K. Yu, R.A. Rutenbar, “Generating Small, Accurate Acoustic Models with a Modified Bayesian Information Criterion,” Proc. Interspeech 2007, August 2007.
  6. Edward C. Lin, Kai Yu, Rob A. Rutenbar and Tsuhan Chen, “A 1000-Word Vocabulary, Speaker- Independent, Continuous Live-Mode Speech Recognizer Implemented in a Single FPGA,” Proc. ACM International Symposium on FPGAs, Feb. 2007. 
  7. Edward C. Lin, Kai Yu, Rob A. Rutenbar and Tsuhan Chen, “Moving Speech Recognition from Software to Silicon: the In Silico Vox Project,” Proc. International Conference on Spoken Language Process­ing (InterSpeech2006), September 2006.