Toshiba Memory Corporation Develops High-Speed and High-Energy-Efficiency Algorithm and Hardware Architecture for Deep Learning Processor

  • November 6, 2018
  • Toshiba Memory Corporation

TOKYO- Toshiba Memory Corporation, the world leader in memory solutions, today announced the development of a high-speed and high-energy-efficiency algorithm and hardware architecture for deep learning processing with less degradations of recognition accuracy. The new processor for deep learning implemented on an FPGA[1] achieves 4 times energy efficiency compared to conventional ones. The advance was announced at IEEE Asian Solid-State Circuits Conference 2018 (A-SSCC 2018) in Taiwan on November 6.

Deep learning calculations generally require large amounts of multiply-accumulate (MAC) operations, and it has resulted in issues of long calculation time and large energy consumption. Although techniques reducing the number of bits to represent parameters (bit precision) have been proposed to reduce the total calculation amount, one of proposed algorithm reduces the bit precision down to one or two bit, those techniques cause degraded recognition accuracy. Toshiba Memory developed the new algorithm reducing MAC operations by optimizing the bit precision of MAC operations for individual filters[2] in each layer of a neural network. By using the new algorithm, the MAC operations can be reduced with less degradation of recognition accuracy.

Furthermore, Toshiba Memory developed a new hardware architecture, called bit-parallel method, which is suitable for MAC operations with different bit precision. This method divides each various bit precision into a bit one by one and can execute 1-bit operation in numerous MAC units in parallel. It significantly improves utilization efficiency of the MAC units in the processor compared to conventional MAC architectures that execute in series.

Toshiba Memory implemented ResNet50[3], a deep neural network, on an FPGA using the various bit precision and bit-parallel MAC architecture. In the case of image recognition for the image dataset of ImageNet[4], the above technique supports that both operation time and energy consumption for recognizing image data are reduced to 25 % with less recognition accuracy degradation, compared to conventional method.

Artificial intelligence (AI) is forecasted to be implemented in various devices. The developed high-speed and low-energy-consumption techniques for deep-learning processors are expected to be utilized for various edge devices like smartphones and HMDs[5] and datacenters which require low energy consumption. High-performance processors like GPU are important devices for high-speed operation of AI. Memories and storages are also one of the most important devices for AI which inevitably use big data. Toshiba Memory Corporation is continuously focusing on research and development of AI technologies as well as innovating memories and storages to lead data-oriented computing.

[1] FPGA: Field Programmable Gate Array, an integrated circuit designed to be configured by a customer or a designer after manufacturing.
[2] filter: Generally, there are many filters of up to several thousands in one layer of a neural network.
[3] ResNet50: One of deep neural network, generally used to benchmark deep-learning for image recognition.
[4] ImageNet: A large image database, generally used to benchmark image-recognition, the number of image data is over 14,000,000.
[5] HMD: Head Mounted Display