Abstract
The use of lower precision has emerged as a popular technique to optimize the compute and storage requirements of complex deep neural networks (DNNs). In the quest for lower precision, recent studies have shown that ternary DNNs (which represent weights and activations by signed ternary values) represent a promising sweet spot, achieving accuracy close to full-precision networks on complex tasks. We propose TiM-DNN, a programmable in-memory accelerator that is specifically designed to execute ternary DNNs. TiM-DNN supports various ternary representations including unweighted {-1, 0, 1}, symmetric weighted {-a, 0, a}, and asymmetric weighted {-a, 0, b} ternary systems. The building blocks of TiM-DNN are TiM tiles- specialized memory arrays that perform massively parallel signed ternary vector-matrix multiplications with single access. TiM tiles are in turn composed of ternary processing cells (TPCs), bit-cells that function as both ternary storage units and signed ternary multiplication units. We evaluate an implementation of TiM-DNN in 32-nm technology using an architectural simulator calibrated with SPICE simulations and RTL synthesis. We evaluate TiM-DNN across a suite of state-of-the-art DNN benchmarks including both deep convolutional and recurrent neural networks. A 32-tile instance of TiM-DNN achieves a peak performance of 114 TOPs/s, consumes 0.9-W power, and occupies 1.96 mm2 chip area, representing a 300× and 388× improvement in TOPS/W and TOPS/mm2, respectively, compared to an NVIDIA Tesla V100 GPU. In comparison to specialized DNN accelerators, TiM-DNN achieves 55×-240× and 160×-291× improvement in TOPS/W and TOPS/mm2, respectively. Finally, when compared to a well-optimized near-memory accelerator for ternary DNNs, TiM-DNN demonstrates 3.9×-4.7× improvement in system-level energy and 3.2×-4.2× speedup, underscoring the potential of in-memory computing for ternary DNNs.
-
Author Keywords
- AI hardware,
- in-memory computing,
- low-precision deep neural networks (DNNs),
- ternary dot-products,
- ternary neural networks
-
IEEE Keywords
- Computational modeling,
- Nonvolatile memory,
- Encoding,
- Tiles,
- Very large scale integration,
- Task analysis,
- Performance evaluation
Introduction
DEEP neural networks (DNNs) have drastically advanced the field of machine learning by enabling super-human accuracies for many cognitive tasks involved in image, video, and natural language processing [1]. However, the high computation and storage costs of DNNsseverely limit their deployment in energy and cost-constrained devices [2]. The use of lower precision to represent the weights and activations in DNNs is a promising technique for improving the efficiency of DNN inference (evaluation of pretrainedDNN models) [3]–[14]. Reduced bit-precision can lower all facets of energy consumption including computation, memory, and data transfer. Current commercial hardware [15], [16]already supports 8-bit and 4-bit fixed point formats, while recent research has continued the push toward even lower precision [4]–[12]. Recent studies [4]–[12], [17] suggest that ternary DNNspresent a particularly attractive sweet-spot in the tradeoff between efficiency and accuracy. To illustrate this, Fig. 1reports the accuracies of various state-of-the-art binary[4]–[6], ternary [7]–[12], and full-precision (FP32) DNNs for image classification (ImageNet [18]) and language modeling(PTB [19]). We observe that the accuracy degradation of binary DNNs over the FP32 networks can be considerable[5%–13% for image classification, 150–180 PPW (PerplexityPer Word) for language modeling]. In contrast, ternary DNNsachieve accuracy significantly better than binary networks, and result in minimal degradation (0.53% for image classification)compared to FP32 networks. Motivated by these results, we focus on the design of a programmable accelerator for realizing state-of-the-art ternary DNNs.
The multiply-and-accumulate (MAC) operation represents 95%–99% of total DNN computations. Consequently, the amount of energy and time spent on DNN computations can be drastically improved by using ternary processing elements (the energy of a MAC operation has a superlinear relationship with precision). However, when classical accelerator architectures (e.g., TPUs and GPUs) are adopted to realize ternary DNNs, the memory becomes the energy and performance bottleneck due to sequential (row-by-row) reads and leakage in un-accessed rows. In-memory computing [20]–[44]is an emerging computing paradigm that overcomes memory bottlenecks by integrating computations within the memory array itself, enabling greater parallelism and reducing the need to transfer data to/from memory. This work explores-memory computing in the specific context of ternary DNNsand demonstrates that it leads to significant improvements in performance and energy efficiency.
Although several efforts have explored in-memory accelerators in recent years, TiM-DNN differs in significant ways and is the first to apply in-memory computing (massively parallel vector-matrix multiplications within the memory array)to ternary DNNs using a new CMOS-based bit-cell. Prior efforts have explored SRAM-based in-memory accelerators for binary networks [26]–[31]. However, the restriction to binary networks is a significant limitation as binary networks are known to date incur a large drop in accuracy as highlighted in Fig. 1. Many in-memory accelerators use nonvolatile memory (NVM) technologies such as PCM and ReRAM [20]–[25],[45] to realize in-memory dot product operations. AlthoughNVMs promise higher density and lower leakage than CMOSmemories, they are still an emerging technology with open challenges such as large-scale manufacturing yield, limited endurance, high write energy, and errors due to device and circuit-level nonidealities [46], [47]. Near-memory accelerators for ternary networks [10], [48] have also been proposed, but their performance and energy are limited by sequential (row-by-row) memory access. SRAMs augmented within-memory binary computation and additional near-memory logic have been proposed to perform higher precision computations in a bit-serial manner [49]. However, such an approach suffers from similar bottlenecks, limiting efficiency.
We propose TiM-DNN, a programmable in-memory accelerator that can realize massively parallel signed ternary vector-matrix multiplications per array access. TiM-DNNsupports various ternary representations including unweighted{−1,0,1}, symmetric weighted{−a,0,a}, and asymmetricweighted{−a,0,b}systems, enabling it to execute a broad range of state-of-the-art ternary DNNs. This is motivated by recent efforts [7]–[12] that show weighted ternary systems can achieve improved accuracies. The building block of TiM-DNNis a new memory cell called the ternary processing cell(TPC), which functions as both a ternary storage unit and a scalar ternary multiplication unit. Using TPCs, we designTiM tiles, which are specialized memory arrays that execute signed ternary dot-product operations. TiM-DNN comprises of a plurality of TiM tiles arranged into banks, wherein all tiles compute signed vector-matrix multiplications in parallel. We develop an architectural simulator for TiM-DNN, with array-level timing and energy models obtained from circuit-level simulations in 32-nm CMOS technology. We evaluate TiM-DNN using a suite of 5 popular DNNs designed for image classification and language modeling tasks.A 32-tile instance of TiM-DNN achieves a peak performance of 114 TOPs/s, consumes 0.9-W power, and occupies1.96-mm2chip area, representing a 300×improvementin TOPS/W compared to a state-of-the-art NVIDIA TeslaV100 GPU [15]. In comparison to recent low-precision accelerators [48], [49], TiM-DNN achieves 55.2×-240×improve-meant in TOPS/W. Finally, TiM-DNN obtains 3.9×-4.7×improvement in system energy and 3.2×-4.2×improvement in performance over a highly optimized near-memory accelerator for ternary DNNs.
Conclusion
Ternary DNNs are extremely promising due to their ability to achieve an accuracy similar to full-precision networks on complex machine learning tasks while enabling DNN inference at low energy. In this work, we presented TiM-DNN, an in-memory accelerator for executing state-of-the-art ternaryDNNs. TiM-DNN is a programmable accelerator designed using TiM tiles, i.e., specialized memory arrays for realizing massively parallel signed vector-matrix multiplications with ternary values. TiM tiles are in turn composed using a newTPC that functions as both a ternary storage unit and a scalar multiplication unit. We evaluate an instance of TiM-DNNwith 32 TiM tiles and demonstrate that it achieves significant energy and performance improvements over GPUs, current DNN accelerators, as well as a well-optimized near-memory ternary accelerator baseline.
About KSRA
The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.
KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.
Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.
The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.
FULL Paper PDF file:
TiM-DNN: Ternary In-Memory Accelerator for Deep Neural NetworksBibliography
author
Year
2020
Title
TiM-DNN: Ternary In-Memory Accelerator for Deep Neural Networks
Publish in
in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 7, pp. 1567-1577, July 2020,
Doi
10.1109/TVLSI.2020.2993045.
PDF reference and original file: Click here
Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.
-
Nasim Gazeranihttps://ksra.eu/author/nasim/
-
Nasim Gazeranihttps://ksra.eu/author/nasim/
-
Nasim Gazeranihttps://ksra.eu/author/nasim/
-
Nasim Gazeranihttps://ksra.eu/author/nasim/