Hardware Architecture

New submissions

Submissions received from Fri 10 May 24 to Mon 13 May 24, announced Tue, 14 May 24

New submissions
Cross-lists
Replacements

[ total of 9 entries: 1-9 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 14 May 24

[1] arXiv:2405.06840 [pdf, other]: Title: MEIC: Re-thinking RTL Debug Automation using LLMs

Authors: Ke Xu, Jialin Sun, Yuchen Hu, Xinwei Fang, Weiwei Shan, Xi Wang, Zhe Jiang

Subjects: Hardware Architecture (cs.AR); Software Engineering (cs.SE)

The deployment of Large Language Models (LLMs) for code debugging (e.g., C and Python) is widespread, benefiting from their ability to understand and interpret intricate concepts. However, in the semiconductor industry, utilising LLMs to debug Register Transfer Level (RTL) code is still insufficient, largely due to the underrepresentation of RTL-specific data in training sets. This work introduces a novel framework, Make Each Iteration Count (MEIC), which contrasts with traditional one-shot LLM-based debugging methods that heavily rely on prompt engineering, model tuning, and model training. MEIC utilises LLMs in an iterative process to overcome the limitation of LLMs in RTL code debugging, which is suitable for identifying and correcting both syntax and function errors, while effectively managing the uncertainties inherent in LLM operations. To evaluate our framework, we provide an open-source dataset comprising 178 common RTL programming errors. The experimental results demonstrate that the proposed debugging framework achieves fix rate of 93% for syntax errors and 78% for function errors, with up to 48x speedup in debugging processes when compared with experienced engineers. The Repo. of dataset and code: https://anonymous.4open.science/r/Verilog-Auto-Debug-6E7F/.
[2] arXiv:2405.07259 [pdf, other]: Title: CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool

Authors: Tanner Andrulis, Joel S. Emer, Vivienne Sze

Comments: Available at this https URL Published in ISPASS 2024

Subjects: Hardware Architecture (cs.AR)

Compute-In-Memory (CiM) is a promising solution to accelerate Deep Neural Networks (DNNs) as it can avoid energy-intensive DNN weight movement and use memory arrays to perform low-energy, high-density computations. These benefits have inspired research across the CiM stack, but CiM research often focuses on only one level of the stack (i.e., devices, circuits, architecture, workload, or mapping) or only one design point (e.g., one fabricated chip). There is a need for a full-stack modeling tool to evaluate design decisions in the context of full systems (e.g., see how a circuit impacts system energy) and to perform rapid early-stage exploration of the CiM co-design space.
To address this need, we propose CiMLoop: an open-source tool to model diverse CiM systems and explore decisions across the CiM stack. CiMLoop introduces (1) a flexible specification that lets users describe, model, and map workloads to both circuits and architecture, (2) an accurate energy model that captures the interaction between DNN operand values, hardware data representations, and analog/digital values propagated by circuits, and (3) a fast statistical model that can explore the design space orders-of-magnitude more quickly than other high-accuracy models.
Using CiMLoop, researchers can evaluate design choices at different levels of the CiM stack, co-design across all levels, fairly compare different implementations, and rapidly explore the design space.
[3] arXiv:2405.07518 [pdf, other]: Title: SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain, Urmish Thakker, Dawei Huang, Sumti Jairath, Kevin J. Brown, Kunle Olukotun

Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)

Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Experts (CoE) is an alternative modular approach that lowers the cost and complexity of training and serving. However, this approach presents two key challenges when using conventional hardware: (1) without fused operations, smaller models have lower operational intensity, which makes high utilization more challenging to achieve; and (2) hosting a large number of models can be either prohibitively expensive or slow when dynamically switching between them.
In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall. We describe Samba-CoE, a CoE system with 150 experts and a trillion total parameters. We deploy Samba-CoE on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU) - a commercial dataflow accelerator architecture that has been co-designed for enterprise inference and training applications. The chip introduces a new three-tier memory system with on-chip distributed SRAM, on-package HBM, and off-package DDR DRAM. A dedicated inter-RDU network enables scaling up and out over multiple sockets. We demonstrate speedups ranging from 2x to 13x on various benchmarks running on eight RDU sockets compared with an unfused baseline. We show that for CoE inference deployments, the 8-socket RDU Node reduces machine footprint by up to 19x, speeds up model switching time by 15x to 31x, and achieves an overall speedup of 3.7x over a DGX H100 and 6.6x over a DGX A100.

Cross-lists for Tue, 14 May 24

[4] arXiv:2405.06676 (cross-list from cs.CL) [pdf, other]: Title: EDA Corpus: A Large Language Model Dataset for Enhanced Interaction with OpenROAD

Authors: Bing-Yue Wu, Utsav Sharma, Sai Rahul Dhanvi Kankipati, Ajay Yadav, Bintu Kappil George, Sai Ritish Guntupalli, Austin Rovinski, Vidya A. Chhabria

Comments: Under review at Workshop on LLM-Aided Design (LAD'24)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)

Large language models (LLMs) serve as powerful tools for design, providing capabilities for both task automation and design assistance. Recent advancements have shown tremendous potential for facilitating LLM integration into the chip design process; however, many of these works rely on data that are not publicly available and/or not permissively licensed for use in LLM training and distribution. In this paper, we present a solution aimed at bridging this gap by introducing an open-source dataset tailored for OpenROAD, a widely adopted open-source EDA toolchain. The dataset features over 1000 data points and is structured in two formats: (i) a pairwise set comprised of question prompts with prose answers, and (ii) a pairwise set comprised of code prompts and their corresponding OpenROAD scripts. By providing this dataset, we aim to facilitate LLM-focused research within the EDA domain. The dataset is available at https://github.com/OpenROAD-Assistant/EDA-Corpus.
[5] arXiv:2405.07061 (cross-list from cs.LG) [pdf, other]: Title: LLMs and the Future of Chip Design: Unveiling Security Risks and Building Trust

Authors: Zeng Wang, Lilas Alrahis, Likhitha Mankali, Johann Knechtel, Ozgur Sinanoglu

Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Cryptography and Security (cs.CR)

Chip design is about to be revolutionized by the integration of large language, multimodal, and circuit models (collectively LxMs). While exploring this exciting frontier with tremendous potential, the community must also carefully consider the related security risks and the need for building trust into using LxMs for chip design. First, we review the recent surge of using LxMs for chip design in general. We cover state-of-the-art works for the automation of hardware description language code generation and for scripting and guidance of essential but cumbersome tasks for electronic design automation tools, e.g., design-space exploration, tuning, or designer training. Second, we raise and provide initial answers to novel research questions on critical issues for security and trustworthiness of LxM-powered chip design from both the attack and defense perspectives.
[6] arXiv:2405.07266 (cross-list from cs.ET) [pdf, other]: Title: Architecture-Level Modeling of Photonic Deep Neural Network Accelerators

Authors: Tanner Andrulis, Gohar Irfan Chaudhry, Vinith M. Suriyakumar, Joel S. Emer, Vivienne Sze

Comments: Published at ISPASS 2024

Subjects: Emerging Technologies (cs.ET); Hardware Architecture (cs.AR)

Photonics is a promising technology to accelerate Deep Neural Networks as it can use optical interconnects to reduce data movement energy and it enables low-energy, high-throughput optical-analog computations.
To realize these benefits in a full system (accelerator + DRAM), designers must ensure that the benefits of using the electrical, optical, analog, and digital domains exceed the costs of converting data between domains. Designers must also consider system-level energy costs such as data fetch from DRAM. Converting data and accessing DRAM can consume significant energy, so to evaluate and explore the photonic system space, there is a need for a tool that can model these full-system considerations.
In this work, we show that similarities between Compute-in-Memory (CiM) and photonics let us use CiM system modeling tools to accurately model photonics systems. Bringing modeling tools to photonics enables evaluation of photonic research in a full-system context, rapid design space exploration, co-design, and comparison between systems.
Using our open-source model, we show that cross-domain conversion and DRAM can consume a significant portion of photonic system energy. We then demonstrate optimizations that reduce conversions and DRAM accesses to improve photonic system energy efficiency by up to 3x.

Replacements for Tue, 14 May 24

[7] arXiv:2208.00763 (replaced) [pdf, other]: Title: HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation

Authors: Yao Chen, Junhao Pan, Xinheng Liu, Jinjun Xiong, Deming Chen

Comments: arXiv admin note: text overlap with arXiv:2112.13972

Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Performance (cs.PF)
[8] arXiv:2308.09445 (replaced) [pdf, other]: Title: parti-gem5: gem5's Timing Mode Parallelised

Authors: José Cubero-Cascante, Niko Zurstraßen, Jörn Nöller, Rainer Leupers, Jan Moritz Joseph

Comments: Pre-print of work presented at SAMOS Conference XXIII

Subjects: Hardware Architecture (cs.AR)
[9] arXiv:2311.01598 (replaced) [pdf, other]: Title: CiFlow: Dataflow Analysis and Optimization of Key Switching for Homomorphic Encryption

Authors: Negar Neda, Austin Ebel, Benedict Reynwar, Brandon Reagen

Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR); Performance (cs.PF)

New submissions
Cross-lists
Replacements

[ total of 9 entries: 1-9 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help (Access key information)

> cs > cs.AR

Hardware Architecture

New submissions

New submissions for Tue, 14 May 24

Cross-lists for Tue, 14 May 24

Replacements for Tue, 14 May 24