Time | What |
---|---|
09:00 - 10:30 | Workshops and tutorials - first block |
10:30 - 11:00 | Coffee break |
11:00 - 12:30 | Workshops and tutorials - second block |
12:30 - 13:30 | Lunch break |
13:30 - 15:00 | Workshops and tutorials - third block |
15:00 - 15:30 | Coffee break |
15:30 - 17:30 | Workshops and tutorials - last block |
Time | What |
---|---|
09:00 - 10:30 | Workshops and tutorials - first block |
10:30 - 11:00 | Coffee break |
11:00 - 12:30 | Workshops and tutorials - second block |
12:30 - 13:30 | Lunch break |
13:30 - 15:00 | Workshops and tutorials - third block |
15:00 - 15:30 | Coffee break |
15:30 - 17:30 | Workshops and tutorials - last block |
Time | What |
---|---|
8:00 | Opening |
8:30 | Keynote: Concurrent Data Sketches, Idit Keidar, Technion - Israel Institute of Technology |
9:30 | Break |
10:00 |
Session: Compilers
Chair: Albert Cohen
|
12:00 | Lunch |
13:00 |
Session: Memory system
Chair: Bernhard Egger
|
14:30 | Break |
15:00 |
Session: Memory system (cont)
Chair: Tien-Pao Shih
|
16:30 | Break |
17:00 | Reception Poster session |
19:00 | Business meeting |
Time | What |
---|---|
8:30 | Keynote: Energy-Efficient GPU Architectures for Real-Time Rendering, Antonio González, UPC Barcelona |
9:30 | Break |
10:00 |
Session: GPUs
Chair: Gregory Byrd
|
12:00 | Lunch |
13:00 |
Session: Algorithms
Chair: Michael Spear
|
15:00 | Break |
15:30 |
Session: Architecture
Chair: Tamara Lehman
|
17:30 | Break |
18:30 | Conference dinner at the Motto am Fluss |
Time | What | Where |
---|---|---|
8:30 | Keynote: Optimizing Compilers in an Age of Ubiquitous AI, Albert Cohen, Google DeepMind | |
9:30 | Break | |
10:00 | SRC poster winners presentations | |
11:00 |
Session: Optimization
Chair: Riyadh Baghdadi
|
|
12:30 | Closing |
Data sketching algorithms have become an indispensable tool for high-speed computations over massive datasets. They maintain a succinct summary of a data stream’s state and answer queries on it using limited memory, at the cost of giving approximations rather than exact answers. For example, a Θ sketch estimates the number of unique items in a data stream, the CountMin sketch approximates the frequencies at which distinct stream elements occur, and a Quantiles sketch estimates the data distribution of a large input stream.
This talk will discuss efficient concurrent (multi-threaded) implementations of such objects.We will first present an efficient generic approach to parallelizing data sketches and allowing them to be queried in real time, while bounding the error that such parallelism introduces. When instantiated with the KMV Θ sketch sketch, this solution achieves high scalability with a small error. Its implementation is now now publicly available as part of the popular Apache Data Sketches library.
Second, we will discuss the correctness semantics of such objects. We will define Intermediate Value Linearizability (IVL), a correctness criterion that relaxes linearizability to allow more parallelism, and yet preserves the error bounds of sequential (probabilistic) sketches. To illustrate the power of this result, we will show a straightforward and efficient concurrent implementation of a CountMin sketch, which is IVL (albeit not linearizable).
Finally, we will consider the Quantiles sketch, which does not scale well using the generic concurrent sketches approach. We instead present Quancurrent, a highly scalable quantiles sketch.
Based on joint works with Edward Bortnikov, Shaked Elias-Zada, Eshcar Hillel, Lee Rhodes, Arik Rinberg, Hadar Serviansky, and Alexander Spiegelman.
Idit Keidar is a Chaired Professor and the current Dean of the Viterbi Faculty of Electrical and Computer Engineering at the Technion – Israeli Institute of Technology. She received her BSc (summa cum laude), MSc (summa cum laude), and PhD from the Hebrew University of Jerusalem in 1992, 1994, and 1998, respectively. Subsequently, she was a Rothschild Postdoctoral Fellow at MIT’s Laboratory for Computer Science. She was a Visiting Professor at Cornell and has consulted for several companies. Prof. Keidar has also served as the program chair for leading conferences (PODC, DISC, PPoPP, and SYSTOR). In her free time, she enjoys writing prose.
Mobile devices such as smartphones and tablets have become the most commonly used computing device nowadays, and projections forecast a significant growth in the future in both the number of shipped units and their capabilities. These devices have become quite powerful and most applications that run on them make an intensive use of graphics animation to provide a rich user experience. Energy consumption and the related heat dissipation issues are the main constraints for the capabilities provided by such systems. These systems are equipped with a very small battery that is expected to last for hours if not days, and since they are normally handheld, their external temperature cannot significantly exceed typical human levels. To provide richer user experiences in these devices, dramatic improvements in energy efficiency are required. This talk focuses on one of the main components of these systems, which is the GPU, and describes some novel microarchitectures that we have recently developed for increasing its performance and energy efficiency.
Antonio González is a Full Professor at the Computer Architecture Department of the Universitat Politècnica de Catalunya, Barcelona (Spain), and the director of the Architecture and Compilers research group. He was the founding director of the Intel Barcelona Research Center from 2002 to 2014.
His research has focused on computer architecture and compilers, with special emphasis on cognitive computing systems and graphics processors in recent years. Antonio holds 53 patents, has published over 400 research papers and has given over 130 invited talks. He has a long track record of innovations through technology transfer of his research results to commercial products, especially microprocessors and computing systems in general.
Antonio has served as associate editor for five IEEE and ACM journals, program chair for ISCA, MICRO, HPCA, ICS and ISPASS, general chair for MICRO and HPCA, and member of the program committee for more than 140 symposia.
He is a recipient of multiple awards including the Rosina Ribalta award as the advisor of the best PhD project in Information Technology and Communications, the Duran Farell award for research in technology, the Aritmel National Award of Informatics to the Computer Engineer of the Year, the King Jaime I award in the area of New Technologies, and the ICREA Academia Award. He has been inducted into the “IEEE/ACM MICRO Hall of Fame, the “IEEE HPCA Hall of Fame” and the “ACM/IEEE ISCA Hall of Fame”. Antonio is a Fellow of IEEE and ACM.
Compilers and AI have been working hand in hand for some time… almost “forever” actually. Yet, despite decades of mutually profitable inspiration and progress, the art of compiler construction has not seen changes comparable to the accelerating history of ML for the last 10 years. And yet again, we may actually be standing at the doorstep of such a radical shift in the design of compilers. I am not the only one to notice this of course: ML-enhanced compilers become the norm rather than the exception, while high-performance ML is made possible by advances in domain-specific compilers. I would like to draw the PACT community’s attention to challenges that may require more radical changes to compiler construction, pertaining to correctness, performance and agility. For example, today’s highest performance libraries and heroic accelerator programming are only made possible at the expense of a dramatic loss of programmability. Are we ever going to find a way out of this portability/performance dilemma? What about the agility of compiler engineers? Can we build a software infrastructure scalable enough to compile billions of lines of code while leveraging advanced ML-based heuristics? Can we do so while enabling massive code reuse across domains, languages and hardware? We will shed some light on these questions, based on recent successes and half-successes in academia and industry. We will also form an invitation to tackle these challenges in future research and software development.
Albert Cohen is a research scientist at Google. An alumnus of École Normale Supérieure de Lyon and the University of Versailles, he has been a research scientist at Inria, a visiting scholar at the University of Illinois, an invited professor at Philips Research, and a visiting scientist at Facebook Artificial Intelligence Research. Albert works on parallelizing, optimizing and machine learning compilers, and on dataflow and synchronous programming languages, with applications to high-performance computing, artificial intelligence and reactive control.
This is the preliminary list of accepted papers and posters accepted at PACT 2023.
A Silicon Photonic Multi-DNN Accelerator, Yuan Li, George Washington University, Ahmed Louri, George Washington University, Avinash Karanth, Ohio University
Accelerating Decision-Tree-based Inference through Adaptive Parallelization, Jan van Lunteren, IBM Research
Architecture-Aware Currying, Mahmut Taylan Kandemir, The Pennsylvania State University, Gulsum Gudukbay Akbulut, The Pennsylvania State University, Wonil Choi, Hanyang University, Mustafa Karakoy, TUBITAK-BILGEM
Automatic Algorithm-Based Fault Tolerance (AABFT) of Stencil Computations, Louis Narmour, University of Rennes 1, CNRS, IRISA, Colorado State University, Steven Derrien, University of Rennes 1, CNRS, IRISA, Sanjay Rajopadhye, Colorado State University
Barad-dur: Near-Storage Accelerator for Training Large Graph Neural Networks, Jiyoung An, University of California Irvine, Sang-Woo Jun, University of California Irvine
Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUs, Diya Joseph, Polytechnic University of Catalonia, Juan L. Aragón, University of Murcia, Joan-Manuel Parcerisa, Polytechnic University of Catalonia, Antonio Gonzalez, Polytechnic University of Catalonia
CELLO: Compiler-Assisted Efficient Load-Load Ordering in Data-Race-Free Regions, Sawan Singh, University of Murcia, Josue Feliu, University of Murcia, Manuel E. Acacio, University of Murcia, Alexandra Jimborean, University of Murcia, Alberto Ros, University of Murcia
Drishyam: An Image is Worth a Data Prefetcher, Shubdeep Mohapatra, Student at BITS Pilani, Biswabandan Panda, IIT Bombay
G-Sparse: Compiler-Driven Acceleration for Generalized Sparse Computation for Graph Neural Networks on Modern GPUs, Yue Jin, Ant Group, Heng Zhang, Institute of Software, Chinese Academy of Sciences, Chengying Huan, Institute of Software, Chinese Academy of Sciences, Yongchao Liu, Ant Group, Shuaiwen Leon Song, Microsoft/University of Sydney, Rui Zhao, Ant Group, Yao Zhang, Microsoft, Charles He, Dipeak Ltd, Wenguang Chen, Ant Group
Automatic Code Generation for High-Performance Graph Algorithms, Zhen Peng, Pacific Northwest National Laboratory, Rizwan A. Ashraf, Pacific Northwest National Laboratory, Luanzheng Guo, Pacific Northwest National Laboratory, Ruiqin Tian, Horizon Robotics, Gokcen Kestor, Pacific Northwest National Laboratory
HugeGPT: Storing Guest Page Tables on Host Huge Pages to Accelerate Address Translation, Weiwei Jia, The University of Rhode Island, Jiyuan Zhang, University of Illinois Urbana-Champaign, Jianchen Shan, Hofstra University, Yiming Du, The University of Rhode Island, Xiaoning Ding, New Jersey Institute of Technology, Tianyin Xu, University of Illinois at Urbana-Champaign
INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core, Jae Seok Kwak, Yonsei University, Myung Kuk Yoon, Ewha Womans University, Ipoom Jeong, University of Illinois Urbana-Champaign, Seunghyun Jin, Yonsei University, Won Woo Ro, Yonsei University
MBAPIS: Multi-Level Behavior Analysis Guided Program Interval Selection for Microarchitecture Studies, Hongwei Cui, School of Computer Science, Peking University, Yujie Cui, School of Computer Science, Peking University, Honglan Zhan, School of Computer Science, Peking University, Shuhao Liang, School of Computer Science, Peking University, Xianhua Liu, School of Computer Science, Peking University, Chun Yang, School of Computer Science, Peking University, Xu Cheng, School of Computer Science, Peking University
GraphMini: Accelerating Subgraph Enumeration Using Auxiliary Graphs, Juelin Liu, University of Massachusetts Amherst, Sandeep Polisetty, University of Massachusetts Amherst, Hui Guan, University of Massachusetts Amherst, Marco Serafini, University of Massachusetts Amherst
Parallelizing Maximal Clique Enumeration on GPUs, Mohammad Almasri, University of Illinois at Urbana-Champaign, Yen-Hsiang Chang, University of Illinois at Urbana-Champaign, Izzat El Hajj, American University of Beirut, Rakesh Nagi, University of Illinois at Urbana-Champaign, Jinjun Xiong, University at Buffalo, Wen-mei Hwu, NVIDIA, University of Illinois at Urbana-Champaign
Performance Characterization of Popular DNN Models on Out-of-Order CPUs, Pablo Prieto, Universidad de Cantabria, Pablo Abad, Universidad de Cantabria, Jose Angel Gregorio, Universidad de Cantabria, Valentin Puente, Universidad de Cantabria
PreFlush: Lightweight Hardware Prediction Mechanism for Cache Line Flush and Writeback, Hussein Elnawawy, North Carolina State University, James Tuck, North Carolina State University, Gregory Byrd, North Carolina State University
SDM: Sharing-enabled Disaggregated Memory System with Cache Coherent Compute Express Link, Hyokeun Lee, North Carolina State University, Kwanseok Choi, Seoul National University, Hyuk-Jae Lee, Seoul National University, Jaewoong Sim, Seoul National University
Separating Mechanism from Policy in STM, Yaodong Sheng, Lehigh University, Ahmed Hassan, Lehigh University, Michael Spear, Lehigh University
SimplePIM: A Software Framework For Productive And Efficient In-Memory Processing, Jinfan Chen, ETH Zurich, Juan Gomez Luna, ETH Zurich, Izzat El Hajj, American University of Beirut, YuXin Guo, ETH Zurich, Onur Mutlu, ETH Zurich
SpecCheck: A Tool for Systematic Identification of Vulnerable Transient Execution in gem5, Zack McKevitt, University of Colorado Boulder, Ashutosh Trivedi, University of Colorado Boulder, Tamara Silbergleit Lehman, University of Colorado Boulder
TSUNAMI: a GPU implementation of the WFA algorithm, Giulia Gerometta, Politecnico di Milano, Alberto Zeni, Politecnico di Milano, Marco D. Santambrogio, Politecnico di Milano, Italy
UWOmp𝑝𝑟𝑜: UWOmp++ with Point-to-Point Synchronization, Reduction and Schedules, Aditya Agrawal, IIT Madras, V. Krishna Nandivada, IIT Madras
Virtual PIM: Resource-aware Dynamic DPU Allocation and Workload Scheduling Framework on Multi-DPU PIM Architecture, Donghyeon Kim, Hanyang University, Taehoon Kim, Hanyang University, Inyong Hwang, Yonsei University, Taehyeong Park, Yonsei University, Hanjun Kim, Yonsei University, Youngsok Kim, Yonsei University, Yongjun Park, Yonsei University
mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis, Alexander Brauckmann, University of Edinburgh, Elizabeth Polgreen, University of Edinburgh, Tobias Grosser, University of Edinburgh, Michael O’Boyle, University of Edinburgh
A CPU-FPGA Holistic Source-To-Source Compilation Approach for Partitioning and Optimizing C/C++ Applications, Tiago Santos, Faculty of Engineering, University of Porto, João M. P. Cardoso, Faculty of Engineering, University of Porto, João Bispo, Faculty of Engineering, University of Porto
Dynamic Allocation of Processor Cores to Graph Applications on Commodity Servers, Lucia Pons, Universitat Politècnica de València, Julio Sahuquillo, Universitat Politècnica de València, Timothy M. Jones, University of Cambridge
Breaking the Complexity Barrier: Enhancing Quality of Service in Simultaneous Multithreading Processors, Gürhan Küçük, Yeditepe University, Onur Demir, Yeditepe University, Sercan Sari, Yeditepe University, Yiğit Bilgin, Yeditepe University, Uğur Nezir, Yeditepe University, Mehmet Erdem Çakır, Yeditepe University
QeiHaN: An Energy-Efficient DNN Accelerator that Leverages Logarithmic Quantization in Near-Data Processing Architectures, Bahareh Khabbazan, Polytechnic University of Catalonia, Barcelona Tech (UPC), Marc Riera Villanueva, Polytechnic University of Catalonia, Antonio Gonzalez, Polytechnic University of Catalonia
Quickloop: An efficient, FPGA-accelerated exploration of parameterized DNN accelerators, Tayyeb Mahmood, Incheon National University, Kashif Inayat, Barcelona Supercomputing Center, Jaeyong Chung, Incheon National University
Retargeting Applications for Heterogeneous Systems with the Tribble Source-to-Source Framework, Luís Miguel Sousa, Faculty of Engineering, University of Porto / INESC TEC, Nuno Paulino, Faculty of Engineering, University of Porto / INESC TEC, João Bispo, Faculty of Engineering, University of Porto / INESC TEC
SLIDEX: Sliding Window Extension for Image Processing, Raúl Taranco, Polytechnic University of Catalonia, Jose Maria Arnau, Polytechnic University of Catalonia, Antonio Gonzalez, Polytechnic University of Catalonia
Thread-to-Core Allocation in ARM Processors Building Synergistic Pairs, Marta Navarro, Universitat Politècnica de València, Josué Feliu Pérez, Universitat Politècnica de València, Salvador Petit, Universitat Politècnica de València, Maria E. Gómez, Universitat Politècnica de València, Victor Lixin, HiSilicon, Julio Sahuquillo, Universitat Politècnica de València
SparseFT: Sparsity-aware Fault Tolerance for Reliable CNN Inference on GPUs, GwangeunByeon, Sungkyunkwan University, Seungtae Lee, Sungkyunkwan University, Seongwook Kim, Sungkyunkwan university, Yongjun Kim, Sungkyunkwan University, Prashant J. Nair, University of British Columbia, Seokin Hong, Sungkyunkwan University
Registration:
Conference Papers:
Conference: October 21–25, 2023