Program

Workshop Date : May 19th 2022
Registration link
Event clock

All times are in Eastern Daylight Time (EDT)

8:30 - 8:40 Welcome from the organizers

Session 1

8:40 - 9:25 Scalable Automated Design and Development of Deep Neural Networks for Scientific and Engineering Applications

Invited talk by Dr. Prasanna Balaprakash, Director at Oak Ridge National Laboratory.

Abstract :

Deep learning is becoming crucial for tackling the increasing modeling complexity of scientific and engineering applications. However, designing high-performing deep neural network (DNN) models can be a challenging and time-consuming task that requires expertise. To address this challenge, we have developed DeepHyper [1], a software package that automates the design and development of DNN models for scientific and engineering applications through scalable neural architecture and hyperparameter search. Our approach emphasizes deep learning over parallel and distributed infrastructures, enabling us to efficiently design and train DNNs for a wide range of scientific applications. In this talk, we will present our recent work on using DeepHyper to automatically generate an ensemble of DNNs at scale and using them to estimate data (aleatoric) and model (epistemic) uncertainties. Our approach enables us to leverage the power of parallel and distributed infrastructures to scale the training of DNNs and improve their performance, while reducing the time and expertise required for manual architecture design and hyperparameter tuning.

[1] DeepHyper: https://deephyper.readthedocs.io/en/latest/

Speaker Bio :

Prasanna Balaprakash is the Director of AI Programs at Oak Ridge National Laboratory, where he leads the development and application of AI and machine learning (ML) solutions to solve problems of national importance. His research interests span artificial intelligence, machine learning, optimization, and high-performance computing. He is a recipient of the U.S. Department of Energy's 2018 Early Career Award. Prior to joining Oak Ridge, He was a R&D lead and computer scientist at Argonne National Laboratory. He earned his Ph.D. from CoDE-IRIDIA at the Université Libre de Bruxelles in Brussels, Belgium, where he was a recipient of the European Commission's Marie Curie and Belgian F.R.S-FNRS Aspirant fellowships.

9:25 - 9:55 Can hierarchical client clustering mitigate the data heterogeneity effect in federated learning?

Seungjun Lee, Miri Yu, Daegun Yoon, Sangyoon Oh (Ajou University, South Korea)

9:55 - 10:30 Break

Session 2

10:30 - 11:00 Scaling up Deep Learning: Efficiency in AI

Invited talk by Dr. Angela Dalton, Director at AMD Research

Abstract :

In recent years, the usage of deep learning has surged. We are still in the early stages of AI proliferation, but the energy consumption for even a single training run of today’s large language models is already significant. Moreover, inference is expected to be the largest energy consumer, accounting for over 70% of the total energy for a model. Growth in energy consumption for AI models is unsustainable, and unless we can find a way to drastically increase the world's energy production, we must make computing more efficient. This talk will explore the challenges of scaling deep learning in an energy-efficient way and discuss potential research directions.

Speaker Bio :

Dr. Angela Dalton is a Director at AMD Research where she leads a portfolio of research and advanced development efforts on innovative new hardware and software technologies for high performance computing, artificial intelligence, and advanced computing capabilities. She manages AMD’s External Research Office, which fosters relationships with universities to drive innovation and education. Prior to joining AMD in November 2020, Angela spent twelve years at the Johns Hopkins University Applied Physics Laboratory (JHU/APL), where she was both the technical leader for strategic DOD projects and deputy director of a branch of ~250 scientists and engineers across four technical groups working to develop and advance capabilities that assure mission critical communications for US Government sponsors.

Angela has a B.S. in computer engineering from Virginia Tech as well as an M.S. and a Ph.D. in computer science from Duke University. She was an instructor and postdoctoral fellow at the University of Texas at Austin before joining JHU/APL. 

11:00 - 11:30 Skip Connections in Spiking Neural Networks: An Analysis of Their Effect on Network Training

Hadjer Benmeziane (Univ. Polytechnique Hauts-de-France, France), Amine Ziad Ounnoughene (Belmihoub Abd El Rahmane High School, Algeria), Imane Hamzaoui (Ecole Nationale Supérieure d'Informatique, Algeria), Younes Bouhadjar (Peter Gr ̈unberg Institute, Germany)

11:30 - 12:00 Ray-based Elastic Distributed Data Parallel Framework with Distributed Data Cache

Haoran Lin (Shandong University, China)

12:00 - 13:30 Break

Session 3

13:30 - 14:15 On Distributed Training of Foundation Models: Challenges and Observations

Invited talk by Dr. Supriyo Chakraborty, Senior Research Scientist at IBM Research

Abstract :

We are currently at an inflection point in the ongoing AI revolution. The emergence of highly parallelizable transformer-based neural architectures, along with self-supervised learning, has made it possible to use widely available unlabeled datasets to train large foundation models. These models have shown remarkable performance across various benchmarks and continue to exhibit new and improved emergent properties as we scale across parameter and data sizes.

Over a year ago, our team at IBM Research embarked on a mission to perform distributed training of these large-scale foundation models. We aimed to do so on cost-effective commodity hardware that is cloud-native. In this presentation, we will discuss the challenges we faced - from selecting model architectures and hyper-parameters to parallelization choices for distributed training at scale - and the valuable lessons we learned.

Speaker Bio :

Dr. Supriyo Chakraborty is a senior research scientist working with the Distributed Training group at IBM Research. In this role, he is responsible for designing efficient architectures and training mechanisms for large foundation models. Dr. Chakraborty has extensive experience in Information Theoretic Privacy and Adversarial Machine Learning, and he has authored more than 50 papers in top-tier international conferences and journals. He holds over 20 patents and has received several prestigious awards, including the Qualcomm Innovation Fellowship Award, multiple Outstanding Technical Achievement Awards, and a Master Inventor Award at IBM. His research has been recognized with a Best Demonstration Award and a Best Paper Honorable Mention Award. Dr. Chakraborty is a Senior Member of the IEEE.

14:15 - 14:45 A Parallel Machine Learning Workflow for Neutron Scattering Data Analysis

Tianle Wang (Brookhaven National Laboratory, USA), Sudip Seal (Oak Ridge National Laboratory, USA), Ramakrishnan Kannan (Oak Ridge National Laboratory, USA), Cristina Garcia-Cardona (Los Alamos National Laboratory, USA), Thomas Proffen (Oak Ridge National Laboratory, USA), Shantenu Jha (Brookhaven National Laboratory, USA)

14:45 - 15:15 Closing Remarks