Ammar Ahmad Awan
Publications by Year
Research Areas
Advanced Neural Network Applications, Parallel Computing and Optimization Techniques, Advanced Data Storage Technologies, Stochastic Gradient Optimization Techniques, Distributed and Parallel Computing Systems
Most-Cited Works
- → DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale(2022)208 cited
- → Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone(2024)147 cited
- → S-Caffe(2017)134 cited
- → An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures(2017)75 cited
- → DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale(2022)55 cited
- → Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning(2016)51 cited
- → Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation(2019)47 cited
- → GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training(2020)45 cited
- → OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training(2018)37 cited
- → NV-group(2020)37 cited