About Me

I am currently a Ph.D. candidate under the supervision of Prof. Tianwei Zhang at S-Lab, College of Computing and Data Science of Nanyang Technological University, Singapore. Before that, I received my M.Sc. degree in Electrical Engineering from National University of Singapore in 2022 and my B.Eng. degree in Information Engineering from Zhejiang University in 2020.

Research Interests

  • Distributed Training
  • Systems for Graph Learning
  • Machine Learning for Systems

Publications

TorchGT: A Holistic System for Large-scale Graph Transformer Training
Meng Zhang*, Jie Sun*, Qinghao Hu, Peng Sun, Zeke Wang, Yonggang Wen, Tianwei Zhang
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024  
[Paper]

Sylvie: 3D-adaptive and Universal System for Large-scale Graph Neural Network Training
Meng Zhang, Qinghao Hu, Cheng Wan, Haozhao Wang, Peng Sun, Yonggang Wen, Tianwei Zhang
IEEE International Conference on Data Engineering (ICDE), 2024
[Paper] [Code]

Characterization of Large Language Model Development in the Datacenter
Qinghao Hu*, Zhisheng Ye*, Zerui Wang*, Guoteng Wang, Meng Zhang, Qiaoling Chen, Peng Sun, Dahua Lin, Xiaolin Wang, Yingwei Luo, Yonggang Wen, Tianwei Zhang
USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2024
[Paper]

FedDSE: Distribution-aware Sub-model Extraction for Federated Learning over Resource-constrained Devices
Haozhao Wang, Yabo Jia, Meng Zhang, Qinghao Hu, Hao Ren, Peng Sun, Yonggang Wen, Tianwei Zhang
The Web Conference (WWW), 2024
[Paper]

Lucid: A non-intrusive, scalable and interpretable scheduler for deep learning training jobs
Qinghao Hu*, Meng Zhang*, Peng Sun, Yonggang Wen, and Tianwei Zhang
Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023
Distinguished Paper Award
[Paper] [Code]

Hydro: Surrogate-based Hyperparameter Tuning Service in Datacenters
Qinghao Hu, Zhisheng Ye, Meng Zhang, Qiaoling Chen, Peng Sun, Yonggang Wen, Tianwei Zhang
USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2023
[Paper] [Code]

Preprint

Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication
Meng Zhang, Qinghao Hu, Peng Sun, Yonggang Wen, Tianwei Zhang
arXiv, 2023
[Paper]

* denotes Equal Contribution

Experiences

sh ai lab

System Research Intern | NDS Group @ Shanghai AI Lab
Jun 2023 - present

sh ai lab

Research Intern | Tencent JARVIS Lab
Oct 2020 - Feb 2021

sh ai lab

Research Intern | Singapore University of Technology and Design
Advisor: Prof. Simon Perrault
Jul 2019 - Sept 2021

Professional Services

[EuroSys 2023] Shadow Committee Member
[MLSys 2023] AE Committee Member
[OSDI 2023] Presenter & AE Committee Member