About Me

I am currently a Ph.D. candidate under the supervision of Prof. Tianwei Zhang at S-Lab, College of Computing and Data Science of Nanyang Technological University, Singapore. Before that, I received my M.Sc. degree in Electrical Engineering from National University of Singapore in 2022 and my B.Eng. degree in Information Engineering from Zhejiang University in 2020.
My CV can be found here.

Research Interests

  • Distributed Training
  • Systems for Graph Learning
  • Machine Learning for Systems

Publications

TorchGT: A Holistic System for Large-scale Graph Transformer Training
Meng Zhang*, Jie Sun*, Qinghao Hu, Peng Sun, Zeke Wang, Yonggang Wen, Tianwei Zhang
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024  
[Paper]

Sylvie: 3D-adaptive and Universal System for Large-scale Graph Neural Network Training
Meng Zhang, Qinghao Hu, Cheng Wan, Haozhao Wang, Peng Sun, Yonggang Wen, Tianwei Zhang
IEEE International Conference on Data Engineering (ICDE), 2024
[Paper] [Code]

Characterization of Large Language Model Development in the Datacenter
Qinghao Hu*, Zhisheng Ye*, Zerui Wang*, Guoteng Wang, Meng Zhang, Qiaoling Chen, Peng Sun, Dahua Lin, Xiaolin Wang, Yingwei Luo, Yonggang Wen, Tianwei Zhang
USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2024
[Paper]

FedDSE: Distribution-aware Sub-model Extraction for Federated Learning over Resource-constrained Devices
Haozhao Wang, Yabo Jia, Meng Zhang, Qinghao Hu, Hao Ren, Peng Sun, Yonggang Wen, Tianwei Zhang
The Web Conference (WWW), 2024
[Paper]

Lucid: A non-intrusive, scalable and interpretable scheduler for deep learning training jobs
Qinghao Hu*, Meng Zhang*, Peng Sun, Yonggang Wen, and Tianwei Zhang
Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023
Distinguished Paper Award
[Paper] [Code]

Hydro: Surrogate-based Hyperparameter Tuning Service in Datacenters
Qinghao Hu, Zhisheng Ye, Meng Zhang, Qiaoling Chen, Peng Sun, Yonggang Wen, Tianwei Zhang
USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2023
[Paper] [Code]

Preprint

Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication
Meng Zhang, Qinghao Hu, Peng Sun, Yonggang Wen, Tianwei Zhang
arXiv, 2023
[Paper]

* denotes Equal Contribution

Experiences

sh ai lab

System Research Intern | NDS Group @ Shanghai AI Lab
Jun 2023 - present

sh ai lab

Research Intern | Tencent JARVIS Lab
Oct 2020 - Feb 2021

sh ai lab

Research Intern | Singapore University of Technology and Design
Advisor: Prof. Simon Perrault
Jul 2019 - Sept 2021

Professional Services

[EuroSys 2023] Shadow Committee Member
[MLSys 2023] AE Committee Member
[OSDI 2023] Presenter & AE Committee Member