Residential College | false |
Status | 已發表Published |
InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference with Spatio-Temporal Sharing | |
Han, Ziyi1; Zhou, Ruiting2; Xu, Chengzhong3; Zeng, Yifan1; Zhang, Renli1 | |
2024-10 | |
Source Publication | IEEE Transactions on Parallel and Distributed Systems |
ISSN | 1045-9219 |
Volume | 35Issue:10Pages:1735-1748 |
Abstract | As the applications of AI proliferate, it is critical to increase the throughput of online DNN inference services. Multi-process service (MPS) improves the utilization rate of GPU resources by spatial-sharing, but it also brings unique challenges. First, interference between co-located DNN models deployed on the same GPU must be accurately modeled. Second, inference tasks arrive dynamically online, and each task needs to be served within a bounded time to meet the service-level objective (SLO). Third, the problem of fragments has become more serious. To address the above three challenges, we propose an Intelligent Scheduling orchestrator for multi-GPU inference servers with spatio-temporal Sharing (InSS), aiming to maximize the system throughput. InSS exploits two key innovations: i) An interference-aware latency analytical model which estimates the task latency. ii) A two-stage intelligent scheduler is tailored to jointly optimize the model placement, GPU resource allocation and adaptively decides batch size by coupling the latency analytical model. Our prototype implementation on four NVIDIA A100 GPUs shows that InSS can improve the throughput by up to 86% compared to the state-of-the-art GPU schedulers, while satisfying SLOs. We further show the scalability of InSS on 64 GPUs. |
Keyword | Dnn Inference Gpu Resource Management Online Scheduling |
DOI | 10.1109/TPDS.2024.3430063 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Computer Science ; Engineering |
WOS Subject | Computer Science, Theory & Methods ; Engineering, Electrical & Electronic |
WOS ID | WOS:001288220700002 |
Publisher | IEEE COMPUTER SOC, 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1314 |
Scopus ID | 2-s2.0-85199038778 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Zhou, Ruiting |
Affiliation | 1.School of Cyber Science and Engineering, Wuhan University, Wuhan, China 2.School of Computer Science and Engineering, Southeast University, Nanjing, China 3.Faculty of Science and Technology, University of Macau, Taipa, China |
Recommended Citation GB/T 7714 | Han, Ziyi,Zhou, Ruiting,Xu, Chengzhong,et al. InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference with Spatio-Temporal Sharing[J]. IEEE Transactions on Parallel and Distributed Systems, 2024, 35(10), 1735-1748. |
APA | Han, Ziyi., Zhou, Ruiting., Xu, Chengzhong., Zeng, Yifan., & Zhang, Renli (2024). InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference with Spatio-Temporal Sharing. IEEE Transactions on Parallel and Distributed Systems, 35(10), 1735-1748. |
MLA | Han, Ziyi,et al."InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference with Spatio-Temporal Sharing".IEEE Transactions on Parallel and Distributed Systems 35.10(2024):1735-1748. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment