UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference with Spatio-Temporal Sharing
Han, Ziyi1; Zhou, Ruiting2; Xu, Chengzhong3; Zeng, Yifan1; Zhang, Renli1
2024-10
Source PublicationIEEE Transactions on Parallel and Distributed Systems
ISSN1045-9219
Volume35Issue:10Pages:1735-1748
Abstract

As the applications of AI proliferate, it is critical to increase the throughput of online DNN inference services. Multi-process service (MPS) improves the utilization rate of GPU resources by spatial-sharing, but it also brings unique challenges. First, interference between co-located DNN models deployed on the same GPU must be accurately modeled. Second, inference tasks arrive dynamically online, and each task needs to be served within a bounded time to meet the service-level objective (SLO). Third, the problem of fragments has become more serious. To address the above three challenges, we propose an Intelligent Scheduling orchestrator for multi-GPU inference servers with spatio-temporal Sharing (InSS), aiming to maximize the system throughput. InSS exploits two key innovations: i) An interference-aware latency analytical model which estimates the task latency. ii) A two-stage intelligent scheduler is tailored to jointly optimize the model placement, GPU resource allocation and adaptively decides batch size by coupling the latency analytical model. Our prototype implementation on four NVIDIA A100 GPUs shows that InSS can improve the throughput by up to 86% compared to the state-of-the-art GPU schedulers, while satisfying SLOs. We further show the scalability of InSS on 64 GPUs.

KeywordDnn Inference Gpu Resource Management Online Scheduling
DOI10.1109/TPDS.2024.3430063
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaComputer Science ; Engineering
WOS SubjectComputer Science, Theory & Methods ; Engineering, Electrical & Electronic
WOS IDWOS:001288220700002
PublisherIEEE COMPUTER SOC, 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1314
Scopus ID2-s2.0-85199038778
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionFaculty of Science and Technology
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorZhou, Ruiting
Affiliation1.School of Cyber Science and Engineering, Wuhan University, Wuhan, China
2.School of Computer Science and Engineering, Southeast University, Nanjing, China
3.Faculty of Science and Technology, University of Macau, Taipa, China
Recommended Citation
GB/T 7714
Han, Ziyi,Zhou, Ruiting,Xu, Chengzhong,et al. InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference with Spatio-Temporal Sharing[J]. IEEE Transactions on Parallel and Distributed Systems, 2024, 35(10), 1735-1748.
APA Han, Ziyi., Zhou, Ruiting., Xu, Chengzhong., Zeng, Yifan., & Zhang, Renli (2024). InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference with Spatio-Temporal Sharing. IEEE Transactions on Parallel and Distributed Systems, 35(10), 1735-1748.
MLA Han, Ziyi,et al."InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference with Spatio-Temporal Sharing".IEEE Transactions on Parallel and Distributed Systems 35.10(2024):1735-1748.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Han, Ziyi]'s Articles
[Zhou, Ruiting]'s Articles
[Xu, Chengzhong]'s Articles
Baidu academic
Similar articles in Baidu academic
[Han, Ziyi]'s Articles
[Zhou, Ruiting]'s Articles
[Xu, Chengzhong]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Han, Ziyi]'s Articles
[Zhou, Ruiting]'s Articles
[Xu, Chengzhong]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.