Skip to the content.

ASPLOS’25 Tutorial - AIBrix: An Open Source Large-Scale LLM Inference infrastructure for System Research

Abstract

The rapid advancements in large language model (LLM) inference have spurred numerous innovations. Our findings suggest that for LLMs to be effectively deployed in production environments, optimization at the engine level alone is insufficient. Successful production deployment demands a holistic approach that integrates optimizations across three key layers: the model layer, the engine layer, and the system layer.

AIBrix is an open-source system-level solution designed to address the complexities of LLM inference in production environments. It provides a seamless platform to transform large models into scalable APIs, focusing on critical system aspects such as LLM specific autoscaling strategies, model locality scheduling, cost-efficient management of heterogeneous hardware, and efficient online and offline request colocation. AIBrix facilitates cutting-edge research in large-scale LLM inference by offering researchers a flexible framework to explore system-level challenges, accelerating innovation in areas that go beyond engine optimization. Some popular paper ideas like OSDI’24 ServerlessLLM, ASPLOS’24 QLM, Preble, have been integrated into AIBrix for benchmarking.

Location & Time

Venue Map & Directions

Tentative Schedule

Note: All the slides will be made available shortly before the tutorail.

Organizer

   
Jiaxin Shan Jiaxin Shan, Software Engineer in Bytedance Infrastructure Research Lab. He received a MS degree from University of Pittsburgh. His research interests focus on ML Infra and Serverless systems. He is a co-chair in Kubernetes WG-Serving and Kubeflow community.
Le Xu Le Xu is a Researcher in Bytedance. She got her Ph.D. degree from UIUC, advised by Professor Indranil Gupta. Her research focuses on distributed systems, streaming systems and AI systems. She has authored several publications in top-tier conferences, including NSDI, SoCC, and EuroSys.
Shuowei Jin Shuowei Jin is a fifth-year PhD candidate at the University of Michigan, advised by Professor Morley Mao. His research focuses on enhancing LLM inference efficiency through algorithm and systems codesign.
Rong Kang Rong Kang is a Research Engineer at ByteDance. He obtained his Ph.D. degree from Tsinghua University. Passionate about the synergy between AI and systems, his academic and engineering interests are centered around AI for DB, DB for AI, and LLM Serving.
Linhui Xu Linhui Xu, ByteDance Research Engineer. Master from Institute of Computing Technology Chinese Academy of Sciences. Interested in AI for DB, LLM Acceleration.
Liguang Xie Liguang Xie is the manager of a research team in the computer area at Bytedance. He received a Ph.D. degree in computer engineering from Virginia Tech. His research interests include optimization and algorithm design for wireless networks, LLM inference System.

Contact us

For any further questions, please contact Jiaxin Shan or connect on LinkedIn.