Lixiang Han

I am currently a fourth-year Ph.D. student at the Department of Computer Science, City University of Hong Kong, under the guidance of Prof. Zhenjiang Li. Prior to my Ph.D. studies, I completed my bachelor’s degree in Computer Science and Technology at Xiamen University in 2021.

My research focuses on creating resource-efficient learning systems for mobile edge devices.

news

May 21, 2024	Happy to attend INFOCOM 2024 in Vancouver, Canada.
May 07, 2024	Thrilled to be awarded the MobiSys 2024 Student Travel Grant.
Apr 26, 2024	Pantheon is accepted to `MobiSys` 2024.
Dec 01, 2023	DTMM is accepted to `INFOCOM` 2024.

selected publications

MobiSys ’24

Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs

Lixiang Han, Zimu Zhou , and Zhenjiang Li

In ACM International Conference on Mobile Systems, Applications, and Services , 2024

Abs PDF Code Slides Website

GPUs are increasingly utilized for running DNN tasks on emerging mobile edge devices. Beyond accelerating single task inference, their value is also particularly apparent in efficiently executing multiple DNN tasks, which often have strict latency requirements in applications. Preemption is the main technology to ensure multitasking timeliness, but mobile edges primarily offer two priorities for task queues, and existing methods thus achieve only coarse-grained preemption by categorizing DNNs into real-time and best-effort, permitting a real-time task to preempt best-effort ones. However, the efficacy diminishes significantly when other real-time tasks run concurrently, but this is already common in mobile edge applications. Due to different hardware characteristics, solutions from other platforms are unsuitable. For instance, GPUs on traditional mobile devices primarily assist CPU processing and lack special preemption support, mainly following FIFO in GPU scheduling. Clouds handle concurrent task execution, but focus on allocating one or more GPUs per complex model, whereas on mobile edges, DNNs mainly vie for one GPU. This paper introduces Pantheon, designed to offer fine-grained preemption, enabling real-time tasks to preempt each other and best-effort tasks. Our key observation is that the two-tier GPU stream priorities, while underexplored, are sufficient. Efficient preemption can be realized through software design by innovative scheduling and novel exploitation of the nested redundancy principle for DNN models. Evaluation on a diverse set of DNNs shows substantial improvements in deadline miss rate and accuracy of Pantheon over state-of-the-art methods.
INFOCOM ’24

DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning

Lixiang Han, Zhen Xiao , and Zhenjiang Li

In IEEE International Conference on Computer Communications , 2024

Abs PDF

Abstract—DTMM is a library designed for efficient deployment and execution of machine learning models on weak IoT devices such as microcontroller units (MCUs). The motivation for designing DTMM comes from the emerging field of tiny machine learning (TinyML), which explores extending the reach of machine learning to many low-end IoT devices to achieve ubiquitous intelligence. Due to the weak capability of embedded devices, it is necessary to compress models by pruning enough weights before deploying. Although pruning has been studied extensively on many computing platforms, two key issues with pruning methods are exacerbated on MCUs: models need to be deeply compressed without significantly compromising accuracy, and they should perform efficiently after pruning. Current solutions only achieve one of these objectives, but not both. In this paper, we find that pruned models have great potential for efficient deployment and execution on MCUs. Therefore, we propose DTMM with pruning unit selection, pre-execution pruning optimizations, runtime acceleration, and post-execution low-cost storage to fill the gap for efficient deployment and execution of pruned models. It can be integrated into commercial ML frameworks for practical deployment, and a prototype system has been developed. Extensive experiments on various models show promising gains compared to state-of-the-art methods.

awards

Research Tuition Scholarship, CityU, 2024.

Student Travel Grant, ACM MobiSys, 2024.