publications | Lixiang Han

2024

MobiSys ’24

Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs

Lixiang Han, Zimu Zhou , and Zhenjiang Li

In ACM International Conference on Mobile Systems, Applications, and Services , 2024

Abs PDF Code Slides Website

GPUs are increasingly utilized for running DNN tasks on emerging mobile edge devices. Beyond accelerating single task inference, their value is also particularly apparent in efficiently executing multiple DNN tasks, which often have strict latency requirements in applications. Preemption is the main technology to ensure multitasking timeliness, but mobile edges primarily offer two priorities for task queues, and existing methods thus achieve only coarse-grained preemption by categorizing DNNs into real-time and best-effort, permitting a real-time task to preempt best-effort ones. However, the efficacy diminishes significantly when other real-time tasks run concurrently, but this is already common in mobile edge applications. Due to different hardware characteristics, solutions from other platforms are unsuitable. For instance, GPUs on traditional mobile devices primarily assist CPU processing and lack special preemption support, mainly following FIFO in GPU scheduling. Clouds handle concurrent task execution, but focus on allocating one or more GPUs per complex model, whereas on mobile edges, DNNs mainly vie for one GPU. This paper introduces Pantheon, designed to offer fine-grained preemption, enabling real-time tasks to preempt each other and best-effort tasks. Our key observation is that the two-tier GPU stream priorities, while underexplored, are sufficient. Efficient preemption can be realized through software design by innovative scheduling and novel exploitation of the nested redundancy principle for DNN models. Evaluation on a diverse set of DNNs shows substantial improvements in deadline miss rate and accuracy of Pantheon over state-of-the-art methods.
INFOCOM ’24

DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning

Lixiang Han, Zhen Xiao , and Zhenjiang Li

In IEEE International Conference on Computer Communications , 2024

Abs PDF

Abstract—DTMM is a library designed for efficient deployment and execution of machine learning models on weak IoT devices such as microcontroller units (MCUs). The motivation for designing DTMM comes from the emerging field of tiny machine learning (TinyML), which explores extending the reach of machine learning to many low-end IoT devices to achieve ubiquitous intelligence. Due to the weak capability of embedded devices, it is necessary to compress models by pruning enough weights before deploying. Although pruning has been studied extensively on many computing platforms, two key issues with pruning methods are exacerbated on MCUs: models need to be deeply compressed without significantly compromising accuracy, and they should perform efficiently after pruning. Current solutions only achieve one of these objectives, but not both. In this paper, we find that pruned models have great potential for efficient deployment and execution on MCUs. Therefore, we propose DTMM with pruning unit selection, pre-execution pruning optimizations, runtime acceleration, and post-execution low-cost storage to fill the gap for efficient deployment and execution of pruned models. It can be integrated into commercial ML frameworks for practical deployment, and a prototype system has been developed. Extensive experiments on various models show promising gains compared to state-of-the-art methods.
TMC

Finger Tracking Using Wrist-Worn EMG Sensors

Jiani Cao , Yang Liu , Lixiang Han, and 1 more author

IEEE Transactions on Mobile Computing, 2024

Abs PDF

This paper introduces WETrak, a finger tracking system using wrist-worn electromyography (EMG) sensors. Recent finger tracking methods mainly employ EMGs on armbands. Compared to a range of contactless methods using cameras or wireless, they are not limited by high computational costs, privacy concerns, and mobility, while unlike other wearable-based approaches, they do not require the deployment of sensors on the user’s hands. However, users need to wear an additional armband on their forearm each time solely for tracking purpose, which hinders the widespread adoption of finger tracking in practice. This paper investigates the feasibility of moving EMG sensors from the forearm to the wrist for finger tracking. WETrak inherits the advantages of existing EMG-based armband tracking while avoiding the limitation of requiring additional armbands, which brings a strong incentive for integrating EMG sensors into wrist-worn wearables in the future. As sensor placement varies, we find new challenges in determing good locations to place sensors to gather useful information to capture all finger movements and using low-quality signals to still ensure accurate tracking. In this paper, we introduce new, efficient solutions to these problems. We develop a prototype, and the results show that WETrak outperforms the state-of-the-art method and performs consistently well under various settings.