I am an Associate Professor of Computer Science at Renmin University of China. My research interest is in high-performance data systems for AI and big data analytics, particularly in cloud-native databases/data lakehouses, AI-native storage systems, and hardware-aware data processing. Before joining RUC as a faculty member, I worked at EPFL DIAS Lab as a Postdoctoral Researcher and at Tencent as a Senior Database Kernel Engineer.
I am actively looking for motivated students to work on Big Data Systems and Data for AI. 欢迎对大数据系统和AI数据系统感兴趣的优秀本科生来研究组实习和科研早陪,欢迎有意攻读硕士、博士研究生的同学和我联系。 Feel free to email me.
Research
My research focuses on building efficient data systems that bridge the gap between real-world demands and system capabilities. Key areas include:
- Cloud-native Data Lakes — disaggregated storage and compute architectures, serverless query processing, and cost-efficient elastic analytics.
- AI-native (multimodal) Data Systems — novel storage systems designed for AI workloads, enabling efficient data access for machine learning and analytics.
- Hardware-aware data processing — data systems optimized for modern hardware including new storage devices, accelerators, and heterogeneous architectures.
- Automated Database Diagnosis and Optimization — fine-grained, non-intrusive performance diagnosis and tuning for database systems.
Open Source
Pixels — An efficient storage and compute engine for both on-prem and cloud-native data analytics. Pixels features an optimized columnar storage format, serverless query acceleration using cloud functions, natural-language-aided data analytics, and flexible pricing with service-level guarantees. Key columnar storage techniques from Pixels are incorporated into China's National Standard (GB/T 41818-2022) on analytical data storage. Query execution technologies have been adopted by major cloud vendors for database product prototyping.
Experience
- Associate Professor, School of Information, Renmin University of China, 2025 – Present
- Assistant Professor, School of Information, Renmin University of China, 2023 – 2025
- Postdoctoral Researcher, DIAS Lab, EPFL, 2020 – 2023
- Senior Database Kernel Engineer, TDSQL, Tencent, 2018 – 2020
- Ph.D. in Computer Science, Renmin University of China, 2012 – 2018
- Visiting Ph.D. Student, The Ohio State University, 2015 – 2016
- Research Intern, Systems & Algorithms Group, Microsoft Research Asia, 2014 – 2015
Students
Student Competition Achievements:
- National Runner-up (2024 & 2025) in the National College Student Computer System Capability Competition — PolarDB Track
- National Third Prize (2025) in the Database Kernel Track
- 2024 Beijing Outstanding Undergraduate Thesis (1 of 4 in STEM across RUC)
Teaching
- Introduction to Computer System II (ICS2) — Spring, 2024 and later
- Practical Database Development — Fall, 2024–2026
- Open Source Software Practice — Fall, 2026
Publications
*corresponding, ≈equal contribution
-
Demonstrating DBdoctor: A Fine-grained and Non-intrusive Performance Diagnosis Platform for Databases [Demo]
Proceedings of the 2026 ACM International Conference on Management of Data (SIGMOD'26). -
DBdoctor: A Fine-grained and Non-intrusive Performance Diagnosis Platform for Databases
Proceedings of the 42nd IEEE International Conference on Data Engineering (ICDE'26). -
PixelsDB: Serverless and NL-Aided Data Analytics with Flexible Service Levels and Prices [Demo]
Proceedings of the 41st IEEE International Conference on Data Engineering (ICDE'25). -
Serverless Query Processing with Flexible Performance SLAs and Prices
arXiv preprint arXiv:2409.01388 -
Using Cloud Functions as Accelerator for Elastic Data Analytics
Proceedings of the 2023 ACM International Conference on Management of Data (SIGMOD'23). -
Pixels: An Efficient Column Store for Cloud Data Lakes
Proceedings of the 38th IEEE International Conference on Data Engineering (ICDE'22). -
Columnar Storage Optimization and Caching for Data Lakes
Proceedings of the 25th International Conference on Extending Database Technology (EDBT'22). -
Storage Management in Smart Data Lake
1st International Workshop on Data Analytics and Machine Learning Made Simple (SIMPLIFY'21). -
Pixels: Multiversion Wide Table Store for Data Lakes [Abstract]
10th Annual Conference on Innovative Data Systems Research (CIDR'20). -
HDFS存储和优化技术研究综述 (Survey on Storage and Optimization Techniques of HDFS)
Journal of Software, 2020, 31(1): 137-161. -
Rainbow: Adaptive Layout Optimization for Wide Tables [Demo]
Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE'18). -
Wide Table Layout Optimization based on Column Ordering and Duplication
Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD'17). -
A Fast Data Ingestion and Indexing Scheme for Real-Time Log Analytics
The 17th Asia-Pacific Web Conference (APWeb'15). -
A Study of SQL-on-Hadoop Systems
The 5th Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware (BPOE'14). -
Spark上的等值连接优化 (Equi-join optimization on spark)
Journal of East China Normal University (Natural Sc), 2014, 2014(5): 261-270 -
MetKB: enriching RDF knowledge bases with web entity-attribute tables [Demo]
Proceedings of the 22nd ACM international conference on Information & Knowledge Management (CIKM'13). -
Efficient SPARQL query evaluation in a database cluster
2013 IEEE International Congress on Big Data (BigData Congress'13).
Awards
- China Patent Gold Award (24th, First Inventor) — the first Gold Award in the database field. Patent on database transaction processing, applied in Tencent TDSQL.
- National & Beijing Overseas Talent Programs — selected for national-level and Beijing municipal-level overseas talent recruitment programs.
- Outstanding Advisor & Special Contribution Award — 2024–2025 National Computer System Capability Competition for College Students.
- Beijing Outstanding Undergraduate Thesis Advisor, 2024.