Bowen(Brian) Qu - Homepage

Bowen Qu Brian

About Me

👋Hi, I’m Bowen(Brian) Qu. My research interests include Vision-Language Model and MLLM Reasoning. The logo of this website is my lovely cat - Baka (巴卡 in Chinese)!

Education: I’m an MPhil. Candidate at School of Electronic and Computer Science (SECE), Peking University (PKU), since 2022 fall. Previously, I received the Honours B.E. degree at School of Electronic Information and Communications (EIC), Huazhong University of Science and Technology (HUST) in June, 2022.
Experience: Fortunately, I have the honor to participate in some interesting MLLM research projects:
- [2024.05 - 2024.12] 01.ai & Rhymes.ai, Multimodal Team, supervised by Junnan Li, working closely with Dongxu Li and Haoning Wu. Output: Aria (multimodal native MoE)
- [2024.02 - 2024.07] IDEA Research, working closely with Zhengzhuo Xu, Yiyan Qi and Chengjin Xu. Output: ChartMoE (ICLR2025 Oral)
Status: I expect to receive a master’s degree in June, 2025. I’m open to academic collaboration opportunities. Feel free to drop me an 📧 if there is a good fit!

Interests

Vision-Language Model(VLM)
MLLM Reasoning
Al-Generated Image/Video Quality Assesment

Education

Master of Science

Peking University (PKU)
Bachelor of Engineering

Huazhong University of Science and Technology (HUST)

🔥 News

2025.02: 🎉🎉🎉 ChartMoE is selected as ICLR2025 Oral(1.8%)!
2025.01: 🎉🎉 ChartMoE is accepted by ICLR2025!
2024.10: 🎉🎉 We release Aria, a native LMM that excels on text, code, image, video, PDF and more!
2024.09: 💥💥 We release ChartMoE, a MLLM with MoE connector, for advanced chart 1️⃣understanding, 2️⃣replot, 3️⃣editing, 4️⃣highlighting and 5️⃣transformation.
2023.12: 💥 MPP-Qwen-Next is released! Prevent poverty (24GB of VRAM) from limiting imagination. All 7B/14B llava-like training is conducted on RTX3090 GPUs by Pipeline Parallel.

Selected Outputs

🌟 is me. * Equal Contribution (i.e.: Co-First Author). 📧 Corresponding Author.

MoE in Downstream Tasks

ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding

ChartMoE is a multimodal large language model with Mixture-of-Expert connector for advanced chart 1️⃣understanding, 2️⃣replot, 3️⃣editing, 4️⃣highlighting and 5️⃣transformation.

Jan 22, 2025

Multimodal Native MoE

Aria: An Open Multimodal Native Mixture-of-Experts Model

Aria is a multimodal native MoE model. It features：1️⃣State-of-the-art performance on various multimodal and language tasks, superior in video and document understanding; 2️⃣Long multimodal context window of 64K tokens; 3️⃣3.9B activated parameters per token, enabling fast inference speed and low fine-tuning cost.

Oct 5, 2024

Core-Authored Publications

🌟 is me. * Equal Contribution (i.e.: Co-First Author). 📧 Corresponding Author.

Zhengzuo Xu*, 🌟Bowen Qu*, Yiyan Qi*, Sinan Du, Chengjin Xu, Chun Yuan📧, Jian Guo📧 (2025). ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding. ICLR2025 Oral (1.8%).

PDF Cite Code Dataset Slides Source Document

Dongxu Li, Yudong Liu, Haoning Wu, Yue Wang, Zhiqi Shen, 🌟Bowen Qu, Xinyao Niu, Fan Zhou, Chengen Huang, Yanpeng Li, Chongyan Zhu, Xiaoyi Ren, Chao Li, Yifan Ye, Peng Liu, Lihuan Zhang, Hanshu Yan, Guoyin Wang, Bei Chen, Junnan Li📧 (2024). Aria: An Open Multimodal Native Mixture-of-Experts Model. Technical Report.

PDF Cite Code Dataset Slides Source Document

🌟Bowen Qu, Xiaoyu Liang, Shangkun Sun, Wei Gao📧 (2024). Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap. CVPRW2024.

PDF Cite Code Poster

🌟Bowen Qu*, Haohui Li*, Wei Gao📧 (2024). Bringing Textual Prompt to AI-Generated Image Quality Assessment. ICME2024.

PDF Cite Code Poster

Selected Projects

😺 I enjoy open-sourcing. Here are a selection of projects that I’ve led or served as the core contributor.