Home
Bio
My name is Ge Chunjiang. I am currently a fifth year PhD candidate in Tsinghua University. My research interest lies in Computer Vision and Multimodal Foundation Models, and the ultimate goals are towards enabling machine learning models to understand the open world, interact with the open world. I am now a Ph.D candidate of Department of Automation, Tsinghua University, advised by Prof. Gao huang. Before coming to Department of Automation, I received B.S. in Department of Physics, Tsinghua University. I am actively seeking postdoctoral positions and industrial opportunities.
我是葛春江,目前是清华大学五年级的博士生。我兴趣是计算机视觉和多模态基础模型,最终目标是使机器学习模型可以理解开放世界,并和开放世界交互。我目前就读于清华大学自动化系。我在清华大学物理系获得数理基础科学学位。我在寻求博士后和工业界的机会。
My research interests include:
- Multimodal Foundation Models. I think the foundation models should build on physical world. Hence, to understand the real world, vision is essential. My work focuses on improving the foundation’s visual capabilities, e.g., on high resolution images (ConvLLaVA, LLaVA-UHD), long videos. I also interested in integrate visual understanding and generation.
- Computer Vision Architectures. My research interests include building efficient and effective vision architectures, e.g., convolution neural networks, self-attention, linear attention, to accelerate inference speed and reduce model size. My work includes integration of self-attention and convolution (ACMix), efficient linear attention (MILA).
- Learning Robust and Generalizable Representations. I am interested in learning robust and generalizable representations through, e.g., domain adaptation (DAPrompt), causal prediction (SEAD). Besides, making models’ behavior more controllable is also my interest (D3PO).
If you’re interested in my work or personal development, feel free to contact me. I can arrange 30 minutes per week to communicate with you. You can contact me by email.
我每周可以安排30分钟的时间和同学们交流,可以给我发邮件联系。如果你有微信,可以 微信 联系。
News
- [2024/05] I am excited to announce that our project and paper, ConvLLaVA, has been released. We employ a hierarchical backbone for High resolution understanding, which is efficient and effective. Welcome cooperation!
- [2023/06] I establish a github repo for collecting papers on foundation models. Welcome pull requests and collaboration。
Selected Publications
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Chunjiang Ge, Sijie Cheng, Ziming Wang, Jiale Yuan, Yuan Gao, Jun Song, Shiji Song, Gao Huang, Bo Zheng
TL; DR: We propose to employ a five stage ConvNeXt as the visual encoder of LMM to compress visual tokens, greatly improves performance on high resolution benchmarks and efficiency.
Domain Adaptation via Prompt Learning
Chunjiang Ge, Rui Huang, Mixue Xie, Zihang Lai, Shiji Song, Shuang Li, Gao Huang.
TL; DR: We propose a novel domain adaptation method, DAPrompt, which learns a set of domain-specific prompts to avoid information loss resulted from domain alignment.
On the Integration of Self-Attention and Convolution
Pan, Xuran, Chunjiang Ge, Rui Lu, Shiji Song, Guanfu Chen, Zeyi Huang, and Gao Huang.
TL; DR: We propose an operator, ACMix, which integrates convolution and self-attention with most compute sharing.