Cover Image Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Yang Shi1,2,*†, Jiaheng Liu3*, Yushuo Guan2*, Zhenhua Wu2, Yuanxing Zhang2, Zihao Wang2,
Weihong Lin2, Jingyun Hua2, Zekun Wang2, Xinlong Chen4, Bohan Zeng1, Wentao Zhang1,
Fuzheng Zhang2, Wenjing Yang, Di Zhang2
1Peking University, 2Kuaishou Technology, 3Nanjing University, 4CASIA
*Equal Contribution Work done during an internship at Kuaishou Technology Corresponding Author