An Approach to Complex Visual Data Interpretation with Vision-Language Models
Published at The 1st Large Vision - Language Model Learning and Applications Workshop, ACCV 2024
December 8, 2024
Our research adapted the MMMU benchmarks and utilized prompt engineering with a voting-based ensemble method to enhance Large Vision-Language Models' performance on complex visual data interpretation, achieving a top score of 0.85 in the LAVA Workshop 2024 challenge.