0 citations

Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering

2024pp. 246–257

Citations Over TimeTop 14% of 2024 papers

Cuong Ha, Shima Asaadi, Sanjeev Kumar Karn, Oladimeji Farri, Tobias Heimann, Thomas A. Runkler

Abstract

Vision-language models, while effective in general domains and showing strong performance in diverse multi-modal applications like visual question-answering (VQA), struggle to maintain the same level of effectiveness in more specialized domains, e.g., medical.We propose a medical vision-language model that integrates large vision and language models adapted for the medical domain.This model goes through three stages of parameterefficient training using three separate biomedical and radiology multi-modal visual and text datasets.The proposed model achieves state-ofthe-art performance on the SLAKE 1.0 medical VQA (MedVQA) dataset with an overall accuracy of 87.5% and demonstrates strong performance on another MedVQA dataset, VQA-RAD, achieving an overall accuracy of 73.2%.

Related Papers

→ An analysis of the AskMSR question-answering system(2002)315 cited
Overview of Question-Answering(2002)
A Survey on Question and Answering Systems(2012)
Susquehanna Chorale Spring Concert "Roots and Wings"(2017)
→ ИСПОЛЬЗОВAНИЕ ПОТЕНЦИAЛA СОЦИAЛЬНЫХ ПAРТНЕРОВ В ПОДГОТОВКЕ БУДУЩИХ ПЕДAГОГОВ(2024)