Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering
Citations Over TimeTop 14% of 2024 papers
Abstract
Vision-language models, while effective in general domains and showing strong performance in diverse multi-modal applications like visual question-answering (VQA), struggle to maintain the same level of effectiveness in more specialized domains, e.g., medical.We propose a medical vision-language model that integrates large vision and language models adapted for the medical domain.This model goes through three stages of parameterefficient training using three separate biomedical and radiology multi-modal visual and text datasets.The proposed model achieves state-ofthe-art performance on the SLAKE 1.0 medical VQA (MedVQA) dataset with an overall accuracy of 87.5% and demonstrates strong performance on another MedVQA dataset, VQA-RAD, achieving an overall accuracy of 73.2%.
Related Papers
- → An analysis of the AskMSR question-answering system(2002)315 cited
- Overview of Question-Answering(2002)
- A Survey on Question and Answering Systems(2012)
- Susquehanna Chorale Spring Concert "Roots and Wings"(2017)
- → ИСПОЛЬЗОВAНИЕ ПОТЕНЦИAЛA СОЦИAЛЬНЫХ ПAРТНЕРОВ В ПОДГОТОВКЕ БУДУЩИХ ПЕДAГОГОВ(2024)