Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling