CVAMD 2025: 3rd Workshop on Computer Vision for Automated Medical Diagnosis

About The Workshop

The striding advances of computer vision techniques are revolutionizing many long-standing automatic medical diagnosis tasks. Emerging trends—such as Large Language Models (LLMs), Foundation Models (FMs), advanced learning paradigms (e.g., un-/semi-/self-supervised learning), and considerations of fairness and generalization—remain underexplored for secure and reliable automated medical diagnosis. Distinctly, this workshop emphasizes integrating insights from clinicians and radiologists alongside technical discussions to better advance the field.

Resources

Useful links and materials for CVAMD 2025 participants

Workshop Proceedings: View Online
Oral Slides: Access Here
Posters: Access Here
Best Paper Award (Long Paper Track): MedBLINK: Probing Basic Perception in Multimodal Language Models for Medicine
Best Student Paper Award (Long Paper Track): Automated Assessment of Aesthetic Outcomes in Facial Plastic Surgery
Best Paper Award (Highlight Track): A Deep Learning System for Rapid and Accurate Warning of Acute Aortic Syndrome on Non-contrast CT in China
Best Student Paper Award (Highlight Track): VoxelPrompt: A Vision Agent for End-to-End Medical Image Analysis

Call for Papers

We welcome three types of submissions for oral or poster presentation at the workshop:

📄 Long Papers – Up to 8 pages (excluding references). Accepted long papers will be published in the ICCV 2025 Workshop Proceedings.
📝 Extended Abstracts – Up to 4 pages (excluding references), non-archived. Great for sharing work-in-progress or previously published work.
🌟 Highlights – A concise 1-page summary, non-archived. Perfect for showcasing recently accepted or notable papers and boosting visibility and engagement.

🧠 Topics of Interest

🔥 Foundation Models (FMs) and Large Language Models (LLMs) in healthcare
🤖 AI Agents for medical decision-making and workflow automation
🧾 Interpretable & trustworthy AI for medical diagnosis
📊 Predicting clinical outcomes from medical image analysis
🔍 Multimodal biomedical image analysis
⚖️ Fairness, robustness, and generalization in medical computer vision
🧠 Embedding medical knowledge in vision systems
🧾 Generating diagnostic reports from medical images
🧭 Clinical reasoning-aware vision system design
🧹 Learning robust representations from noisy annotations
🩺 Advances in disease diagnosis and management with computer vision
🧬 Medical anomaly and out-of-distribution prediction
🖼️ Medical image registration and classification (MRI/CT/PET)
📌 Organ and lesion segmentation/detection
📈 Longitudinal studies with computer vision
♾️ Life-long and active learning in medical vision

🏆 Review & Selection: All submissions will be evaluated by the program committee based on relevance and quality. Oral and poster presentations will be selected accordingly.

📚 Note: Accepted long papers will be published alongside the ICCV 2025 proceedings.

📢 Submission Guidelines

📘 Long Paper Track (4-8 pages, with Proceedings)

Submit your original research (4–8 pages, excluding references) in the ICCV 2025 format. Accepted papers will be published in the official ICCV workshop proceedings.
Please ensure your submission adheres to the ICCV 2025 Dual Submission Policy. Work must be sufficiently original and not under review elsewhere.
📝 Extended Abstracts (2–4 pages, non-archived)

Ideal for work-in-progress or previously published studies relevant to the workshop. Submissions must be 2–4 pages (excluding references).
These abstracts will not be included in the official proceedings, making them suitable for showcasing ongoing work or gaining community feedback. Please verify the double submission policy of your target venue if you plan to submit elsewhere later.
🌟 Highlights Track (1-page summary + external link)

Showcase your recently accepted or notable work (e.g., from NeurIPS, ICLR, CVPR). Simply provide a concise 1-page summary with a link to the full paper.
This track is lightly curated—no full review process—and is a great opportunity to increase visibility and spark discussion.

All papers can be submitted through OpenReview.

🏆 Awards

🥇 Best Paper Award – One winner per track
🎓 Best Student Paper Award – Recognizing exceptional work led by a student author, one per track
🖼️ Best Poster Award – Honoring the most impactful poster presentation in each track

In addition, each track will feature a selection of oral and poster presentations. The number of slots for each format will be finalized and announced following the review process.

📅 Important Dates

Submission Opens: May 1
Submission Deadline: ~~June 21 23:59 PM~~ → June 30 23:59 PM
Notification of Acceptance: July 10 23:59 PM
Camera-Ready Deadline: August 16 23:59 PM

🕓 All times are in Anywhere on Earth (AoE).

📄 Format

Please use the official ICCV 2025 Submission Template to prepare your manuscript.

Submissions must be in PDF format and anonymous. By submitting a paper, at least one author agrees to present the work if accepted.

🧑‍⚖️ Reviewer Recruitment

We’re actively seeking reviewers to support the community. If you're interested, please sign up via this form.

Invited Speakers

Sheng Liu

Stanford University

Akshay Chaudhari

Stanford

Faisal Mahmood

Harvard

Lei Xing

Stanford

Zongwei Zhou

JHU

Daguang Xu

NVIDIA

Serena Yeung

Stanford

Adam Yala

UC Berkeley

James Zou

Stanford

Event Schedule

9:00 - 9:10

Welcome remarks and introduction

9:10 - 9:35

Invited Talk 1: Beyond Autopilot: Building the AI Copilot for Healthcare

Sheng Liu (Stanford)

Abstract:

9:35 - 9:50

Oral Session 1

[7] UD-Mamba: A pixel-level uncertainty-driven Mamba model for medical image segmentation
Weiren Zhao, Feng Wang, Yanran Wang, Yutong Xie, Qi Wu, Yuyin Zhou
[20] Advancing Prognosis Prediction Using Spatial Omics-Enriched Histopathology
Tianyi Wang, Ruibang Luo, Zhenqin Wu
[21] PMC-Vid: A Large-Scale Biomedical Video Captioning Dataset
Yosuke Yamagishi, Kuniaki Saito, Atsushi Hashimoto, Yoshitaka Ushiku

9:50 - 10:50

Posters session I and coffee break

Location: Exhall II; 211–246

Presented Posters • Paper IDs ≤ 48

10:50 - 11:15

Invited Talk 2: Pathways for Radiology Foundation Models To Enter the Clinic

Akshay Chaudhari (Stanford)

Abstract:

11:15 - 11:40

Invited Talk 3: Multimodal, Generative and Agentic AI for Pathology

Faisal Mahmood (Harvard)

Abstract:

11:40 - 12:05

Invited Talk 4: Foundations and Applications of AI Foundation Models

Lei Xing (Stanford)

Abstract:

12:05 - 13:05

Lunch break

13:05 - 13:25

Oral Session 2

[32] MK-UNet: Multi-kernel Lightweight CNN for Medical Image Segmentation
Md Mostafijur Rahman, Radu Marculescu
[35] A Deep Learning System for Rapid and Accurate Warning of Acute Aortic Syndrome on Non-contrast CT in China
Yan-Jie Zhou, Yujian Hu, Zhengyao Ding, Le Lu, Minfeng Xu, Hongkun Zhang
[37] VoxelPrompt: A Vision-Language Agent for Grounded Medical Image Analysis
Andrew Hoopes, Victor Ion Butoi, John Guttag, Adrian V Dalca
[39] Comparison of Digital Histology AI Models with Low-Dimensional Genomic and Clinical Models in Survival Modeling for Prostate Cancer
Aidan McLoughlin, Ho Yin HO, Xin Zhao, Alexander Karl Hakanasson, Alireza Moradi, Qi Joslove Xu, Yang Liu

13:25 - 13:50

Invited Talk 5: Early Cancer Detection by Computed Tomography and Artificial Intelligence

Zongwei Zhou (JHU)

Abstract:

13:50 - 14:15

Invited Talk 6: Enabling Medical VLMs to Think Like Doctors: Integrating Domain Models and Clinical Reasoning

Daguang Xu (NVIDIA)

Abstract: Recent advances in vision-language models (VLMs) have shown great promise for automated medical diagnosis, yet their reasoning capabilities and reliability remain limited compared to human experts. In this talk, I will present strategies to improve the performance and interpretability of medical VLMs by integrating domain-specific knowledge and doctor-like reasoning. First, we demonstrate that medical VLMs can achieve higher accuracy by leveraging existing healthcare models—such as classification, segmentation, and detection networks—during the learning process. Second, we introduce a two-step training paradigm to align VLM reasoning with clinical practice: (1) supervised fine-tuning with chain-of-thought (CoT) annotations from physicians, and (2) reinforcement learning with Q&A datasets using answer-only supervision. To reduce annotation costs, we generate pseudo-CoT from radiology reports using large language models, enabling scalable training. Our final system not only achieves improved diagnostic accuracy but also provides interpretable, step-by-step reasoning that mimics the decision-making process of human doctors. This work highlights a pathway towards safer, more trustworthy AI systems in medical imaging and healthcare applications.

14:15 - 14:40

Invited Talk 7: Multimodal Generative Models for Science

Serena Yeung (Stanford)

Abstract:

14:40 - 14:55

Oral Session 3

[44] Latent Gene Diffusion for Spatial Transcriptomics Completion
Paula Cárdenas, Leonardo Manrique, Daniela Vega, Daniela Ruiz, Pablo Arbelaez
[49] Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment
Nazanin Moradinasab, Saurav Sengupta, Jiebei Liu, Sana Syed, Donald E. Brown
[72] A Dynamic Agent Framework for Large Language Model Reasoning for Medical and Visual Question Answering
Ziyan Xiao, Ruiyang Zhang, Yushi Feng, Lingting Zhu, Liang Peng, Lequan Yu

14:55 - 15:55

Posters session II and coffee break

Location: Exhall II; 211–246

Presented Posters • Paper IDs > 48

15:55 - 16:20

Invited Talk 8: AI for Personalized Cancer Care

Adam Yala (UC Berkeley)

Abstract: Early detection significantly improves outcomes across many cancers, motivating major investments in population-wide screening programs, such as low-dose CT for lung cancer. To make screening more effective, we must simultaneously improve early detection for patients who will develop cancer while minimizing the harms of over screening. Advancing this Pareto frontier requires progress across three fronts: (1) accurately predicting patient outcomes from all available data, (2) designing intervention strategies tailored to risk, and (3) evaluating and translating these strategies into clinical practice. In this talk, I will present ongoing work across all three areas, driven by the goal of using every available bit of patient data to personalize care.

16:20 - 16:45

Invited Talk 9: Generative Multiagent Systems for Advancing Scientific Research

James Zou (Stanford)

Abstract:

16:45 - 17:05

Oral Session 4

[73] Automated Assessment of Aesthetic Outcomes in Facial Plastic Surgery
Pegah Varghaei, Kiran Abraham-Aggarwal, Manoj T. Abraham, Arun Ross
[80] MedBLINK:Probing Visual Perception and Trustworthiness in Multimodal Language Models for Medicine
Mahtab Bigverdi, Wisdom Oluchi Ikezogwo, Kevin Minghan Zhang, Hyewon Jeong, MingYu Lu, Sungjae Cho, Linda Shapiro, Ranjay Krishna
[81] RadAgent: an agentic system for automatic radiotherapy treatment planning
Sheng Liu, Siqi Wang, James Zou, Lei Xing
[83] Memory-Guided Personalization for Physician-Specific Diagnostic Inference
Jong-hyuk Ahn, Seo-Yeon Choi, Kyungsu Lee