Computer Vision for Automated Medical Diagnosis
(CVAMD @ ICCV 2025)

Oct.19, Honolulu, Hawai'i

Location: Hawai'i Convention Center 314

Online Proceedings: https://openaccess.thecvf.com/ICCV2025_workshops/CVAMD

About The Workshop

The striding advances of computer vision techniques are revolutionizing many long-standing automatic medical diagnosis tasks. Emerging trends—such as Large Language Models (LLMs), Foundation Models (FMs), advanced learning paradigms (e.g., un-/semi-/self-supervised learning), and considerations of fairness and generalization—remain underexplored for secure and reliable automated medical diagnosis. Distinctly, this workshop emphasizes integrating insights from clinicians and radiologists alongside technical discussions to better advance the field.

Resources

Useful links and materials for CVAMD 2025 participants

Call for Papers

We welcome three types of submissions for oral or poster presentation at the workshop:

  • 📄 Long Papers – Up to 8 pages (excluding references). Accepted long papers will be published in the ICCV 2025 Workshop Proceedings.
  • 📝 Extended Abstracts – Up to 4 pages (excluding references), non-archived. Great for sharing work-in-progress or previously published work.
  • 🌟 Highlights – A concise 1-page summary, non-archived. Perfect for showcasing recently accepted or notable papers and boosting visibility and engagement.

🧠 Topics of Interest

  • 🔥 Foundation Models (FMs) and Large Language Models (LLMs) in healthcare
  • 🤖 AI Agents for medical decision-making and workflow automation
  • 🧾 Interpretable & trustworthy AI for medical diagnosis
  • 📊 Predicting clinical outcomes from medical image analysis
  • 🔍 Multimodal biomedical image analysis
  • ⚖️ Fairness, robustness, and generalization in medical computer vision
  • 🧠 Embedding medical knowledge in vision systems
  • 🧾 Generating diagnostic reports from medical images
  • 🧭 Clinical reasoning-aware vision system design
  • 🧹 Learning robust representations from noisy annotations
  • 🩺 Advances in disease diagnosis and management with computer vision
  • 🧬 Medical anomaly and out-of-distribution prediction
  • 🖼️ Medical image registration and classification (MRI/CT/PET)
  • 📌 Organ and lesion segmentation/detection
  • 📈 Longitudinal studies with computer vision
  • ♾️ Life-long and active learning in medical vision

🏆 Review & Selection: All submissions will be evaluated by the program committee based on relevance and quality. Oral and poster presentations will be selected accordingly.

📚 Note: Accepted long papers will be published alongside the ICCV 2025 proceedings.

📢 Submission Guidelines

  • 📘 Long Paper Track (4-8 pages, with Proceedings)

    Submit your original research (4–8 pages, excluding references) in the ICCV 2025 format. Accepted papers will be published in the official ICCV workshop proceedings.
    Please ensure your submission adheres to the ICCV 2025 Dual Submission Policy. Work must be sufficiently original and not under review elsewhere.

  • 📝 Extended Abstracts (2–4 pages, non-archived)

    Ideal for work-in-progress or previously published studies relevant to the workshop. Submissions must be 2–4 pages (excluding references).
    These abstracts will not be included in the official proceedings, making them suitable for showcasing ongoing work or gaining community feedback. Please verify the double submission policy of your target venue if you plan to submit elsewhere later.

  • 🌟 Highlights Track (1-page summary + external link)

    Showcase your recently accepted or notable work (e.g., from NeurIPS, ICLR, CVPR). Simply provide a concise 1-page summary with a link to the full paper.
    This track is lightly curated—no full review process—and is a great opportunity to increase visibility and spark discussion.

All papers can be submitted through OpenReview.

🏆 Awards

  • 🥇 Best Paper Award – One winner per track
  • 🎓 Best Student Paper Award – Recognizing exceptional work led by a student author, one per track
  • 🖼️ Best Poster Award – Honoring the most impactful poster presentation in each track

In addition, each track will feature a selection of oral and poster presentations. The number of slots for each format will be finalized and announced following the review process.

📅 Important Dates

  • Submission Opens: May 1
  • Submission Deadline: June 21 23:59 PMJune 30 23:59 PM
  • Notification of Acceptance: July 10 23:59 PM
  • Camera-Ready Deadline: August 16 23:59 PM

🕓 All times are in Anywhere on Earth (AoE).

📄 Format

Please use the official ICCV 2025 Submission Template to prepare your manuscript.

Submissions must be in PDF format and anonymous. By submitting a paper, at least one author agrees to present the work if accepted.

🧑‍⚖️ Reviewer Recruitment

We’re actively seeking reviewers to support the community. If you're interested, please sign up via this form.

Invited Speakers

Speaker 2

Sheng Liu

Stanford University

Speaker 1

Akshay Chaudhari

Stanford

Speaker 3

Faisal Mahmood

Harvard

Speaker 10

Lei Xing

Stanford

Speaker 4

Daguang Xu

NVIDIA

Speaker 7

Serena Yeung

Stanford

Speaker 7

Adam Yala

UC Berkeley

Speaker 8

James Zou

Stanford

Event Schedule

Welcome remarks and introduction

Sheng Liu

Invited Talk 1: Beyond Autopilot: Building the AI Copilot for Healthcare

Sheng Liu (Stanford)
Abstract:

Oral Session 1

  • [7] UD-Mamba: A pixel-level uncertainty-driven Mamba model for medical image segmentation
    Weiren Zhao, Feng Wang, Yanran Wang, Yutong Xie, Qi Wu, Yuyin Zhou
  • [20] Advancing Prognosis Prediction Using Spatial Omics-Enriched Histopathology
    Tianyi Wang, Ruibang Luo, Zhenqin Wu
  • [21] PMC-Vid: A Large-Scale Biomedical Video Captioning Dataset
    Yosuke Yamagishi, Kuniaki Saito, Atsushi Hashimoto, Yoshitaka Ushiku

Posters session I and coffee break

Location: Exhall II; 211–246
Presented Posters • Paper IDs ≤ 48
Akshay Chaudhari

Invited Talk 2: Pathways for Radiology Foundation Models To Enter the Clinic

Akshay Chaudhari (Stanford)
Abstract:
Faisal Mahmood

Invited Talk 3: Multimodal, Generative and Agentic AI for Pathology

Faisal Mahmood (Harvard)
Abstract:
Lei Xing

Invited Talk 4: Foundations and Applications of AI Foundation Models

Lei Xing (Stanford)
Abstract:

Lunch break

Oral Session 2

  • [32] MK-UNet: Multi-kernel Lightweight CNN for Medical Image Segmentation
    Md Mostafijur Rahman, Radu Marculescu
  • [35] A Deep Learning System for Rapid and Accurate Warning of Acute Aortic Syndrome on Non-contrast CT in China
    Yan-Jie Zhou, Yujian Hu, Zhengyao Ding, Le Lu, Minfeng Xu, Hongkun Zhang
  • [37] VoxelPrompt: A Vision-Language Agent for Grounded Medical Image Analysis
    Andrew Hoopes, Victor Ion Butoi, John Guttag, Adrian V Dalca
  • [39] Comparison of Digital Histology AI Models with Low-Dimensional Genomic and Clinical Models in Survival Modeling for Prostate Cancer
    Aidan McLoughlin, Ho Yin HO, Xin Zhao, Alexander Karl Hakanasson, Alireza Moradi, Qi Joslove Xu, Yang Liu
Zongwei Zhou

Invited Talk 5: Early Cancer Detection by Computed Tomography and Artificial Intelligence

Zongwei Zhou (JHU)
Abstract:
Daguang Xu

Invited Talk 6: Enabling Medical VLMs to Think Like Doctors: Integrating Domain Models and Clinical Reasoning

Daguang Xu (NVIDIA)
Abstract: Recent advances in vision-language models (VLMs) have shown great promise for automated medical diagnosis, yet their reasoning capabilities and reliability remain limited compared to human experts. In this talk, I will present strategies to improve the performance and interpretability of medical VLMs by integrating domain-specific knowledge and doctor-like reasoning. First, we demonstrate that medical VLMs can achieve higher accuracy by leveraging existing healthcare models—such as classification, segmentation, and detection networks—during the learning process. Second, we introduce a two-step training paradigm to align VLM reasoning with clinical practice: (1) supervised fine-tuning with chain-of-thought (CoT) annotations from physicians, and (2) reinforcement learning with Q&A datasets using answer-only supervision. To reduce annotation costs, we generate pseudo-CoT from radiology reports using large language models, enabling scalable training. Our final system not only achieves improved diagnostic accuracy but also provides interpretable, step-by-step reasoning that mimics the decision-making process of human doctors. This work highlights a pathway towards safer, more trustworthy AI systems in medical imaging and healthcare applications.
Serena Yeung

Invited Talk 7: Multimodal Generative Models for Science

Serena Yeung (Stanford)
Abstract:

Oral Session 3

  • [44] Latent Gene Diffusion for Spatial Transcriptomics Completion
    Paula Cárdenas, Leonardo Manrique, Daniela Vega, Daniela Ruiz, Pablo Arbelaez
  • [49] Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment
    Nazanin Moradinasab, Saurav Sengupta, Jiebei Liu, Sana Syed, Donald E. Brown
  • [72] A Dynamic Agent Framework for Large Language Model Reasoning for Medical and Visual Question Answering
    Ziyan Xiao, Ruiyang Zhang, Yushi Feng, Lingting Zhu, Liang Peng, Lequan Yu

Posters session II and coffee break

Location: Exhall II; 211–246
Presented Posters • Paper IDs > 48
Adam Yala

Invited Talk 8: AI for Personalized Cancer Care

Adam Yala (UC Berkeley)
Abstract: Early detection significantly improves outcomes across many cancers, motivating major investments in population-wide screening programs, such as low-dose CT for lung cancer. To make screening more effective, we must simultaneously improve early detection for patients who will develop cancer while minimizing the harms of over screening. Advancing this Pareto frontier requires progress across three fronts: (1) accurately predicting patient outcomes from all available data, (2) designing intervention strategies tailored to risk, and (3) evaluating and translating these strategies into clinical practice. In this talk, I will present ongoing work across all three areas, driven by the goal of using every available bit of patient data to personalize care.
James Zou

Invited Talk 9: Generative Multiagent Systems for Advancing Scientific Research

James Zou (Stanford)
Abstract:

Oral Session 4

  • [73] Automated Assessment of Aesthetic Outcomes in Facial Plastic Surgery
    Pegah Varghaei, Kiran Abraham-Aggarwal, Manoj T. Abraham, Arun Ross
  • [80] MedBLINK:Probing Visual Perception and Trustworthiness in Multimodal Language Models for Medicine
    Mahtab Bigverdi, Wisdom Oluchi Ikezogwo, Kevin Minghan Zhang, Hyewon Jeong, MingYu Lu, Sungjae Cho, Linda Shapiro, Ranjay Krishna
  • [81] RadAgent: an agentic system for automatic radiotherapy treatment planning
    Sheng Liu, Siqi Wang, James Zou, Lei Xing
  • [83] Memory-Guided Personalization for Physician-Specific Diagnostic Inference
    Jong-hyuk Ahn, Seo-Yeon Choi, Kyungsu Lee

Award Ceremony and Closing Remarks

Organizers

This workshop is organized by

Speaker 3

Fuying Wang

University of Hong Kong

Speaker 2

Sheng Liu

Stanford

Speaker 2

Qingyue Wei

Stanford

Speaker 1

Yi Lin

Cornell

Speaker 4

Lequan Yu

University of Hong Kong

Speaker 1

Angelica Aviles-Rivero

Tsinghua University

Speaker 4

Tingying Peng

Helmholtz AI

Speaker 4

Yifan Peng

Weill Cornell Medicine

Speaker 4

Atlas Wang

UT Austin

Supporting Organization

This workshop is sponsored by