VLDB 2026 (pending...)

Med-CRAFT

Automated Construction of Interpretable and Multi-Hop Video Workloads via Knowledge Graph Traversal

A novel pipeline that positions knowledge graphs as the central engine for automated generation of medical VideoQA datasets. Bridging the gap between scalability and interpretability.

Abstract

Medical Video Question Answering (VideoQA) faces a dichotomy: manual annotation ensures structural rigor but lacks scalability, while end-to-end automatic generation offers scalability but sacrifices reasoning control. We propose Med-CRAFT, a pipeline that generates M3-Med-Auto, a large-scale dataset grounded in explicit visual evidence. This approach preserves the structural interpretability of manual methods while achieving the scalability of automatic ones.

Key Contributions

Automated Pipeline

Full automation from entity extraction to question generation.

Knowledge Graph Centric

Uses KGs as the unifying abstraction for controllable reasoning.

Multi-Hop Reasoning

Generates complex queries with explicit reasoning traces.

M3-Med-Auto Dataset

A large-scale benchmark grounded in visual evidence.

Methodology

1

Entity Extraction

Extracts entities from multi-modal signals (ASR, OCR) and grounds them visually.

2

KG Construction

Builds a Cross-modal Knowledge Graph capturing spatial, temporal, and logical relations.

3

Question Generation

Traverses the graph to generate multi-hop questions with precise temporal answers.

M3-Med-Auto Dataset

Dataset Statistics

Large-scale VideoQA pairs
Multi-hop reasoning paths
Precise temporal grounding
Download Dataset Download Dataset Licensed under CC BY-NC-ND 4.0

Video Download Policy

Due to YouTube's Terms of Service and user privacy agreements, we strictly do not provide direct download links for the raw video files. The dataset contains YouTube Video IDs and timestamps. Researchers must download the videos independently using the provided tools/scripts.

Authors

Shenxi Liu Kan Li Mingyang Zhao Yuhang Tian Shoujun Zhou Bin Li

Beijing Institute of Technology · The Hong Kong Polytechnic University · Shenzhen Institute of Advanced Technology, CAS

Citation

@article{medcraft2026,
  title={Med-CRAFT: Automated Construction of Interpretable and Multi-Hop Video Workloads via Knowledge Graph Traversal},
  author={Liu, Shenxi and Li, Kan and Zhao, Mingyang and Tian, Yuhang and Zhou, Shoujun and Li, Bin},
  journal={PVLDB},
  volume={14},
  number={1},
  year={2026}
}