BDIViz

BDIViz: An Interactive Visualization System for Biomedical Schema Matching with LLM-Powered Validation

📑 Table of Contents

Introduction
Features
Getting Started
Demo
- Video Demo
- Live Demo
Documentation & User Manual
Publication
Releases & New Features
Technical Details

Introduction

BDIViz is an interactive web-based visualization tool developed as part of the ARPA-H ASKEM/BDF project to support schema matching and value mapping tasks in biomedical data integration. It helps researchers align their raw datasets with standardized formats such as the Genomic Data Commons (GDC) and Proteomic Data Commons (PDC).

BDIViz is a visualization-driven, expert-in-the-loop tool designed to simplify and improve schema matching and value mapping in biomedical data integration. It provides users with a rich visual interface—including heatmaps, explanations, and value comparisons—to streamline the process of aligning raw biomedical datasets with standardized data schemas.

BDIViz is model agnostic, meaning it can be used with any schema matching model. It is designed to work with the BDI-Kit module, which is a Python library that provides a set of tools for schema matching and value mapping tasks. The BDI-Kit module includes a variety of schema matching algorithms, including supervised and unsupervised methods, as well as tools for data preprocessing and feature extraction.

Open Source - BDIViz is MIT Licensed and available on GitHub.

Features

BDIViz is designed to be intuitive, powerful, and research-ready. Explore the key features that make BDIViz an essential tool for biomedical data integration:

🔍 Interactive Heatmap: Visualize schema matching candidates across all columns using an intuitive heatmap interface. Color-coded cues help quickly identify strong and weak matches.
📊 Value Comparison: Easily compare unique values between source and target columns using fuzzy matching and histograms to validate semantic alignment.
🤖 LLM Agent Explanations: Get contextual explanations for candidate matches powered by large language models. Understand the rationale behind every recommendation.
⏪ Timeline & Undo/Redo: Track every decision with a timeline view and easily reverse or reapply actions using undo/redo support.
📤 Export & Integration: Export your final mappings as CSV or JSON to use in downstream tools, or integrate with other pipelines such as Beaker or BDIKit.
🎯 Control Panel: Adjust similarity threshold and navigate source columns with an intuitive control interface.
🧬 Built for Biomedicine: Tailored for biomedical researchers, BDIViz supports common data commons like GDC and PDC, and was co-designed with domain experts through the ARPA-H ASKEM project.

Getting Started

Installation

First, install the required dependencies:

npm i .

Running Locally

To run locally with Gemini-2.5-flash:

npm run build && npm run start

To run locally with GPT-4.1-mini:

npm run build && LLM_PROVIDER=openai npm run start

Docker

Pre-built Docker images are available for both AMD64 and ARM64 architectures. Pull and run the container:

docker pull edenwu/bdi-viz-react:amd64
# or
docker pull edenwu/bdi-viz-react:arm64

Demo

Video Demo

Live Demo

Try BDIViz online: https://bdiviz.users.hsrn.nyu.edu/dashboard/

Documentation & User Manual

Complete user manual with detailed guides, tutorials, and API documentation: https://vida-nyu.github.io/bdi-viz-manual/

The user manual includes:

Getting started guide
Feature walkthroughs
Best practices for schema matching
Integration examples
Troubleshooting tips

Publication

BDIViz: An Interactive Visualization System for Biomedical Schema Matching with LLM-Powered Validation
IEEE VIS 2025

Releases & New Features

v0.0.3 - MITRE ARPA-H Release (Latest)

New Features & Improvements:

Session Management: Users can now create and switch between multiple sessions within the web application, enabling better task organization for different schema-matching or value-mapping projects.
Interactive Filtering: Enhanced filtering by allowing direct interaction with source and target axes. Users can now filter by categories and specific nodes for more precise exploration.
Canvas Interaction: Added the ability to click on an empty area of the heatmap canvas to create new candidate nodes between selected source-target attributes.
Collaborative Comments: Users can now add and view comments on individual nodes, supporting shared curation and discussion between collaborators.
Agent Streaming and API Enhancements:
- Added a new /api/agent/explore GET endpoint that streams agent updates using Server-Sent Events (SSE), providing real-time feedback for thoughts, tool calls, and results.
- Implemented event-streaming logic with a queue and threading system to handle agent events efficiently, normalize payloads for SSE, and ensure backward compatibility with older agent response formats.
Bug Fixes & Stability: Various fixes and optimizations across the frontend and backend to improve reliability, responsiveness, and performance.

View Release on GitHub

v0.0.2 - CRA MITRE Release

Feedbacks Addressed:

Provide a global explanation in the interface for the criteria used to automatically accept attributes (easy matches), potentially through hover tooltips.
Restructure the attribute dropdown menu to better group and differentiate matched, selected, and unmatched attributes.
Explore ways to provide more detailed information alongside the heat map visualization, such as a detailed view or additional labels.
Add labels or information tags to the drag-and-drop area to indicate the acceptable file types and formats.
Retouch UpSet Plot.

View Release on GitHub

v0.0.1 - VIS 2025

Initial release for IEEE VIS 2025 publication, featuring core functionality including:

Schema matching with multiple algorithms
LLM-powered validation and explanations
Interactive heatmap visualization
Value comparison tables
Export capabilities

View Release on GitHub

View all releases: https://github.com/VIDA-NYU/bdi-viz/releases

Technical Details

BDIViz combines multiple schema matching methods with LLM-based validation to improve biomedical schema matching accuracy. The system employs an ensemble approach that:

Combines multiple matching algorithms for robust results
Uses LLM-based validation to reduce false positives
Provides interactive visualizations for expert curation
Enables efficient workflow through coordinated views and heatmaps

The system is designed to be method-agnostic, allowing integration with various schema matching algorithms and adaptation to application-specific needs.

Eden's Personal Site

BDIViz

BDIViz: An Interactive Visualization System for Biomedical Schema Matching with LLM-Powered Validation

📑 Table of Contents

Introduction

Features

Getting Started

Installation

Running Locally

Docker

Demo

Video Demo

Live Demo

Documentation & User Manual

Publication

Releases & New Features

v0.0.3 - MITRE ARPA-H Release (Latest)

v0.0.2 - CRA MITRE Release

v0.0.1 - VIS 2025

Technical Details