AMIA NLP Working Group · Dataset Registry
Documentation Metadata Schema Report an Issue GitHub ↗

Clinical & Biomedical NLP · FAIR-Aligned Catalog · 2026

Find the right dataset
for clinical NLP research

A standardized, community-maintained registry of clinical and biomedical NLP datasets curated by the AMIA NLP Working Group. Each entry is documented using a 33-variable FAIR-aligned metadata framework.

60Datasets
33Metadata Variables
10NLP Tasks
2026Last Updated
Filter by

Showing of datasets

FAIR-aligned metadata · Updated 2025

Dataset Primary Task Setting Language Access Year PID / DOI