Skip to content
@HealthNLPorg

HealthNLP

HealthNLP .org is home to natural language processing projects related to health such as mining the clinical narrative of the electronic medical record.

HealthNLP

HealthNLP .org is home to natural language processing (NLP) projects related to health such as mining the clinical narrative of the electronic medical record. We are dedicated to making state of the art clinical NLP projects publicly available for those on the cutting edge of biomedical research and AI.

At this time HealthNLP has over 20 public repositories. Several repositories contain tools that are reuseable for any project with specific needs, such as the HNLP-TimeNorm tool that creates normalized ISO 8601 codes from raw text temporal expressions. Other repositories contain code and resources developed for singular purpose with specific data, but can be adapted for other sites and data. For instance, the LGT-SACT project that extracts timelines for Systemic Anti-Cancer Therapies can be given a different tokenizer and different LLM prompt to generate timelines for other even types.

At a glance, the HealthNLP repositories include:

HNLP-TimeNorm : Provides models for finding natural language expressions of dates and times and converting them to a normalized form.

LGT-SACT : Extracts and normalizes temporal information from clinical notes using fine-tuned LLMs. Specifically, Systemic Anti-Cancer Therapy (SACT) Timelines.

chemoTimelines Docker : Dockerizable source code for the baseline system for the Chemotherapy Treatment Timelines Extraction from the Clinical Narrative shared task.

chemoTimelines Eval : Evaluation code for ChemoTimelines 2025.

rt-ctae-eval : Evaluation and annotation adjudication tool for the ACS-CTAE Label Studio project, using lseval as a backend.

lseval : Basic version of core functionality we use with anaforatools but for Label Studio annotations.

radiotherapy_end2end : An end-to-end natural language processing system for automatically extracting radiotherapy events from clinical texts.

ctae_pre_annotation : cTAKES module for generating LabelStudio pre-annotation JSON from clinical text for the CTAE project.

acs-lung-cns-eda : EDA/Computing note counts for ACS project lung cancer patients with at least one CNS adverse event.

acs-lung-cardiac-eda : EDA/Computing note counts for ACS project lung cancer patients with at least one cardiac event.

bwrobitterman_label_studio_setup : Repository for managing the Label Studio startup scripts etc.

rt-signature-docker : DeepPhe RT Signature Docker with fixes for running and Label Studio output.

Associated Projects

Some of our projects are modern upgrades to other public projects such as Apache cTAKES and DeepPhe.

Popular repositories Loading

  1. chemoTimelinesEval chemoTimelinesEval Public

    Evaluation code for ChemoTimelines 2025

    Python 4 3

  2. chemoTimelinesBaselineSystem chemoTimelinesBaselineSystem Public

    Dockerizable source code for the baseline system for the Chemotherapy Treatment Timelines Extraction from the Clinical Narrative shared task.

    Scala 2 1

  3. hnlp-timenorm hnlp-timenorm Public

    Provides models for finding natural language expressions of dates and times and converting them to a normalized form.

    Scala 1

  4. ctae_pre_annotation ctae_pre_annotation Public

    cTAKES-based tool for converting data from the ACS CTAE project to LabelStudio pre-annotation data

    Bluespec

  5. ctae_umls_terms ctae_umls_terms Public

    Originally from Sean Finan's jupyter notebook

    Python

  6. acs-lung-cardiac-eda acs-lung-cardiac-eda Public

    EDA/Computing note counts for ACS project lung cancer patients with at least one cardiac event

    Python 1

Repositories

Showing 10 of 19 repositories

Top languages

Loading…

Most used topics

Loading…