Skip to content

hms-dbmi/pic-sure-hpds

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

a2d1e52 · Mar 20, 2025
Dec 19, 2024
Jan 10, 2025
Jan 8, 2025
Dec 6, 2024
Feb 27, 2025
Mar 14, 2025
Mar 17, 2025
Oct 10, 2024
Mar 14, 2025
Mar 20, 2025
Apr 22, 2019
Dec 18, 2024
Feb 9, 2023
Jan 12, 2023
Feb 7, 2023
Jan 8, 2025
Feb 18, 2025

Repository files navigation

pic-sure-hpds

PIC-SURE-HPDS was built from the ground up to support biomedical informatic use cases without requiring massive clustering as the datasets increase in scale. As such, PIC-SURE-HPDS can manage arbitrarily large datasets with very little computing.

For clinical data, datasets are stored as two files: metadata and data. The metadata file contains the internal data dictionary, high-level dataset-specific information, and file offsets for each variable's data within the data file. The data file contains data for three concepts: patient index, numerical index, and categorical index. How to load phenotypic data into HPDS

For genomic data, variants that are not represented in the database are not stored. Genomic sample data is stored separately from variant annotations in HPDS. Variant annotations are stored using the same Numerical Index, and Categorical Index described above, indexing variant IDs instead of patient IDs. How to load genomic data into HPDS

Pre-requisites

  • Java 21
  • Before contributing code, please set up our git hook: cp code-formatting/pre-commit.sh .git/hooks/pre-commit
    • To skip formatting on a block of code, wrap in spotless:off, spotless:on comments