Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Polygenic Risk Score Functionality #367

Open
aneilbaboo opened this issue Nov 29, 2018 · 0 comments
Open

Add Polygenic Risk Score Functionality #367

aneilbaboo opened this issue Nov 29, 2018 · 0 comments

Comments

@aneilbaboo
Copy link
Contributor

aneilbaboo commented Nov 29, 2018

Overview

We're going to work with Sekar Katherisan who has pioneered one of the most important new techniques in genetic risk analysis. It involves applying a simple function to millions of variant calls to determine a simple risk score for a disease area.

For example, his lab reported using the method to evaluate cardiac risk based on 6.6M variants from imputed data sets: http://www.kathiresanlab.org/our-publications/genome-wide-polygenic-scores-for-common-diseases-identify-individuals-with-risk-equivalent-to-monogenic-mutations/

The same technique has been successfully applied to 4 other major diseases: atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer:

Literature

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

https://www.nature.com/articles/s41588-018-0183-z

PDF

Supplementary material: https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-018-0183-z/MediaObjects/41588_2018_183_MOESM1_ESM.pdf

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

Describes the LDPred algorithm

https://www.cell.com/ajhg/fulltext/S0002-9297(15)00365-1

PDF

Supplemental - PDF

Materials and methods show where data is:
https://www.cell.com/ajhg/fulltext/S0002-9297(15)00365-1#secsectitle0160

Projecting the performance of risk score from GWAS studies

Model building algorithm described
https://www.nature.com/articles/ng.2579

PDF

Common polygenic variation contributes to risk of schizophrenia and bipolar disorder

The original polygenic risk score paper
https://www.researchgate.net/publication/232772602_Common_polygenic_variation_contributes_to_risk_of_schizophrenia_and_bipolar_disorder

PDF

A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease

The Coronary disease GWAS study that provides the GWAS summary stats
https://www.researchgate.net/publication/281643470_A_comprehensive_1000_Genomes-based_genome-wide_association_meta-analysis_of_coronary_artery_disease

PDF

Extra data:
http://www.cardiogramplusc4d.org/data-downloads/

A worldwide survey of haplotype variation and linkage disequilibrium in the human genome

Jonathan Pritchard paper widely cited paper on linkage disequilibrium across populations

https://web.stanford.edu/group/pritchardlab/publications/ConradEtAl06a.pdf

PDF

Criticism

Polygenic Risk Scores, a Biased Prediction

https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-018-0610-x

Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations

https://www.cell.com/ajhg/pdfExtended/S0002-9297(17)30107-6

PDF - paper plus supplemental materials

India

Our new focus on India means that we will develop a product that is a bit simpler for MVP. Instead of offering the two sided capabilities, we'll offer a core set of reports. The initial thinking is a set of polygenic risk scores + Ancestry that come with the initial purchase and which do not have an author. This gets us off the hook of building out two sided market functionality - including all the complexities of data transfer, authoring tools, payment management, communication tools, validation, ratings, etc - in favor of a much simpler product.

Implementation Thoughts

This technique is not a great fit for our current architecture, however, given that we don't have to offer this as a report-author capability, we can greatly simplify our work by running these risk score calculations at impute-time. We will include the polygenic risk tables in the bioinformatics repo, run the calculations in the python container and store the computed scores for each user in a new table. We can build a simple bespoke report for these scores using plain old React and GraphQL API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant