PIVOT (Personalised Identification of driVer Oncogenes and Tumour suppressors) is a tool used to identify personalised tumour suppressor genes (TSGs) and oncogenes (OGs) using multi-omic data. The genes are labelled at for each patient as TSG or OG.
If you use PIVOT in your work, please cite
Sudhakar M, Rengaswamy R and Raman K (2022) Multi-Omic Data Improve Prediction of Personalized Tumor Suppressors and Oncogenes. Frontiers in Genetics 13:854190. doi: 10.3389/fgene.2022.854190
PIVOT is the first supervised tool to predict personalised driver genes and label them based on functionality as TSG or OG. The best model is trained using multi-omic features using labels from Bailey et al. Our model predicts well known driver genes as well as new driver genes that are frequently altered across samples. The predictions are made for genes mutated or altered by copy number variations. PIVOT predicts labels on individual samples and can be used on as few as a single patient. The features are independent of other samples and can be hence used in a clinical setting. PIVOT identifies rare drivers altered in as few as one sample.
While our best model is dependent on multi-omic features, we also publish SNV and RNA feature based models that can be used on SNV and RNA features respectively.
The TCGA data for BRCA, COAD, LGG and LUAD was downloaded from GDC. The preprocessed data can be found below:
The top-level directories contain code, data and output folders.
.
├── code # All the code for analysis
├── data
│ ├── domains # pre-processed domain files required for feature generation
│ │ └── pfam
│ ├── driver genes # List of driver genes for labels
│ │ ├── Bailey et al
│ │ ├── CGC
│ │ ├── CIViC
│ │ └── Martelotto et al
│ ├── miRNA # pre-processed miRNA files required for feature generation
│ ├── network # pre-processed network files required for feature generation
│ └── neutral.txt # List of neutral genes for labels
├── output
│ ├── GDC_BRCA # Results for cancer-type BRCA
│ │ ├── multiomic # Metrics, plots for all models built using multi-omic features
│ │ ├── predict # Predictions using PIVOT
│ │ ├── RNA # Metrics, plots for all models built using RNA features
│ │ └── SNV # Metrics, plots for all models built using SNV features
│ ├── GDC_COAD # Results for cancer-type COAD
│ │ └── ...
│ ├── GDC_LGG # Results for cancer-type LGG
│ │ └── ...
│ └── GDC_LUAD # Results for cancer-type LUAD
│ └── ...
└── README.md
The code folder containes all the files used for building the feature matrix, building the models and and the tissue-specific analysis.
.
├── ...
├── code # All the code for analysis
│ ├── analyse_predictions_BRCA.ipynb # Analyse and plot data from predictions of BRCA
│ ├── analyse_predictions_COAD.ipynb # Analyse and plot data from predictions of COAD
│ ├── analyse_predictions_LUAD.ipynb # Analyse and plot data from predictions of LUAD
│ ├── PIVOT_predict.ipynb # Notebook to generate features and predict labels
│ ├── multiomic_classifier.py # Generates all multi-omic models
│ ├── preprocess_CNV.ipynb # Pre-process CNV data
│ ├── preprocess_drivers.ipynb # Pre-process driver lists
│ ├── preprocess_miRNA.ipynb # Pre-process miRNA data
│ ├── preprocess_networks.ipynb # Pre-process network data
│ ├── preprocess_SNV.ipynb # Pre-process SNV data
│ ├── preprocessRNA.Rmd # Pre-process RNA data
│ ├── rna_classifier.py # Generates all RNA models
│ ├── multiomic_classifier.py # Generates all multi-omic models
│ └── snp_classifier.py # Generates all SNV models
└── ...
Download PIVOT from GitHub and add the folder to PYTHONPATH.
PIVOT requires the following dependencies to run smoothly:
- Python >3
- numpy 1.20.3
- pandas 1.3.4
- sklearn 0.24.2
- imblearn 0.8.0
- Download COAD data.
- Open
PIVOT_predict.ipynb
notebook and follow thw steps
- Grant BT/PR16710/BID/7/680/2016 from the Department of Biotechnology, Government of India.
- Centre for Integrative Biology and Systems mEdicine
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI)
- Indian Institute of Technology Madras Grant: SB20210841BTMHRD008752
- Malvika Sudhakar acknowledges the HTRA fellowship from the Ministry of Education, Government of India