Skip to content

sunilpanda14/PolyToxiQ

Repository files navigation

PolyToxiQ : A Polymer Toxicity Prediction Tool using PSMILE Strings

About This Project

This application predicts the toxicity level of polymers based on their PSMILES string representation using transfer learning techniques and Tox21 molecular fingerprinting.

Methodology

  • AutoGluon & Scikit-learn: We used AutoGluon's TabularPredictor to build a robust machine learning model that classifies polymers into different toxicity levels ( High, Medium, and Low). The model was trained on a carefully curated dataset of Tox 21 datsets (4974) with their known toxicity properties and level of concern(LoC).

  • Cosine Similarity: We calculate the cosine similarity between the PolyBERT Generated fingerprint of the input polymer and those in our reference database of Tox21 Molecule Fingerprints. This metric measures how similar two molecular structures are in their vector space representation, with values ranging from 0 (completely different) to 1 (identical).

  • Zero-Shot Transfer Learning: Our approach leverages transfer learning principles that allow us to make predictions on novel polymer structures that weren't present in the training data using Transfer learning of pre-trained Autogluon Model of Tox21 Molecule dataset.

Toxicity Classification Levels:

Polymers are classified into three concern levels depending on their toxicity properties or Hazard Criteria (0<= Hazard Criteria <= 8):

  • Persistent, Bioaccumulative(BIOACCUM) ,carcinogenicity(CARCINOGEN), mutagenic(MUTA), reproductive toxicity(REPROTOX), specific target organ toxicity(STOT), Endocrine Disrutive Chemicals(EDC), and aquatic toxicity(AQUATOX)

Toxicity Classification Levels:

image image