Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotating ligand binding sites in tRNA synthetases #2

Open
miquelduranfrigola opened this issue Feb 19, 2025 · 4 comments
Open

Annotating ligand binding sites in tRNA synthetases #2

miquelduranfrigola opened this issue Feb 19, 2025 · 4 comments
Assignees

Comments

@miquelduranfrigola
Copy link
Member

miquelduranfrigola commented Feb 19, 2025

Hi @arnaucoma24 ,

Below I have generated, with GPT, some thoughts about how to annotate the tRNA synthetases structurally. I think it is very important to know "what is what" in every structure.

🧬 Annotating Ligand Binding Sites in tRNA Synthetases

⚠️ This issue was initially generated by GPT and may require verification for accuracy.

📌 Objective

We need to structurally annotate the key ligand-binding sites in every available structure of tRNA synthetases in our dataset. This will help in standardizing site identification across different synthetases and facilitate further computational analyses, such as molecular docking and mutational studies.

🔑 Key Binding Sites

1. Amino Acid Binding Site

  • Recognizes and binds the correct amino acid.
  • Located in the catalytic domain of the enzyme.
  • Selectivity is determined by size, charge, and hydrogen bonding interactions.
  • Some aaRSs contain an editing site to correct mischarged amino acids.

2. tRNA Binding Site

  • Binds the acceptor stem of tRNA at the 3' CCA terminus, where aminoacylation occurs.
  • Specific interactions with the anticodon loop (especially in Class I aaRSs).
  • RNA-protein interactions involve hydrogen bonding and stacking interactions.

3. ATP Binding Site

  • ATP is required to activate the amino acid, forming an aminoacyl-adenylate intermediate.
  • The ATP-binding pocket is within the catalytic domain and often coordinates with metal ions (e.g., Mg²⁺) for stability.

4. Editing Site (Proofreading Site)

  • Some aaRSs (especially for small or chemically similar amino acids) contain an editing site that hydrolyzes mischarged amino acids.
  • Functions in pre-transfer (before tRNA attachment) or post-transfer (after tRNA attachment) proofreading to ensure fidelity.

5. Class-Specific Differences

  • Class I aaRSs:
    • Typically bind ATP and tRNA in a Rossmann fold-containing catalytic domain.
    • Aminoacylate the 2'-OH of the ribose.
  • Class II aaRSs:
    • Utilize a different catalytic architecture (often a β-sheet structure).
    • Aminoacylate the 3'-OH.

✅ Tasks

  1. Identify and annotate these sites in all available tRNA synthetase structures.
  2. Standardize the annotation using structural and sequence-based alignment methods.
  3. Cross-check with experimental data (e.g., from PDB structures and mutagenesis studies).
  4. Document the annotations for future machine learning and docking studies.
@arnaucoma24
Copy link
Collaborator

arnaucoma24 commented Feb 20, 2025

Hi @miquelduranfrigola ! Thanks for the global overview, useful way to start the discussion.

1. Identify and annotate these sites in all available tRNA synthetase structures.

  • I'm using P2rank to detect pockets in all structures of each protein (AF2, AF3, Chai-1 and SwissModel). I've just contacted them to ask which is the best practice to interpret their results. In brief, P2rank detects many pockets (most of them are false positives): I usually considered the TOP-N (e.g. N=2) pockets per structure, leading to high recall but low precision. I'm not sure if this is exactly what we want here.
  • Are we planning to use ligand information from Alphafill at this point? That is, to detect/identify pockets on the basis of the predicted ligands' locations.
  • Each detected pocket will have an associated 3D coordinate (x,y,z) and a set of residues defining the pocket. We need to automate the visualization of these results --- Automate Pymol Session File Generation #3
  • This is important: where should we detect pockets? We agreed to do it in aligned_structures but I think it makes much more sense to do it in aligned_relaxed_structures, specially if we plan to dock molecules in further steps. In any case, I don't expect results to change dramatically.

2. Standardize the annotation using structural and sequence-based alignment methods.

For each protein, the main outcome will be a set of structures with a set of associated detected pockets (3D coordinates + involved residues). Residue numbering in AF2 is standard, meaning residue N in the structure corresponds to residue N in the Uniprot sequence. I need to check if this is indeed the case for AF3, Chai-1 and SwissModel. Apart from this, is there something I'm missing?

3. Cross-check with experimental data (e.g., from PDB structures and mutagenesis studies).

The main goal here is to gather experimental evidence in the PDB for the pockets we detect in predicted structures?

4. Document the annotations for future machine learning and docking studies.

Should be easy. I'm thinking on a plain CSV file with all the results: pocket coordinates, residues involved, etc.

Let me know what you think 😃

@miquelduranfrigola
Copy link
Member Author

Thanks @arnaucoma24 this is very helpful, and thanks for reaching out to the R2Rank authors.

In response to your points:

  1. RE: binding sites
  • Also following suggestion by authors, I would work based on a probability criteria rather than the top-N, or perhaps based on a combination of both. We will compare cavitities to suggest cross-pharmacology, so I believe that at this stage recall is more important than precision.
  • About AlphaFill, I would say that this information is not crucial but it would be worth investigating whether the ligands provided by AlphaFill make sense or not. On a quick look, I was not very convinced by the proposed ligands and this is why I decided to not use AlphaFill (yet). But happy to reconsider.
  • I agree, it makes more sense on the aligned_relaxed_structures. I asked you to do it on the aligned_structures because I was unsure about the relaxation and how long it would take to compute. I agree that results will not change dramatically, so feel free to do whatever is most convenient at this stage.
  1. RE: standardize annotation
    No, there is nothing you are missing

  2. RE: cross-check with experimental data
    Yes, basically it would be good to check in tRNA synthetase structures in the PDB whether we have ligands bound in the regions of the cavities. Those structures can be orthologs, in my opinion. I am aware this can be a very large task - I would not spend a lot of time on it, just a few quick manual checks if this is easy to do; just to gain confidence.

  3. RE: document
    Great!

@arnaucoma24
Copy link
Collaborator

Hi @miquelduranfrigola !

Several updates:

  1. RE: binding sites.
  • Following the authors recommendations: for each structure, we consider detected pockets as those with a probability (K) > 0.2, but at least Top-3 (N) per structure. On average, we get 4-5 pockets per structure. After that, we filter those pockets having at least one residue with pLDDT < 65 (AF2, AF3, chai1) or QSQE < 0.65 (swissmodel), discarding about 25-30% of the pockets. Cut-offs are arbitrary - usual recommendations are 70 & 0.7; I’ve been slightly less restrictive.
  • AlphaFill still not considered at this stage. Let me know if I should prioritize it.
  1. RE: standardize annotation.
  • I’ve dug into it a bit and to me everything looks the same. How did you download AF3, chai1 and swissmodel structures exactly? Directly from a DB (e.g. AFDB) or providing the uniprot sequence? In any case, I’ve checked several proteins manually and the mapping is indeed direct.
  1. RE: cross-check with experimental data.

    Still to be done!

  2. RE: document.

    All results are stored in processed/pocket_detection_data.csv. For each protein, we include the list of structures we detect pockets in together with pocket scores, probabilities, coordinates, residues involved and confidence metrics of the residues.

@miquelduranfrigola
Copy link
Member Author

Thanks @arnaucoma24 - all of this sounds fantastic.

RE: 1. No need to use AlphaFill at this stage in my opinion.
RE: 2. I downloaded AF3, Chai1 and SwissModel structures manually, after submitting the UniProt sequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants