feat: hackathon/Sheng + Inigo PR (Team Sanofi Germany) #134

inigoo18 · 2025-03-07T14:56:41Z

This PR first obtains more background on species using GenAI. Then, we use a BERT model to embed these to numerical representations.
As for the KG, we first enrich it using STARK (the notebook was already present in the project), then embed the nodes using their rich representations using BERT. Note that we also take into consideration the neighboring nodes in order to account for related terms to said node.
Finally, we make a similarity comparison between all nodes in the KG with each of the species we want to annotate, and end up fetching the most similar node, and its related code

…4Pharma into sheng-inigo-PR

awmulyadi · 2025-03-12T11:53:39Z

Hi @inigoo18 and Sheng. Thanks for the PR 👍 These are thorough working Jupyter notebooks to perform model annotation by utilizing RAG over SBML and PDF files with alignment over KG nodes.

Just a quick question about your approach that was implemented here, especially for taking into consideration the neighboring nodes of KG (without training any network).
Do you refer to any specific publication for this approach? If so, do you mind sharing the reference with us?
This idea is interesting. However, the mapping performance seems somehow dependent on the proportions, which are fixed (or given) in the current implementation.
So I am wondering if we adopt your approach, which other aspects of the implementation could be improved or optimized in order to obtain more accurate mappings.

inigoo18 · 2025-03-12T14:58:37Z

Good afternoon @awmulyadi ! Thanks for the follow-up :)

The idea about neighborhood aggregation came from traditional methods like Message Passing, as well as embedding techniques like Node2Vec or GraphSAGE. Since I'm used to working with GNNs, and we had species and nodes in vector format, this approach seemed like a good fit.

For instance in this paper, they process text documents through a relation graph where they apply convolutional operations on top to obtain embeddings based on the surrounding structure.

This other paper might also be interesting to you, they consider neighbor text nodes for feature aggregation later in the pipeline.

Moreover, it is true that the proportions we give are fixed - we didn't have enough time to push for a more sophisticated working solution. On that note, while creating this notebook we thought that our approach had a major weak point - we don't think that assigning every N-hop edge the same weight would give an optimal embedding. Instead, each edge could have an individual weight, the weight would be higher the more related the neighbor node is to the current node. We thought it would be interesting to use a deep learning model to assign these weights automatically, based on attention.

One last thing - the BERT model we use isn't fine-tuned for domain-specific biological data. Fine-tuning it might improve the quality of the embeddings.

Hope that was useful to you, feel free to reach out with any other question you have :)

inigoo18 and others added 16 commits March 7, 2025 09:27

First push

3935d8a

create a folder for notebook

7c4c281

add funciton for RAG on PDF file and sbml model

3fd4bee

add comments

38f1ac1

update function to get top 20 keyword for a pdf file

1e6dddd

update prompt to get top 20 keyword from a PDF file

b557498

update keywords function

1b51de9

More stuff on my side

63125ff

Fix

602e49c

update rests

6579490

update

8266302

update results files

8f9fc72

Fix

bf55ede

Merge branch 'sheng-inigo-PR' of https://github.com/inigoo18/AIAgents…

c3b89d3

…4Pharma into sheng-inigo-PR

Last changes

1150cd4

Clean up

ea786ed

gurdeep330 requested review from awmulyadi, dmccloskey and lilijap March 8, 2025 06:35

gurdeep330 assigned inigoo18 Mar 8, 2025

gurdeep330 added T2B T2KG labels Mar 8, 2025

gurdeep330 changed the title ~~Sheng + Inigo PR~~ feat: hackathon/Sheng + Inigo PR (Team Sanofi Germany) Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: hackathon/Sheng + Inigo PR (Team Sanofi Germany) #134

feat: hackathon/Sheng + Inigo PR (Team Sanofi Germany) #134

inigoo18 commented Mar 7, 2025 •

edited

Loading

awmulyadi commented Mar 12, 2025

inigoo18 commented Mar 12, 2025

feat: hackathon/Sheng + Inigo PR (Team Sanofi Germany) #134

Are you sure you want to change the base?

feat: hackathon/Sheng + Inigo PR (Team Sanofi Germany) #134

Conversation

inigoo18 commented Mar 7, 2025 • edited Loading

awmulyadi commented Mar 12, 2025

inigoo18 commented Mar 12, 2025

inigoo18 commented Mar 7, 2025 •

edited

Loading