Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Require multi-PDF support in Question_and_answer.py (tool) of pdf_agent.py (agent) #131

Open
NimeshKotian opened this issue Mar 5, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request T2S

Comments

@NimeshKotian
Copy link
Contributor

The current implementation of question_and_answer.py only processes one PDF at a time. This feature request proposes adding support for processing multiple PDFs simultaneously, which would streamline workflows for users handling multiple documents.

Many use cases require analyzing multiple documents at once. Handling each PDF individually is inefficient, especially when comparing or aggregating data across several PDFs.

I would like the tool to accept multiple PDF as input— as a list of dictionaries —and process them either sequentially or concurrently. The output should either be consolidated into a single result or clearly separated by document, making it easier for users to work with multiple documents.

We might require one more tool which can decide which pdf's content to use based on user's query.

Multi-PDF support would greatly benefit users such as researchers and analysts who regularly work with multiple documents. This enhancement would save time and simplify the process of comparing or extracting information from several PDFs simultaneously.

@gurdeep330
Copy link
Member

Hi @NimeshKotian

Thanks for creating the issue! 🚀

To enhance this iteration, let's integrate NVIDIA's RAPIDS library. Specifically, we should explore incorporating FAISS with cuVS for GPU-accelerated vector search.

For reference, NVIDIA has a detailed technical blog on cuVS and FAISS optimizations. Additionally, a useful FAISS tutorial from LangChain might help with implementation.

Let me know what you think!

@gurdeep330 gurdeep330 added this to the Talk2Scholars milestone Mar 10, 2025
@dmccloskey
Copy link
Member

@NimeshKotian A team from our recent hackathon developed several scripts using FAISS #137. Perhaps these could be a useful starting point for simple index building and saving using FAISS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request T2S
Projects
None yet
Development

No branches or pull requests

3 participants