Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing pointer semantics to std::unique_ptr #58

Closed
wants to merge 13 commits into from
Closed

Changing pointer semantics to std::unique_ptr #58

wants to merge 13 commits into from

Conversation

polterguy
Copy link
Contributor

Changed all occurrences of vector<float> * to std::unique_ptr<vector<float>> to avoid memory leaks.

@polterguy
Copy link
Contributor Author

Just throw away my previous pull request. This is much cleaner and less hassle and does exactly what it claims to do ... :)

@polterguy
Copy link
Contributor Author

Well, at least now it's dropping. This is from our Kubernetes cluster on a deployment with one POD having the lib and roughly 15,000 vectors from OpenAI's text-embedding-ada-002 (1,500+dimensions). The peak is after a delete operation.

Before my fix it didn't even drop at all after deleting 8,000 indexes. Now at least it goes down. I'll work more on this in the weekend, but already I suspect 75% of the leaks are mostly gone. Which for us (running in a Kubernetes cluster) is kind of a big deal ...

I'll weed out the rest of the leaks tomorrow or Sunday. If somebody with more knowledge about SQLite's API can run through that part of the code it would be nice. However, I suspect you did a good job on this part. It was only the C++ parts that was a little bit "challenging" ...

Screenshot 2023-06-09 at 8 55 44 PM

@polterguy
Copy link
Contributor Author

This last drop is after a re-index operation (deleting everything in vss_ virtual table and re-creating records) - Still some work to be done, bu at least now it is dropping (which it didn't do at all before I started) ...

Screenshot 2023-06-09 at 9 15 47 PM

@asg017
Copy link
Owner

asg017 commented Jun 9, 2023

Thank you @polterguy for this PR! I'll be taking a closer look soon.

One thing I want to call out: memory when writing to a vss0 table (either inserting or deleting) gets a little wonky. Write only happen on a COMMIT, in which the following occurs:

  • Any deleted vectors are "deleted" in the in-memory faiss index
  • Any inserted vectors are "added" to the in-memory faiss index
  • The entire Faiss in-memory index is copied again into memory and written to the sqlite DB

All to say: Any INSERT/DELETE on a vss0 table will copy the Faiss index in-memory , temporarily until it's safely saved into the sqlite DB.

This is mostly due to the limitations of Faiss's write_index method. It can be improved with a custom IOWriter, which I'm tracking in #1

But in general, yeah sqlite-vss and sqilte-vector has some significant memory problem. I'll make a "tracking" issue for this shortly.

@asg017
Copy link
Owner

asg017 commented Jun 9, 2023

Tracking in #59

@asg017
Copy link
Owner

asg017 commented Jun 12, 2023

@polterguy let me know when you'd like me to review this!

@polterguy
Copy link
Contributor Author

Thx, give me a couple of more days, I'm still trying to wrap my head around SQLite C/++ API ... :/

@asg017
Copy link
Owner

asg017 commented Jun 12, 2023

No problem, let me know if you need any help with any parts of that.

In general, small quick PRs > larger more encompassing changes, so even if you just wanna merge a few std::unique_ptr changes I'd be more than happy to do so

@polterguy
Copy link
Contributor Author

You OK if I apply some consistent coding standard, and some slightly longer variable names? I'll be a bit "dramatic", but I'll try to keep the spirit of the library ...

@asg017
Copy link
Owner

asg017 commented Jun 13, 2023

Longer variable names and improved coding standards would be fine! Once this PR is merged I'll most likely go back and update a few things, including adding clang-format for formatting

@polterguy polterguy closed this by deleting the head repository Jun 13, 2023
@polterguy
Copy link
Contributor Author

Just letting you know I haven't flunked on you. Creating a PR tomorrow. I've 100% perfectly stabilised memory usage, plugged some 10 to 20 leaks, apprx. 98% of the leaks are gone in my code base now, and code is higher quality, better naming conventions, and coding std (used clan-format) ...

@eelco
Copy link

eelco commented Jun 15, 2023

@polterguy Appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants