Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added parameters for Clickhouse flush interval and max buffer size #1073

Merged
merged 1 commit into from
May 25, 2021

Conversation

ameshkov
Copy link
Contributor

@ameshkov ameshkov commented May 23, 2021

Changes

Currently, both "flush interval" and "max buffer size" are hard-coded to relatively small values.
This may cause an unnecessarily high load on the Clickhouse instance.
It makes sense to make those configurable.

Tests

  • This PR does not require tests

Changelog

  • Entry has been added to changelog

Documentation

  • Docs have been updated

Docs haven't been updated, but if you're okay with this PR, I'll open a PR there.

Currently, both are hard-coded to relatively small values.
This may cause an unnecessary high load on the Clickhouse instance.
It makes sense to make those configurable.
@ukutaht
Copy link
Contributor

ukutaht commented May 25, 2021

I think this is a good addition. I missed the fact that we flush sessions every second instead of every 5-10 seconds which seems more reasonable to me.

What values are you using? Or is this PR in preparation for testing on your end? The values that are hard-coded at the moment are pretty arbitrary so I'm curious to hear what settings work well in the wild.

@ukutaht ukutaht merged commit 90cfed0 into plausible:master May 25, 2021
@ukutaht
Copy link
Contributor

ukutaht commented May 25, 2021

Would you be so kind as to document the changes as well? https://github.com/plausible/docs/blob/master/docs/self-hosting-configuration.md

@ameshkov
Copy link
Contributor Author

ameshkov commented May 25, 2021

Hi @ukutaht, thanks for accepting the PR, I'll prepare the documentation changes asap.

What values are you using? Or is this PR in preparation for testing on your end? The values that are hard-coded at the moment are pretty arbitrary so I'm curious to hear what settings work well in the wild.

Tbh, we had to configure clickhouse-bulk to mitigate this issue without the need to fork Plausible.

Regarding the acceptable values, a lot depends on your Clickhouse instance's overall load and how many clients write to it in parallel. In our case, we have a pretty huge instance under heavy load from several applications so adding writes from Plausible every second was problematic.

Clickhouse devs suggest this:

We recommend inserting data in packets of at least 1000 rows, or no more than a single request per second

If Plausible is the only client of Clickhouse and there's just a single Plausible node, I guess 5 secs / 10k buffer would be okay. But once you start scaling it and adding more nodes, you should probably consider using higher values.

@ukutaht
Copy link
Contributor

ukutaht commented May 25, 2021

Ah OK that makes sense. Indeed once every second is probably too fast, even for a single node.

I'll see what this update does to our Clickhouse host metrics and I'll probably try a 10 second interval as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants