Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chapter47_part3:/520_Post_Deployment/30_indexing_perf.asciidoc #56

Merged
merged 2 commits into from
Oct 22, 2016

Conversation

chenryn
Copy link

@chenryn chenryn commented Mar 17, 2016

这里讨论点翻译以外的事情。index.translog.flush_threshold_size这个设置,我感觉书里写的不太对啊……ES的segment生成是由indexing buffer大小以及更重要的refresh_interval决定的,加大translog的size,并不能导致larger segment啊。我理解加大translog的flush其实对写入没多大影响,translog这种顺序IO,能花多大事。

@medcl
Copy link
Member

medcl commented Mar 19, 2016

其实两个设置做的事情不一样,flush控制commit,commit之后才会完成写segment,index buffer是写segment过程中的内存中的缓存设置罢了,这个不冲突

@chenryn
Copy link
Author

chenryn commented Mar 19, 2016

commit完成的只是把文件系统缓存里的segment正式刷到磁盘上啊。segment本身是small还是large,不就是来自refresh_interval内收到多大数据就多大,或者refresh_interval还没到但是indexing_buffer已经满了于是这个buffer能收多大就多大么?

@medcl
Copy link
Member

medcl commented Mar 20, 2016

恩,你理解的index buffer是对的,indexing buffer控制的是最大写索引缓存,也控制写过程中的最大的segment大小(理论上的写索引产生segment的最大阀值),写满了就会产生新的segment到磁盘,trasnlog是用来保证数据安全的,定期会flush同时执行segment的commit,只有segment commit成功到磁盘了,translog才算是flush成功了,所以加大translog可以避免频繁的commit操作,另外大segment不意味着写入性能好,需要同时考虑io操作和io操作时间,不是segment越大越好,写的时候不需要大segment,查询的时候才关心有多少segment,所以会有定期的merge操作,减少查询时候的io操作。
一般情况下不会达到indexing buffer大小就开始执行commit了。如果等达到buffer再执行就会内存溢出或者丢数据了。

by people internal to your organization. They are willing to wait several
seconds for a search, as opposed to a consumer facing a search that must
return in milliseconds.
如果你是在一个索引负载很重的环境,((("indexing", "performance tips")))((("post-deployment", "indexing performance tips")))比如索引的是日志,你可能愿意牺牲一些搜索性能换取更快的索引速率。在这些场景里,搜索常常是很少见的操作,而且一般是由你公司内部的人发起的。他们也愿意为一个搜索等上几秒钟,而不像普通消费者,要求一个搜索必须毫秒级返回。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

比如索引的是日志
=>
比如索引的是基础设施日志

而不像普通消费者
=>
而不像普通客户 (这里感觉说消费者有点奇怪。。)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得这里消费者是用来区分企业和消费市场的。说客户太虚了,因为企业也是客户啊。或者直接说普通网民?

have been multiple performance improvements and bugs fixed that directly impact
indexing. In fact, some of these recommendations will _reduce_ performance on
older versions because of the presence of bugs or performance defects.
不过,本节提及的技巧, _只_ 针对 1.3 及以上版本。该版本后有不少性能提升和故障修复是直接影响到索引的。事实上,有些建议在老版本上反而会因为故障或性能缺陷而 _降低_ 性能。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉bug是不是可以不用翻译啊。。

sequential IDs, UUID-1, and nanotime; these IDs have consistent, sequential
patterns that compress well. In contrast, IDs such as UUID-4 are essentially
random, which offer poor compression and slow down Lucene.
相反,如果你的索引是零副本,然后在写入完成后再开启副本,恢复过程本质上只是一个字节到字节的网络传输。比重复索引过程这个算是相当高效的了。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

比重复索引过程这个算是相当高效的了。
=>
相比重复索引过程,这个算是相当高效的了。

Merges are scheduled to operate in the background because they can take a long
time to finish, especially large segments. This is normally fine, because the
rate of large segment merges is relatively rare.
段合并的计算量庞大,((("indexing", "performance tips", "segments and merging")))((("merging segments")))((("segments", "merging")))而且还要吃掉大量磁盘 I/O。合并在后台定期操作,因为他们可能要很长时间才能完成,尤其是比较大的短。这个通常来说都没问题,以你为大规模的段合并的概率是很罕见的。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

尤其是比较大的短。
=>
尤其是比较大的段。

以你为大规模的段合并的概率是很罕见的。
=>
因为大规模段合并的概率是很小的。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

居然连续笔误……呜呜


最后,还有一些其他值得考虑的东西需要记住:

- 如果你的搜索结果不需要近实时的准确度,考虑把每个索引的 `index.refresh_interval`((("indexing", "performance tips", "other considerations")))((("refresh_interval setting")))改到 `30s` 。如果你是在做大批量导入,导入期间你可以通过设置这个值为 `-1` 关掉刷新。别忘记在完工的时候重新开启它。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

近实时。。感觉准实时更顺一些
不过但是也所谓。。

@cch123
Copy link

cch123 commented Oct 12, 2016

LGTM

@medcl
Copy link
Member

medcl commented Oct 22, 2016

LGTM

@medcl medcl merged commit 5f8c44f into elasticsearch-cn:cn Oct 22, 2016
tangmisi pushed a commit to tangmisi/elasticsearch-definitive-guide that referenced this pull request Oct 31, 2016
…icsearch-cn#56)

* chapter47_part3:/520_Post_Deployment/30_indexing_perf.asciidoc

* 按照review意见修改
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants