-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an aggregator for IPv4 and IPv6 subnets #82410
Add an aggregator for IPv4 and IPv6 subnets #82410
Conversation
Pinging @elastic/es-analytics-geo (Team:Analytics) |
@@ -65,6 +65,7 @@ | |||
"geotile_grid", | |||
"global", | |||
"histogram", | |||
"ip_prefix", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@elasticmachine run elasticsearch-ci/part-1 |
I am aware I need to add documentation about this new aggregation...I am just waiting for some feedback about the interface. |
The API makes a lot of sense to me - this would be a very good fit for how Kibana does things today 👍 |
This is a sample request including an aggregation on an IPv4 field and an aggregation on an IPv6 field: curl -X GET "localhost:9200/network/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"/16": {
"ip_prefix": {
"field": "ipv4",
"prefix_len": 16,
"is_ipv6": false,
"min_doc_count": 1,
"append_prefix_len": false,
"keyed": false
}
},
"/64": {
"ip_prefix": {
"field": "ipv6",
"prefix_len": 64,
"is_ipv6": true,
"min_doc_count": 1,
"append_prefix_len": false,
"keyed": false
}
}
}
}' And this is the response: {
"took" : 29,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"/16" : {
"buckets" : [
{
"key" : "192.168.0.0",
"netmask" : "255.255.0.0",
"doc_count" : 4,
"is_ipv6" : false,
"prefix_len" : 16
},
{
"key" : "192.169.0.0",
"netmask" : "255.255.0.0",
"doc_count" : 1,
"is_ipv6" : false,
"prefix_len" : 16
}
]
},
"/64" : {
"buckets" : [
{
"key" : "2001:db8:85a3::",
"doc_count" : 1,
"is_ipv6" : true,
"prefix_len" : 64
},
{
"key" : "2001:db8:85ab::",
"doc_count" : 1,
"is_ipv6" : true,
"prefix_len" : 64
},
{
"key" : "2001:dc4:11b2:7::",
"doc_count" : 1,
"is_ipv6" : true,
"prefix_len" : 64
},
{
"key" : "2001:dc4:11b3::",
"doc_count" : 1,
"is_ipv6" : true,
"prefix_len" : 64
},
{
"key" : "2004:db3:85f4:10::",
"doc_count" : 1,
"is_ipv6" : true,
"prefix_len" : 64
}
]
}
}
} |
NOTE: I put the code under |
You can create a new package under One thing we should probably test - what happens if we get mixed IPv4 and IPv6 addresses in the same run? That doesn't seem like something the user would do on purpose, but for example if they're using an index pattern, it's not hard to imagine that the same field might have changed from IPv4 in one index to IPv6 in another. We should do something sensible in that case, even if it's just failing the shard if the stored data doesn't match what the aggregation expected. |
I am adding a test were I index IPv6 addresses and then try to aggregate with |
I was wondering if we should, in some way, encode the IP version in the |
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine run elasticsearch-ci/bwc |
@elasticmachine update branch |
@elasticmachine run elasticsearch-ci/part-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things that I'd still like to fix before merging. Looking good overall!
|
||
@Override | ||
protected LeafBucketCollector getLeafCollector(LeafReaderContext ctx, LeafBucketCollector sub) throws IOException { | ||
return config.getValuesSource() == null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config.getValuesSource()
should never return null. You want to check config.hasValues()
. And you shouldn't even need to do that - ValuesSourceAggregatorFactory#createInternal()
will check if there are values to aggregate, and call createUnmapped()
if there aren't, which in turn can build a no-op aggregator.
// offset (12) to apply the subnet to the last 4 bytes (byes 12, 13, 14, 15) | ||
// if the subnet mask is just a 4-bytes subnet mask. | ||
int offset = subnetMask.length == 4 ? 12 : 0; | ||
byte[] subnet = new byte[subnetMask.length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think we should be allocating short lived objects here. I think you can just pass in the BytesRef
and mutate it in place.
Hi @salvatore-campagna, I've created a changelog YAML for you. |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
* upstream/master: (100 commits) Avoid duplicate _type fields in v7 compat layer (elastic#83239) Bump bundled JDK to 17.0.2+8 (elastic#83243) [DOCS] Correct header syntax (elastic#83275) Add unit tests for indices.recovery.max_bytes_per_sec default values (elastic#83261) [DOCS] Add note that write indices are not replicated (elastic#82997) Add notes on indexing to kNN search guide (elastic#83188) Fix get-snapshot-api :docs:integTest (elastic#83273) FilterPathBasedFilter support match fieldname with dot (elastic#83178) Fix compilation issues in example-plugins (elastic#83258) fix ClusterStateListener javadoc (elastic#83246) Speed up Building Indices Lookup in Metadata (elastic#83241) Mute whole suite for elastic#82502 (elastic#83252) Make PeerFinder log messages happier (elastic#83222) [Docs] Add supported _terms_enum field types (elastic#83244) Add an aggregator for IPv4 and IPv6 subnets (elastic#82410) [CI] Fix 70_time_series/default sort yaml test failures (elastic#83217) Update test-failure Issue Template to include "needs:triage" label elastic#83226 Add an index->step cache to the PolicyStepsRegistry (elastic#82316) Improve support for joda datetime to java datetime transition in Painless (elastic#83099) Fix joda migration for week based methods in Painless (elastic#83232) ... # Conflicts: # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/TransportRollupAction.java
This PR addresses a request for an IP address subnet aggregator (see #57964).
Parameters accepted by the aggregator include:
prefix_length
(integer, required): defines the network size of the subnet maskis_ipv6
(boolean, optional, default: false): defines whether the prefix applies to IPv6 (true) or IPv4 (false) IP addressesmin_doc_count
(integer, optional, default: 1): defines the minimum number of documents for a bucket to be returned in the resultsappend_prefix_length
(boolean, optional, default: false): defines if the prefix length is appended to the IP address key when returning resultskeyed
(boolean, optional, default: false): defines whether the result is returned keyed or as an array of bucketsEach bucket returned by the aggregator represents a different subnet. IPv4 subnets also include a
netmask
field set to the subnet mask value (i.e. "255.255.0.0" for a /16 subnet).Related to: #57964 and elastic/kibana#68424