-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very high memory usage with serde_json::Value
#635
Comments
@dtolnay I started working on a crate to address these issues: https://github.com/Diggsey/ijson It is functionally complete but needs a lot more testing, etc. to get to a point where I can recommend people actually use it. That said, it demonstrates that significant improvements are possible. Is this something you'd be interested in bringing into |
This came to me as a bit of an unpleasant surprise when my AWS Lambdas started running out of memory. I was sizing them based what is being retrieved from the DB. For example, ElasticSearch returns a 8,683KB document, I deser it into Value and the next RAM reading gives me delta of 98,484KB of RAM use. That's more than 10x the original size. @dtolnay , David, is this high memory consumption a necessary price to pay for speed? |
@rimutaka serde_json is much more efficient at deserializing into structs, compared to the |
@Diggsey , thanks for the suggestion. Do you know if Value is more compact if I deser into a struct and then convert it into Value? |
It would only be more compact if some fields are dropped as part of the deserialization into a struct (if say they are not required). |
Memory allocation log for processing 10MB of JSON data:
I can understand high memory consumption when JSON is converted into Value because the size of collections is not known, so more is allocated than needed to make it faster. |
FWIW I think I encountered this on the current version of the |
Unfortunately, due to
Value
being completely public I don't know how much can be done about this without breaking changes. However, a couple of times I've run into problems with exceptionally high memory usage when using aValue
.I don't think there's a bug here, just that common uses seem to be much more memory intensive than similar code in dynamic languages, where this kind of data is already heavily optimised.
I think it comes from several factors:
Each
Value
is 32 bytes on a 64-bit system even though the majority ofValue
s will be the leaf nodes (numbers, strings, nulls, etc.) which don't need that much space. IfValue
were more highly optimized for leaf nodes, I think this could easily be halved.Maps are optimized for access time rather than space efficiency, and this is made worse because there are lots of "empty"
Value
slots, each of which is another 32 bytes.Strings are owned. When converting from a struct with
to_value
, object keys will all be known statically, and those strings will already be embedded in the program as static data, so using aCow<'static, str>
could dramatically reduce memory usage.Strings are exclusively owned. When deserializing into a
Value
it's likely that there will be lots of duplicate strings, but there is no possibility for them to be shared with the currentValue
representation.I think a more space-efficient
Value
type could be introduced. Keys could be stored as a pointer-sized union of&'static str, Arc<String>
using a tag in the low bits to differentiate. The deserializer could automatically intern strings as they are deserialized.Value
could be shrunk to 16 bytes, and store short strings inline.Map
s could use a simpleVec
representation for small numbers of elements to avoid any wasted space. The improved cache-coherency could also improve performance. All access to "compact values" should be done via methods to allow further optimisations in the future. There would also need to be a version of thejson!()
macro that produced this compact type.The text was updated successfully, but these errors were encountered: