Skip to content
  • Knut Anders Hatlen's avatar
    a2f9ea42
    Bug#23031146: INSERTING 64K SIZE RECORDS TAKE TOO MUCH TIME · a2f9ea42
    Knut Anders Hatlen authored
    If a JSON value consists of a large sub-document which is wrapped in
    many levels of JSON arrays or objects, serialization of the JSON value
    may take a very long time to complete.
    
    This is caused by how the serialization switches between the small
    storage format (used by documents that need less than 64KB) and the
    large storage format. When it detects that the large storage format
    has to be used, it redoes the serialization of the current
    sub-document using the large format. But this re-serialization has to
    be redone again when the parent of the sub-document is switched from
    small format to large format. For deeply nested documents, the inner
    parts end up getting re-serializing again and again.
    
    This patch changes how the switch between the formats is done. Instead
    of starting with re-serializing the inner parts, it now starts with
    the outer parts. If a sub-document exceeds the maximum size for the
    small format, we know that the parent document will exceed it and need
    to be re-serialized too. Re-serializing an inner document is therefore
    a waste of time if we haven't already expanded its parent. By starting
    with expanding the outer parts of the JSON document, we avoid the
    wasted work and speed up the serialization.
    a2f9ea42
    Bug#23031146: INSERTING 64K SIZE RECORDS TAKE TOO MUCH TIME
    Knut Anders Hatlen authored
    If a JSON value consists of a large sub-document which is wrapped in
    many levels of JSON arrays or objects, serialization of the JSON value
    may take a very long time to complete.
    
    This is caused by how the serialization switches between the small
    storage format (used by documents that need less than 64KB) and the
    large storage format. When it detects that the large storage format
    has to be used, it redoes the serialization of the current
    sub-document using the large format. But this re-serialization has to
    be redone again when the parent of the sub-document is switched from
    small format to large format. For deeply nested documents, the inner
    parts end up getting re-serializing again and again.
    
    This patch changes how the switch between the formats is done. Instead
    of starting with re-serializing the inner parts, it now starts with
    the outer parts. If a sub-document exceeds the maximum size for the
    small format, we know that the parent document will exceed it and need
    to be re-serialized too. Re-serializing an inner document is therefore
    a waste of time if we haven't already expanded its parent. By starting
    with expanding the outer parts of the JSON document, we avoid the
    wasted work and speed up the serialization.
Loading