Skip to content
  • Dmitry Lenev's avatar
    0c60d406
    WL#7306 "Improve MDL performance and scalability by implementing lock-free · 0c60d406
    Dmitry Lenev authored
    lock acquisition for DML" and fix for bug #18077895 "WL7305 PUSH (7249)
    CAUSES SYSBENCH OLTP_RO 1 THREAD TO DROP UP TO -8%".
    
    The idea of this task is to change the acquisition of unobtrusive locks on
    the fast path, which currently involves acquisition of MDL_lock::m_rwlock,
    some checks, increment of the packed counter and release of m_rwlock, to
    a single atomic compare-and-swap operation.
    Similarly the release of the lock on the fast path becomes single atomic
    compare-and-swap (in absence of obtrusive locks and assuming we are not
    releasing the last lock for this MDL_lock object) instead of acquisition
    of MDL_lock::m_rwlock, decrement of the packed counter, some checks and
    m_rwlock release.
    As result these operations become at least twice cheaper than their
    old versions which has a nice effect on performance/scalability.
    
    Acquisition/release of locks on the slow path (i.e. unobtrusive locks in
    presence of obtrusive locks and obtrusive locks) still has to use the old
    approach involving locking/unlocking MDL_lock::m_rwlocks and checks of
    MDL_lock::m_granted/m_waiting bitmaps/lists.
    
    This patch implements the above idea by performing the following
    three transformations:
    
    I)   MDL_lock::m_fast_path_granted_count is replaced with an atomic
         MDL_lock::m_fast_path_state member, which in the ideal case of
         "fast path" acquisition/release is checked and changed using CAS
         without holding any mutexes.
    II)  Since we would like to check in the same atomic CAS operation that
         MDL_lock object was not destroyed, its m_is_destroyed member is
         replaced by a IS_DESTROYED bit flag in the m_fast_path_state
         packed counter.
    III) Similarly, since we also would like to check in the same atomic CAS
         that there are no granted or pending obtrusive locks, we have to
         add a HAS_OBTRUSIVE bit flag in the m_fast_path_state, while
         keeping MDL_lock::m_obtrusive_locks_granted_waiting_count.
         This flag should be set when we are about to try acquiring an obtrusive
         lock and cleared once the last granted or pending obtrusive lock goes
         away.
    
    
    Most of the remaining changes in this patch are necessary in order to fix
    bug #18077895 "WL7305 PUSH (7249) CAUSES SYSBENCH OLTP_RO 1 THREAD TO DROP
    UP TO -8%".
    
    This bug manifested itself as a slowdown for workloads involving 1 connection
    in cases when there were many concurrent connections to the same server in the
    past or there were many dormant connections at the same time as 1 active
    connection.
    
    In such scenarios the release of a metadata lock meant that MDL_lock became
    unused, was removed from lock-free hash with all lock objects and we
    tried to return it back to allocator. The latter operation involved
    scanning pins for all current and past connections, which became fairly
    expensive in this scenario.
    
    This patch solves this problem by avoiding releasing MDL_lock objects
    and removing them from the hash once they are no longer used. Instead we
    keep unused objects in MDL_map and start their eviction only if their
    number passes certain threshold and the ratio of unused/total lock objects
    is big enough. We evict random unused objects so on average objects
    which are used more often will stay in the hash and rarely used objects
    will go away.
    
    The above idea is implemented by:
    
    a) Introducing a new HAS_SLOW_PATH flag in the MDL_lock::m_fast_path_state
       member, which indicates if there any tickets in MDL_lock::m_granted
       and m_waiting lists or we are about try to add one. Thanks to this
       flag, it is possible to distinguish between used and unused MDL_lock
       objects in atomic compare-and-swap operations used to implement fast
       path acquisition and release of locks.
    b) Changing code which releases locks to avoid removing unused MDL_lock
       objects from the hash and deleting them afterwards. Instead we
       atomically increment the newly introduced MDL_map::m_unused_lock_objects
       counter. Similarly, on the first acquisition of lock for MDL_lock which
       was previously unused we atomically decrement this counter.
    c) In cases when the increment of the MDL_map::m_unused_lock_objects counter
       exceeds the threshold value and the unused/total objects ratio is high
       enough, we try to reduce the number of unused objects. We look-up a random
       unused object in MDL_map, mark it as destroyed, remove it from the hash and
       return it back to allocator. As a consequence MDL_map::remove() method
       has became MDL_map::remove_random_unused().
    d) To support the change described in c), a new lf_hash_random_match()
       function was introduced which allows us to efficiently find a random
       object which matches certain condition in LF_HASH.
    e) Also to support the change described in c), a new PRNG was added to
       MDL_context class. This PRNG is used as a source for randomness for
       look-ups of random unused objects.
    
    Unit tests were added covering handling of unused MDL_lock objects and
    for the new lf_hash_random_matches() function.
    
    
    Finally, this patch fixes a violation of the pinning protocol, which was
    introduced by WL7305 and which occured when the MDL subsystem failed
    to look up MDL_lock object in lock free hash. The LF_HASH documentation
    was updated to reflect the need to call lf_hash_search_unpin in this case.
    0c60d406
    WL#7306 "Improve MDL performance and scalability by implementing lock-free
    Dmitry Lenev authored
    lock acquisition for DML" and fix for bug #18077895 "WL7305 PUSH (7249)
    CAUSES SYSBENCH OLTP_RO 1 THREAD TO DROP UP TO -8%".
    
    The idea of this task is to change the acquisition of unobtrusive locks on
    the fast path, which currently involves acquisition of MDL_lock::m_rwlock,
    some checks, increment of the packed counter and release of m_rwlock, to
    a single atomic compare-and-swap operation.
    Similarly the release of the lock on the fast path becomes single atomic
    compare-and-swap (in absence of obtrusive locks and assuming we are not
    releasing the last lock for this MDL_lock object) instead of acquisition
    of MDL_lock::m_rwlock, decrement of the packed counter, some checks and
    m_rwlock release.
    As result these operations become at least twice cheaper than their
    old versions which has a nice effect on performance/scalability.
    
    Acquisition/release of locks on the slow path (i.e. unobtrusive locks in
    presence of obtrusive locks and obtrusive locks) still has to use the old
    approach involving locking/unlocking MDL_lock::m_rwlocks and checks of
    MDL_lock::m_granted/m_waiting bitmaps/lists.
    
    This patch implements the above idea by performing the following
    three transformations:
    
    I)   MDL_lock::m_fast_path_granted_count is replaced with an atomic
         MDL_lock::m_fast_path_state member, which in the ideal case of
         "fast path" acquisition/release is checked and changed using CAS
         without holding any mutexes.
    II)  Since we would like to check in the same atomic CAS operation that
         MDL_lock object was not destroyed, its m_is_destroyed member is
         replaced by a IS_DESTROYED bit flag in the m_fast_path_state
         packed counter.
    III) Similarly, since we also would like to check in the same atomic CAS
         that there are no granted or pending obtrusive locks, we have to
         add a HAS_OBTRUSIVE bit flag in the m_fast_path_state, while
         keeping MDL_lock::m_obtrusive_locks_granted_waiting_count.
         This flag should be set when we are about to try acquiring an obtrusive
         lock and cleared once the last granted or pending obtrusive lock goes
         away.
    
    
    Most of the remaining changes in this patch are necessary in order to fix
    bug #18077895 "WL7305 PUSH (7249) CAUSES SYSBENCH OLTP_RO 1 THREAD TO DROP
    UP TO -8%".
    
    This bug manifested itself as a slowdown for workloads involving 1 connection
    in cases when there were many concurrent connections to the same server in the
    past or there were many dormant connections at the same time as 1 active
    connection.
    
    In such scenarios the release of a metadata lock meant that MDL_lock became
    unused, was removed from lock-free hash with all lock objects and we
    tried to return it back to allocator. The latter operation involved
    scanning pins for all current and past connections, which became fairly
    expensive in this scenario.
    
    This patch solves this problem by avoiding releasing MDL_lock objects
    and removing them from the hash once they are no longer used. Instead we
    keep unused objects in MDL_map and start their eviction only if their
    number passes certain threshold and the ratio of unused/total lock objects
    is big enough. We evict random unused objects so on average objects
    which are used more often will stay in the hash and rarely used objects
    will go away.
    
    The above idea is implemented by:
    
    a) Introducing a new HAS_SLOW_PATH flag in the MDL_lock::m_fast_path_state
       member, which indicates if there any tickets in MDL_lock::m_granted
       and m_waiting lists or we are about try to add one. Thanks to this
       flag, it is possible to distinguish between used and unused MDL_lock
       objects in atomic compare-and-swap operations used to implement fast
       path acquisition and release of locks.
    b) Changing code which releases locks to avoid removing unused MDL_lock
       objects from the hash and deleting them afterwards. Instead we
       atomically increment the newly introduced MDL_map::m_unused_lock_objects
       counter. Similarly, on the first acquisition of lock for MDL_lock which
       was previously unused we atomically decrement this counter.
    c) In cases when the increment of the MDL_map::m_unused_lock_objects counter
       exceeds the threshold value and the unused/total objects ratio is high
       enough, we try to reduce the number of unused objects. We look-up a random
       unused object in MDL_map, mark it as destroyed, remove it from the hash and
       return it back to allocator. As a consequence MDL_map::remove() method
       has became MDL_map::remove_random_unused().
    d) To support the change described in c), a new lf_hash_random_match()
       function was introduced which allows us to efficiently find a random
       object which matches certain condition in LF_HASH.
    e) Also to support the change described in c), a new PRNG was added to
       MDL_context class. This PRNG is used as a source for randomness for
       look-ups of random unused objects.
    
    Unit tests were added covering handling of unused MDL_lock objects and
    for the new lf_hash_random_matches() function.
    
    
    Finally, this patch fixes a violation of the pinning protocol, which was
    introduced by WL7305 and which occured when the MDL subsystem failed
    to look up MDL_lock object in lock free hash. The LF_HASH documentation
    was updated to reflect the need to call lf_hash_search_unpin in this case.
Loading