Skip to content
  • Daogang Qu's avatar
    e3e1cd11
    Wl#7361 MSR: per-channel replication filters · e3e1cd11
    Daogang Qu authored
    There are per-channel and global replication filters. Each channel uses
    *only* its own per-channel replication filters to filter the event
    stream. It never uses global replication filters to filter the event
    stream. (A new channel would copy global replication filters to its
    per-channel replication filters if there are no per-channel replication
    filters and there are global replication filters on the filter type
    when it is being configured.)
    
    The per-channel replication filters and global replication filters can
    be configured in two ways:
    A) startup options: --replicate-*
    B) SQL commands: CHANGE REPLICATION FILTER
    
    Additionally, behavior for the following statements needs to be specified:
    C) RESET SLAVE [ALL] [FOR CHANNEL]
    D) SHOW SLAVE STATUS [FOR CHANNEL]
    
    Query, troubleshoot, monitor replication filters and do statistics:
    E) CREATE A NEW performance_schema.replication_applier_filters
    
    Show the global replication filters:
    F) CREATE A NEW performance_schema.replication_applier_global_filters
    
    A) Startup options: --replicate-*
    =================================
    
    The current startup options are extended by allowing to specify channel_name in
    filter variable to configure per-channel replication filters as follows.
      --replicate-do-db=<channel_name>:<database_id>
      --replicate-ignore-db=<channel_name>:<database_id>
      --replicate-do-table=<channel_name>:<table_id>
      --replicate-ignore-table=<channel_name>:<table_id>
      --replicate-rewrite-db=<channel_name>:<db1->db2>
      --replicate-wild-do-table=<channel_name>:<table regexid>
      --replicate-wild-ignore-table=<channel_name>:<table regexid>
    
    ---- Syntax ----
    
    Each command line parameter optionally takes a channel_name followed by a colon,
    further followed by the filter specification. Note that the first colon is
    interpreted as a separator, others are literal colons.
    
    ---- Semantics ----
    
    Without specifying channel_name in filter variable, the startup options
    shall act on the default channel. See below.
    
      --replicate-do-db=:<database_id>
      --replicate-ignore-db=:<database_id>
      --replicate-do-table=:<table_id>
      --replicate-ignore-table=:<table_id>
      --replicate-rewrite-db=:<from_db>-><to_db>
      --replicate-wild-do-table=:<table regex>
      --replicate-wild-ignore-table=:<table regex>
    
    Without specifying channel_name and a followed 'colon' in filter variable, the
    startup options shall configure the global replication filters. See below.
      --replicate-do-db=<database_id>
      --replicate-ignore-db=<database_id>
      --replicate-do-table=<table_id>
      --replicate-ignore-table=<table_id>
      --replicate-rewrite-db=<from_db>-><to_db>
      --replicate-wild-do-table=<table regex>
      --replicate-wild-ignore-table=<table regex>
    
    If the user specifies a per-channel replication filter through a command-line
    option (or in a configuration file) for a slave replication channel which
    does not exist as of now (i.e not present in slave info tables yet), then
    the per-channel replication filter is discarded with the following warning:
      "There are per-channel replication filter(s) configured for channel '%.192s'
    which does not exist. The filter(s) have been discarded."
    
    If the user specifies a per-channel replication filter through a command-line
    option (or in a configuration file) for group replication channels
    'group_replication_recovery' and 'group_replication_applier' which is
    disallowed, then the per-channel replication filter is discarded with
    the following warning:
      "There are per-channel replication filter(s) configured for group replication
    channel '%.192s' which is disallowed. The filter(s) have been discarded."
    
    How global and per-channel replication filters work together?
    - Any global replication filter option will add the filter to global
      replication filters on the filter type, not add the filter to every
      channel on the filter type.
    - Any per-channel replication filter option will add the filter to
      per-channel replication filters of the specified channel on the
      filter type.
    - Every slave replication channel will copy global replication filters
      to its per-channel replication filters if there are no per-channel
      replication filters and there are global replication filters on the
      filter type when it is being configured.
    
    Example: Suppose channels '' and 'ch1' exist before the server starts,
      the command line options --replicate-do-db=db1
      --replicate-do-db=ch1:db2 --replicate-do-db=db3
      --replicate-ignore-db=db4 --replicate-ignore-db=:db5
      would result in:
        global replication filters: do_db=db1,db3, ignore_db=db4
        default channel: do_db=db1,db3 ignore_db=db5
        ch1: do_db=db2 ignore_db=db4
    
    Note: GROUP REPLICATION channels should not be configurable using
      --replicate* nor CHANGE REPLICATION FILTER, and should not inherit
      from global filters.
    
    BTW: if user specifies multiple replicate-rewrite-db=FROM->TO options
    having the same FROM database, all are added together (put into the
    rewrite_do list) and the first one takes affect. The global replication
    filters and per-channel filters have the same behavior in the worklog.
    So there is no change on this, since a channel uses either global or
    per-channel rewrite filters on a filter type.
    
    B) SQL commands: CHANGE REPLICATION FILTER
    ==========================================
    
    Dynamic replication filters are currently settable using the
    CHANGE REPLICATION FILTER statement. We extend this command to
    introduce dynamic replication filters per channel, by allowing
    a FOR CHANNEL <channel_name> clause as follows.
    
    ---- Syntax ----
    
    CHANGE REPLICATION FILTER filter [, filter...] [FOR CHANNEL <channel_name>]
    
    filter:
        REPLICATE_DO_DB = (db_list)
      | REPLICATE_IGNORE_DB = (db_list)
      | REPLICATE_DO_TABLE = (tbl_list)
      | REPLICATE_IGNORE_TABLE = (tbl_list)
      | REPLICATE_WILD_DO_TABLE = (wild_tbl_list)
      | REPLICATE_WILD_IGNORE_TABLE = (wild_tbl_list)
      | REPLICATE_REWRITE_DB = (db_pair_list)
    
    ---- Semantics ----
    
    1) If an explicit FOR CHANNEL clause is provided, the statement acts on that
       configured slave replication channel removing any existing replication
       filter if it has the same filter type as one of specified replication
       filters, and replacing them with the specified ones. Filter types that
       were not explicitly listed in the statement are not modified. The statement
       is disallowed with an error 'ER_SLAVE_CONFIGURATION' on slave replication
       channel if it is not configured. The statement is disallowed with an error
       'ER_SLAVE_CHANNEL_OPERATION_NOT_ALLOWED' on group replication channels.
    
    2) CHANGE REPLICATION FILTER filter [, filter...] with no FOR CHANNEL clause
       does the following, both for every configured slave replication channel's
       per-channel filter and for the global replication filters: For every filter
       type, if the filter type is listed in the statement, then any existing
       filter rules of that type are replaced by the filter rules specified in
       the statement, otherwise the old value of the type is retained. The
       statement does not act on group replication channels, because replication
       filters on group replication channels are disallowed. For example,
    
    C. SQL COMMAND: RESET SLAVE [ALL] [FOR CHANNEL]
    ===============================================
    
    1) "RESET SLAVE FOR CHANNEL '<channel_name>'" does not remove the replication
       channel specified by 'FOR CHANNEL' clause, so it shall retain replication
       filters of the channel. It throws an error 'ER_SLAVE_CHANNEL_DOES_NOT_EXIST'
       if the channel does not exist. So this statement is not changed by the worklog.
    
    2) "RESET SLAVE" does not remove any replication channel, so it shall retain
       all per-channel replication filters and all global replication filters.
       So this statement is not changed by the worklog.
    
    3) "'RESET SLAVE ALL FOR CHANNEL '<channel_name>'" removes the replication
       channel specified by 'FOR CHANNEL' clause, so it shall remove all
       per-channel replication filters of the channel if the channel exists.
       Then SELECT * FROM performance_schema.replication_applier_filters
       and SHOW SLAVE STATUS proves there's no channel anymore and therefore
       its replication filters are gone too. It still throws an error
       'ER_SLAVE_CHANNEL_DOES_NOT_EXIST' if the channel does not exist as before.
    
    4) "RESET SLAVE ALL" with no FOR CHANNEL clause removes all replication
       channels, so it shall remove all per-channel replication filters but
       does not touch all global replication filters. When the new empty
       channel is being configured, it therefore uses the global replication
       filters (copies all global replication filters to its own per-channel
       replication filters). A user who wants to remove all global and
       per-channel filters can use the statement: CHANGE REPLICATION FILTER
       Replicate_Do_DB = (), Replicate_Ignore_DB = (),
       Replicate_Do_Table = (), Replicate_Ignore_Table = (),
       Replicate_Wild_Do_Table = (), Replicate_Wild_Ignore_Table = (),
       Replicate_Rewrite_DB = ().
    
    D. SQL COMMAND: SHOW SLAVE STATUS [FOR CHANNEL <channel_name>]
    ==============================================================
    
    SHOW SLAVE STATUS FOR CHANNEL <channel_name> shall show per-channel
    replication filters for the specified channel, or throw an error
    'ER_SLAVE_CHANNEL_DOES_NOT_EXIST' if the channel does not exist.
    SHOW SLAVE STATUS with no FOR CHANNEL clause shall show the
    per-channel replication filters on every channel.
    
    E. CREATE A NEW performance_schema.replication_applier_filters
    ==============================================================
    
    We shall introduce a new dedicated P_S table to display per-channel
    replication filters for usability. So create and maintain the new
    P_S table with the following columns:
      1) Channel_name: the name of the channel;
      2) Filter_name: REPLICATE_DO_DB, REPLICATE_IGNORE_DB,
                      REPLICATE_DO_TABLE, REPLICATE_IGNORE_TABLE,
                      REPLICATE_WILD_DO_TABLE, REPLICATE_WILD_IGNORE_TABLE,
                      REPLICATE_REWRITE_DB;
      3) Filter_rule: The values that user has configured with startup
                      options: --replicate-* or through CHANGE REPLICATION
                      FILTER command (This also includes empty set when user
                      unsets the rules).
      4) Configured_by: ENUM(STARTUP_OPTIONS, CHANGE_REPLICATION_FILTER,
                        STARTUP_OPTIONS_FOR_CHANNEL,
                        CHANGE_REPLICATION_FILTER_FOR_CHANNEL); (These
                        enumeration constants are the most self-descriptive
                        set of identifiers, and supporting all the use
                        cases: U1. Reflect the configured commands;
                        U2. Determine if the filter has been persisted;
                        U3. Debugging by a confused user, or learn the
                        logic of default filters by playing with
                        different ways to set them.)
      5) Active_since: Timestamp of when the configuration took place;
                       (To a new channel copying the global replication filters as
                        its own per-channel filters, set 'active_since'
                        to channel creation time.)
      6) Counter: the hit counter of the filter since last configuration;
    
      Note: (4) and (5) are important to troubleshooting. (6) is more about
      statistics (and monitoring).
    
    F) CREATE A NEW performance_schema.replication_applier_global_filters
    ======================================================================
    
    We shall introduce a new dedicated P_S table to display all global
    replication filters for usability. So create and maintain the new
    P_S table with the following columns:
      1) Filter_name: REPLICATE_DO_DB, REPLICATE_IGNORE_DB,
                      REPLICATE_DO_TABLE, REPLICATE_IGNORE_TABLE,
                      REPLICATE_WILD_DO_TABLE, REPLICATE_WILD_IGNORE_TABLE,
                      REPLICATE_REWRITE_DB;
      2) Filter_rule: The values that user has configured with startup
                      options: --replicate-* or through CHANGE REPLICATION
                      FILTER command (This also includes empty set when user
                      unsets the rules).
      3) Configured_by: ENUM(STARTUP_OPTIONS, CHANGE_REPLICATION_FILTER);
      4) Active_since: Timestamp of when the configuration took place;
    e3e1cd11
    Wl#7361 MSR: per-channel replication filters
    Daogang Qu authored
    There are per-channel and global replication filters. Each channel uses
    *only* its own per-channel replication filters to filter the event
    stream. It never uses global replication filters to filter the event
    stream. (A new channel would copy global replication filters to its
    per-channel replication filters if there are no per-channel replication
    filters and there are global replication filters on the filter type
    when it is being configured.)
    
    The per-channel replication filters and global replication filters can
    be configured in two ways:
    A) startup options: --replicate-*
    B) SQL commands: CHANGE REPLICATION FILTER
    
    Additionally, behavior for the following statements needs to be specified:
    C) RESET SLAVE [ALL] [FOR CHANNEL]
    D) SHOW SLAVE STATUS [FOR CHANNEL]
    
    Query, troubleshoot, monitor replication filters and do statistics:
    E) CREATE A NEW performance_schema.replication_applier_filters
    
    Show the global replication filters:
    F) CREATE A NEW performance_schema.replication_applier_global_filters
    
    A) Startup options: --replicate-*
    =================================
    
    The current startup options are extended by allowing to specify channel_name in
    filter variable to configure per-channel replication filters as follows.
      --replicate-do-db=<channel_name>:<database_id>
      --replicate-ignore-db=<channel_name>:<database_id>
      --replicate-do-table=<channel_name>:<table_id>
      --replicate-ignore-table=<channel_name>:<table_id>
      --replicate-rewrite-db=<channel_name>:<db1->db2>
      --replicate-wild-do-table=<channel_name>:<table regexid>
      --replicate-wild-ignore-table=<channel_name>:<table regexid>
    
    ---- Syntax ----
    
    Each command line parameter optionally takes a channel_name followed by a colon,
    further followed by the filter specification. Note that the first colon is
    interpreted as a separator, others are literal colons.
    
    ---- Semantics ----
    
    Without specifying channel_name in filter variable, the startup options
    shall act on the default channel. See below.
    
      --replicate-do-db=:<database_id>
      --replicate-ignore-db=:<database_id>
      --replicate-do-table=:<table_id>
      --replicate-ignore-table=:<table_id>
      --replicate-rewrite-db=:<from_db>-><to_db>
      --replicate-wild-do-table=:<table regex>
      --replicate-wild-ignore-table=:<table regex>
    
    Without specifying channel_name and a followed 'colon' in filter variable, the
    startup options shall configure the global replication filters. See below.
      --replicate-do-db=<database_id>
      --replicate-ignore-db=<database_id>
      --replicate-do-table=<table_id>
      --replicate-ignore-table=<table_id>
      --replicate-rewrite-db=<from_db>-><to_db>
      --replicate-wild-do-table=<table regex>
      --replicate-wild-ignore-table=<table regex>
    
    If the user specifies a per-channel replication filter through a command-line
    option (or in a configuration file) for a slave replication channel which
    does not exist as of now (i.e not present in slave info tables yet), then
    the per-channel replication filter is discarded with the following warning:
      "There are per-channel replication filter(s) configured for channel '%.192s'
    which does not exist. The filter(s) have been discarded."
    
    If the user specifies a per-channel replication filter through a command-line
    option (or in a configuration file) for group replication channels
    'group_replication_recovery' and 'group_replication_applier' which is
    disallowed, then the per-channel replication filter is discarded with
    the following warning:
      "There are per-channel replication filter(s) configured for group replication
    channel '%.192s' which is disallowed. The filter(s) have been discarded."
    
    How global and per-channel replication filters work together?
    - Any global replication filter option will add the filter to global
      replication filters on the filter type, not add the filter to every
      channel on the filter type.
    - Any per-channel replication filter option will add the filter to
      per-channel replication filters of the specified channel on the
      filter type.
    - Every slave replication channel will copy global replication filters
      to its per-channel replication filters if there are no per-channel
      replication filters and there are global replication filters on the
      filter type when it is being configured.
    
    Example: Suppose channels '' and 'ch1' exist before the server starts,
      the command line options --replicate-do-db=db1
      --replicate-do-db=ch1:db2 --replicate-do-db=db3
      --replicate-ignore-db=db4 --replicate-ignore-db=:db5
      would result in:
        global replication filters: do_db=db1,db3, ignore_db=db4
        default channel: do_db=db1,db3 ignore_db=db5
        ch1: do_db=db2 ignore_db=db4
    
    Note: GROUP REPLICATION channels should not be configurable using
      --replicate* nor CHANGE REPLICATION FILTER, and should not inherit
      from global filters.
    
    BTW: if user specifies multiple replicate-rewrite-db=FROM->TO options
    having the same FROM database, all are added together (put into the
    rewrite_do list) and the first one takes affect. The global replication
    filters and per-channel filters have the same behavior in the worklog.
    So there is no change on this, since a channel uses either global or
    per-channel rewrite filters on a filter type.
    
    B) SQL commands: CHANGE REPLICATION FILTER
    ==========================================
    
    Dynamic replication filters are currently settable using the
    CHANGE REPLICATION FILTER statement. We extend this command to
    introduce dynamic replication filters per channel, by allowing
    a FOR CHANNEL <channel_name> clause as follows.
    
    ---- Syntax ----
    
    CHANGE REPLICATION FILTER filter [, filter...] [FOR CHANNEL <channel_name>]
    
    filter:
        REPLICATE_DO_DB = (db_list)
      | REPLICATE_IGNORE_DB = (db_list)
      | REPLICATE_DO_TABLE = (tbl_list)
      | REPLICATE_IGNORE_TABLE = (tbl_list)
      | REPLICATE_WILD_DO_TABLE = (wild_tbl_list)
      | REPLICATE_WILD_IGNORE_TABLE = (wild_tbl_list)
      | REPLICATE_REWRITE_DB = (db_pair_list)
    
    ---- Semantics ----
    
    1) If an explicit FOR CHANNEL clause is provided, the statement acts on that
       configured slave replication channel removing any existing replication
       filter if it has the same filter type as one of specified replication
       filters, and replacing them with the specified ones. Filter types that
       were not explicitly listed in the statement are not modified. The statement
       is disallowed with an error 'ER_SLAVE_CONFIGURATION' on slave replication
       channel if it is not configured. The statement is disallowed with an error
       'ER_SLAVE_CHANNEL_OPERATION_NOT_ALLOWED' on group replication channels.
    
    2) CHANGE REPLICATION FILTER filter [, filter...] with no FOR CHANNEL clause
       does the following, both for every configured slave replication channel's
       per-channel filter and for the global replication filters: For every filter
       type, if the filter type is listed in the statement, then any existing
       filter rules of that type are replaced by the filter rules specified in
       the statement, otherwise the old value of the type is retained. The
       statement does not act on group replication channels, because replication
       filters on group replication channels are disallowed. For example,
    
    C. SQL COMMAND: RESET SLAVE [ALL] [FOR CHANNEL]
    ===============================================
    
    1) "RESET SLAVE FOR CHANNEL '<channel_name>'" does not remove the replication
       channel specified by 'FOR CHANNEL' clause, so it shall retain replication
       filters of the channel. It throws an error 'ER_SLAVE_CHANNEL_DOES_NOT_EXIST'
       if the channel does not exist. So this statement is not changed by the worklog.
    
    2) "RESET SLAVE" does not remove any replication channel, so it shall retain
       all per-channel replication filters and all global replication filters.
       So this statement is not changed by the worklog.
    
    3) "'RESET SLAVE ALL FOR CHANNEL '<channel_name>'" removes the replication
       channel specified by 'FOR CHANNEL' clause, so it shall remove all
       per-channel replication filters of the channel if the channel exists.
       Then SELECT * FROM performance_schema.replication_applier_filters
       and SHOW SLAVE STATUS proves there's no channel anymore and therefore
       its replication filters are gone too. It still throws an error
       'ER_SLAVE_CHANNEL_DOES_NOT_EXIST' if the channel does not exist as before.
    
    4) "RESET SLAVE ALL" with no FOR CHANNEL clause removes all replication
       channels, so it shall remove all per-channel replication filters but
       does not touch all global replication filters. When the new empty
       channel is being configured, it therefore uses the global replication
       filters (copies all global replication filters to its own per-channel
       replication filters). A user who wants to remove all global and
       per-channel filters can use the statement: CHANGE REPLICATION FILTER
       Replicate_Do_DB = (), Replicate_Ignore_DB = (),
       Replicate_Do_Table = (), Replicate_Ignore_Table = (),
       Replicate_Wild_Do_Table = (), Replicate_Wild_Ignore_Table = (),
       Replicate_Rewrite_DB = ().
    
    D. SQL COMMAND: SHOW SLAVE STATUS [FOR CHANNEL <channel_name>]
    ==============================================================
    
    SHOW SLAVE STATUS FOR CHANNEL <channel_name> shall show per-channel
    replication filters for the specified channel, or throw an error
    'ER_SLAVE_CHANNEL_DOES_NOT_EXIST' if the channel does not exist.
    SHOW SLAVE STATUS with no FOR CHANNEL clause shall show the
    per-channel replication filters on every channel.
    
    E. CREATE A NEW performance_schema.replication_applier_filters
    ==============================================================
    
    We shall introduce a new dedicated P_S table to display per-channel
    replication filters for usability. So create and maintain the new
    P_S table with the following columns:
      1) Channel_name: the name of the channel;
      2) Filter_name: REPLICATE_DO_DB, REPLICATE_IGNORE_DB,
                      REPLICATE_DO_TABLE, REPLICATE_IGNORE_TABLE,
                      REPLICATE_WILD_DO_TABLE, REPLICATE_WILD_IGNORE_TABLE,
                      REPLICATE_REWRITE_DB;
      3) Filter_rule: The values that user has configured with startup
                      options: --replicate-* or through CHANGE REPLICATION
                      FILTER command (This also includes empty set when user
                      unsets the rules).
      4) Configured_by: ENUM(STARTUP_OPTIONS, CHANGE_REPLICATION_FILTER,
                        STARTUP_OPTIONS_FOR_CHANNEL,
                        CHANGE_REPLICATION_FILTER_FOR_CHANNEL); (These
                        enumeration constants are the most self-descriptive
                        set of identifiers, and supporting all the use
                        cases: U1. Reflect the configured commands;
                        U2. Determine if the filter has been persisted;
                        U3. Debugging by a confused user, or learn the
                        logic of default filters by playing with
                        different ways to set them.)
      5) Active_since: Timestamp of when the configuration took place;
                       (To a new channel copying the global replication filters as
                        its own per-channel filters, set 'active_since'
                        to channel creation time.)
      6) Counter: the hit counter of the filter since last configuration;
    
      Note: (4) and (5) are important to troubleshooting. (6) is more about
      statistics (and monitoring).
    
    F) CREATE A NEW performance_schema.replication_applier_global_filters
    ======================================================================
    
    We shall introduce a new dedicated P_S table to display all global
    replication filters for usability. So create and maintain the new
    P_S table with the following columns:
      1) Filter_name: REPLICATE_DO_DB, REPLICATE_IGNORE_DB,
                      REPLICATE_DO_TABLE, REPLICATE_IGNORE_TABLE,
                      REPLICATE_WILD_DO_TABLE, REPLICATE_WILD_IGNORE_TABLE,
                      REPLICATE_REWRITE_DB;
      2) Filter_rule: The values that user has configured with startup
                      options: --replicate-* or through CHANGE REPLICATION
                      FILTER command (This also includes empty set when user
                      unsets the rules).
      3) Configured_by: ENUM(STARTUP_OPTIONS, CHANGE_REPLICATION_FILTER);
      4) Active_since: Timestamp of when the configuration took place;
Loading