Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6459

max_relay_log_size and sql_slave_skip_counter misbehave on PPC64

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.13
    • Fix Version/s: 10.0.14
    • Component/s: None
    • Labels:
      None
    • Environment:
      PPC64 RHEL 6.5

      Description

      The following tests fail on PPC64 due to misbehaving variables: multi_source.skip_counter, rpl.rpl_auto_increment, rpl.rpl_mdev6020, rpl.rpl_skip_replication, rpl.rpl_stm_max_relay_size, sys_vars.max_relay_log_size_basic, sys_vars.sql_slave_skip_counter_basic.

      BB link: http://buildbot.askmonty.org/buildbot/builders/bintar-rhel6-p8/builds/211/steps/test/logs/stdio

      Most failures look as following:

      @@ -21,17 +21,17 @@
       set global sql_slave_skip_counter = 2;
       select @@global.sql_slave_skip_counter;
       @@global.sql_slave_skip_counter
      -2
      +8589934592
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              svoj Sergey Vojtovich added a comment -

              Kristian, please review fix for this bug.

              The patch has been pushed to 10.0.13:

              revno: 4293
              revision-id: svoj@mariadb.org-20140718154521-mwoz6ezimga0axcj
              parent: svoj@mariadb.org-20140718111625-uch1ssbh8kf6i4ib
              committer: Sergey Vojtovich <svoj@mariadb.org>
              branch nick: 10.0
              timestamp: Fri 2014-07-18 19:45:21 +0400
              message:
                MDEV-6459 - max_relay_log_size and sql_slave_skip_counter
                            misbehave on PPC64
                
                There was a mix of ulong and uint casts/variables which caused
                incorrect value to be passed to/retrieved from max_relay_log_size
                and sql_slave_skip_counter.
                
                This mix failed to work on big-endian PPC64 where sizeof(int)= 4,
                sizeof(long)= 8. E.g. session_var(thd, uint)= 1 will in fact store
                0x100000000.
              
              Show
              svoj Sergey Vojtovich added a comment - Kristian, please review fix for this bug. The patch has been pushed to 10.0.13: revno: 4293 revision-id: svoj@mariadb.org-20140718154521-mwoz6ezimga0axcj parent: svoj@mariadb.org-20140718111625-uch1ssbh8kf6i4ib committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 10.0 timestamp: Fri 2014-07-18 19:45:21 +0400 message: MDEV-6459 - max_relay_log_size and sql_slave_skip_counter misbehave on PPC64 There was a mix of ulong and uint casts/variables which caused incorrect value to be passed to/retrieved from max_relay_log_size and sql_slave_skip_counter. This mix failed to work on big-endian PPC64 where sizeof(int)= 4, sizeof(long)= 8. E.g. session_var(thd, uint)= 1 will in fact store 0x100000000.
              Hide
              serg Sergei Golubchik added a comment -

              Try to avoid long in sysvars, prefer int or longlong instead. They are a lot more stable between architectures, int is typically 23-bit, longlong is 64-bit. But long can be either, so the variable gets different limits on different platforms — this makes documenting the variable (and writing test cases) rather complicated.

              Show
              serg Sergei Golubchik added a comment - Try to avoid long in sysvars, prefer int or longlong instead. They are a lot more stable between architectures, int is typically 23-bit, longlong is 64-bit. But long can be either, so the variable gets different limits on different platforms — this makes documenting the variable (and writing test cases) rather complicated.
              Hide
              svoj Sergey Vojtovich added a comment -

              Max value for sql_slave_skip_counter is UINT_MAX and for max_relay_log_size is 1024L*1024*1024. That is both fit 32-bit unsigned integer.

              Not sure if there was a good reason to choose ulong and not the other type. Since Kristian created this code, I'm better handing off this recommendation to him.

              Show
              svoj Sergey Vojtovich added a comment - Max value for sql_slave_skip_counter is UINT_MAX and for max_relay_log_size is 1024L*1024*1024. That is both fit 32-bit unsigned integer. Not sure if there was a good reason to choose ulong and not the other type. Since Kristian created this code, I'm better handing off this recommendation to him.
              Hide
              knielsen Kristian Nielsen added a comment -

              > Not sure if there was a good reason to choose ulong and not the other
              > type. Since Kristian created this code, I'm better handing off this
              > recommendation to him.

              I don't think I could have created the code for max_relay_log_size and
              sql_slave_skip_counter? Those have existed since far before I started working
              on replication, AFAIK?

              Generally, I would agree with Serg that it's best to avoid using ulong. Using
              ulonglong seems fine here.

              I have noticed that binlog sizes and offsets have a tendency to use 32-bit
              values around the replication code (which is generally wrong for file
              offsets). I suspect that there are other bugs related to this lingering
              around.

              Using ulonglong by default when adding or otherwise changing code seems a
              reasonable approach to me, where there are no performance concerns that would
              suggest using a 32-bit type (and that does not seem to be the case here).

              • Kristian.
              Show
              knielsen Kristian Nielsen added a comment - > Not sure if there was a good reason to choose ulong and not the other > type. Since Kristian created this code, I'm better handing off this > recommendation to him. I don't think I could have created the code for max_relay_log_size and sql_slave_skip_counter? Those have existed since far before I started working on replication, AFAIK? Generally, I would agree with Serg that it's best to avoid using ulong. Using ulonglong seems fine here. I have noticed that binlog sizes and offsets have a tendency to use 32-bit values around the replication code (which is generally wrong for file offsets). I suspect that there are other bugs related to this lingering around. Using ulonglong by default when adding or otherwise changing code seems a reasonable approach to me, where there are no performance concerns that would suggest using a 32-bit type (and that does not seem to be the case here). Kristian.

                People

                • Assignee:
                  serg Sergei Golubchik
                  Reporter:
                  svoj Sergey Vojtovich
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 10 minutes
                    10m