Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6161

server hangs in simple query with large strings (myisam)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 5.5.35, 5.5.36
    • Fix Version/s: N/A
    • Component/s: OTHER
    • Labels:
      None
    • Environment:
      fedora-20

      Description

      some queries on sample table cause server hang:

      Command | Time | State        | Info                                                                                                 
      Query   | 1597 | Sending data | SELECT count(h.id)
      

      with 0 disk activity

      hanging query:

      27 Query   SELECT count(h.id) FROM wp_bp_album_hash h
      where  h.id!=8164 AND h.hashv !='' AND huffman_dist(h.hashv,'00BB0000 00000000 00BB9C80 00000000 00C09BF9 00000000 CCC00C04 000000CC DC4C0000 00000CCD FFDDCC0C 00CCCFFB DDFFDC40 00CFFFFD FFF3FFCC DDFDCCDD FFFFDD0C 0CCCCDDF CCCCC4CC 0C0DDDDD CCCC40C0 0000CCCC DCFFF400 0000C04C DCDDC000 00000CDD DD040000 0000CCCD C0000000 00000000 00000000 00000000')< 400
      

      while next query works fine (in 2.941s):

      27 Query     SELECT count(h.id) FROM wp_bp_album_hash h
      where  h.id!=8164 AND h.hashk !='' AND huffman_dist(h.hashk,'00990000 00000000 00F99910 00000000 00009999 00000000 00000005 00000000 D4540000 0000004C FD444000 00044DFF CCDDC400 004D9DDC DFFFFF40 D99C444C FDDDCD00 000044CF 44444000 0009DC44 44400000 00000046 C4999400 00000044 D1DD4000 00000DDD 59000000 0000055D 00000000 00000000 00000000 00000000')< 390
      

      huffman_dist - small UDF which calculates "BIT_COUNT( a ^ b )' word-by-word for LONG hashes (I can attach compiled one)

      table in attachment

        Gliffy Diagrams

          Attachments

          1. err.txt
            46 kB
          2. err1.txt
            4 kB
          3. hash.sql.gz
            3.51 MB
          4. mariadb.spec
            27 kB
          5. my.cnf.tgz
            0.8 kB
          6. trace.log
            32 kB
          7. udf_hash.so
            5 kB

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Hi,

            Thanks.
            Did you take the stack trace snapshot once, or did you try it several times?
            In the latter case, were there any changes in the stack traces between the snapshots?

            Also, how is the CPU usage when it hangs?

            Show
            elenst Elena Stepanova added a comment - Hi, Thanks. Did you take the stack trace snapshot once, or did you try it several times? In the latter case, were there any changes in the stack traces between the snapshots? Also, how is the CPU usage when it hangs?
            Hide
            vde vde added a comment -

            Once, there is no so much time before hangs.

            CPU load about 0 - hanging threads don't eats CPU

            Show
            vde vde added a comment - Once, there is no so much time before hangs. CPU load about 0 - hanging threads don't eats CPU
            Hide
            elenst Elena Stepanova added a comment -

            What about stack traces during hanging (not before)? Do they look any different?

            I'm asking about stack traces and CPU because server that is really hanging (forever waiting for something) and server going through an endless loop often look the same – unavailable, irresponsive. The difference is that for a server in a loop consequent stack traces differ and CPU load is [relatively] high. Since your CPU is about 0, I assume it's not the case.

            Show
            elenst Elena Stepanova added a comment - What about stack traces during hanging (not before)? Do they look any different? I'm asking about stack traces and CPU because server that is really hanging (forever waiting for something) and server going through an endless loop often look the same – unavailable, irresponsive. The difference is that for a server in a loop consequent stack traces differ and CPU load is [relatively] high. Since your CPU is about 0, I assume it's not the case.
            Hide
            vde vde added a comment -

            >stack traces during hanging (not before)
            I can't do much debug on production server because we lose all subscribers... "killall -9 mysqld" spells automatically when "mysqladmin ping" fails.

            Yes its seems that threads are waiting for events forever and eventually use up the entire thread pool so server comes irresponsive.

            Show
            vde vde added a comment - >stack traces during hanging (not before) I can't do much debug on production server because we lose all subscribers... "killall -9 mysqld" spells automatically when "mysqladmin ping" fails. Yes its seems that threads are waiting for events forever and eventually use up the entire thread pool so server comes irresponsive.
            Hide
            elenst Elena Stepanova added a comment -

            I tried again to reproduce it, but there is too much mystery in here.

            The query quoted in the description hangs in 'Sending data', which can be anything, no way to know without the stack trace and more information (complete process list etc.). It can even be a problem in the UDF function, e.g. something causing an endless loop (only in this case CPU usage should be noticeable).

            But the attached processlist and stack trace are from a different occasion, a query which starts with SELECT SQL_SMALL_RESULT h.id, h.hashv, huffman_dist(h.hashv,. It stops not on 'Sending data', but on 'Copying to tmp table'). Unfortunately, there is no complete query anywhere, so we can't even know why it wants to create a temporary table in the first place. Also, the stack trace does not look like something hanging, again it looks more like an endless loop – without a few consequent stack traces, there is no way to tell for sure.

            So, if the problem still exists, we need:

            • at least a couple (better 3-4) stack traces taken with a short time interval;
            • complete processlist;
            • complete query which is being run;
            • server error log.
            Show
            elenst Elena Stepanova added a comment - I tried again to reproduce it, but there is too much mystery in here. The query quoted in the description hangs in 'Sending data', which can be anything, no way to know without the stack trace and more information (complete process list etc.). It can even be a problem in the UDF function, e.g. something causing an endless loop (only in this case CPU usage should be noticeable). But the attached processlist and stack trace are from a different occasion, a query which starts with SELECT SQL_SMALL_RESULT h.id, h.hashv, huffman_dist(h.hashv, . It stops not on 'Sending data', but on 'Copying to tmp table'). Unfortunately, there is no complete query anywhere, so we can't even know why it wants to create a temporary table in the first place. Also, the stack trace does not look like something hanging, again it looks more like an endless loop – without a few consequent stack traces, there is no way to tell for sure. So, if the problem still exists, we need: at least a couple (better 3-4) stack traces taken with a short time interval; complete processlist; complete query which is being run; server error log.

              People

              • Assignee:
                Unassigned
                Reporter:
                vde vde
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: