Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5618

TokuDB tests fail when building 5.5.35 in buildd at Launchpad.net

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.5.35
    • Fix Version/s: 5.5.37
    • Component/s: None
    • Labels:

      Description

      I've been working on Debian packaging. My current latest version https://github.com/ottok/mariadb-5.5 builds OK on localhost with git-buildpackage and dpkg-buildpackage, and also on another build machine that runs git-buildpackage with pbuilder chroots.

      However when I upload the same package to Launchpad.net all the versions that build TokuDB fail to successfully build because the test run has TokuDB related fails:

      Only  1358  of 3485 completed.
      --------------------------------------------------------------------------
      The servers were restarted 533 times
      Spent 1884.283 of 2996 seconds executing testcases
      
      Check of testcase failed for: rpl.rpl_ddl
      
      Too many failed: Failed 10/952 tests, 98.95% were successful.
      
      Failing test(s): rpl-tokudb.tokudb_innodb_xa_crash tokudb_alter_table.ai_part tokudb_alter_table.drop_add_pk_part_104 tokudb_alter_table.hcad_part tokudb_alter_table.rename_column_cold_part_104
      

      Above was from build log at https://launchpadlibrarian.net/165042892/buildlog_ubuntu-trusty-amd64.mariadb-5.5_5.5.35-1~trusty1~ppa3_FAILEDTOBUILD.txt.gz

      More build logs at https://launchpad.net/~mysql-ubuntu/+archive/mariadb/+builds?build_text=&build_state=all

      The build failure for saucy-amd64 is identical.

      Launchpad uses buildd to build, and as that is the main difference to other build environments, I suspect there is some issue with 5.5.35 and buildd.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            I ran the test for trusty build binaries inside a trusty chroot like this:
            root@htpc:/tmp/buildd/mariadb-5.5-5.5.35/mysql-test# ./mtr storage/tokudb/ft-index/portability/tests/test-cpu-freq.cc

            The CPU went 100% and the system was unresponsive for 40 minutes

            It is a cc unit test (hence the name), it has nothing to do with MTR. In fact, it should not have even started since its name does not meet MTR requirements; but I saw the behavior you described several times when I had my mtr script corrupted.

            I'm sorry but I think my time budget for this in used for now and I'll upload without TokuDB. It can be added back later.

            I see. I'm afraid without any clear information on what is happening inside those machines, there is not much we can do, but I'll keep it open in case something new comes up.

            Show
            elenst Elena Stepanova added a comment - I ran the test for trusty build binaries inside a trusty chroot like this: root@htpc:/tmp/buildd/mariadb-5.5-5.5.35/mysql-test# ./mtr storage/tokudb/ft-index/portability/tests/test-cpu-freq.cc The CPU went 100% and the system was unresponsive for 40 minutes It is a cc unit test (hence the name), it has nothing to do with MTR. In fact, it should not have even started since its name does not meet MTR requirements; but I saw the behavior you described several times when I had my mtr script corrupted. I'm sorry but I think my time budget for this in used for now and I'll upload without TokuDB. It can be added back later. I see. I'm afraid without any clear information on what is happening inside those machines, there is not much we can do, but I'll keep it open in case something new comes up.
            Hide
            elenst Elena Stepanova added a comment -

            Please comment to re-open or ping me on IRC when/if you have more information so that we can proceed with this.

            Show
            elenst Elena Stepanova added a comment - Please comment to re-open or ping me on IRC when/if you have more information so that we can proceed with this.
            Hide
            otto Otto Kekäläinen added a comment -

            I intend to re-engineer how TokuDB is built/tested in Debian builds to get this one solved.

            Show
            otto Otto Kekäläinen added a comment - I intend to re-engineer how TokuDB is built/tested in Debian builds to get this one solved.
            Hide
            tmcallaghan Tim Callaghan added a comment -

            From the maria-dev list:

            The answer to your first question is, that's how CMake works. CMake's Cross Compiling guide says that it can't guess the target processor details, and you're supposed to provide that information either by explicitly setting the variables, or by providing a toolchain file: http://www.cmake.org/Wiki/CMake_Cross_Compiling

            I would be surprised if launchpad.net's infrastructure did not include suitable toolchain files, but this really isn't my area of expertise. If you can find suitable ones to use, then you should use them, otherwise I think you should probably just add something to the rules file to set those variables explicitly.

            Regarding your second problem, it sounds like your packaging scripts aren't properly linking with jemalloc as the first library, with --whole-archive. I say that because we get a failure (from Elena's stacktrace) inside jemalloc code when calling free() inside the library constructor:

            #2 <signal handler called>
            #3 extent_ad_comp (a=0x7fff22e3f930, b=0x0) at extra/jemalloc/src/extent.c:32
            #4 jemalloc_internal_extent_tree_ad_search (rbtree=rbtree@entry=0x7f5ceadfb0c0 <huge>, key=key@entry=0x7fff22e3f930) at extra/jemalloc/src/extent.c:38
            #5 0x00007f5ceab7fab8 in jemalloc_internal_huge_salloc (ptr=0x7f5cedf7ce00) at extra/jemalloc/src/huge.c:229
            #6 0x00007f5ceab6e335 in jemalloc_internal_isalloc (demote=false, ptr=0x7f5cedf7ce00) at include/jemalloc/internal/jemalloc_internal.h:863
            #7 free (ptr=0x7f5cedf7ce00) at extra/jemalloc/src/jemalloc.c:1267
            #8 0x00007f5ceaac78de in toku_get_processor_frequency_cpuinfo (hzret=0x7fff22e3fa78) at storage/tokudb/ft-index/portability/portability.cc:371
            #9 toku_os_get_processor_frequency (hzret=0x7fff22e3fa78) at storage/tokudb/ft-index/portability/portability.cc:409
            #10 0x00007f5ceaac7a5d in toku_portability_init () at storage/tokudb/ft-index/portability/portability.cc:139
            #11 0x00007f5ceaaf72bc in toku_ft_layer_init () at storage/tokudb/ft-index/ft/ft-ops.cc:6275
            #12 0x00007f5ceaa80f55 in GLOBAL_I_65535_0_libtokufractaltree_static.a_0x235798 () at storage/tokudb/ft-index/src/ydb_lib.cc:103

            I've seen this happen before when a buffer is allocated with the system allocator's malloc() (as likely happens in this call to getline(3) https://github.com/Tokutek/ft-index/blob/releases/tokudb-7.1/portability/portability.cc#L360), and then the fractal tree tries to free it with jemalloc (https://github.com/Tokutek/ft-index/blob/releases/tokudb-7.1/portability/portability.cc#L371).

            Please make sure that jemalloc is being linked properly (as the first library, and with --whole-archive) into mysqld. It is not sufficient to only link it to ha_tokudb.so, because in that case the process (mysqld) will be using the system allocator, and there will be some possibly inlined calls to jemalloc's interface inside ha_tokudb.so. If you need help with this, please show me how your linking is being done and I'll try to give the right advice. If it is against policy to ship with the allocator statically linked in a binary, then you should make sure jemalloc isn't linked in ha_tokudb.so anywhere, but I strongly recommend against that.

            On Mon, Apr 14, 2014 at 5:01 PM, Rich Prohaska <prohaska@tokutek.com> wrote:
            Hello Otto,
            Have not investigated these problems yet. Created a tokudb issue to track: https://github.com/Tokutek/mariadb-5.5/issues/53

            On Mon, Apr 14, 2014 at 5:03 AM, Otto Kekäläinen <otto@seravo.fi> wrote:
            Hello Richard,

            Any chance of getting your comments on this..? Thanks!

            2014-04-01 12:25 GMT+03:00 Otto Kekäläinen <otto@seravo.fi>:
            > Hello Rick,
            >
            > Last year I spent a lot of time packaging MariaDB 5.5 for Debian and
            > finally this year it has landed in Ubuntu 14.04 and Debian testing.
            > Unfortunately the Debian/Ubuntu version does not include TokuDB and I
            > need your help to get it there.
            >
            > In 5.5.35 (I think) the TokuDB plugn was added to MariaDB but I had
            > issues getting it build 100% correctly and I eventually dropped it
            > (added build parameter -DWITHOUT_TOKUDB=true), as getting MariaDB in
            > Debian at all was a bigger priority than getting it there with every
            > possible plugin.
            >
            > The root cause seems to be that when Debian and Ubuntu packages are
            > built in chroot environments (the build systems of Debian and Ubuntu
            > use pbuilder/sbuilder systems, see
            > https://en.wikipedia.org/wiki/Debian_build_toolchain#Isolated_build_environments)
            > the code that builds the plugin does not seem to correctly detect the
            > CPU features. It seems to read the values from the build machine and
            > not the inputted target values (in a cross-compile situation).
            >
            >
            > There are two related issues that needs a solution:
            >
            >
            > 1) Currenlty the code that checks what the architecture is
            > (32-bit/64-bit) is the first lines of
            > https://bazaar.launchpad.net/~maria-captains/maria/10.0/view/head:/storage/tokudb/CMakeLists.txt.
            > This works well for real and virtual machhines, but it does not seem
            > to work in the pbuilder/sbuilder chroots, as CMAKE_SYSTEM_PROCESSOR
            > always shows the chroot host CPU, not the cross-compile target CPU.
            >
            > Could you please investigate pbuilder/sbuilder and search for some
            > solution that works for reliable target CPU checking?
            >
            >
            > 2) When building TokuDB in Ubuntu (amd64) sbuilder environments
            > something in crashes in the 'toku_os_get_processor_frequency'
            > function. For this too, could you investigate the sbuilde chroot
            > environment and figure out what goes on and how to fix it?
            >
            > Issue 2 has a bug report with the (a bit messy) debugging history
            > documented: https://mariadb.atlassian.net/browse/MDEV-5618
            >
            >
            > Both of these issues requires learning a bit about sbuilder CPU
            > things, so I assume it is most efficient if the same persons looks
            > into both of these.
            >
            >
            > Thanks!
            >

            Show
            tmcallaghan Tim Callaghan added a comment - From the maria-dev list: The answer to your first question is, that's how CMake works. CMake's Cross Compiling guide says that it can't guess the target processor details, and you're supposed to provide that information either by explicitly setting the variables, or by providing a toolchain file: http://www.cmake.org/Wiki/CMake_Cross_Compiling I would be surprised if launchpad.net's infrastructure did not include suitable toolchain files, but this really isn't my area of expertise. If you can find suitable ones to use, then you should use them, otherwise I think you should probably just add something to the rules file to set those variables explicitly. Regarding your second problem, it sounds like your packaging scripts aren't properly linking with jemalloc as the first library, with --whole-archive. I say that because we get a failure (from Elena's stacktrace) inside jemalloc code when calling free() inside the library constructor: #2 <signal handler called> #3 extent_ad_comp (a=0x7fff22e3f930, b=0x0) at extra/jemalloc/src/extent.c:32 #4 jemalloc_internal_extent_tree_ad_search (rbtree=rbtree@entry=0x7f5ceadfb0c0 <huge>, key=key@entry=0x7fff22e3f930) at extra/jemalloc/src/extent.c:38 #5 0x00007f5ceab7fab8 in jemalloc_internal_huge_salloc (ptr=0x7f5cedf7ce00) at extra/jemalloc/src/huge.c:229 #6 0x00007f5ceab6e335 in jemalloc_internal_isalloc (demote=false, ptr=0x7f5cedf7ce00) at include/jemalloc/internal/jemalloc_internal.h:863 #7 free (ptr=0x7f5cedf7ce00) at extra/jemalloc/src/jemalloc.c:1267 #8 0x00007f5ceaac78de in toku_get_processor_frequency_cpuinfo (hzret=0x7fff22e3fa78) at storage/tokudb/ft-index/portability/portability.cc:371 #9 toku_os_get_processor_frequency (hzret=0x7fff22e3fa78) at storage/tokudb/ft-index/portability/portability.cc:409 #10 0x00007f5ceaac7a5d in toku_portability_init () at storage/tokudb/ft-index/portability/portability.cc:139 #11 0x00007f5ceaaf72bc in toku_ft_layer_init () at storage/tokudb/ft-index/ft/ft-ops.cc:6275 #12 0x00007f5ceaa80f55 in GLOBAL _I_65535_0_libtokufractaltree_static.a_0x235798 () at storage/tokudb/ft-index/src/ydb_lib.cc:103 I've seen this happen before when a buffer is allocated with the system allocator's malloc() (as likely happens in this call to getline(3) https://github.com/Tokutek/ft-index/blob/releases/tokudb-7.1/portability/portability.cc#L360 ), and then the fractal tree tries to free it with jemalloc ( https://github.com/Tokutek/ft-index/blob/releases/tokudb-7.1/portability/portability.cc#L371 ). Please make sure that jemalloc is being linked properly (as the first library, and with --whole-archive) into mysqld . It is not sufficient to only link it to ha_tokudb.so, because in that case the process (mysqld) will be using the system allocator, and there will be some possibly inlined calls to jemalloc's interface inside ha_tokudb.so. If you need help with this, please show me how your linking is being done and I'll try to give the right advice. If it is against policy to ship with the allocator statically linked in a binary, then you should make sure jemalloc isn't linked in ha_tokudb.so anywhere, but I strongly recommend against that. On Mon, Apr 14, 2014 at 5:01 PM, Rich Prohaska <prohaska@tokutek.com> wrote: Hello Otto, Have not investigated these problems yet. Created a tokudb issue to track: https://github.com/Tokutek/mariadb-5.5/issues/53 On Mon, Apr 14, 2014 at 5:03 AM, Otto Kekäläinen <otto@seravo.fi> wrote: Hello Richard, Any chance of getting your comments on this..? Thanks! 2014-04-01 12:25 GMT+03:00 Otto Kekäläinen <otto@seravo.fi>: > Hello Rick, > > Last year I spent a lot of time packaging MariaDB 5.5 for Debian and > finally this year it has landed in Ubuntu 14.04 and Debian testing. > Unfortunately the Debian/Ubuntu version does not include TokuDB and I > need your help to get it there. > > In 5.5.35 (I think) the TokuDB plugn was added to MariaDB but I had > issues getting it build 100% correctly and I eventually dropped it > (added build parameter -DWITHOUT_TOKUDB=true), as getting MariaDB in > Debian at all was a bigger priority than getting it there with every > possible plugin. > > The root cause seems to be that when Debian and Ubuntu packages are > built in chroot environments (the build systems of Debian and Ubuntu > use pbuilder/sbuilder systems, see > https://en.wikipedia.org/wiki/Debian_build_toolchain#Isolated_build_environments ) > the code that builds the plugin does not seem to correctly detect the > CPU features. It seems to read the values from the build machine and > not the inputted target values (in a cross-compile situation). > > > There are two related issues that needs a solution: > > > 1) Currenlty the code that checks what the architecture is > (32-bit/64-bit) is the first lines of > https://bazaar.launchpad.net/~maria-captains/maria/10.0/view/head:/storage/tokudb/CMakeLists.txt . > This works well for real and virtual machhines, but it does not seem > to work in the pbuilder/sbuilder chroots, as CMAKE_SYSTEM_PROCESSOR > always shows the chroot host CPU, not the cross-compile target CPU. > > Could you please investigate pbuilder/sbuilder and search for some > solution that works for reliable target CPU checking? > > > 2) When building TokuDB in Ubuntu (amd64) sbuilder environments > something in crashes in the 'toku_os_get_processor_frequency' > function. For this too, could you investigate the sbuilde chroot > environment and figure out what goes on and how to fix it? > > Issue 2 has a bug report with the (a bit messy) debugging history > documented: https://mariadb.atlassian.net/browse/MDEV-5618 > > > Both of these issues requires learning a bit about sbuilder CPU > things, so I assume it is most efficient if the same persons looks > into both of these. > > > Thanks! >
            Hide
            otto Otto Kekäläinen added a comment -

            Later versions of TokuDB have built OK on Launchpad, e.g. https://launchpadlibrarian.net/174495437/buildlog_ubuntu-trusty-amd64.mariadb-5.5_5.5.37-1~trusty1~ppa6_UPLOADING.txt.gz

            Thus I can close this particular issue though other issues with TokuDB builds remain (MDEV-6449).

            Show
            otto Otto Kekäläinen added a comment - Later versions of TokuDB have built OK on Launchpad, e.g. https://launchpadlibrarian.net/174495437/buildlog_ubuntu-trusty-amd64.mariadb-5.5_5.5.37-1~trusty1~ppa6_UPLOADING.txt.gz Thus I can close this particular issue though other issues with TokuDB builds remain ( MDEV-6449 ).

              People

              • Assignee:
                otto Otto Kekäläinen
                Reporter:
                otto Otto Kekäläinen
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: