Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6675

Use PGO in builds to help reduce icache miss overhead

    Details

    • Type: Task
    • Status: Stalled
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      I wrote about this in January:

      https://lists.launchpad.net/maria-developers/msg06693.html
      http://kristiannielsen.livejournal.com/17676.html
      http://kristiannielsen.livejournal.com/18168.html

      Even for simple queries, profiling shows that icache misses is a major
      bottleneck to performance. The total amount of code executed is larger than
      the icache, and prefetch is not sufficiently effective, making the CPU spend
      most of its time waiting for new instructions to be fetched and decoded.

      A partial but easy-to-implement fix is to use GCC profile-guided
      optimisations. Tests have shown this to significantly reduce icache misses, as
      well as causing other small improvements, for a nice total speedup in
      single-threaded performance.

      I already have a script that generates a suitable test load, and the commands
      needed to build using PGO:

      https://github.com/knielsen/gen_profile_load

        mkdir bld
        cd bld
        cmake -DWITHOUT_PERFSCHEMA_STORAGE_ENGINE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 --coverage" -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 --coverage" ..
        make
      
        tests/gen_profile_load
      
        cmake -DWITHOUT_PERFSCHEMA_STORAGE_ENGINE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 -fprofile-use -fprofile-correction" -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 -fprofile-use -fprofile-correction"
        make
      

      It just needs to be integrated into the .deb build scripts (native Debian as
      well as MariaDB 3rd-party repos) as well as bintar scripts.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            knielsen Kristian Nielsen added a comment -

            Hm, I have a patch for using PGO when building .debs.

            But I did a quick test with a bunch of simple queries, and the PGO binaries were not seen to be faster. In fact, they were seen to be a few percent slower.

            So I need to analyse this before proceeding, need to find the explanation for this, to see if the PGO idea is at all viable.

            Show
            knielsen Kristian Nielsen added a comment - Hm, I have a patch for using PGO when building .debs. But I did a quick test with a bunch of simple queries, and the PGO binaries were not seen to be faster. In fact, they were seen to be a few percent slower. So I need to analyse this before proceeding, need to find the explanation for this, to see if the PGO idea is at all viable.
            Hide
            knielsen Kristian Nielsen added a comment -

            I made a patch to use PGO in the debian package builds.

            But then a quick benchmark showed that the resulting binaries were slower, not faster, than the original. This probably needs to be understood before going further with this task.

            Show
            knielsen Kristian Nielsen added a comment - I made a patch to use PGO in the debian package builds. But then a quick benchmark showed that the resulting binaries were slower, not faster, than the original. This probably needs to be understood before going further with this task.
            Hide
            knielsen Kristian Nielsen added a comment -

            I attached my patch to this issue.

            This is mainly extending debian/rules to build with profiling, then run the profile load, then build again using PGO. And it uses the load generator from here:

            https://github.com/knielsen/gen_profile_load

            It also includes the simple test script that showed poorer performance of the PGO binaries.

            The patch should be complete (it is based on an older 10.0 tree). But the issue that even simple performance tests become slower using PGO probably needs to be investigated before using this ...

            Show
            knielsen Kristian Nielsen added a comment - I attached my patch to this issue. This is mainly extending debian/rules to build with profiling, then run the profile load, then build again using PGO. And it uses the load generator from here: https://github.com/knielsen/gen_profile_load It also includes the simple test script that showed poorer performance of the PGO binaries. The patch should be complete (it is based on an older 10.0 tree). But the issue that even simple performance tests become slower using PGO probably needs to be investigated before using this ...

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                knielsen Kristian Nielsen
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: