Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7439

Power8 builders running out of disk space

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: N/A
    • Fix Version/s: N/A
    • Component/s: Tests
    • Labels:
      None

      Description

      Power8 VMs have really little available disk space, like 30-50Gb (minus space taken by OS). Each of them keep 3 build directories: release, debug and packages. Each directory is up to 7Gb, which gives 21Gb.

      This makes builds hit out of disk errors frequently. Please add a step to bb configuration to cleanup build dir when build is completed. Or solve this problem any other reasonable way.

      Also please remove "xtra" step, it is covered anyway by "xtra-big".

      There's also AT7.x and AT8.x installed on p8-rhel7, which take almost 5Gb. Worth to remove one version?

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              elenst Elena Stepanova added a comment - - edited

              I've made the following changes to get it work for now:

              1) Added a step to all p8_* factories:

              f_p8_rhel6_bintar.addStep(RemoveDirectory(
                      name="remove_build", 
                      dir=WithProperties("%(distdirname)s"),
                      alwaysRun=True));
              

              It is to remove the <builder>/build subdir after the test.
              It is the very last step, after all archiving, uploads etc., so it should be safe; and it's set to be executed always, even when the build fails.
              Before that, the directory was only removed at the beginning of the corresponding test.

              2) Added the following auxiliary step to p8_* package factories:

              f_p8_trusty_deb.addStep(SetPropertyFromCommand(
                      property="distdirname",
                      command=["sh", "-c", WithProperties("pwd")],
                      ))
              

              It is to facilitate the previous change. I suppose it's needed since the bintar factories had it already.

              3) For each p8_* factory, Added git clean in getCompileStep before running cmake:

                  getCompileStep(["sh", "-c", "git clean -dfX && export PATH=/opt/at8.0/bin:$PATH && cmake . -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON && make -j4 package"],
              ...
              

              Explanation:
              For each builder (but not for each branch!) there is a persistent folder <common_path>/<builder_name>/source. When a build for a particular revision is run, it goes to the folder, resets it to the required revision, fetches etc.. When it's done, it copies the entire <common_path>/<builder_name>/source dir to <common_path>/<builder_name>/build and builds there.
              So, technically the source tree should be clean of all build stuff. Apparently, it is not always the case, e.g. it's possible that somebody builds there manually. If it happens, the folder will contain previously built binaries and CMakeCache.txt, all of which can conflict with the current build attempt. If it conflicts badly, e.g. CMakeCache.txt belongs to a different version, it's not that bad, because the build just fails right away with a proper message; it gets more complicated when the folder contains unexpected binaries, e.g. somebody ran a build in source with default options, so it built all engines as dynamic libraries; then, the buildbot build attempts to build without Connect engine, and really does so, but ha_connect.so still exists in the folder, gets picked up by MTR, which causes all kinds of oddities. Both situations actually happened already. So, adding git clean is supposed to protect from this.

              4) Commented out xtra test, as requested.

              Show
              elenst Elena Stepanova added a comment - - edited I've made the following changes to get it work for now: 1) Added a step to all p8_* factories: f_p8_rhel6_bintar.addStep(RemoveDirectory( name="remove_build", dir=WithProperties("%(distdirname)s"), alwaysRun=True)); It is to remove the <builder>/build subdir after the test. It is the very last step, after all archiving, uploads etc., so it should be safe; and it's set to be executed always, even when the build fails. Before that, the directory was only removed at the beginning of the corresponding test. 2) Added the following auxiliary step to p8_* package factories: f_p8_trusty_deb.addStep(SetPropertyFromCommand( property="distdirname", command=["sh", "-c", WithProperties("pwd")], )) It is to facilitate the previous change. I suppose it's needed since the bintar factories had it already. 3) For each p8_* factory, Added git clean in getCompileStep before running cmake: getCompileStep(["sh", "-c", "git clean -dfX && export PATH=/opt/at8.0/bin:$PATH && cmake . -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON && make -j4 package"], ... Explanation: For each builder (but not for each branch!) there is a persistent folder <common_path>/<builder_name>/source . When a build for a particular revision is run, it goes to the folder, resets it to the required revision, fetches etc.. When it's done, it copies the entire <common_path>/<builder_name>/source dir to <common_path>/<builder_name>/build and builds there. So, technically the source tree should be clean of all build stuff. Apparently, it is not always the case, e.g. it's possible that somebody builds there manually. If it happens, the folder will contain previously built binaries and CMakeCache.txt, all of which can conflict with the current build attempt. If it conflicts badly, e.g. CMakeCache.txt belongs to a different version, it's not that bad, because the build just fails right away with a proper message; it gets more complicated when the folder contains unexpected binaries, e.g. somebody ran a build in source with default options, so it built all engines as dynamic libraries; then, the buildbot build attempts to build without Connect engine, and really does so, but ha_connect.so still exists in the folder, gets picked up by MTR, which causes all kinds of oddities. Both situations actually happened already. So, adding git clean is supposed to protect from this. 4) Commented out xtra test, as requested.
              Hide
              elenst Elena Stepanova added a comment - - edited

              Step 3) above is necessary anyway, step 4) was requested, and step 2) is innocent; but I consider step 1) a temporary solution.
              Even though it's set to alwaysRun, it's not completely safe – if a few builds in a row are interrupted abruptly, they can still leave stuff there, and we'll get the space problem again.
              I think currently the space usage there is generally excessive.

              Lets take power8-vlp01 as an example.

              It is set to run one build at a time, which is natural, since it's not VMs.
              It runs three builders: p8-rhel6-bintar, p8-rhel6-bintar-debug and p8-rhel6-rpm, for a number of branches.
              For each builder, it has a "home" folder (a.k.a. builddir, configured in the buildbot config) /home/buildbot/maria-slave/power8-vlp01-bintar, /home/buildbot/maria-slave/power8-vlp01-bintar-debug, /home/buildbot/maria-slave/power8-vlp03-rpm.

              Each folder contains a persistent cloned copy of the source tree ( /home/buildbot/maria-slave/power8-vlp01-bintar/source etc.).
              When a particular builder is active, it goes into its own builddir, removes the stale build subfolder, updates its own source folder, copies it into its own new build folder, builds there, runs the tests, archives and uploads if necessary, exits.
              It makes no sense. Since they run one at a time, they can just as well use one common builddir. It would solve the problem of having six source trees and three binary dirs on the machine simultaneously – there would have been only one persistent source dir, and one build dir which would be re-created on each build anyway. Then the step 1) above would be unnecessary.

              So, I suggest after we release current urgent releases, to do the following (first on power8-vlp01):

              • copy /home/buildbot/maria-slave/power8-vlp01-bintar into /home/buildbot/maria-slave/power8-vlp01 (it should still contain the source subfolder as it does now);
              • reconfigure p8-rhel6-* builders to use builddir: "power8-vlp01" instead of builddir: "power8-vlp01-bintar" etc. as it does now;
              • let it run once for each builder to see it works;
              • remove /home/buildbot/maria-slave/power8-vlp01-* folders;
              • repeat the exercise for other p8 slaves;
              • remove the step 1) that I added earlier.

              Daniel Bartholomew, Sergey Vojtovich, any objections?

              Show
              elenst Elena Stepanova added a comment - - edited Step 3) above is necessary anyway, step 4) was requested, and step 2) is innocent; but I consider step 1) a temporary solution. Even though it's set to alwaysRun , it's not completely safe – if a few builds in a row are interrupted abruptly, they can still leave stuff there, and we'll get the space problem again. I think currently the space usage there is generally excessive. Lets take power8-vlp01 as an example. It is set to run one build at a time, which is natural, since it's not VMs. It runs three builders: p8-rhel6-bintar , p8-rhel6-bintar-debug and p8-rhel6-rpm , for a number of branches. For each builder, it has a "home" folder (a.k.a. builddir, configured in the buildbot config) /home/buildbot/maria-slave/power8-vlp01-bintar , /home/buildbot/maria-slave/power8-vlp01-bintar-debug , /home/buildbot/maria-slave/power8-vlp03-rpm . Each folder contains a persistent cloned copy of the source tree ( /home/buildbot/maria-slave/power8-vlp01-bintar/source etc.). When a particular builder is active, it goes into its own builddir, removes the stale build subfolder, updates its own source folder, copies it into its own new build folder, builds there, runs the tests, archives and uploads if necessary, exits. It makes no sense. Since they run one at a time, they can just as well use one common builddir . It would solve the problem of having six source trees and three binary dirs on the machine simultaneously – there would have been only one persistent source dir, and one build dir which would be re-created on each build anyway. Then the step 1) above would be unnecessary. So, I suggest after we release current urgent releases, to do the following (first on power8-vlp01): copy /home/buildbot/maria-slave/power8-vlp01-bintar into /home/buildbot/maria-slave/power8-vlp01 (it should still contain the source subfolder as it does now); reconfigure p8-rhel6-* builders to use builddir: "power8-vlp01" instead of builddir: "power8-vlp01-bintar" etc. as it does now; let it run once for each builder to see it works; remove /home/buildbot/maria-slave/power8-vlp01-* folders; repeat the exercise for other p8 slaves; remove the step 1) that I added earlier. Daniel Bartholomew , Sergey Vojtovich , any objections?
              Hide
              svoj Sergey Vojtovich added a comment -

              No objections from my side.

              Show
              svoj Sergey Vojtovich added a comment - No objections from my side.
              Hide
              dbart Daniel Bartholomew added a comment -

              No objections from me.

              Show
              dbart Daniel Bartholomew added a comment - No objections from me.
              Hide
              elenst Elena Stepanova added a comment -

              Sadly, the one-dir approach turned out to be impossible, buildbot just does not allow it:

              2015-09-13 02:35:15+0300 [-] duplicate builder builddir 'power8-vlp01'
              2015-09-13 02:35:15+0300 [-] duplicate builder builddir 'power8-vlp01'
              2015-09-13 02:35:15+0300 [-] reconfig aborted without making any changes
              

              So, I'm keeping the previously described (and introduced) solution with removing the builddir after a test, lets see how it goes.

              Show
              elenst Elena Stepanova added a comment - Sadly, the one-dir approach turned out to be impossible, buildbot just does not allow it: 2015-09-13 02:35:15+0300 [-] duplicate builder builddir 'power8-vlp01' 2015-09-13 02:35:15+0300 [-] duplicate builder builddir 'power8-vlp01' 2015-09-13 02:35:15+0300 [-] reconfig aborted without making any changes So, I'm keeping the previously described (and introduced) solution with removing the builddir after a test, lets see how it goes.

                People

                • Assignee:
                  elenst Elena Stepanova
                  Reporter:
                  svoj Sergey Vojtovich
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: