Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6032

Slow DELETE Query (> 7000 seconds) on small table (60k rows)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Incomplete
    • Affects Version/s: 10.0.11
    • Fix Version/s: 10.0.11
    • Component/s: None
    • Labels:
    • Environment:

      Description

      Two tables:

      aggregated_data.xx_aggregated_ad_revenue (~1150 rows) and xx_tracker.ad_revenue (~60,000 rows), running the following delete query as a stored procedure on our old machine with Percona Server (GPL), Release rel29.0, Revision 315 (5.5.27-29.0-log) takes seconds up to 1-2 minutes, same query on MariaDB 10 takes more than 7000 seconds.

      
      -- DB: aggregated_data
      -- Command: Query
      -- Time: 7103
      -- State: updating
      DELETE FROM aggregated_data.xx_aggregated_ad_revenue
      			WHERE network = 'networkA' AND booking_date IN (
      				SELECT CONCAT(SUBSTRING(a.booking_date,-4), SUBSTRING(a.booking_date,3,2), SUBSTRING(a.booking_date,1,2)) AS booking_date
      				FROM xx_tracker.ad_revenue a
      				JOIN
      				(
      					SELECT CONCAT(SUBSTRING(booking_date,-4), SUBSTRING(booking_date,3,2), SUBSTRING(booking_date,1,2)) AS booking_date, MAX(created_ts) AS max_ts
      					FROM xx_tracker.ad_revenue b
      					WHERE network = 'networkA'
      					GROUP BY b.booking_date
      				) AS i
      				ON CONCAT(SUBSTRING(a.booking_date,-4), SUBSTRING(a.booking_date,3,2), SUBSTRING(a.booking_date,1,2)) = i.booking_date AND i.max_ts = a.created_ts
      				WHERE a.created_ts >= NOW() - INTERVAL 1 DAY
      			)
      

      Even though the same indices exist on Percona and MariaDB (tables have been created by dumping the definitions are exactly the same) - MariaDB is using a key (EXPLAIN of subselect: http://puu.sh/7Zjuh.png), where Percona does not (EXPLAIN of subselect: http://puu.sh/7ZjuT.png - but Percona is way faster.
      EXPLAIN DELETE for MariaDB 10: http://puu.sh/7ZjDl.png

      Indices on aggregated_data.xx_aggregated_ad_revenue:
      1: PRIMARY id, created_date
      2: created_date
      3: network

      Indices on xx_tracker.ad_revenue:
      1. PRIMARY id, created_date
      2. created_date
      3. network

      Running the Subselect of the DELETE alone is fast (< 1-2 seconds). The query is a stored procedure which spends 95% of the time in the "Sending data" state.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            james_woods Tobias Feldhaus added a comment -

            Dropping the index on the (network) column in xx_tracker.ad_revenue and creating an index on (booking_date, network) fixed the performance problem and leads to the following query plan:

            http://puu.sh/7Zw3q.png

            Show
            james_woods Tobias Feldhaus added a comment - Dropping the index on the (network) column in xx_tracker.ad_revenue and creating an index on (booking_date, network) fixed the performance problem and leads to the following query plan: http://puu.sh/7Zw3q.png
            Hide
            psergey Sergei Petrunia added a comment -

            If dropping/creating an index has changed the query plan, this is an indication that poor query plan was chosen because the storage engine (I guess this was InnoDB?) provided the optimizer with bad estimates.

            Then, one may ask why InnoDB in Percona didn't return bad estimates while InnoDB in MariaDB did. I have no answer for this.

            Show
            psergey Sergei Petrunia added a comment - If dropping/creating an index has changed the query plan, this is an indication that poor query plan was chosen because the storage engine (I guess this was InnoDB?) provided the optimizer with bad estimates. Then, one may ask why InnoDB in Percona didn't return bad estimates while InnoDB in MariaDB did. I have no answer for this.
            Hide
            psergey Sergei Petrunia added a comment -

            A single-table DELETE ... WHERE x IN (...) is unable to make use of subquery optimizations (common missing feature of MariaDB 5.5+ and MySQL 5.6+).

            Tobias Feldhaus, could you try a "fake" multi-table DELETE? Instead of

            DELETE FROM aggregated_data.xx_aggregated_ad_revenue WHERE
            

            run

            DELETE aggregated_data.xx_aggregated_ad_revenue FROM aggregated_data.xx_aggregated_ad_revenue WHERE
            

            Please first note the EXPLAIN and then compare execution speed..

            Show
            psergey Sergei Petrunia added a comment - A single-table DELETE ... WHERE x IN (...) is unable to make use of subquery optimizations (common missing feature of MariaDB 5.5+ and MySQL 5.6+). Tobias Feldhaus , could you try a "fake" multi-table DELETE? Instead of DELETE FROM aggregated_data.xx_aggregated_ad_revenue WHERE run DELETE aggregated_data.xx_aggregated_ad_revenue FROM aggregated_data.xx_aggregated_ad_revenue WHERE Please first note the EXPLAIN and then compare execution speed..
            Hide
            psergey Sergei Petrunia added a comment -

            Another question: is the issue repeatable? If one takes a mysqldump and loads it into server, do they get a good query plan or a bad one? (Or, is there any way to reliably get a bad query plan? If yes, this will make it possible to debug)

            Show
            psergey Sergei Petrunia added a comment - Another question: is the issue repeatable? If one takes a mysqldump and loads it into server, do they get a good query plan or a bad one? (Or, is there any way to reliably get a bad query plan? If yes, this will make it possible to debug)
            Hide
            psergey Sergei Petrunia added a comment -

            Yet another thing: one of subqueries has "GROUP BY b.booking_date". With current optimizer, GROUP BY is going prevent FROM-subquery from being merged into the upper query. Can you remove the "GROUP BY" part and see what happens?

            (And this too is unrelated to the question of why it worked bad before dropping an index and started work better after re-creating it. However, it could be that there is little one could do about that problem, so it's better to work around it)

            Show
            psergey Sergei Petrunia added a comment - Yet another thing: one of subqueries has "GROUP BY b.booking_date". With current optimizer, GROUP BY is going prevent FROM-subquery from being merged into the upper query. Can you remove the "GROUP BY" part and see what happens? (And this too is unrelated to the question of why it worked bad before dropping an index and started work better after re-creating it. However, it could be that there is little one could do about that problem, so it's better to work around it)
            Hide
            james_woods Tobias Feldhaus added a comment -

            Sorry for not coming back to this issue for so long. We ended up optimizing the query and getting rid of non-date types in the table (to no longer need to parse through varchar fields via the SUBSTRING field). This made the query run in under a minute.

            Show
            james_woods Tobias Feldhaus added a comment - Sorry for not coming back to this issue for so long. We ended up optimizing the query and getting rid of non-date types in the table (to no longer need to parse through varchar fields via the SUBSTRING field). This made the query run in under a minute.
            Hide
            psergey Sergei Petrunia added a comment -

            Good to know there was some solution. I guess, it's not possible to get the dataset that demonstrated the problem anymore? In this case, it would be hard to analyze if there was something we could improve in the optimizer, so I have to close this issue.

            Show
            psergey Sergei Petrunia added a comment - Good to know there was some solution. I guess, it's not possible to get the dataset that demonstrated the problem anymore? In this case, it would be hard to analyze if there was something we could improve in the optimizer, so I have to close this issue.

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                james_woods Tobias Feldhaus
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: