Details

    • Type: Task
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Assume strict mode is enabled and I have to update 10000 rows in a table with 100000 rows. The updates are independent of each other (so update xbt_files set leechers = 1 where fid in (1,2) doesn't work).
      Executing 10000 queries isn't efficient. There's a nice trick via insert ... on duplicate key update, but it breaks in strict mode (and strict mode is good, so I don't want to disable it).

      Another way is to execute the updates via update ... case ... when ... then ... else ... end. However, this way can be 20x slower (for 10000 updates) and the syntax is worse.
      So I'd like to request support for bulk updates. I think batch/bulk updates are a common operation and deserve proper and efficient support.

      Possible syntax:

      update2 xbt_files (fid, leechers, seeders, completed) values (3,1,2,3),(2,4,5,6) set
      leechers = values(leechers), seeders = values(seeders), completed = values(completed);
      

      The syntax is just a possibility, the semantics are the most important part.

      How to repeat:

      set sql_mode = '';
      
      drop table if exists xbt_files;
      
      create table xbt_files
      (
       fid int not null auto_increment,
       info_hash blob not null,
       leechers int not null default 0,
       seeders int not null default 0,
       completed int not null default 0,
       flags int not null default 0,
       mtime int not null,
       ctime int not null,
       primary key (fid),
       unique key (info_hash(20))
      );
      
      insert into xbt_files (info_hash, mtime, ctime) values ('oRUImUWFyXvKItw3tVfB',
      unix_timestamp(), unix_timestamp()), ('7XnB2eksvUgHvavG3YPJ', unix_timestamp(),
      unix_timestamp()), ('9EsYv9y72FuHsL5yQYf8', unix_timestamp(), unix_timestamp());
      
      insert into xbt_files (fid, leechers, seeders, completed) values (2,6268,4510,38453) on
      duplicate key update leechers = values(leechers), seeders = values(seeders), completed =
      values(completed);
      
      update xbt_files set leechers = case fid when 2 then 4 else leechers end, seeders = case
      fid when 2 then 5 else seeders end, completed = case fid when 2 then 6 else completed end
      where fid in (2);
      
      set sql_mode = 'strict_all_tables';
      
      insert into xbt_files (fid, leechers, seeders, completed) values (3,6268,4510,38453) on
      duplicate key update leechers = values(leechers), seeders = values(seeders), completed =
      values(completed);
      # ERROR 1364 (HY000): Field 'info_hash' doesn't have a default value
      
      update xbt_files set leechers = case fid when 3 then 7 else leechers end, seeders = case
      fid when 3 then 8 else seeders end, completed = case fid when 3 then 9 else completed end
      where fid in (3);
      
      select * from xbt_files;
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              knielsen Kristian Nielsen added a comment -

              Re: Bulk Update statement
              Have you tried sending multiple UPDATE statements in a single request?

              I think you would be able to achieve most of the needed efficiency this way.

              Show
              knielsen Kristian Nielsen added a comment - Re: Bulk Update statement Have you tried sending multiple UPDATE statements in a single request? I think you would be able to achieve most of the needed efficiency this way.
              Hide
              olafvanderspek Olaf van der Spek added a comment -

              Re: Bulk Update statement
              TBH I haven't. Why do you think this would achieve comparable performance?

              Show
              olafvanderspek Olaf van der Spek added a comment - Re: Bulk Update statement TBH I haven't. Why do you think this would achieve comparable performance?
              Hide
              knielsen Kristian Nielsen added a comment -

              Re: Bulk Update statement
              > TBH I haven't. Why do you think this would achieve comparable performance?

              The main overhead in executing 10000 statements one after the other is in round-trips between the client and server.

              By sending multiple statements in batches, this overhead is eliminated.

              Show
              knielsen Kristian Nielsen added a comment - Re: Bulk Update statement > TBH I haven't. Why do you think this would achieve comparable performance? The main overhead in executing 10000 statements one after the other is in round-trips between the client and server. By sending multiple statements in batches, this overhead is eliminated.
              Hide
              olafvanderspek Olaf van der Spek added a comment -

              Re: Bulk Update statement
              That way, the total query size is 86 byte per row vs 20 b per row. That's no good. Queries are already 1+ mb.
              This has to be send, transfered, received and parsed. Then you need to lock 10000 times. Then you might need to commit 10000 times.

              Hmm, I guess benchmarking settles this dispute faster.

              Show
              olafvanderspek Olaf van der Spek added a comment - Re: Bulk Update statement That way, the total query size is 86 byte per row vs 20 b per row. That's no good. Queries are already 1+ mb. This has to be send, transfered, received and parsed. Then you need to lock 10000 times. Then you might need to commit 10000 times. Hmm, I guess benchmarking settles this dispute faster.
              Hide
              psergey Sergei Petrunia added a comment -

              Re: Bulk Update statement
              How about using a temporary table and a multi table UPDATE statement:

              – create a temptable with update 'specifications':
              – fid=3 : change (leechers, seeders, completed) to (7,8,9)
              – fid=2 : change (leechers, seeders, completed) to (4,5,6)
              create temporary table updates (
              fid int,
              new_leechers int,
              new_seeders int,
              new_completed int
              );
              insert into updates values (3, 7, 8, 9), (2, 4, 5, 6);

              update xbt_files, updates
              set
              xbt_files.leechers = updates.new_leechers,
              xbt_files.seeders = updates.new_seeders,
              xbt_files.completed = updates.new_completed
              where
              xbt_files.fid=updates.fid;

              Show
              psergey Sergei Petrunia added a comment - Re: Bulk Update statement How about using a temporary table and a multi table UPDATE statement: – create a temptable with update 'specifications': – fid=3 : change (leechers, seeders, completed) to (7,8,9) – fid=2 : change (leechers, seeders, completed) to (4,5,6) create temporary table updates ( fid int, new_leechers int, new_seeders int, new_completed int ); insert into updates values (3, 7, 8, 9), (2, 4, 5, 6); update xbt_files, updates set xbt_files.leechers = updates.new_leechers, xbt_files.seeders = updates.new_seeders, xbt_files.completed = updates.new_completed where xbt_files.fid=updates.fid;
              Hide
              olafvanderspek Olaf van der Spek added a comment -

              Re: Bulk Update statement
              Doesn't work because the same row might be updated multiple times. It's also slower and requires 3 queries instead of 1.

              Show
              olafvanderspek Olaf van der Spek added a comment - Re: Bulk Update statement Doesn't work because the same row might be updated multiple times. It's also slower and requires 3 queries instead of 1.
              Hide
              ratzpo Rasmus Johansson added a comment -

              Launchpad bug id: 692669

              Show
              ratzpo Rasmus Johansson added a comment - Launchpad bug id: 692669
              Hide
              serg Sergei Golubchik added a comment -

              I suppose the simplest workaround would be to use insert ... on duplicate key update indeed. And specify values for all columns as required. The values don't matter, if they'll never be used anyway.

              It's a workaround, I agree. A nicer solution would be not to issue “ERROR 1364 (HY000): Field 'info_hash' doesn't have a default value” error if there's a unique key conflict. But it's not easy to implement in a reasonable way.

              Show
              serg Sergei Golubchik added a comment - I suppose the simplest workaround would be to use insert ... on duplicate key update indeed. And specify values for all columns as required. The values don't matter, if they'll never be used anyway. It's a workaround, I agree. A nicer solution would be not to issue “ERROR 1364 (HY000): Field 'info_hash' doesn't have a default value” error if there's a unique key conflict. But it's not easy to implement in a reasonable way.

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  olafvanderspek Olaf van der Spek
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  1 Start watching this issue

                  Dates

                  • Created:
                    Updated: