Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4414

INDEX - SUBSTRING, LEFT and others string functions that could be optimized with index

    Details

    • Type: Task
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Fix Version/s: 10.2
    • Component/s: None
    • Labels:
      None

      Description

      Hi guys, i have a table like this:
      CREATE TABLE mov_boleto (
      cc_hash_key VARCHAR(75) NOT NULL DEFAULT '',
      linha_digitavel VARCHAR(54) NOT NULL DEFAULT '',
      PRIMARY KEY (cc_hash_key),
      INDEX linha_digitavel (linha_digitavel)
      )

      with 500k rows
      i need to search part of 'linha_digitavel' field (the left part):
      LEFT(linha_digitavel,37), but i'm having a problem....

      check the query

      SELECT cc_hash_key
      FROM mov_boleto WHERE
      left(linha_digitavel,37) IN ("34191.09008 27471.621287 27580.420001","34191.09008 27471.701287 27580.422001");

      the problem is the optimization...
      "Using where", 521069 rows

      could it use the index linha_digitavel? like the "LIKE" operator?
      SELECT cc_hash_key
      FROM mov_boleto WHERE
      linha_digitavel like "34191.09008 27471.621287 27580.420001%" OR
      linha_digitavel like "34191.09008 27471.621287 27580.422001%" OR
      linha_digitavel ="34191.09008 27471.621287 27580.420001" OR
      linha_digitavel ="34191.09008 27471.621287 27580.422001"

      explain=> "Using index condition; Using where", rows =2

      could SUBSTRING(char field,1,?) and LEFT(char_field,?)
      FUNCTIONS be optimized for search?

      --------
      i'm thinking about a query rewrite in sql_select.cc or opt_range.cc, i think that opt_range.cc is better
      but we can optimize ENUM too when MDEV-4419 is done, in this case sql_select.cc is a better place to optimize...

      the point is, optimize rewrinting this:

      LEFT(column,1234) = "some string"
      or
      SUBSTR(column,1,1234) = "some string"
      

      to

      (LEFT(column,1234) = "some string" AND column LIKE "some string%")
      or
      (SUBSTR(column,1,1234) = "some string" AND column LIKE "some string%")
      

      and this:

      LEFT(column,1234) IN ('string1','string2','string3')
      

      to

      (LEFT(column,1234) IN ('string1','string2','string3') AND 
      (
        LEFT(column,1234) LIKE 'string1%' OR 
        LEFT(column,1234) LIKE 'string2%' OR 
        LEFT(column,1234) LIKE 'string3%'
      )
      )
      

      or rewrite the LEFT(column,1234) to a virtual column, if it's exists, in this case we can use index

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              rspadim roberto spadim added a comment -

              i tryed
              explain
              SELECT cc_hash_key
              FROM mov_boleto WHERE
              CAST(linha_digitavel AS char(37)) in
              ("34191.09008 27471.621287 27580.420001","34191.09008 27471.621287 27580.422001")

              no sucess, >500k rows in explain

              Show
              rspadim roberto spadim added a comment - i tryed explain SELECT cc_hash_key FROM mov_boleto WHERE CAST(linha_digitavel AS char(37)) in ("34191.09008 27471.621287 27580.420001","34191.09008 27471.621287 27580.422001") no sucess, >500k rows in explain
              Hide
              rspadim roberto spadim added a comment -

              feature request MDEV-4415 , could help optimizing this query without many (column LIKE value OR column LIKE value2 OR ....)

              Show
              rspadim roberto spadim added a comment - feature request MDEV-4415 , could help optimizing this query without many (column LIKE value OR column LIKE value2 OR ....)
              Hide
              rspadim roberto spadim added a comment -

              maybe MDEV-4430 could be used here too

              Show
              rspadim roberto spadim added a comment - maybe MDEV-4430 could be used here too

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  rspadim roberto spadim
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated:

                    Time Tracking

                    Estimated:
                    Original Estimate - 1 day, 2 hours
                    1d 2h
                    Remaining:
                    Remaining Estimate - 1 day, 2 hours
                    1d 2h
                    Logged:
                    Time Spent - Not Specified
                    Not Specified