Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6357

QUERY CACHE - Query Cache And Limit Clause

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Incomplete
    • Affects Version/s: 10.0.10
    • Fix Version/s: N/A
    • Component/s: None
    • Labels:

      Description

      In reading about the query cache, I came across an undocumented situation.

      https://mariadb.com/kb/en/query-cache/

      Basically, the cache should consume queries that always return the same results. With that said, I have noticed that the query cache will cache a query with a LIMIT clause without an ORDER BY clause. This seems incorrect because the LIMIT could return different results for different queries.

      For completeness sake, I would suggest not caching queries that use a LIMIT without an ORDER BY clause.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            danblack Daniel Black added a comment -

            Just because a LIMIT without an ORDER BY could return a different result doesn't mean that the result will be returned from the cache is incorrect.

            If the output needs a deterministic order then a ORDER BY should be used.

            If no caching is needed then SQL_NO_CACHE can be used.

            I'd be of the impression failing to cache LIMIT-full and ORDER BY-less queries would be an unnecessary change in behaviour but that's just my opinion.

            What aspect of this is undocumented? There's nothing special about ORDER BY or LIMIT so its not documented as such.

            Show
            danblack Daniel Black added a comment - Just because a LIMIT without an ORDER BY could return a different result doesn't mean that the result will be returned from the cache is incorrect. If the output needs a deterministic order then a ORDER BY should be used. If no caching is needed then SQL_NO_CACHE can be used. I'd be of the impression failing to cache LIMIT-full and ORDER BY-less queries would be an unnecessary change in behaviour but that's just my opinion. What aspect of this is undocumented? There's nothing special about ORDER BY or LIMIT so its not documented as such.
            Hide
            elenst Elena Stepanova added a comment -

            I agree with Daniel's comment, or maybe I don't quite understand the nature of complaint.

            Could you please describe a situation where the result would actually be wrong because of using the query cache?
            I don't expect you to create a test case, just words and pseudo-code would be enough, as long as it's clear what goes wrong at the end.

            Show
            elenst Elena Stepanova added a comment - I agree with Daniel's comment, or maybe I don't quite understand the nature of complaint. Could you please describe a situation where the result would actually be wrong because of using the query cache? I don't expect you to create a test case, just words and pseudo-code would be enough, as long as it's clear what goes wrong at the end.
            Hide
            elenst Elena Stepanova added a comment -

            Closing as incomplete for now, if you have any additional information, please comment to re-open.

            Show
            elenst Elena Stepanova added a comment - Closing as incomplete for now, if you have any additional information, please comment to re-open.
            Hide
            belugabehr BELUGABEHR added a comment -

            Elena,

            Thank you for following up on this trivial remark. I ask that you don't sink too much time into thinking about this. I am probably thinking about this too hard myself.

            "It also means [the] query has to be exactly the same and deterministic."
            http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/

            It's the word "deterministic" that I find interesting in the description. I honestly cannot imagine a true business scenario, but a limit clause without a order by is not deterministic and the result set could change row order between queries. No? To store an un-ordered result list does not seem true to form.

            Within the documentation for replication, it states:

            "Statement-based replication of LIMIT clauses in DELETE, UPDATE, and INSERT ... SELECT statements is unsafe since the order of the rows affected is not defined."

            So, it does seem to me that caching a result set that has not been first ordered may not be the desired behavior.

            Show
            belugabehr BELUGABEHR added a comment - Elena, Thank you for following up on this trivial remark. I ask that you don't sink too much time into thinking about this. I am probably thinking about this too hard myself. "It also means [the] query has to be exactly the same and deterministic." http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/ It's the word "deterministic" that I find interesting in the description. I honestly cannot imagine a true business scenario, but a limit clause without a order by is not deterministic and the result set could change row order between queries. No? To store an un-ordered result list does not seem true to form. Within the documentation for replication, it states: "Statement-based replication of LIMIT clauses in DELETE, UPDATE, and INSERT ... SELECT statements is unsafe since the order of the rows affected is not defined." So, it does seem to me that caching a result set that has not been first ordered may not be the desired behavior.
            Hide
            danblack Daniel Black added a comment -

            ah. the select without a order by isn't deterministic in the sense that exactly which rows are returned isn't predictable. With no data changes to the tables (or global config changes) however two sequential queries will return the same result (there's no chance the query plan will change). Because this is the case it may as well go in the query cache. Data changes will invalidate the cache anyway.

            Show
            danblack Daniel Black added a comment - ah. the select without a order by isn't deterministic in the sense that exactly which rows are returned isn't predictable. With no data changes to the tables (or global config changes) however two sequential queries will return the same result (there's no chance the query plan will change). Because this is the case it may as well go in the query cache. Data changes will invalidate the cache anyway.
            Hide
            elenst Elena Stepanova added a comment -

            Yes, using the word "deterministic" may be not completely accurate in this case; but then again, it is not official documentation, and besides later in the same post Peter introduces the definition of "deterministic" query, which actually fits fine a query with LIMIT without ORDER BY:

            Query must be deterministic – Query might provide same result no matter how many times it is run, if data remains the same. So if query works with current data, uses non-deterministic functions such as UUID(), RAND(), CONNECTION_ID() etc it will not be cached.

            This query indeed might provide same result – even though there is no obligation to (unlike a query with CONNECTION_ID(), which might not produce the same result if it's run from different connections).

            As far as I can see, the official MySQL manual avoids claiming that a query must be deterministic, and instead just provides a list of exceptions explicitly.

            Show
            elenst Elena Stepanova added a comment - Yes, using the word "deterministic" may be not completely accurate in this case; but then again, it is not official documentation, and besides later in the same post Peter introduces the definition of "deterministic" query, which actually fits fine a query with LIMIT without ORDER BY: Query must be deterministic – Query might provide same result no matter how many times it is run, if data remains the same. So if query works with current data, uses non-deterministic functions such as UUID(), RAND(), CONNECTION_ID() etc it will not be cached. This query indeed might provide same result – even though there is no obligation to (unlike a query with CONNECTION_ID(), which might not produce the same result if it's run from different connections). As far as I can see, the official MySQL manual avoids claiming that a query must be deterministic, and instead just provides a list of exceptions explicitly.

              People

              • Assignee:
                Unassigned
                Reporter:
                belugabehr BELUGABEHR
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved: