Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8322

Distinguish between time and date strings more carefully

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.1, 10.0, 5.5
    • Fix Version/s: 10.1
    • Component/s: None
    • Labels:
      None
    • Sprint:
      10.1.6-1, 10.1.6-2

      Description

      We need to distinguish better between date ad time strings when parsing strings in str_to_date(), str_to_time(), etc.

      The problem is well described in this letter from Sergei:

      Hi, Alexander!

      On Jun 15, Alexander Barkov wrote:

      >> (preference matters, as it tells how to parse ambiguous strings like
      >> "10:10:10").
      >
      > I'd say '10:10:10' should be unambiguously treated as time.
      > Colon is never used to delimit date parts. Is it?
      

      yes, I believe delimiters are pretty much ignored in our code. so any
      delimiter can be used anywhere.

      > Date parts are usually delimited by as follows:
      > '01-01-01'
      > '01.01.01'
      > '01/01/01'
      >
      > But this is a kind of separate issue. Would you like me to create a task 
      > for this?
      

      The way it works now - after your patch - there's no much need for a
      "preference" flag. The only issue I've uncovered in testing was related
      to parsing strings with time preference. Like in

        WHERE time_column > '2010-12-11'
      

      The code is

        my_bool str_to_time(const char *str, uint length, MYSQL_TIME *l_time,
                            ulonglong fuzzydate, MYSQL_TIME_STATUS *status)
        {
        ...
          /* Check first if this is a full TIMESTAMP */
          if (length >= 12)
          {                                             /* Probably full timestamp */
            (void) str_to_datetime(str, length, l_time,
                                   (fuzzydate & ~TIME_TIME_ONLY) | TIME_DATETIME_ONLY,
                                    status);
      

      Which is very stupid, it decides solely on the string length. That is '2010-12-11' is parsed as a time (when there's time preference), but '10:11:12.123456' is parsed as a date (but fails and falls back to time).

      I would suggest to get rid of this ad hoc detection code (check the length, try and fall back, etc). And use a systematic approach based on patterns. Like

         patterns[]=
         {
           { 'yyyy-mm-dd' , parse_date },
           { 'hh:mm:ss.uuuuuu', parse_time },
           ...
         }
      

      Note, I wrote "like". I do not mean literally these patterns or string patterns whatsoever. I'd prefer something much faster. May be some compact "signature" number that describes the format, or may be a decision tree (where the string is parsed into an array of ints and then analyzed like three numbers? first is 4 digit? etc).

        Gliffy Diagrams

          Attachments

            Activity

            There are no comments yet on this issue.

              People

              • Assignee:
                bar Alexander Barkov
                Reporter:
                bar Alexander Barkov
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Agile