Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6845

Unable to specify escape character for FMT

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 10.0.14
    • Fix Version/s: 10.0.15
    • Labels:
      None
    • Environment:
      Ubuntu trusty

      Description

      The request path in the 2nd line of the following sample has a trailing quotation mark, which is escaped with a backslash:

      imusic.dk:80 5.255.253.131 - - [20/Jul/2014:10:47:25 +0200] "GET /image.php?id=2090503593091&type=thumbnail HTTP/1.1" 302 266 "-" "Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)"
      imusic.dk:80 157.55.39.105 - - [20/Jul/2014:10:47:25 +0200] "GET /cd/5053105444328/christoffer-hoeyer-2012-silent-songs-about-things-that-don-t-happen-cd\" HTTP/1.1" 301 335 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
      

      Here's a CREATE string to query the log data using the CONNECT engine:

      CREATE TABLE access_logs (
        server_name CHAR(30)          NOT NULL FIELD_FORMAT = ' %n%s%n',
        remote_host CHAR(15)          NOT NULL FIELD_FORMAT = ' %n%s%n',
        time        DATETIME          NOT NULL FIELD_FORMAT = ' - - [%n%s%n +0200]' 
          DATE_FORMAT = 'DD/MMM/YYYY:hh:mm:ss',
        request     VARCHAR(16384)    NOT NULL FIELD_FORMAT = ' "%n%[^"]%n"',
        status      SMALLINT UNSIGNED NOT NULL FIELD_FORMAT = ' %n%d%n',
        bytes_sent  INT UNSIGNED      NOT NULL FIELD_FORMAT = ' %n%d%n',
        referer     VARCHAR(16384)    NOT NULL FIELD_FORMAT = ' "%n%[^"]%m',
        user_agent  VARCHAR(512)      NOT NULL FIELD_FORMAT = '" "%n%[^"]%n"'
      )
        ENGINE = CONNECT
        TABLE_TYPE = FMT
        FILE_NAME = '/tmp/text.txt';
      

      Querying the first line works well:

      MariaDB [test]> SELECT * FROM access_logs LIMIT 1;
      +--------------+---------------+---------------------+---------------------------------------------------------+--------+------------+---------+---------------------------------------------------------------------+
      | server_name  | remote_host   | time                | request                                                 | status | bytes_sent | referer | user_agent                                                          |
      +--------------+---------------+---------------------+---------------------------------------------------------+--------+------------+---------+---------------------------------------------------------------------+
      | imusic.dk:80 | 5.255.253.131 | 2014-07-20 10:47:02 | GET /image.php?id=2090503593091&type=thumbnail HTTP/1.1 |    302 |        266 | -       | Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots) |
      +--------------+---------------+---------------------+---------------------------------------------------------+--------+------------+---------+---------------------------------------------------------------------+
      1 row in set, 1 warning (0.01 sec)
      

      For the second line, I get an error:

      MariaDB [test]> SELECT * FROM access_logs LIMIT 2 OFFSET 1;
      ERROR 1296 (HY000): Got error 122 'Bad format line 2 field 5 of access_logs' from CONNECT
      

      I suppose the engine chokes on the escaped quotation mark (\"), and assume that this case requires the ability to specify an escape character (which in turn might prove troublesome, in case the parsing is sscanf-based).

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            bertrandop Olivier Bertrand added a comment - - edited

            Unfortunately the parsing of FMT tables is sscanf-based indeed. I see no solution for this problem so far.

            However, if the number of such lines is not to big in your table, what you can do temporarily is:

            ALTER TABLE access_logs OPTION_LIST='maxerr=5000,accept=1';
            

            Where maxerr is set to a number larger than the number of bad lines. Doing so the query:

            SELECT * FROM access_logs;
            

            returns:

            server_name remote_host time request status bytes_sent referer user_agent
            imusic.dk:80 5.255.253.131 2014-07-20 10:47:02 GET /image.php?id=2090503593091&type=thumbnail HTTP/1.1 302 266 - Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)
            imusic.dk:80 157.55.39.105 2014-07-20 10:47:02 GET /cd/5053105444328/christoffer-hoeyer-2012-silent-songs-about-things-that-don-t-happen-cd|0 0

            NOTE: According to the documentation, it would have been enough to specify

            ALTER TABLE access_logs OPTION_LIST='accept';
            

            However this does not work. This bug will be fixed in next version.

            By the way, when doing the SELECT command, I got two warnings:

            Level Code Message
            Warning 1105 Out of range value for column time at row 1
            Warning 1105 Out of range value for column time at row 2

            Indeed the result seems truncated.
            This is another bug that I have fixed for new versions.

            OEM:
            Of course, another solution would be to write an OEM table type with parsing based on lex/yacc. It is what could be done for a customer having a need for using such files. It is precisely the purpose of the OEM type.

            Show
            bertrandop Olivier Bertrand added a comment - - edited Unfortunately the parsing of FMT tables is sscanf-based indeed. I see no solution for this problem so far. However, if the number of such lines is not to big in your table, what you can do temporarily is: ALTER TABLE access_logs OPTION_LIST='maxerr=5000,accept=1'; Where maxerr is set to a number larger than the number of bad lines. Doing so the query: SELECT * FROM access_logs; returns: server_name remote_host time request status bytes_sent referer user_agent imusic.dk:80 5.255.253.131 2014-07-20 10:47:02 GET /image.php?id=2090503593091&type=thumbnail HTTP/1.1 302 266 - Mozilla/5.0 (compatible; YandexImages/3.0; + http://yandex.com/bots ) imusic.dk:80 157.55.39.105 2014-07-20 10:47:02 GET /cd/5053105444328/christoffer-hoeyer-2012-silent-songs-about-things-that-don-t-happen-cd|0 0 NOTE: According to the documentation, it would have been enough to specify ALTER TABLE access_logs OPTION_LIST='accept'; However this does not work. This bug will be fixed in next version. By the way, when doing the SELECT command, I got two warnings: Level Code Message Warning 1105 Out of range value for column time at row 1 Warning 1105 Out of range value for column time at row 2 Indeed the result seems truncated. This is another bug that I have fixed for new versions. OEM: Of course, another solution would be to write an OEM table type with parsing based on lex/yacc. It is what could be done for a customer having a need for using such files. It is precisely the purpose of the OEM type.

              People

              • Assignee:
                bertrandop Olivier Bertrand
                Reporter:
                crishoj Christian Rishøj
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 2 hours
                  2h