Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Won't Fix
-
Affects Version/s: 10.0.14
-
Fix Version/s: 10.0.15
-
Component/s: Storage Engine - Connect
-
Labels:None
-
Environment:Ubuntu trusty
Description
The request path in the 2nd line of the following sample has a trailing quotation mark, which is escaped with a backslash:
imusic.dk:80 5.255.253.131 - - [20/Jul/2014:10:47:25 +0200] "GET /image.php?id=2090503593091&type=thumbnail HTTP/1.1" 302 266 "-" "Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)" imusic.dk:80 157.55.39.105 - - [20/Jul/2014:10:47:25 +0200] "GET /cd/5053105444328/christoffer-hoeyer-2012-silent-songs-about-things-that-don-t-happen-cd\" HTTP/1.1" 301 335 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
Here's a CREATE string to query the log data using the CONNECT engine:
CREATE TABLE access_logs (
server_name CHAR(30) NOT NULL FIELD_FORMAT = ' %n%s%n',
remote_host CHAR(15) NOT NULL FIELD_FORMAT = ' %n%s%n',
time DATETIME NOT NULL FIELD_FORMAT = ' - - [%n%s%n +0200]'
DATE_FORMAT = 'DD/MMM/YYYY:hh:mm:ss',
request VARCHAR(16384) NOT NULL FIELD_FORMAT = ' "%n%[^"]%n"',
status SMALLINT UNSIGNED NOT NULL FIELD_FORMAT = ' %n%d%n',
bytes_sent INT UNSIGNED NOT NULL FIELD_FORMAT = ' %n%d%n',
referer VARCHAR(16384) NOT NULL FIELD_FORMAT = ' "%n%[^"]%m',
user_agent VARCHAR(512) NOT NULL FIELD_FORMAT = '" "%n%[^"]%n"'
)
ENGINE = CONNECT
TABLE_TYPE = FMT
FILE_NAME = '/tmp/text.txt';
Querying the first line works well:
MariaDB [test]> SELECT * FROM access_logs LIMIT 1;
+--------------+---------------+---------------------+---------------------------------------------------------+--------+------------+---------+---------------------------------------------------------------------+
| server_name | remote_host | time | request | status | bytes_sent | referer | user_agent |
+--------------+---------------+---------------------+---------------------------------------------------------+--------+------------+---------+---------------------------------------------------------------------+
| imusic.dk:80 | 5.255.253.131 | 2014-07-20 10:47:02 | GET /image.php?id=2090503593091&type=thumbnail HTTP/1.1 | 302 | 266 | - | Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots) |
+--------------+---------------+---------------------+---------------------------------------------------------+--------+------------+---------+---------------------------------------------------------------------+
1 row in set, 1 warning (0.01 sec)
For the second line, I get an error:
MariaDB [test]> SELECT * FROM access_logs LIMIT 2 OFFSET 1; ERROR 1296 (HY000): Got error 122 'Bad format line 2 field 5 of access_logs' from CONNECT
I suppose the engine chokes on the escaped quotation mark (\"), and assume that this case requires the ability to specify an escape character (which in turn might prove troublesome, in case the parsing is sscanf-based).
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
Unfortunately the parsing of FMT tables is sscanf-based indeed. I see no solution for this problem so far.
However, if the number of such lines is not to big in your table, what you can do temporarily is:
Where maxerr is set to a number larger than the number of bad lines. Doing so the query:
returns:
NOTE: According to the documentation, it would have been enough to specify
However this does not work. This bug will be fixed in next version.
By the way, when doing the SELECT command, I got two warnings:
Indeed the result seems truncated.
This is another bug that I have fixed for new versions.
OEM:
Of course, another solution would be to write an OEM table type with parsing based on lex/yacc. It is what could be done for a customer having a need for using such files. It is precisely the purpose of the OEM type.