Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6217

CONNECT engine cannot handle varchar fields containing newline

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.0.10
    • Fix Version/s: 10.0
    • Component/s: None
    • Labels:
    • Environment:
      windows 7 64 bit (actually v10.0.10 without galera)

      Description

      Given a CSV file with a varchar field containing a newline:

      f1,f2
      1,"simple ""text"" field"
      2,"includes newline
      in field"
      3,"simple"
      

      I can create a CONNECT CSV table:

      CREATE TABLE `simple` (
        `f1` int(1) NOT NULL,
        `f2` varchar(200) NOT NULL
      ) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='csv' `FILE_NAME`='c:/Program Files/MariaDB 10.0/data/mw/simple.csv' `SEP_CHAR`=',' `QCHAR`='"' `HEADER`=1;
      

      But selecting from the table fails:

      MariaDB [mw]> select * from simple;
      ERROR 1296 (HY000): Got error 122 'Missing ending quote in simple field 2 line 2' from CONNECT
      

      This makes the CONNECT CSV table unusable for my application.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            bertrandop Olivier Bertrand added a comment - - edited

            Sure enough for most file table types (except BIN, VEC and XML) CONNECT regards a physical line of the file as a table row.
            For it your CSV file has four rows, the second and third being wrong because not having the required number of fields.
            Currently, if you have only a few fields containing newline characters, a bypass is to use the MAXERR option to skip them:

            alter table simple option_list='maxerr=10';
            

            Giving a MAXERR value big enough to skip all wrong lines (each newline character will make 2 wrong lines)
            Another alternative is to use the CSV engine that handles field containing newline characters.

            I will see if I can do the same in the special case of CSV tables but it is not in my list of priorities, sorry about that...

            Show
            bertrandop Olivier Bertrand added a comment - - edited Sure enough for most file table types (except BIN, VEC and XML) CONNECT regards a physical line of the file as a table row. For it your CSV file has four rows, the second and third being wrong because not having the required number of fields. Currently, if you have only a few fields containing newline characters, a bypass is to use the MAXERR option to skip them: alter table simple option_list='maxerr=10'; Giving a MAXERR value big enough to skip all wrong lines (each newline character will make 2 wrong lines) Another alternative is to use the CSV engine that handles field containing newline characters. I will see if I can do the same in the special case of CSV tables but it is not in my list of priorities, sorry about that...

              People

              • Assignee:
                bertrandop Olivier Bertrand
                Reporter:
                martinwaite Martin WaIte
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: