Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7375

FEDERATED + DISCOVERY can make UTF8 columns to be corrupted

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.0.15
    • Fix Version/s: 10.0
    • Labels:
      None
    • Environment:
      Windows 7

      Description

      Having done on the remote server:

      create table t1 (line char(32) character set UTF8 not null) engine=myisam;
      insert into t1 values('Et si on était déjà à noël ?');
      select * from t1;
      

      returns:

      line
      Et si on était déjà à noël ?

      On the local server, a FEDERATED table to access it can be created as:

      create table t1 engine=FEDERATED
      connection='mysql://root:tinono@localhost:3307/test/t1';
      select * from t1;
      

      returns:

      line
      Et si on

      and issues on warning saying:

      Level Code Message
      Warning 1366 Incorrect string value: '\xE9tait ...' for column 'line' at row 1

      Note: This can be corrected by specifying the default charset of the local FEDERATED table as DEFAULT CHARSET=UTF8 or by explicitely defining the local table column not specifying its character set. But this is not clearly documented.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              elenst Elena Stepanova added a comment - - edited

              Alexander Barkov,

              The behavior doesn't look right to me...
              With the automatic discovery, the federated table's structure is exactly like the remote one, with charset specified for the column. And it doesn't work. On the other hand, if the column charset is not specified on the federated table and thus stays latin1, it works fine. At the very least it's counter-intuitive.
              What do you think?

              Show
              elenst Elena Stepanova added a comment - - edited Alexander Barkov , The behavior doesn't look right to me... With the automatic discovery, the federated table's structure is exactly like the remote one, with charset specified for the column. And it doesn't work. On the other hand, if the column charset is not specified on the federated table and thus stays latin1, it works fine. At the very least it's counter-intuitive. What do you think?
              Hide
              bertrandop Olivier Bertrand added a comment -

              The explanation is that FEDERATED (and now CONNECT) set the connection charset to the default local table charset before connecting. In this case the local table charset is latin1 by default and so is the connection charset. Thus the UTF8 column is translated to latin1 on the connection and this is why it should not be specified as UTF8 on the local table. If is must remain UTF8, the local default charset must be specified to UTF8 and then the connection charset will be UTF8 and the column contains will not be translated.

              Note that this does not occur with CONNECT because in the discovery process, CONNECT ignores the column charset specification of the remote table, which also can be wrong in some cases. However, this shows that the whole process could be reconsidered or at least properly documented.

              Show
              bertrandop Olivier Bertrand added a comment - The explanation is that FEDERATED (and now CONNECT) set the connection charset to the default local table charset before connecting. In this case the local table charset is latin1 by default and so is the connection charset. Thus the UTF8 column is translated to latin1 on the connection and this is why it should not be specified as UTF8 on the local table. If is must remain UTF8, the local default charset must be specified to UTF8 and then the connection charset will be UTF8 and the column contains will not be translated. Note that this does not occur with CONNECT because in the discovery process, CONNECT ignores the column charset specification of the remote table, which also can be wrong in some cases. However, this shows that the whole process could be reconsidered or at least properly documented.

                People

                • Assignee:
                  bar Alexander Barkov
                  Reporter:
                  bertrandop Olivier Bertrand
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated: