Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.0.17
    • Fix Version/s: None
    • Labels:
    • Environment:
      CentOS release 6.5 (X86_64), PureData Release 7.0 (P-3) , NetezzaSQL ODBCDriver

      Description

      I am trying to make a CONNECT table to a table on an IBM database.
      It seems to me no test has been done yet for the IBM server.

      The original table includes columns in NVARCHAR(N) for UTF-8 characters.

      http://www-01.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_odiagntz_016.htm

      I have two problems.

      1.
      When I try creating a CONNECT table without specifying the column definition,
      I get the below error.

      Error (Code 1105): Unsupported SQL type -9

      It doesn't happen when the table does not have any NVARCHAR(N) columns,
      so I guess the CONNECT engine is not familiar with this data type

      2.
      When I specify the column definition such as VARCHAR(32),
      the CONNECT table is created with no error.
      But I get ? marks for all UTF-8 characters.

      MariaDB [test]> show variables like '%char%';
      +--------------------------+----------------------------+
      | Variable_name            | Value                      |
      +--------------------------+----------------------------+
      | character_set_client     | utf8                       |
      | character_set_connection | utf8                       |
      | character_set_database   | utf8                       |
      | character_set_filesystem | binary                     |
      | character_set_results    | utf8                       |
      | character_set_server     | utf8                       |
      | character_set_system     | utf8                       |
      | character_sets_dir       | /usr/share/mysql/charsets/ |
      +--------------------------+----------------------------+
      8 rows in set (0.06 sec)
      
      MariaDB [test]> SELECT c1 FROM CONNECT_TABLE;
      +-------+
      | c1      |
      +-------+
      | ???     |
      | JCB     |
      | ???     |
      | ???     |
      | ???     |
      | ???     |
      | ???     |
      | ???     |
      | UFJ     |
      | ???     |
      +-------+
      10 rows in set (0.00 sec)
      

      I found it is not a problem of the ODBC driver, queries on isql returns the UTF-8 values correctly.
      Changing the table's DEFAULT CHARSET or DATA_CHARSET did not work.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            bertrandop Olivier Bertrand added a comment - - edited

            Indeed types NCHAR or NVARCHAR correspond to SQL_WCHAR (-8) and SQL_WVARCHAR (-9). These types were not recognized by CONNECT. It will be fixed just by handling them as CHAR and VARCHAR with a warning (NOTE) saying that the column contains wide characters.

            Remains the issue of UTF8 characters displayed as '?'. It looks as if the column text was translated twice. The first time, UTF8 characters are replaced by, say, latin1 characters that are not recognized the second time and replaced by '?'.

            Perhaps this could be avoided by setting character_set_client to latin1. Indeed the character_set_results variable only applies to MYSQL tables but does not for ODBC tables.

            Show
            bertrandop Olivier Bertrand added a comment - - edited Indeed types NCHAR or NVARCHAR correspond to SQL_WCHAR (-8) and SQL_WVARCHAR (-9). These types were not recognized by CONNECT. It will be fixed just by handling them as CHAR and VARCHAR with a warning (NOTE) saying that the column contains wide characters. Remains the issue of UTF8 characters displayed as '?'. It looks as if the column text was translated twice. The first time, UTF8 characters are replaced by, say, latin1 characters that are not recognized the second time and replaced by '?'. Perhaps this could be avoided by setting character_set_client to latin1. Indeed the character_set_results variable only applies to MYSQL tables but does not for ODBC tables.
            Hide
            bertrandop Olivier Bertrand added a comment - - edited

            I don't know about IBM but, according to the Oracle documentation, the NCHAR, NVARCHAR types (N for Native) are character strings supposely containing text in native character set. It seems that when writing and reading, the data source (at least Oracle) automatically does the translation between the internal coding (Unicode as UTF8 or UT16) and the native character set.
            Thereby, the table charset and the character_set_client variable should not be set to UTF8 but to the native character set. Note that with Oracle there is no issue of mixed character sets as the default charset is defined for the entire database.
            I think that there is probably nothing CONNECT can do in addition to address such problems.

            Show
            bertrandop Olivier Bertrand added a comment - - edited I don't know about IBM but, according to the Oracle documentation, the NCHAR, NVARCHAR types (N for Native) are character strings supposely containing text in native character set. It seems that when writing and reading, the data source (at least Oracle) automatically does the translation between the internal coding (Unicode as UTF8 or UT16) and the native character set. Thereby, the table charset and the character_set_client variable should not be set to UTF8 but to the native character set. Note that with Oracle there is no issue of mixed character sets as the default charset is defined for the entire database. I think that there is probably nothing CONNECT can do in addition to address such problems.
            Hide
            takuya Takuya Aoki added a comment -

            Hello Olivier,
            I understand the translation is happening twice and not working correctly.
            (the first time by the CONNECT engine and the second time by the client?)

            Do you mean that the table charset is for the first translation and character_set_client for the second?
            By the way, I don't understand what native character set stands for.

            Show
            takuya Takuya Aoki added a comment - Hello Olivier, I understand the translation is happening twice and not working correctly. (the first time by the CONNECT engine and the second time by the client?) Do you mean that the table charset is for the first translation and character_set_client for the second? By the way, I don't understand what native character set stands for.
            Hide
            takuya Takuya Aoki added a comment -

            Netezza seems to handle all characters in UTF8 and does not do any conversions.

            http://www-01.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_odiagntz_016.htm

            NCHAR/NVARCHAR data is always stored in UTF-8. The database character defines the encoding for CHAR and VARCHAR columns and is defined when creating the database with the CREATE DATABASE command; the default is latin9. Note that, at the time of writing these lines, Netezza V6 does not yet support a different database character set than latin9.
            
            No automatic character set conversion is done by the Netezza software, this means that the application/client character set must match the database character set.
            
            Show
            takuya Takuya Aoki added a comment - Netezza seems to handle all characters in UTF8 and does not do any conversions. http://www-01.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_odiagntz_016.htm NCHAR/NVARCHAR data is always stored in UTF-8. The database character defines the encoding for CHAR and VARCHAR columns and is defined when creating the database with the CREATE DATABASE command; the default is latin9. Note that, at the time of writing these lines, Netezza V6 does not yet support a different database character set than latin9. No automatic character set conversion is done by the Netezza software, this means that the application/client character set must match the database character set.

              People

              • Assignee:
                bertrandop Olivier Bertrand
                Reporter:
                takuya Takuya Aoki
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: