Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-3435

LP:488040 - Support for contractions between non-ASCII characters and Croatian collation

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      From Neven Jacmenovic:

      The feature we desperately need in MariaDB is proper support for Croatian utf8 collation based on Croatian alphabet (http://en.wikipedia.org/wiki/Gajica) so we can finally sort croatian words (names etc) properly. MySQL don't have support for it, without this, we can't consider MySQL server or MariaDB for that matter, a choice for eg. government migration to open-source platform in near future. Most, if not all of those organizations now use MS SQL instead of open source solutions.

      AFAIK the countries which would benefit from the same implementation (alongside Croatia) are: Bosnia, Serbia (for latin charset) and Monte Negro (for latin charset).

      There already is built in latin2 Croatian collation (latin2_croatian_ci) and CP1250 Croatian collation (cp1250_croatian_ci) in MySQL but those implementations lack digraph support - single letters consisted of two letters (http://www.collation-charts.org/mysql60/mysql604.latin2_croatian_ci.html) and they are useless. And without proper support for diagraphs, we will never be able to use ORDER BY properly (a-b-c-č-ć-d-dž-đ-e-f-g-h-i-j-k-l-lj-m-n-nj-o-p-r-s-š-t-u-v-z-ž).

      Closest to Croatian is Slovenian collation (utf8_slovenian_ci) support built-in in MySQL, but it also lacks digraphs so it's not possible to adapt it (http://www.collation-charts.org/mysql60/mysql604.utf8_slovenian_ci.html).

      Right now, we are forced to use utf8_general_ci collation, which off course, doesn't know how to order Croatian alphabet properly. I've attached mysqldump with Croatian alphabet. Valid ordering should be: a-b-c-č-ć-d-dž-đ-e-f-g-h-i-j-k-l-lj-m-n-nj-o-p-r-s-š-t-u-v-z-ž.
      "DŽ", "NJ" and "LJ" are SINGLE letters.

      I've submitted S4 feature request to MySQL some time ago, and MySQL dev team started talking about it, but nothing happened (http://bugs.mysql.com/44523).

      Please MariaDB developers, make our native language suck less!

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            antekaramatić Ante Karamatić added a comment -

            test_croatian_order.sql
            LPexportBug488040_test_croatian_order.sql

            Show
            antekaramatić Ante Karamatić added a comment - test_croatian_order.sql LPexportBug488040_test_croatian_order.sql
            Hide
            antekaramatić Ante Karamatić added a comment -

            Re: Support for contractions between non-ASCII characters and Croatian collation

            Show
            antekaramatić Ante Karamatić added a comment - Re: Support for contractions between non-ASCII characters and Croatian collation
            Hide
            antekaramatić Ante Karamatić added a comment -

            As explained at:

            http://www.collation-charts.org/articles/croatian.htm

            this patch does more than just add support for Croatian UTF8 collation. It was based on Alexander's patch for mysql 5.1 (http://www.collation-charts.org/articles/utf8_croatian_ci.diff) and you could probably get it by pulling from mysql 6.
            maria.croatian.diff
            LPexportBug488040_maria.croatian.diff

            Show
            antekaramatić Ante Karamatić added a comment - As explained at: http://www.collation-charts.org/articles/croatian.htm this patch does more than just add support for Croatian UTF8 collation. It was based on Alexander's patch for mysql 5.1 ( http://www.collation-charts.org/articles/utf8_croatian_ci.diff ) and you could probably get it by pulling from mysql 6. maria.croatian.diff LPexportBug488040_maria.croatian.diff
            Hide
            antekaramatić Ante Karamatić added a comment -

            Re: Support for contractions between non-ASCII characters and Croatian collation
            As explained at:

            http://www.collation-charts.org/articles/croatian.htm

            this patch does more than just add support for Croatian UTF8 collation. It was based on Alexander's patch for mysql 5.1 (http://www.collation-charts.org/articles/utf8_croatian_ci.diff) and you could probably get it by pulling from mysql 6.

            Show
            antekaramatić Ante Karamatić added a comment - Re: Support for contractions between non-ASCII characters and Croatian collation As explained at: http://www.collation-charts.org/articles/croatian.htm this patch does more than just add support for Croatian UTF8 collation. It was based on Alexander's patch for mysql 5.1 ( http://www.collation-charts.org/articles/utf8_croatian_ci.diff ) and you could probably get it by pulling from mysql 6.
            Hide
            monty Michael Widenius added a comment -

            re: [Bug 488040] [NEW] Support for contractions between non-ASCII characters and Croatian collation

            Hi!

            >>>>> "Ante" == Ante Karamati <Ante> writes:

            Ante> Public bug reported:
            >> From Neven Jacmenovic:

            Ante> The feature we desperately need in MariaDB is proper support for
            Ante> Croatian utf8 collation based on Croatian alphabet
            Ante> (http://en.wikipedia.org/wiki/Gajica) so we can finally sort croatian
            Ante> words (names etc) properly. MySQL don't have support for it, without
            Ante> this, we can't consider MySQL server or MariaDB for that matter, a
            Ante> choice for eg. government migration to open-source platform in near
            Ante> future. Most, if not all of those organizations now use MS SQL instead
            Ante> of open source solutions.

            <cut>

            Croatian character sets are pushed into MariaDB 5.1-merge and should
            be in default MariaDB 5.1 tomorrow.

            Regards,
            Monty

            Show
            monty Michael Widenius added a comment - re: [Bug 488040] [NEW] Support for contractions between non-ASCII characters and Croatian collation Hi! >>>>> "Ante" == Ante Karamati <Ante> writes: Ante> Public bug reported: >> From Neven Jacmenovic: Ante> The feature we desperately need in MariaDB is proper support for Ante> Croatian utf8 collation based on Croatian alphabet Ante> ( http://en.wikipedia.org/wiki/Gajica ) so we can finally sort croatian Ante> words (names etc) properly. MySQL don't have support for it, without Ante> this, we can't consider MySQL server or MariaDB for that matter, a Ante> choice for eg. government migration to open-source platform in near Ante> future. Most, if not all of those organizations now use MS SQL instead Ante> of open source solutions. <cut> Croatian character sets are pushed into MariaDB 5.1-merge and should be in default MariaDB 5.1 tomorrow. Regards, Monty
            Hide
            monty Michael Widenius added a comment -

            Re: Support for contractions between non-ASCII characters and Croatian collation
            Croatian character sets are pushed into MariaDB 5.1-merge and should be in default MariaDB 5.1 tomorrow.

            Show
            monty Michael Widenius added a comment - Re: Support for contractions between non-ASCII characters and Croatian collation Croatian character sets are pushed into MariaDB 5.1-merge and should be in default MariaDB 5.1 tomorrow.
            Hide
            antekaramatić Ante Karamatić added a comment -

            There's an update for this bug. Patch is attached. Explained at:

            http://www.collation-charts.org/

            "Dec 2, 2009. An updated version of the Croatian collation patch for MySQL-5.1 is available. It works a little bit more accurate when optimizing a LIKE query for UCS2 columns, in case of non-ASCII contractions:

            SELECT a FROM t1 WHERE a LIKE 'dž%';

            The previous version could potentially lose some rows."
            ctype-ucs2.c.v0-v1.diff
            LPexportBug488040_ctype-ucs2.c.v0-v1.diff

            Show
            antekaramatić Ante Karamatić added a comment - There's an update for this bug. Patch is attached. Explained at: http://www.collation-charts.org/ "Dec 2, 2009. An updated version of the Croatian collation patch for MySQL-5.1 is available. It works a little bit more accurate when optimizing a LIKE query for UCS2 columns, in case of non-ASCII contractions: SELECT a FROM t1 WHERE a LIKE 'dž%'; The previous version could potentially lose some rows." ctype-ucs2.c.v0-v1.diff LPexportBug488040_ctype-ucs2.c.v0-v1.diff
            Hide
            antekaramatić Ante Karamatić added a comment -

            Re: Support for contractions between non-ASCII characters and Croatian collation
            There's an update for this bug. Patch is attached. Explained at:

            http://www.collation-charts.org/

            "Dec 2, 2009. An updated version of the Croatian collation patch for MySQL-5.1 is available. It works a little bit more accurate when optimizing a LIKE query for UCS2 columns, in case of non-ASCII contractions:

            SELECT a FROM t1 WHERE a LIKE 'dž%';

            The previous version could potentially lose some rows."

            Show
            antekaramatić Ante Karamatić added a comment - Re: Support for contractions between non-ASCII characters and Croatian collation There's an update for this bug. Patch is attached. Explained at: http://www.collation-charts.org/ "Dec 2, 2009. An updated version of the Croatian collation patch for MySQL-5.1 is available. It works a little bit more accurate when optimizing a LIKE query for UCS2 columns, in case of non-ASCII contractions: SELECT a FROM t1 WHERE a LIKE 'dž%'; The previous version could potentially lose some rows."
            Hide
            ratzpo Rasmus Johansson added a comment -

            Launchpad bug id: 488040

            Show
            ratzpo Rasmus Johansson added a comment - Launchpad bug id: 488040

              People

              • Assignee:
                monty Michael Widenius
                Reporter:
                antekaramatić Ante Karamatić
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: