Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-794

LP:861763 - Inconsistent use of column alias In ORDER BY clause that specified collation

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 5.5.28
    • Fix Version/s: 10.0
    • Component/s: None
    • Labels:

      Description

      This bug is created to track MySQL BUG#59449, and to review and
      either apply or reimplement a contributed patch for this bug.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            timour Timour Katchaounov added a comment -

            Re: Inconsistent use of column alias In ORDER BY clause that specified collation
            Much shorter test case:

            CREATE TABLE Language (
            LanguageID INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
            LanguageName VARCHAR(40) NOT NULL,
            PRIMARY KEY (LanguageID),
            KEY LanguageName (LanguageName) )
            ENGINE=MyISAM
            DEFAULT CHARSET=utf8
            COLLATE=utf8_unicode_ci;

            INSERT INTO Language VALUES
            (1, 'English'),
            (2, 'Spanish'),
            (3, 'American Sign Language'),
            (4, 'Argentinian Sign Language');

            – works as expected - no collation - sorted by alias to a function result (alias has name matching a real field from a queried table)
            SELECT LanguageID,
            substr(LanguageName, 2) as LanguageName
            FROM Language
            ORDER BY LanguageName;

            – broken - collated - sorted by alias to a function (coalesce) result (alias has name matching a real field from a queried table)
            SELECT LanguageID,
            substr(LanguageName, 2) as LanguageName
            FROM Language
            ORDER BY LanguageName COLLATE utf8_spanish_ci;

            – broken - collated - sorted by alias to a column (alias has name matching a real field from a queried table)
            SELECT LanguageID as LanguageName,
            substr(LanguageName, 2) as LanguageNameSubstr
            FROM Language
            ORDER BY LanguageName COLLATE utf8_spanish_ci;

            drop table Language;

            Show
            timour Timour Katchaounov added a comment - Re: Inconsistent use of column alias In ORDER BY clause that specified collation Much shorter test case: CREATE TABLE Language ( LanguageID INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, LanguageName VARCHAR(40) NOT NULL, PRIMARY KEY (LanguageID), KEY LanguageName (LanguageName) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; INSERT INTO Language VALUES (1, 'English'), (2, 'Spanish'), (3, 'American Sign Language'), (4, 'Argentinian Sign Language'); – works as expected - no collation - sorted by alias to a function result (alias has name matching a real field from a queried table) SELECT LanguageID, substr(LanguageName, 2) as LanguageName FROM Language ORDER BY LanguageName; – broken - collated - sorted by alias to a function (coalesce) result (alias has name matching a real field from a queried table) SELECT LanguageID, substr(LanguageName, 2) as LanguageName FROM Language ORDER BY LanguageName COLLATE utf8_spanish_ci; – broken - collated - sorted by alias to a column (alias has name matching a real field from a queried table) SELECT LanguageID as LanguageName, substr(LanguageName, 2) as LanguageNameSubstr FROM Language ORDER BY LanguageName COLLATE utf8_spanish_ci; drop table Language;
            Hide
            timour Timour Katchaounov added a comment -

            Re: Inconsistent use of column alias In ORDER BY clause that specified collation
            Analysis:
            This is an architectural problem in the way ORDER BY expressions
            and function arguments are resolved.

            Order by expressions are resolved by the procedure find_order_in_list().
            This procedure correctly defines the name resolution order to search
            first in the SELECT clause and to prefer aliases. The name resolution is
            implemented via the call:

            /* Lookup the current GROUP/ORDER field in the SELECT clause. */
            select_item= find_item_in_list(order_item, fields, &counter,
            REPORT_EXCEPT_NOT_FOUND, &resolution);

            Note however that find_item_in_list() works only for fields (Item_field)
            and not functions.

            Thus functions in the ORDER clause are not resolved against the
            SELECT clause, and find_order_in_list() proceeds until it calls
            Item::fix_fields() for these Items.

            For functions, fix_fields() calls recursively fix_fields() for its
            arguments. Finally we end up calling Item_field::fix_fields() for
            the 'LanguageName' argument of e.g. the Collate function.

            However Item_field::fix_fields() is designed to resolve fields
            mainly in the WHERE and other clauses, where the search order
            is different from the one in the ORDER clause. Specifically,
            fix_fields() searches first in the FROM clause. As a result, matching
            fields in the FROM clause have priority over aliases in the SELECT
            clause.

            This explains why the first example is resolved correctly, and the
            next two - incorrectly.

            Show
            timour Timour Katchaounov added a comment - Re: Inconsistent use of column alias In ORDER BY clause that specified collation Analysis: This is an architectural problem in the way ORDER BY expressions and function arguments are resolved. Order by expressions are resolved by the procedure find_order_in_list(). This procedure correctly defines the name resolution order to search first in the SELECT clause and to prefer aliases. The name resolution is implemented via the call: /* Lookup the current GROUP/ORDER field in the SELECT clause. */ select_item= find_item_in_list(order_item, fields, &counter, REPORT_EXCEPT_NOT_FOUND, &resolution); Note however that find_item_in_list() works only for fields (Item_field) and not functions. Thus functions in the ORDER clause are not resolved against the SELECT clause, and find_order_in_list() proceeds until it calls Item::fix_fields() for these Items. For functions, fix_fields() calls recursively fix_fields() for its arguments. Finally we end up calling Item_field::fix_fields() for the 'LanguageName' argument of e.g. the Collate function. However Item_field::fix_fields() is designed to resolve fields mainly in the WHERE and other clauses, where the search order is different from the one in the ORDER clause. Specifically, fix_fields() searches first in the FROM clause. As a result, matching fields in the FROM clause have priority over aliases in the SELECT clause. This explains why the first example is resolved correctly, and the next two - incorrectly.
            Hide
            timour Timour Katchaounov added a comment -

            Re: Inconsistent use of column alias In ORDER BY clause that specified collation
            Solution:

            In summary, the problem that needs to be solved is that
            column references in the ORDER/GROUP BY clauses must
            be resolved in a different way from column references in
            the WHERE and SELECT clauses.

            The implementation in find_order_in_list() is limited only
            to Item_field and Item_ref. In general the approach to resolve
            ORDER/GROUP BY clauses in a different way compared to the
            rest seems wrong.

            As discussed with Sanja, the right way to have a flexible and
            generic name resolution that supports different name resolution
            algorithms, is to reuse the Item::fix_fields mechanism in the
            following way.

            Items have to be marked correctly with their location in the query.
            The name resolution algorithms for each group of clauses have to
            be implemented as separate methods of the most suitable class.
            Then the corresponding fix_fields() method should select the
            correct name resolution algorithm based on the location of the
            item in the query.

            Notice that when e.g. an Item_field in an ORDER/GROUP BY clause
            is inside a subquery, the name resolution algorithm must be changed
            so that it first searches the subquery itself in the same way the WHERE
            clause is resolved, and only if not found in the subquery, name resolution
            should proceed to search the outer query, giving preference to aliases.

            This is a small re-engineering task, and not a simple bug fix. My estimate
            is at least 4-5 full developer days if there are no nasty surprises. Changing
            name resolution is quite risky, so this task should be planned for a release
            after 5.3.

            Show
            timour Timour Katchaounov added a comment - Re: Inconsistent use of column alias In ORDER BY clause that specified collation Solution: In summary, the problem that needs to be solved is that column references in the ORDER/GROUP BY clauses must be resolved in a different way from column references in the WHERE and SELECT clauses. The implementation in find_order_in_list() is limited only to Item_field and Item_ref. In general the approach to resolve ORDER/GROUP BY clauses in a different way compared to the rest seems wrong. As discussed with Sanja, the right way to have a flexible and generic name resolution that supports different name resolution algorithms, is to reuse the Item::fix_fields mechanism in the following way. Items have to be marked correctly with their location in the query. The name resolution algorithms for each group of clauses have to be implemented as separate methods of the most suitable class. Then the corresponding fix_fields() method should select the correct name resolution algorithm based on the location of the item in the query. Notice that when e.g. an Item_field in an ORDER/GROUP BY clause is inside a subquery, the name resolution algorithm must be changed so that it first searches the subquery itself in the same way the WHERE clause is resolved, and only if not found in the subquery, name resolution should proceed to search the outer query, giving preference to aliases. This is a small re-engineering task, and not a simple bug fix. My estimate is at least 4-5 full developer days if there are no nasty surprises. Changing name resolution is quite risky, so this task should be planned for a release after 5.3.
            Hide
            ratzpo Rasmus Johansson added a comment -

            Launchpad bug id: 861763

            Show
            ratzpo Rasmus Johansson added a comment - Launchpad bug id: 861763

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                timour Timour Katchaounov
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: