Details
-
Type:
Bug
-
Status: Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: 5.5.32, 5.1.67, 5.2.14, 5.3.12
-
Fix Version/s: 5.5
-
Component/s: None
-
Labels:None
Description
This script:
DROP TABLE IF EXISTS t1; CREATE TABLE t1 (s1 VARCHAR (60) CHARACTER SET UTF8 COLLATE UTF8_UNICODE_CI) ENGINE = MyISAM; CREATE FULLTEXT INDEX i ON t1 (s1);INSERT INTO t1 VALUES ('a'),('b'),('c'),('d'),('ÓÓÓÓ'),('OOOO'),(NULL),('ÓÓÓÓ ÓÓÓÓ'),('OOOOOOOO'); SELECT * FROM t1 WHERE MATCH(s1) AGAINST ('OOOO' COLLATE UTF8_POLISH_CI);
returns wrong results:
+-------------------+ | s1 | +-------------------+ | ÓÓÓÓ | | OOOO | | ÓÓÓÓ ÓÓÓÓ | +-------------------+
The problem is that in the collation utf8_polish_ci
the letter "O WITH ACUTE" is not equal to "O":
MariaDB [test]> SELECT 'O'='Ó' COLLATE utf8_polish_ci;
+---------------------------------+
| 'O'='Ó' COLLATE utf8_polish_ci |
+---------------------------------+
| 0 |
+---------------------------------+
Therefore, returning the records containing accented letter
in the above query is wrong.
Further investigation with EXPLAIN:
MySQL [test]> EXPLAIN SELECT * FROM t1 WHERE MATCH(s1) AGAINST ('OOOO' COLLATE UTF8_POLISH_CI);
+----+-------------+-------+----------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+----------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | t1 | fulltext | i | i | 0 | NULL | 1 | Using where |
+----+-------------+-------+----------+---------------+------+---------+------+------+-------------+
tells that MATCH erroneously chooses the non-suitable index "i"
whose collation is NOT equal to the collation of the operation.
If I further drop the index and use BOOLEAN MODE:
ALTER TABLE t1 DROP INDEX i; SELECT * FROM t1 WHERE MATCH(s1) AGAINST ('OOOO' COLLATE UTF8_POLISH_CI IN BOOLEAN MODE);
it still returns wrong results:
+-------------------+ | s1 | +-------------------+ | ÓÓÓÓ | | OOOO | | ÓÓÓÓ ÓÓÓÓ | +-------------------+
It seems MATCH always uses the collation of the field and does not follow the
standard argument collation aggregation rules.
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions