Hi,
If I understand the problem correctly, all complications aside, you have the following:
drop table if exists t1;
create table t1 (id int, last_reply_date datetime) engine=InnoDB;
insert into t1 values (1, '2013-01-11 01:01:01'), (2, '2013-02-12 02:02:02'), (1, '2013-03-13 03:03:03'), (1, '2013-04-04 04:04:04'), (2, '2013-05-05 05:05:05');
and you rely on the fact that the following query will always return the biggest value of last_reply_date for each id:
select temp.* from ( select * from t1 order by last_reply_date desc ) temp group by temp.id;
This is a wrong expectation.
The MySQL (not MariaDB, but MySQL) manual is very specific about it:
http://dev.mysql.com/doc/refman/5.5/en/group-by-extensions.html
<quote>
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
</quote>
This is what you have in the initial query. You have a select list of non-aggregated columns (`temp`.* ) and only `temp`.`id` in the GROUP BY clause. Thus, your query is subject to this described non-deterministic behavior. It is a pure coincidence that currently MySQL chooses to execute such queries in the way that it does pick a highest value. It can change any time, with any bug fix, and won't even require a new GA version because it would not be a change in the existing functionality.
MariaDB 5,5 optimizer is more advanced and attempts to execute the query differently, thus the non-deterministic result turns out to be different.
You can try to temporarily switch back to the old behavior by setting the following in your MariaDB config file:
optimizer_switch=derived_merge=off,derived_with_keys=off
To see if it helps, you can try it at runtime first, by running
set global optimizer_switch='derived_merge=off,derived_with_keys=off';
set optimizer_switch='derived_merge=off,derived_with_keys=off';
and seeing if you started getting expected results after that.
But I can't emphasize it strongly enough, it is just a temporary and highly unreliable solution. Whether you choose MySQL or MariaDB at the end, the non-deterministic behavior will stay such. There have been many things that recently changed in MySQL 5.6 comparing to MySQL 5.5 that made people complain about regressions and incompatibility, while in fact they were legitimate changes touching scenarios where people used to rely on pure luck.
Hi,
If I understand the problem correctly, all complications aside, you have the following:
drop table if exists t1;
create table t1 (id int, last_reply_date datetime) engine=InnoDB;
insert into t1 values (1, '2013-01-11 01:01:01'), (2, '2013-02-12 02:02:02'), (1, '2013-03-13 03:03:03'), (1, '2013-04-04 04:04:04'), (2, '2013-05-05 05:05:05');
and you rely on the fact that the following query will always return the biggest value of last_reply_date for each id:
select temp.* from ( select * from t1 order by last_reply_date desc ) temp group by temp.id;
This is a wrong expectation.
The MySQL (not MariaDB, but MySQL) manual is very specific about it:
http://dev.mysql.com/doc/refman/5.5/en/group-by-extensions.html
<quote>
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
</quote>
This is what you have in the initial query. You have a select list of non-aggregated columns (`temp`.* ) and only `temp`.`id` in the GROUP BY clause. Thus, your query is subject to this described non-deterministic behavior. It is a pure coincidence that currently MySQL chooses to execute such queries in the way that it does pick a highest value. It can change any time, with any bug fix, and won't even require a new GA version because it would not be a change in the existing functionality.
MariaDB 5,5 optimizer is more advanced and attempts to execute the query differently, thus the non-deterministic result turns out to be different.
You can try to temporarily switch back to the old behavior by setting the following in your MariaDB config file:
optimizer_switch=derived_merge=off,derived_with_keys=off
To see if it helps, you can try it at runtime first, by running
set global optimizer_switch='derived_merge=off,derived_with_keys=off';
set optimizer_switch='derived_merge=off,derived_with_keys=off';
and seeing if you started getting expected results after that.
But I can't emphasize it strongly enough, it is just a temporary and highly unreliable solution. Whether you choose MySQL or MariaDB at the end, the non-deterministic behavior will stay such. There have been many things that recently changed in MySQL 5.6 comparing to MySQL 5.5 that made people complain about regressions and incompatibility, while in fact they were legitimate changes touching scenarios where people used to rely on pure luck.