revno: 3402.50.240
committer: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
branch nick: mysql-5.6
timestamp: Thu 2012-02-23 15:38:33 +0530
message:
Bug#11829861 - SUBSTRING_INDEX() RESULTS "OMIT" CHARACTER WHEN USED
INSIDE LOWER()
PROBLEM
Output of the function substring_index would have missing characters
when used with string conversion functions like lower().
Ex:
SET @user_at_host = 'root@mytinyhost-PC.local';
SELECT LOWER(SUBSTRING_INDEX(@user_at_host, '@', -1));
mytinyhost-pc. ocal
ANALYSIS:
In the function Item_func_substr_index::val_str(), the final
evaluated string(Item_func_substr_index::tmp_value) is marked
as constant after the first evaluation. (The reason for the
same is mentioned in Bug#14676).
Once evaluated, we try to convert this string to lower case.
While doing so, we call the function "copy_if_not_alloced".
This function does a copy or allocation, based on the
"alloced length"s of the strings passed. Since, "tmp_value" is
marked as constant, "Alloced length" for that string becomes
zero, thereby forcing allocation and then a subsequent
copy which results in the missing space.
What we need to note here is that, the source string(tmp_value)
for the function "copy_if_not_alloced" would be pointing to an
address inside the destination string, which is the original
string. Hence the missing letters.
Code Snippets:
Item_str_conv::val_str(str)//conversion to lower case
{
res=Item_func_substr_index::val_str(str)
(res is actully pointing to an address inside str)
res= copy_if_not_alloced(str,res,res->length());
}
copy_if_not_alloced(to,from,from_length)
{
if (to->realloc(from_length))
return from; // Actually an error
if ((to->str_length=min(from->str_length,from_length)))
memcpy(to->Ptr,from->Ptr,to->str_length);
}
If we do not, mark the "tmp_value" as const, we would have
returned from "copy_if_not_alloced" much earlier, avoiding
the overwriting.
So the fix is to "not mark tmp_value as const", as there is
no need for it.As for the fix for the bug#14676, we fix it by
allocating a temporary buffer to get the delimiter. As, we were
using "tmp_value" to get the delimiter and also to return the
evaluated string, we were seeing the problem.
Also, there is one more bug present in this function associated
with bug#42404.substring_index function returns inconsistent
results when delimiter is present at offset "0" while the count
is negative and greater than the number of times the delimiter
is present in the string.
Currently, if the delimiter is present at offset "0", we skip
setting of "tmp_value"(this contains final evaluated string),
instead return the previously set "tmp_value". This was reason
for the inconsistent results stated in the problem description.
With this fix, we return the original string if the count is
non-zero at the end of the loop.
Pushed into 5.3 and 5.5.