This is the problem briefly discussed with Monty on IRC, when the result of the 2nd execution depends on seemingly unimportant factors. I will collect all information that I have so that it's not forgotten, but I suggest to put it aside till more pressing tasks are finished. It would be good to investigate it though, as it's one of things which are impossible to analyze when it happens in external users' environment.
Assorted notes and observations:
the result depends even on insignificant blank spaces in the query (a few less of those and the failure is gone), name of the basedir, etc.
I could reproduce it on at least two different machines (Ubuntu Precise 64-bit and Debian Wheezy 64-bit), so it's not limited to the machine or particular OS flavor;
I could reproduce it via MySQL client, although it required some additional seemingly useless actions, so it's not limited to MTR either;
the result seems persistent within the same machine, same build, but it might be different on different machines (the query that causes a failure on one machine did not do the same on the other at first, I had to revert to a little longer query to make the bug re-appear on the 2nd machine);
the problem appears at least with at least BUILD/compile-pentium-debug-max-no-ndb builds;
the problem is reproducible with --mysqld=--debug;
the problem stops appearing if the test is run with --valgrind-mysqld;
I could reproduce it on 5.3 (current tree) and on 5.3.12; could not reproduce on 5.5 so far, but due to the fragility of the test case it does not mean that it does not exist in 5.5.
Two complete test cases with the data are attached, mdev5600_bad.test and mdev5600_good.test. They only differ by one space in the query (where 3 right brackets come in a row):
The first query on the 2nd execution as a PS returns an empty set. The second one returns a result set.
The test is very unclean, with ugly names and probably excessive data, but I cannot safely clean it up due to the nature of the problem.
I ran the bad and good tests with debug, extracted the fragments that belong to the 2nd execution, converted them using convert-dbug-for-diff and attached the diff as mdev5600_trace_diff_between_bad_and_good. In case it did not go right there are also compressed full trace files, mdev5600_bad.trace.gz and mdev5600_bad.trace.gz (also uploaded to hasky:/tmp).
I have also set up the test on perro under mdev5600 folder.
Indication of the good result (it's the tail of the result set from the 2nd execution, followed by DROP):
Indication of the bad result (it's the tail of the result set from the 1nd execution, followed by the second EXECUTE with an empty result set, followed by DROP):