Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-2572

LP:823930 - Wrong result with semijoin materialization and blob fields

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Two test cases from subselect_sj_mat that test materialization
      with blobs produce incorrect empty result with semijoin
      materialization. The two failing test cases are the ones that test
      blobs with size 1024, and 1025. Here I extracted and simplified
      a bit the test case marked with:

      1. BLOB == 1024 (group_concat_max_len == 1024).
        For length 1025 the wrong result is the same.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            timour Timour Katchaounov added a comment -

            Re: Wrong result with semijoin materialization and blob fields
            create table t1_1024 (a1 blob(1024), a2 blob(1024));
            create table t2_1024 (b1 blob(1024), b2 blob(1024));
            create table t3_1024 (c1 blob(1024), c2 blob(1024));

            insert into t1_1024 values
            (concat('1 - 00', repeat('x', 1018)), concat('2 - 00', repeat('x', 1018)));
            insert into t1_1024 values
            (concat('1 - 01', repeat('x', 1018)), concat('2 - 01', repeat('x', 1018)));
            insert into t1_1024 values
            (concat('1 - 02', repeat('x', 1018)), concat('2 - 02', repeat('x', 1018)));

            insert into t2_1024 values
            (concat('1 - 01', repeat('x', 1018)), concat('2 - 01', repeat('x', 1018)));
            insert into t2_1024 values
            (concat('1 - 02', repeat('x', 1018)), concat('2 - 02', repeat('x', 1018)));
            insert into t2_1024 values
            (concat('1 - 03', repeat('x', 1018)), concat('2 - 03', repeat('x', 1018)));

            insert into t3_1024 values
            (concat('1 - 01', repeat('x', 1018)), concat('2 - 01', repeat('x', 1018)));
            insert into t3_1024 values
            (concat('1 - 02', repeat('x', 1018)), concat('2 - 02', repeat('x', 1018)));
            insert into t3_1024 values
            (concat('1 - 03', repeat('x', 1018)), concat('2 - 03', repeat('x', 1018)));
            insert into t3_1024 values
            (concat('1 - 04', repeat('x', 1018)), concat('2 - 04', repeat('x', 1018)));

            – Correct result
            set @@optimizer_switch='semijoin=off,materialization=on,in_to_exists=off';

            select left(a1,7), left(a2,7)
            from t1_1024
            where a1 in (select substring(b1,1,1024) from t2_1024 where b1 > '0');

            – Wrong result
            set @@optimizer_switch='semijoin=on,materialization=on,in_to_exists=off';

            select left(a1,7), left(a2,7)
            from t1_1024
            where a1 in (select substring(b1,1,1024) from t2_1024 where b1 > '0');

            Show
            timour Timour Katchaounov added a comment - Re: Wrong result with semijoin materialization and blob fields create table t1_1024 (a1 blob(1024), a2 blob(1024)); create table t2_1024 (b1 blob(1024), b2 blob(1024)); create table t3_1024 (c1 blob(1024), c2 blob(1024)); insert into t1_1024 values (concat('1 - 00', repeat('x', 1018)), concat('2 - 00', repeat('x', 1018))); insert into t1_1024 values (concat('1 - 01', repeat('x', 1018)), concat('2 - 01', repeat('x', 1018))); insert into t1_1024 values (concat('1 - 02', repeat('x', 1018)), concat('2 - 02', repeat('x', 1018))); insert into t2_1024 values (concat('1 - 01', repeat('x', 1018)), concat('2 - 01', repeat('x', 1018))); insert into t2_1024 values (concat('1 - 02', repeat('x', 1018)), concat('2 - 02', repeat('x', 1018))); insert into t2_1024 values (concat('1 - 03', repeat('x', 1018)), concat('2 - 03', repeat('x', 1018))); insert into t3_1024 values (concat('1 - 01', repeat('x', 1018)), concat('2 - 01', repeat('x', 1018))); insert into t3_1024 values (concat('1 - 02', repeat('x', 1018)), concat('2 - 02', repeat('x', 1018))); insert into t3_1024 values (concat('1 - 03', repeat('x', 1018)), concat('2 - 03', repeat('x', 1018))); insert into t3_1024 values (concat('1 - 04', repeat('x', 1018)), concat('2 - 04', repeat('x', 1018))); – Correct result set @@optimizer_switch='semijoin=off,materialization=on,in_to_exists=off'; select left(a1,7), left(a2,7) from t1_1024 where a1 in (select substring(b1,1,1024) from t2_1024 where b1 > '0'); – Wrong result set @@optimizer_switch='semijoin=on,materialization=on,in_to_exists=off'; select left(a1,7), left(a2,7) from t1_1024 where a1 in (select substring(b1,1,1024) from t2_1024 where b1 > '0');
            Hide
            psergey Sergei Petrunia added a comment -

            Re: Wrong result with semijoin materialization and blob fields
            Looking at EXPLAIN outputs, I see that

            • The query with "Correct result" settings does not use Materialization (even though optimizer_switch value instructs it to do it)
            • The query with "Wrong result" settings does use Materialization.

            It seems, the problem is that SJ-Materialization attempts to handle this case, while it should refuse it, like regular materialization did.

            Show
            psergey Sergei Petrunia added a comment - Re: Wrong result with semijoin materialization and blob fields Looking at EXPLAIN outputs, I see that The query with "Correct result" settings does not use Materialization (even though optimizer_switch value instructs it to do it) The query with "Wrong result" settings does use Materialization. It seems, the problem is that SJ-Materialization attempts to handle this case, while it should refuse it, like regular materialization did.
            Hide
            psergey Sergei Petrunia added a comment -

            Re: Wrong result with semijoin materialization and blob fields
            In the case of non-semijoin materialization, we proceed as far as entering
            subselect_hash_sj_engine::init(). Inside that function, we execute up to this
            code:

            if (result_sink->create_result_table(thd, tmp_columns, TRUE,
            tmp_create_options,
            name, TRUE, TRUE))
            DBUG_RETURN(TRUE);

            tmp_table= result_sink->table;
            result= result_sink;

            /*
            If the subquery has blobs, or the total key lenght is bigger than
            some length, or the total number of key parts is more than the
            allowed maximum (currently MAX_REF_PARTS == 16), then the created
            index cannot be used for lookups and we can't use hash semi
            join. If this is the case, delete the temporary table since it
            will not be used, and tell the caller we failed to initialize the
            engine.
            */
            if (tmp_table->s->keys == 0)
            {

            And then we find out that "tmp_table->s->keys == 0" which causes us not to use
            materialization strategy, and switch to IN->EXISTS instead.

            Show
            psergey Sergei Petrunia added a comment - Re: Wrong result with semijoin materialization and blob fields In the case of non-semijoin materialization, we proceed as far as entering subselect_hash_sj_engine::init(). Inside that function, we execute up to this code: if (result_sink->create_result_table(thd, tmp_columns, TRUE, tmp_create_options, name, TRUE, TRUE)) DBUG_RETURN(TRUE); tmp_table= result_sink->table; result= result_sink; /* If the subquery has blobs, or the total key lenght is bigger than some length, or the total number of key parts is more than the allowed maximum (currently MAX_REF_PARTS == 16), then the created index cannot be used for lookups and we can't use hash semi join. If this is the case, delete the temporary table since it will not be used, and tell the caller we failed to initialize the engine. */ if (tmp_table->s->keys == 0) { And then we find out that "tmp_table->s->keys == 0" which causes us not to use materialization strategy, and switch to IN->EXISTS instead.
            Hide
            psergey Sergei Petrunia added a comment -

            Re: Wrong result with semijoin materialization and blob fields
            It is difficult to reuse this approach for SJ-Materialization, because SJ-Materialization creates temp.table after join optimization has been finished. At that point, it is not easy to go back and redo the join optimization.

            A better option would be to make subquery_types_allow_materialization() do its job properly and detect all cases where materialization is not applicable.

            Show
            psergey Sergei Petrunia added a comment - Re: Wrong result with semijoin materialization and blob fields It is difficult to reuse this approach for SJ-Materialization, because SJ-Materialization creates temp.table after join optimization has been finished. At that point, it is not easy to go back and redo the join optimization. A better option would be to make subquery_types_allow_materialization() do its job properly and detect all cases where materialization is not applicable.
            Hide
            ratzpo Rasmus Johansson added a comment -

            Launchpad bug id: 823930

            Show
            ratzpo Rasmus Johansson added a comment - Launchpad bug id: 823930

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                timour Timour Katchaounov
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: