Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5901

EITS: killing the server leaves statistical tables in "marked as crashed" state

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.9
    • Fix Version/s: 10.0.10
    • Component/s: None
    • Labels:

      Description

      If one does the following sequence of operations

      • make some action that updates statistical tables (e.g. ANALYZE TABLE ... PERSISTENT FOR ALL).
      • kill the server
      • start the server again

      then any action that attempts read from EITS tables will not be able to open the tables anymore. Opening the table will fail with "table marked as crashed" error.

      This task is about making EITS tables more resilient to the scenario.

      There are two things to be done:
      1. Flush statistical table to disk as soon as we've made any modification (similar to what is done to mysql.proc)
      2. Enable auto-repair for statistical tables, like it happens with regular myisam tables.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            Hint from Monty: check out the code in sp.cc:

                if (table->file->ha_write_row(table->record[0]))
                  ret= SP_WRITE_ROW_FAILED;
                /* Make change permanent and avoid 'table is marked as crashed' errors */
                table->file->extra(HA_EXTRA_FLUSH);
            

            Note the HA_EXTRA_FLUSH call. We will need to add it to EITS tables.

            Show
            psergey Sergei Petrunia added a comment - Hint from Monty: check out the code in sp.cc: if (table->file->ha_write_row(table->record[0])) ret= SP_WRITE_ROW_FAILED; /* Make change permanent and avoid 'table is marked as crashed' errors */ table->file->extra(HA_EXTRA_FLUSH); Note the HA_EXTRA_FLUSH call. We will need to add it to EITS tables.
            Hide
            psergey Sergei Petrunia added a comment -

            I'm also trying to investigate what is needed for auto-repair.

            1. Auto-repair doesn't work for mysql.proc table.
            I run a CREATE PROCEDURE ... , and kill the server right after ha_myisam::write_row(). If I restart the server and attempt to use mysql.proc again, I get:

            mysql> create procedure p4() begin select now(); end //
            ERROR 145 (HY000): Table './mysql/proc' is marked as crashed and should be repaired

            2. Auto-repair does work for regular tables.

            mysql> insert into t21 values (2);
            ERROR 2013 (HY000): Lost connection to MySQL server during query  
            ^^ -- I intentionally kill the server
            
            mysql> select * from t21;
            ERROR 2006 (HY000): MySQL server has gone away
            No connection. Trying to reconnect...
            Connection id:    3
            Current database: test
            
            +------+
            | a    |
            +------+
            |    1 |
            |    2 |
            +------+
            2 rows in set, 7 warnings (18 min 49.03 sec)
            
            mysql> show warnings\G
            Message: Table './test/t21' is marked as crashed and should be repaired
            Message: Table 't21' is marked as crashed and should be repaired
            Message: 1 client is using or hasn't closed the table properly
            Message: Size of datafile is: 14       Should be: 7
            Message: Record-count is not ok; is 2   Should be: 1
            Message: Found 2 key parts. Should be: 1
            Message: Number of rows changed from 1 to 2
            7 rows in set (0.00 sec)
            

            Code-wise, auto-repair happens in open_table() and open_tables(). In open_table, there is this code:

                  else if (share->crashed)
                    (void) ot_ctx->request_backoff_action(Open_table_context::OT_REPAIR,
                                                          table_list);
            

            and open_tables() has:

                  error= open_and_process_table(thd, thd->lex, tables, counter,
                                                flags, prelocking_strategy,
                                                has_prelocking_list, &ot_ctx,
                                                &new_frm_mem);
            
                  if (error)
                  {
                    if (ot_ctx.can_recover_from_failed_open())
            
            Show
            psergey Sergei Petrunia added a comment - I'm also trying to investigate what is needed for auto-repair. 1. Auto-repair doesn't work for mysql.proc table. I run a CREATE PROCEDURE ... , and kill the server right after ha_myisam::write_row(). If I restart the server and attempt to use mysql.proc again, I get: mysql> create procedure p4() begin select now(); end // ERROR 145 (HY000): Table './mysql/proc' is marked as crashed and should be repaired 2. Auto-repair does work for regular tables. mysql> insert into t21 values (2); ERROR 2013 (HY000): Lost connection to MySQL server during query ^^ -- I intentionally kill the server mysql> select * from t21; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 3 Current database: test +------+ | a | +------+ | 1 | | 2 | +------+ 2 rows in set, 7 warnings (18 min 49.03 sec) mysql> show warnings\G Message: Table './test/t21' is marked as crashed and should be repaired Message: Table 't21' is marked as crashed and should be repaired Message: 1 client is using or hasn't closed the table properly Message: Size of datafile is: 14 Should be: 7 Message: Record-count is not ok; is 2 Should be: 1 Message: Found 2 key parts. Should be: 1 Message: Number of rows changed from 1 to 2 7 rows in set (0.00 sec) Code-wise, auto-repair happens in open_table() and open_tables(). In open_table, there is this code: else if (share->crashed) (void) ot_ctx->request_backoff_action(Open_table_context::OT_REPAIR, table_list); and open_tables() has: error= open_and_process_table(thd, thd->lex, tables, counter, flags, prelocking_strategy, has_prelocking_list, &ot_ctx, &new_frm_mem); if (error) { if (ot_ctx.can_recover_from_failed_open())
            Hide
            psergey Sergei Petrunia added a comment -

            When I try debugging a failure to open a statistical table, I see a difference in this call:

            Open_table_context::request_backoff_action (this=0x7ffff7e9ede0, action_arg=Open_table_context::OT_REPAIR,

            Here,

            (gdb) print action_arg
            $312 = Open_table_context::OT_REPAIR
            (gdb) print m_has_locks
            $313 = true

            and because of that we don't take any action.

            Show
            psergey Sergei Petrunia added a comment - When I try debugging a failure to open a statistical table, I see a difference in this call: Open_table_context::request_backoff_action (this=0x7ffff7e9ede0, action_arg=Open_table_context::OT_REPAIR, Here, (gdb) print action_arg $312 = Open_table_context::OT_REPAIR (gdb) print m_has_locks $313 = true and because of that we don't take any action.
            Hide
            psergey Sergei Petrunia added a comment -

            The reason is that open_and_lock_tables() is structured like this:

              if (open_tables(thd, &tables, &counter, flags, prelocking_strategy))
                goto err;
              ...
              if (lock_tables(thd, tables, counter, flags))
                goto err;
            
              (void) read_statistics_for_tables_if_needed(thd, tables);
            

            Statistical tables are opened after the regular tables have been opened and locked (I'm wondering why can't we open them at the same time?). Because of that, deadlock prevention logic prevents repair.

            Show
            psergey Sergei Petrunia added a comment - The reason is that open_and_lock_tables() is structured like this: if (open_tables(thd, &tables, &counter, flags, prelocking_strategy)) goto err; ... if (lock_tables(thd, tables, counter, flags)) goto err; (void) read_statistics_for_tables_if_needed(thd, tables); Statistical tables are opened after the regular tables have been opened and locked (I'm wondering why can't we open them at the same time?). Because of that, deadlock prevention logic prevents repair.
            Hide
            psergey Sergei Petrunia added a comment -

            If I force execution in Open_table_context::request_backoff_action to allow repair, then I get an assertion

            thd->mdl_context.is_lock_owner(MDL_key::TABLE, table->s->db.str, table->s->table_name.str, MDL_SHARED)

            in close_thread_table() for table test.t10.

            Show
            psergey Sergei Petrunia added a comment - If I force execution in Open_table_context::request_backoff_action to allow repair, then I get an assertion thd->mdl_context.is_lock_owner(MDL_key::TABLE, table->s->db.str, table->s->table_name.str, MDL_SHARED) in close_thread_table() for table test.t10.
            Hide
            psergey Sergei Petrunia added a comment -

            It seems, auto-repair (item#2) is difficult to do. I will only implement flushing (item#1), for now.

            Show
            psergey Sergei Petrunia added a comment - It seems, auto-repair (item#2) is difficult to do. I will only implement flushing (item#1), for now.

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                psergey Sergei Petrunia
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: