Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7943

pthread_getspecific() takes 0.76% in OLTP RO

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.1
    • Fix Version/s: 10.1.6
    • Component/s: OTHER
    • Labels:
      None

      Description

      Data comes from Sandy Bridge system running sysbench OLTP RO in 1 thread against 1 table.

      Call graphs:

      -   0.76%  mysqld  libpthread-2.15.so   [.] pthread_getspecific
         - pthread_getspecific
            + 19.28% trx_is_interrupted(trx_t const*)
            + 8.56% net_real_write
            + 7.94% vio_io_wait
            + 5.19% execute_sqlcom_select(THD*, TABLE_LIST*)
            + 4.35% my_free
            + 3.82% String_list::append_str(st_mem_root*, char const*)
            + 3.70% my_real_read(st_net*, unsigned long*, char)
            + 3.26% Item_equal::add_const(Item*, Item*)
            + 3.11% MYSQLparse(THD*)
            + 3.04% make_select(TABLE*, unsigned long long, unsigned long long, Item*, bool, int*)
            + 2.62% Item_equal::Item_equal(Item*, Item*, bool)
            + 2.61% Item_func::fix_fields(THD*, Item**)
            + 2.41% get_best_combination(JOIN*)
            + 2.39% st_select_lex::init_query()
            + 2.16% check_simple_equality(Item*, Item*, Item*, COND_EQUAL*)
            + 1.80% Item_ident::Item_ident(Name_resolution_context*, char const*, char const*, char const*)
            + 1.79% build_equal_items(JOIN*, Item*, COND_EQUAL*, List<TABLE_LIST>*, bool, COND_EQUAL**, bool) [clone .constprop.262]
            + 1.77% mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_
            + 1.63% st_select_lex::add_joined_table(TABLE_LIST*)
            + 1.59% make_leaves_list(List<TABLE_LIST>&, TABLE_LIST*, bool, TABLE_LIST*)
            + 1.55% my_malloc
            + 1.44% DsMrr_impl::dsmrr_info_const(unsigned int, st_range_seq_if*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*)
            + 1.34% Item_bool_func2::Item_bool_func2(Item*, Item*)
            + 1.31% Item_int::Item_int(char const*, long long, unsigned int)
            + 1.17% st_select_lex::add_item_to_list(THD*, Item*)
            + 1.06% Eq_creator::create(Item*, Item*) const
            + 0.85% cmp_item::get_comparator(Item_result, Item*, charset_info_st const*)
            + 0.85% st_select_lex::save_leaf_tables(THD*)
            + 0.72% ha_innobase::multi_range_read_init(st_range_seq_if*, void*, unsigned int, unsigned int, st_handler_buffer*)
            + 0.71% Item_func::setup_args_and_comparator(THD*, Arg_comparator*)
            + 0.61% key_and(RANGE_OPT_PARAM*, SEL_ARG*, SEL_ARG*, unsigned int) [clone .part.152]
            + 0.60% get_quick_keys(PARAM*, QUICK_RANGE_SELECT*, st_key_part*, SEL_ARG*, unsigned char*, unsigned int, unsigned char*, unsigned int)
            + 0.56% Item_func_between::Item_func_between(Item*, Item*, Item*)
            + 0.52% sql_memdup(void const*, unsigned long)
            + 0.51% Item_cache::get_cache(Item const*, Item_result)
      

      The most frequent caller is trx_is_interrupted()/thd_kill_level(): it calls current_thd unconditionally.
      Note: it may be fixed in Monty's fastconnect tree.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              serg Sergei Golubchik added a comment -

              one option would be to use thread local variables in gcc. they might be faster (needs to be tested) and with macros one can easily hide the underlying implementation (getspecific or tls) from the caller.

              Show
              serg Sergei Golubchik added a comment - one option would be to use thread local variables in gcc. they might be faster (needs to be tested) and with macros one can easily hide the underlying implementation (getspecific or tls) from the caller.
              Hide
              svoj Sergey Vojtovich added a comment -

              Sergei Golubchik, please review 3 patches for this task.

              Show
              svoj Sergey Vojtovich added a comment - Sergei Golubchik , please review 3 patches for this task.
              Hide
              svoj Sergey Vojtovich added a comment -

              Sergei Golubchik, please also review 3-d patch for this task.

              Show
              svoj Sergey Vojtovich added a comment - Sergei Golubchik , please also review 3-d patch for this task.
              Hide
              kaamos Alexey Kopytov added a comment -

              Out of curiosity, what happened to the thread-local variables idea? Has it proved to be not fast enough to replace pthread_getspecific() calls?

              Show
              kaamos Alexey Kopytov added a comment - Out of curiosity, what happened to the thread-local variables idea? Has it proved to be not fast enough to replace pthread_getspecific() calls?
              Hide
              svoj Sergey Vojtovich added a comment -

              Alexey Kopytov, according to my study (with no good benchmarks though) TLS should be faster than pthread_getspecific(), but still slower than passing function args.

              Currently we reduced number of pthread_getspecific() calls from ~1100 to ~300 per OLTP RO transaction. Alas there're different workloads which won't benefit from this.

              The plan is: pass THD through whenever it is possible, otherwise fallback to TLS if there're worthy cases.

              Show
              svoj Sergey Vojtovich added a comment - Alexey Kopytov , according to my study (with no good benchmarks though) TLS should be faster than pthread_getspecific(), but still slower than passing function args. Currently we reduced number of pthread_getspecific() calls from ~1100 to ~300 per OLTP RO transaction. Alas there're different workloads which won't benefit from this. The plan is: pass THD through whenever it is possible, otherwise fallback to TLS if there're worthy cases.
              Hide
              kaamos Alexey Kopytov added a comment -

              I see, thanks. I was asking, because I was considering the same idea for Percona Server a few years ago. Leveraging thread-local storage looked like a low-hanging fruit to optimize all those pthread_getspecific() call sites without introducing invasive code changes, but I never got around to evaluating it.

              Show
              kaamos Alexey Kopytov added a comment - I see, thanks. I was asking, because I was considering the same idea for Percona Server a few years ago. Leveraging thread-local storage looked like a low-hanging fruit to optimize all those pthread_getspecific() call sites without introducing invasive code changes, but I never got around to evaluating it.
              Hide
              svoj Sergey Vojtovich added a comment -

              Sergei Golubchik, please review another patch for this bug:

              [Commits] a5799f5: MDEV-7943 - pthread_getspecific() takes 0.76% in OLTP RO
              
              Show
              svoj Sergey Vojtovich added a comment - Sergei Golubchik , please review another patch for this bug: [Commits] a5799f5: MDEV-7943 - pthread_getspecific() takes 0.76% in OLTP RO
              Hide
              svoj Sergey Vojtovich added a comment -

              Number of pthread_getspecific() calls was reduced from ~1100 to 290. Further improvements (if any) will be done separately.

              Show
              svoj Sergey Vojtovich added a comment - Number of pthread_getspecific() calls was reduced from ~1100 to 290. Further improvements (if any) will be done separately.

                People

                • Assignee:
                  svoj Sergey Vojtovich
                  Reporter:
                  svoj Sergey Vojtovich
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 40 minutes
                    40m

                      Agile