Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Gliffy Diagrams

        Attachments

          Issue Links

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            == Data mapping from HBase to SQL ==

            Hbase table consists of rows, which are identified by row key. Each row has an
            arbitrary (potentially, very large) number of columns. (Looking at HBase
            documentation, one can conclude that HBase is specifically targeting two use
            cases:

            • a column family has few columns
            • a column family has so many columns that "get column by name" operation is
              specifically optimized for (TODO: exact link!)

            One can see two ways to map that to SQL tables:

            === Per-row mapping ===

            Let each row in HBase table be mapped into a row from SQL point of view:

            SELECT * FROM hbase_table;

            row-id column1 column2
            ------ ------- -------
            row1 data data
            row2 data data
            row3 data data

            This is the most straightforward mapping. However, accessing some of the hbase
            data will be difficult. For example, in hbase, it is natural to have this query

            given a row-id, get a list of columns in the row (maybe, with their values)

            in this mapping, result of this query will be one row, and there is no
            convinient way to return arbitrarily-long list of columns on one row.

            Table DDL could look like this:

            create table hbase_tbl_rows (
            row_id binary(MAX_HBASE_ROWID_LEN),
            columns blob,
            key (row_content)
            ) engine=hbase_row;

            1. TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it?

            === Per-cell mapping ===

            HBase shell has 'scan' command, here's an example of its output:

            hbase(main):007:0> scan 'testtable'
            ROW COLUMN+CELL
            myrow-1 column=colfam1:q1, timestamp=1297345476469, value=value-1
            myrow-2 column=colfam1:q2, timestamp=1297345495663, value=value-2
            myrow-2 column=colfam1:q3, timestamp=1297345508999, value=value-3

            Here, one HBase row produces multiple rows in the output. Each output row
            represents one (row_id, column) combination, so rows with multiple columns
            (and multiple revisions of column data) are easily represented.

            create table hbase_tbl_cells (
            row_id binary(MAX_HBASE_ROWID_LEN),
            column_name binary(N),
            timestamp TIMESTAMP,
            value binary(N)
            ) engine=hbase_row;

            1. TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it?
            • NOTE: It is nice to have table DDL independent of the hbase table. This
              saves us from need to synchronize table DDLs between hbase and mysql (which
              NDB cluster had to do and they ended up implementing a rather complex
              subsystem for it)

            == Consistency and transactionality ==

            • HBase has single-record transactions. Does this mean that HBase engine
              will have MyISAM-like characteristics? e.g. if we fail in the middle of
              an UPDATE, there is no way to go back?
            Show
            psergey Sergei Petrunia added a comment - == Data mapping from HBase to SQL == Hbase table consists of rows, which are identified by row key. Each row has an arbitrary (potentially, very large) number of columns. (Looking at HBase documentation, one can conclude that HBase is specifically targeting two use cases: a column family has few columns a column family has so many columns that "get column by name" operation is specifically optimized for (TODO: exact link!) One can see two ways to map that to SQL tables: === Per-row mapping === Let each row in HBase table be mapped into a row from SQL point of view: SELECT * FROM hbase_table; row-id column1 column2 ------ ------- ------- row1 data data row2 data data row3 data data This is the most straightforward mapping. However, accessing some of the hbase data will be difficult. For example, in hbase, it is natural to have this query given a row-id, get a list of columns in the row (maybe, with their values) in this mapping, result of this query will be one row, and there is no convinient way to return arbitrarily-long list of columns on one row. Table DDL could look like this: create table hbase_tbl_rows ( row_id binary(MAX_HBASE_ROWID_LEN), columns blob, key (row_content) ) engine=hbase_row; TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it? === Per-cell mapping === HBase shell has 'scan' command, here's an example of its output: hbase(main):007:0> scan 'testtable' ROW COLUMN+CELL myrow-1 column=colfam1:q1, timestamp=1297345476469, value=value-1 myrow-2 column=colfam1:q2, timestamp=1297345495663, value=value-2 myrow-2 column=colfam1:q3, timestamp=1297345508999, value=value-3 Here, one HBase row produces multiple rows in the output. Each output row represents one (row_id, column) combination, so rows with multiple columns (and multiple revisions of column data) are easily represented. create table hbase_tbl_cells ( row_id binary(MAX_HBASE_ROWID_LEN), column_name binary(N), timestamp TIMESTAMP, value binary(N) ) engine=hbase_row; TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it? NOTE: It is nice to have table DDL independent of the hbase table. This saves us from need to synchronize table DDLs between hbase and mysql (which NDB cluster had to do and they ended up implementing a rather complex subsystem for it) == Consistency and transactionality == HBase has single-record transactions. Does this mean that HBase engine will have MyISAM-like characteristics? e.g. if we fail in the middle of an UPDATE, there is no way to go back?
            Hide
            ratzpo Rasmus Johansson added a comment - - edited

            Finalize the description and any questions we might have. Let's then send it for review.

            Show
            ratzpo Rasmus Johansson added a comment - - edited Finalize the description and any questions we might have. Let's then send it for review.
            Hide
            psergey Sergei Petrunia added a comment -

            == Data mapping from HBase to SQL ==

            Hbase table consists of rows, which are identified by row key. Each row has an
            arbitrary (potentially, very large) number of columns. (Looking at HBase
            documentation, one can conclude that HBase is specifically targeting two use
            cases:

            • a column family has few columns
            • a column family has so many columns that "get column by name" operation is
              specifically optimized for (TODO: exact link!)

            One can see two ways to map that to SQL tables:

            === Per-row mapping ===

            Let each row in HBase table be mapped into a row from SQL point of view:

            SELECT * FROM hbase_table;

            row-id column1 column2
            ------ ------- -------
            row1 data data
            row2 data data
            row3 data data

            This is the most straightforward mapping. However, accessing some of the hbase
            data will be difficult. For example, in hbase, it is natural to have this query

            given a row-id, get a list of columns in the row (maybe, with their values)

            in this mapping, result of this query will be one row, and there is no
            convinient way to return arbitrarily-long list of columns on one row.

            Table DDL could look like this:

            CREATE TABLE hbase_tbl_rows (
            row_id BINARY(MAX_HBASE_ROWID_LEN),
            columns BLOB,
            PRIMARY KEY (row_id)
            ) ENGINE=hbase_row;

            1. TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it?

            === Per-cell mapping ===

            HBase shell has 'scan' command, here's an example of its output:

            hbase(main):007:0> scan 'testtable'
            ROW COLUMN+CELL
            myrow-1 column=colfam1:q1, timestamp=1297345476469, value=value-1
            myrow-2 column=colfam1:q2, timestamp=1297345495663, value=value-2
            myrow-2 column=colfam1:q3, timestamp=1297345508999, value=value-3

            Here, one HBase row produces multiple rows in the output. Each output row
            represents one (row_id, column) combination, so rows with multiple columns
            (and multiple revisions of column data) are easily represented.

            CREATE TABLE hbase_tbl_cells (
            row_id binary(MAX_HBASE_ROWID_LEN),
            column_name binary(MAX_HBASE_NAME_LEN),
            timestamp TIMESTAMP,
            value BLOB,
            PRIMARY KEY (row_id, column_name)
            ) ENGINE=hbase_cell;

            1. TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it?
            • NOTE: It is nice to have SQL table DDLs independent of the content of
              the backend hbase table. This saves us from the need to synchronize
              table DDLs between hbase and mysql (NDB cluster had to do this and they
              have ended up implementing a rather complex system for this).

            == Consistency, transactions, etc ==

            • HBase has single-record transactions. Does this mean that HBase storage
              engine will have MyISAM-like characteristics? e.g. if we fail in the
              middle of a multi-row UPDATE, there is no way to go back.
            • How should writes be made? Should UPDATEs/INSERTs use checkAndPut operation
              so that they don't overwrite data that's already there?
            • Are the writes important at all? (e.g. if we've had the first version with
              provide read-only access, would that be useful?)
            Show
            psergey Sergei Petrunia added a comment - == Data mapping from HBase to SQL == Hbase table consists of rows, which are identified by row key. Each row has an arbitrary (potentially, very large) number of columns. (Looking at HBase documentation, one can conclude that HBase is specifically targeting two use cases: a column family has few columns a column family has so many columns that "get column by name" operation is specifically optimized for (TODO: exact link!) One can see two ways to map that to SQL tables: === Per-row mapping === Let each row in HBase table be mapped into a row from SQL point of view: SELECT * FROM hbase_table; row-id column1 column2 ------ ------- ------- row1 data data row2 data data row3 data data This is the most straightforward mapping. However, accessing some of the hbase data will be difficult. For example, in hbase, it is natural to have this query given a row-id, get a list of columns in the row (maybe, with their values) in this mapping, result of this query will be one row, and there is no convinient way to return arbitrarily-long list of columns on one row. Table DDL could look like this: CREATE TABLE hbase_tbl_rows ( row_id BINARY(MAX_HBASE_ROWID_LEN), columns BLOB, PRIMARY KEY (row_id) ) ENGINE=hbase_row; TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it? === Per-cell mapping === HBase shell has 'scan' command, here's an example of its output: hbase(main):007:0> scan 'testtable' ROW COLUMN+CELL myrow-1 column=colfam1:q1, timestamp=1297345476469, value=value-1 myrow-2 column=colfam1:q2, timestamp=1297345495663, value=value-2 myrow-2 column=colfam1:q3, timestamp=1297345508999, value=value-3 Here, one HBase row produces multiple rows in the output. Each output row represents one (row_id, column) combination, so rows with multiple columns (and multiple revisions of column data) are easily represented. CREATE TABLE hbase_tbl_cells ( row_id binary(MAX_HBASE_ROWID_LEN), column_name binary(MAX_HBASE_NAME_LEN), timestamp TIMESTAMP, value BLOB, PRIMARY KEY (row_id, column_name) ) ENGINE=hbase_cell; TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it? NOTE: It is nice to have SQL table DDLs independent of the content of the backend hbase table. This saves us from the need to synchronize table DDLs between hbase and mysql (NDB cluster had to do this and they have ended up implementing a rather complex system for this). == Consistency, transactions, etc == HBase has single-record transactions. Does this mean that HBase storage engine will have MyISAM-like characteristics? e.g. if we fail in the middle of a multi-row UPDATE, there is no way to go back. How should writes be made? Should UPDATEs/INSERTs use checkAndPut operation so that they don't overwrite data that's already there? Are the writes important at all? (e.g. if we've had the first version with provide read-only access, would that be useful?)
            Hide
            psergey Sergei Petrunia added a comment -

            == Results of discussion with Monty ==

            As first milestone, implement only Hbase_row<-->MySQL row (see above for DDL of
            hbase_tbl_rows). The `columns` blob will contain all of the columns.

            === Partial blob writes/reads ===
            Currently, MySQL uses this approach to work with blobs: If any fragment
            of the query accesses the blob field, then the storage engine will retrieve
            the entire blob and provide it to the SQL layer as a malloc'ed char*. It is not
            possible to read/write a part of the blob.

            We'll need to extend the storage engine API somehow to accomodate working on
            parts of BLOBs.

            TODO: how exactly will the optimizer/hbase-se recognize that we need to readd
            only some particular column? Hook into table condition pushdown?

            === Need column names for Dynamic columns ===

            Dynamic columns currently identify columns by numbers. HBase identifies them by
            string names.

            Two possible approaches

            • Genuinely support arbitrary column names
            • Store names in .frm file << TODO: Monty mentioned this has some advantage
              but I don't have access to where I've written this down.
            Show
            psergey Sergei Petrunia added a comment - == Results of discussion with Monty == As first milestone, implement only Hbase_row<-->MySQL row (see above for DDL of hbase_tbl_rows). The `columns` blob will contain all of the columns. === Partial blob writes/reads === Currently, MySQL uses this approach to work with blobs: If any fragment of the query accesses the blob field, then the storage engine will retrieve the entire blob and provide it to the SQL layer as a malloc'ed char*. It is not possible to read/write a part of the blob. We'll need to extend the storage engine API somehow to accomodate working on parts of BLOBs. TODO: how exactly will the optimizer/hbase-se recognize that we need to readd only some particular column? Hook into table condition pushdown? === Need column names for Dynamic columns === Dynamic columns currently identify columns by numbers. HBase identifies them by string names. Two possible approaches Genuinely support arbitrary column names Store names in .frm file << TODO: Monty mentioned this has some advantage but I don't have access to where I've written this down.
            Hide
            timour Timour Katchaounov added a comment -

            Per Monty's request investigated if Cassandra provides any C/C++ API, and if such an API will be easier to program against compared to HBase.

            Cassandra provides the following three levels of APIs ordered by ease of use:

            • High level clients - most recommended API by Cassandra. Provides clients for many different environments/languages.
              The one for C++ is: http://snapwebsites.org/project/libqtcassandra. It is used only in one project AFAIU, therefore I would not rely on this one much.
            • Internal API "StorageProxy". This is a native Java API. As such it would have to be accessed via JNI. Cassandra documentation advise strongly against it because it is unstable, and low-level.
              (source: http://wiki.apache.org/cassandra/ClientOptions)

            In summary, I see no advantages in using Cassandra with respect to its API. The most reasonable choice seems to be Thrift, however HBase provides a Thrift API as well.
            It may be possible that Cassandra has other advantages, such as richer set of data access primitives, but I did not compare Cassandra and HBase in this respect.

            Show
            timour Timour Katchaounov added a comment - Per Monty's request investigated if Cassandra provides any C/C++ API, and if such an API will be easier to program against compared to HBase. Cassandra provides the following three levels of APIs ordered by ease of use: High level clients - most recommended API by Cassandra. Provides clients for many different environments/languages. The one for C++ is: http://snapwebsites.org/project/libqtcassandra . It is used only in one project AFAIU, therefore I would not rely on this one much. Thrift API ( http://wiki.apache.org/cassandra/API ). Thrift can generate C++ client API (same as HBase). Internal API "StorageProxy". This is a native Java API. As such it would have to be accessed via JNI. Cassandra documentation advise strongly against it because it is unstable, and low-level. (source: http://wiki.apache.org/cassandra/ClientOptions ) In summary, I see no advantages in using Cassandra with respect to its API. The most reasonable choice seems to be Thrift, however HBase provides a Thrift API as well. It may be possible that Cassandra has other advantages, such as richer set of data access primitives, but I did not compare Cassandra and HBase in this respect.
            Hide
            larsgeorge Lars George added a comment -

            Please note that HBase trunk (termed "singularity") is changing the RPC to ProtoBufs, just like Hadoop Common has done. That way it will be really easy to talk straight to the RPC natively. Obviously, this is not yet released, but seems like a good place to start given that work here still seems pending. Maybe a storage driver, one for ThriftHBase, and one later on for ProtoBufHBase if you want not to wait?

            Show
            larsgeorge Lars George added a comment - Please note that HBase trunk (termed "singularity") is changing the RPC to ProtoBufs, just like Hadoop Common has done. That way it will be really easy to talk straight to the RPC natively. Obviously, this is not yet released, but seems like a good place to start given that work here still seems pending. Maybe a storage driver, one for ThriftHBase, and one later on for ProtoBufHBase if you want not to wait?
            Hide
            psergey Sergei Petrunia added a comment -

            Lars, thanks for the note.

            Alas, it seems that for now, MDEV-431 "Cassandra storage engine" is satisfying all the demand that there was for MySQL-> cloud-no-sql connectivity.

            Show
            psergey Sergei Petrunia added a comment - Lars, thanks for the note. Alas, it seems that for now, MDEV-431 "Cassandra storage engine" is satisfying all the demand that there was for MySQL-> cloud-no-sql connectivity.
            Hide
            psergey Sergei Petrunia added a comment -

            Right now, nobody has this work in their plans. Feel free to reopen if/when that changes.

            Show
            psergey Sergei Petrunia added a comment - Right now, nobody has this work in their plans. Feel free to reopen if/when that changes.

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                ratzpo Rasmus Johansson
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 0 minutes
                  0m
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 day, 3 hours
                  1d 3h