Apache HBase Interview Questions & Answer [GUIDE TO CRACK]
HBase Interview Questions and Answers

Apache HBase Interview Questions & Answer [GUIDE TO CRACK]

Last updated on 03rd Jul 2020, Blog, Interview Questions

About author

AjithKumar (Lead Engineer - Director Level )

Highly Expertise in Respective Industry Domain with 7+ Years of Experience Also, He is a Technical Blog Writer for Past 4 Years to Renders A Kind Of Informative Knowledge for JOB Seeker

(5.0) | 16547 Ratings 899

Looking out for Apache HBase Interview Questions that are frequently asked by employers? Here is the blog on Apache HBase interview questions in Hadoop Interview Questions series. I hope you must not have missed the earlier blogs of our Hadoop Interview Question series.

After going through the HBase interview questions, you will get an in-depth knowledge of questions that are frequently asked by employers in Hadoop interviews related to HBase. This will definitely help you to kickstart your career as a Big Data Engineer and become a Big Data certified professional. 

In case you have attended any HBase interview previously, we encourage you to add your questions in the comments tab. We will be happy to answer them, and spread the word to the community of fellow job seekers.

1) Explain what is Hbase?

Ans: 

  • Hbase is a column-oriented database management system which runs on top of HDFS (Hadoop Distribute File System). Hbase is not a relational data store, and it does not support structured query language like SQL.
  • In Hbase, a master node regulates the cluster and region servers to store portions of the tables and operates the work on the data.

2) Explain why to use Hbase?

Ans: 

  • High capacity storage system
  • Distributed design to cater large tables
  • Column-Oriented Stores
  • Horizontally Scalable
  • High performance & Availability
  • Base goal of Hbase is millions of columns, thousands of versions and billions of rows
  • Unlike HDFS (Hadoop Distribute File System), it supports random real time CRUD operations

3) Mention what are the key components of Hbase?

Ans: 

  • Zookeeper: It does the co-ordination work between client and Hbase Maser
  • Hbase Master: Hbase Master monitors the Region Server
  • RegionServer: RegionServer monitors the Region
  • Region: It contains in memory data store(MemStore) and Hfile.
  • Catalog Tables: Catalog tables consist of ROOT and META

4) Explain what does Hbase consists of?

Ans: 

  • Hbase consists of a set of tables
  • And each table contains rows and columns like traditional database
  • Each table must contain an element defined as a Primary Key
  • Hbase column denotes an attribute of an object

5) Mention how many operational commands in Hbase?

Ans: 

Operational command in Hbases is about five types:

  • Get
  • Put
  • Delete
  • Scan
  • Increment

6) Explain what is WAL and Hlog in Hbase?

Ans: 

WAL (Write Ahead Log) is similar to MySQL BIN log; it records all the changes occur in data. It is a standard sequence file by Hadoop and it stores HLogkey’s.  These keys consist of a sequential number as well as actual data and are used to replay not yet persisted data after a server crash. So, in cash of server failure WAL work as a life-line and retrieves the lost data’s.

7) When you should use Hbase?

Ans: 

  • Data size is huge: When you have tons and millions of records to operate
  • Complete Redesign: When you are moving RDBMS to Hbase, you consider it as a complete re-design then mere just changing the ports
  • SQL-Less commands: You have several features like transactions; inner joins, typed columns, etc.
  • Infrastructure Investment: You need to have enough cluster for Hbase to be really useful

8) In Hbase what is column families?

Ans: 

Column families comprise the basic unit of physical storage in Hbase to which features like compressions are applied.

9) Explain what is the row key?

Ans: 

Row key is defined by the application. As the combined key is pre-fixed by the rowkey, it enables the application to define the desired sort order. It also allows logical grouping of cells and make sure that all cells with the same rowkey are co-located on the same server.

10) Explain deletion in Hbase? Mention what are the three types of tombstone markers in Hbase?

Ans: 

When you delete the cell in Hbase, the data is not actually deleted but a tombstone marker is set, making the deleted cells invisible.  Hbase deleted are actually removed during compactions.

Three types of tombstone markers are there:

  • Version delete marker: For deletion, it marks a single version of a column
  • Column delete marker: For deletion, it marks all the versions of a column
  • Family delete marker: For deletion, it marks of all column for a column family

11) Explain how does Hbase actually delete a row?

Ans: 

  • In Hbase, whatever you write will be stored from RAM to disk, these disk writes are immutable barring compaction. During deletion process in Hbase, major compaction process delete marker while minor compactions don’t. In normal deletes, it results in a delete tombstone marker- these delete data they represent are removed during compaction.
  • Also, if you delete data and add more data, but with an earlier timestamp than the tombstone timestamp, further Gets may be masked by the delete/tombstone marker and hence you will not receive the inserted value until after the major compaction.

12) Explain what happens if you alter the block size of a column family on an already occupied database?

Ans: 

   When you alter the block size of the column family, the new data occupies the new block size while the old data remains within the old block size. During data compaction, old data will take the new block size.  New files as they are flushed, have a new block size whereas existing data will continue to be read correctly. All data should be transformed to the new block size, after the next major compaction.

13) Mention the difference between Hbase and Relational Database?

Ans: 

HbaseRelational Database
It is schema-lessIt is a column-oriented data storeIt is used to store de-normalized dataIt contains sparsely populated tablesAutomated partitioning is done in Hbase It is a schema based databaseIt is a row-oriented data storeIt is used to store normalized dataIt contains thin tablesThere is no such provision or built-in support for partitioning

14) What is HBaseFsck class?

Ans: 

There is a tool name called back is available in HBase, which is implemented by the HBaseFsck class. It offers several command-line switches that influence its behavior.

15) What are the main key structures of HBase?

Ans: 

Row key and Column key are the two most important key structures using in HBase

16) Discuss how you can use filters in Apache HBase

Ans: 

Filters In HBase Shell. It was introduced in Apache HBase 0.92 which helps you to conduct server-side filtering for accessing HBase over HBase shell or thrift.

17) HBase support syntax structure like SQL yes or No?

Ans: 

No, unfortunately, SQL support for HBase is not available currently. However, by using Apache Phoenix, we can retrieve data from HBase through SQL queries.

18) What is the meaning of compaction in HBase?

Ans: 

At the time of heavy incoming writes, it is impossible to achieve optimal performance by having one file per store. HBase helps you to combines all these HFiles to reduce the number of disk seeds for every read. This process is known as for as Compaction in HBase.

19) How will you implement joins in HBase?

Ans: 

HBase, not support joins directly but uses Map Reduce jobs join queries can be implemented by retrieving data with the help of different HBase tables.

20) Explain JMX concerning HBSE

Ans: 

Java Management Extensions or JMX is an export status of Java applications is the standard for them.

    Subscribe For Free Demo

    21) What is the use of MasterServer?

    Ans: 

      Master sever helps you to assign a region to the region server as well. It also helps you to handle the load balancing we use the Master Server.

    22) Define the Term Thrift

    Ans: 

    Apache Thrift is written in C++. It provides schema compilers for various programming languages like C++, Perl, PHP, Python, Ruby, and more.

    23) Why use HColumnDescriptor class?

    Ans: 

    The detail regarding column family such as compression settings, Number of versions, are stored .in HColumnDescriptor.

    24)How is data written into HBase?

    Ans: 

    Data is written into HBase following several steps. First, when the user updates data in HBase table, it makes an entry to a commit log which is known as write-ahead log (WAL) in HBase. Next, the data is stored in the in-memory MemStore. If the data in the memory exceeds the maximum value, then it is flushed to the disk as HFile. Once the data is flushed users can discard the commit logs.

    25) When would you use HBase?

    Ans: 

    • HBase is used in cases where we need random read and write operations and it can perform a number of operations per second on a large data sets.
    • HBase gives strong data consistency.
    • It can handle very large tables with billions of rows and millions of columns on top of commodity hardware cluster.

    26) What is the use of get() method?

    Ans: 

    get() method is used to read the data from the table.

    27) Define the difference between Hive and HBase?

    Ans: 

    • Apache Hive is a data warehousing infrastructure built on top of Hadoop. It helps in querying data stored in HDFS for analysis using Hive Query Language (HQL), which is a SQL-like language, that gets translated into MapReduce jobs. Hive performs batch processing on Hadoop.
    • Apache HBase is NoSQL key/value store which runs on top of HDFS. Unlike Hive, HBase operations run in real-time on its database rather than MapReduce jobs. HBase partitions the tables, and the tables are further splitted into column families. 
    • Hive and HBase are two different Hadoop based technologies – Hive is an SQL-like engine that runs MapReduce jobs, and HBase is a NoSQL key/value database of Hadoop. We can use them together. Hive can be used for analytical queries while HBase for real-time querying. Data can even be read and written from HBase to Hive and vice-versa.

    28) Explain the data model of HBase.

    Ans: 

    HBase comprises of:

    • Set of tables.
    • Each table consists of column families and rows.
    • Row key acts as a Primary key in HBase.
    • Any access to HBase tables uses this Primary Key.
    • Each column qualifier present in HBase denotes attributes corresponding to the object which resides in the cell.

    29)  Define column families?

    Ans: 

    Column Family is a collection of columns, whereas row is a collection of column families.

    30) Define standalone mode in HBase?

    Ans: 

    It is a default mode of HBase. In standalone mode, HBase does not use HDFS—it uses the local file system instead—and it runs all HBase daemons and a local ZooKeeper in the same JVM process.

    Course Curriculum

    Get Practical Oriented HBase Certification Course By Experts Trainers

    Weekday / Weekend BatchesSee Batch Details

    31)  What is decorating Filters?

    Ans: 

     It is useful to modify, or extend, the behavior of a filter to gain additional control over the returned data. These types of filters are known as decorating filter. It includes SkipFilter and WhileMatchFilter.

    32)  What is RegionServer?

    Ans: 

    A table can be divided into several regions. A group of regions is served to the clients by a Region Server.

    33) What are the data manipulation commands of HBase?

    Ans: 

    Data Manipulation commands of HBase are:

    • put – Puts a cell value at a specified column in a specified row in a particular table.
    • get – Fetches the contents of a row or a cell.
    • delete – Deletes a cell value in a table.
    • deleteall – Deletes all the cells in a given row.
    • scan – Scans and returns the table data.
    • count – Counts and returns the number of rows in a table.
    • truncate – Disables, drops, and recreates a specified table.

    34) Which code is used to open a connection in HBase?

    Ans: 

    Following code is used to open a HBase connection, here users is my HBase table:

    • Configuration myConf = HBaseConfiguration.create();HTable table = new HTable(myConf, “users”);

    35)What is the use of truncate command?

    Ans: 

    It is used to disable, drop and recreate the specified tables.

    36) What happens when you issue a delete command in HBase?

    Ans: 

    • Once you issue a delete command in HBase for cell, column or column family, it is not deleted instantly. A tombstone marker in inserted. Tombstone is a specified data, which is stored along with standard data. This tombstone makes hides all the deleted data.
    • The actual data is deleted at the time of major compaction. In Major compaction, HBase merges and recommits the smaller HFiles of a region to a new HFile. In this process, the same column families are placed together in the new HFile. It drops deleted and expired cell in this process. All the results from scan and get filters the deleted cells.

    37)  What are different tombstone markers in HBase?

    Ans: 

    There are three types of tombstone markers in HBase:

    • Version Marker: Marks only one version of a column for deletion.
    • Column Marker: Marks the whole column (i.e. all version) for deletion.
    • Family Marker: Marks the whole column family (i.e. all the columns in the column family) for deletion

    38)HBase block size is configured on which level?

    Ans: 

    The block size is configured per column family and the default value is 64 KB. This value can be changed as per requirements.

    39)  Which command is used to run HBase Shell?

    Ans: 

     ./bin/hbase shell command is used to run the HBase shell. Execute this command in HBase directory.

    40) Which command is used to show the current HBase user?

    Ans: 

    whoa-mi command is used to show HBase user.

    41) What is the full form of MSLAB?

    Ans: 

    MSLAB stands for Memstore-Local Allocation Buffer. Whenever a request thread needs to insert data into a MemStore, it doesn’t allocates the space for that data from the heap at large, but rather allocates memory arena dedicated to the target region.

    42) Define LZO?

    Ans: 

    Lemuel-Ziv-Oberhumer (LZO) is a lossless data compression algorithm that focuses on decompression speed.

    43)  What is HBase Fsck?

    Ans: 

    HBase comes with a tool called hbck which is implemented by the HBaseFsck class. HBaseFsck (hbck) is a tool for checking for region consistency and table integrity problems and repairing a corrupted HBase. It works in two basic modes – a read-only inconsistency identifying mode and a multi-phase read-write repair mode.

    44)Explain about HBase in brief?

    Ans: 

     HBase is a column oriented distributed database residing within the Hadoop environment. It can store huge amounts of data from terabytes to petabytes. HBase is scalable, distributed big data storage on top of the Hadoop eco system. It has some special features , when compared with traditional relational database models.

    45) Explain HBase features in Hadoop?

    Ans: 

    • Storage system capable of storing huge volumes of data 
    • Distributed design to handle large tables
    • Horizontally Scalable
    • Column Oriented Stores

    46) What are Hlog and HFile?

    Ans: 

    HLog is the write-ahead log file, also known as WAL and HFile is the real data storage file. Data is first written to the write-ahead log file and also written in MemStore.Once MemStore is full, the contents of the MemStore are flushed to the disk into HFiles.

    47) What is HFile?

    Ans: 

    The HFile is the HBase underlying storage format. Each HFile belongs to a column family whereas a column family may have multiple HFiles. However, a single HFile never contain data for multiple column families.

    48) Explain CAP theorem?

    Ans: 

    •  CAP Theorem stands for Consistency, Availability and Partition tolerance. Let us have a look at the real meaning of each terminologies.
    • Consistency implies that at any moment of time, all the nodes will be able to see the same set of data. Availability makes sure that, we will receive a response for every request, even if the request was a success or failure. It’s like any handshaking mechanism in computer networks. Partition tolerance makes sure that the system continues to work despite occasional message loss or failure of part of the system. Systems with partition tolerance feature works well regardless of physical network partitions.

    49)  HBase follows which features of CAP theorem?

    Ans: 

    A column oriented database, like HBase provides features like consistency and partition tolerance.

    50) Explain two differences between HDFS and HBase in terms of storage?

    Ans: 

    The key difference between HDFS and HBase lies in the operations and processing aspect of the data. When it comes to batch processing and high latency operations, HDFS is the best suited database. When it comes to low latency operations, HBase suits well. In HDFS, we can access data, primarily through Map Reduce jobs. In case of HBase, it provides access to single rows from billions of records.

    Course Curriculum

    Get Experts Curated HBase Training with Industry Trends Concepts

    • Instructor-led Sessions
    • Real-life Case Studies
    • Assignments
    Explore Curriculum

    51)  State two differences between HBase and RDBMS?

    Ans: 

    • RDBMS is a row oriented database having a fixed schema. 
    • HBase is a column oriented storage having a flexible schema.

    52) Name some column oriented databases other than HBase?

    Ans: 

    Some of the popular column oriented databases are Cassandra, MongoDB and Couch-db. These databases can store large data sets and provide facility to access the stored data in a random manner.

    53)  Explain the column families in HBase?

    Ans: 

    Column families constitute the basic unit of physical storage in HBase upon which advanced features like compression are applied.

    54) Explain the row key in HBase?

    Ans: 

     The Row keys are byte arrays created internally by the application to identify a row. There will be a unique row key for each row. Row key does not have a data type. We can use Row keys to specify the sort order. It also makes sure that the cells with same rowkey is located in the same server.

    55)Explain HBase data flow in brief?

    Ans: 

    • There is a bidirectional communication that exists between the client and HBase Master (HMaster) which handles DDL operations, Region assignment, etc. and Zookeeper, which acts as a distributed coordination service in HBase to maintain a live cluster state.
    • To read and write operations, client contacts HRegion servers. In an HBase architecture, there can be multiple region servers. Master is responsible for assigning regions to region servers. It also checks the health of regional servers, periodically. Log files of each region server, will be stored by the Hlog present in it.

    56)  Explain about HMaster?

    Ans: 

     HMaster is one of the key services of HBase. Major tasks of the HMaster are to provide a stable performance and maintain the nodes in a cluster. It also executes the administrative operations and allocates the services to different region servers.

    57)  Explain how Hregion servers will work in HBase?

    Ans: 

    It will perform the following functions in communication with HMaster and Zookeeper.

    • Host and manage regions
    • Automatically, splitting the regions
    • Handling read, write operations
    • Direct communication with the client

    58) Explain about Zoo keeper in HBase?

    Ans: 

    Zookeeper acts as a monitoring server. It provides synchronization to a distributed environment and stores configuration info. In case, a client wants to communicate with regions servers, the client need to approach Zookeeper for access to regions servers.

    59)  Explain different type of read and write operations provided by HBase?

    Ans: 

    HBase provides random read and write operations. Data can be accessed via shell commands and various API’s REST ape, Java Client API, Avior and thrift.

    60)Explain the partition provided by HBase?

    Ans: 

    HBase provide automatic partitions.

    61) What is the difference between row and column oriented databases?

    Ans: 

     The row oriented database provides easy read and writes operations on records. It is well suited for OLTP systems. Column oriented databases are well suited for OLAP systems.

    62) What is Hlog in HBase?

    Ans: 

    Hlog is a centralized log storage and it will store all the log files of HBase. Hlogs are present in all the regional servers.

    63)  Explain about HRegions in HBase?

    Ans: 

    When a write operation request is received by region servers, it transfers the request to the appropriate Region. A region has multiple stores representing each column family. On receipt of a write request, based on the column family, it will be allocated to the appropriate store. The two main components of HStore are Memstore and Hfile. Memstore is kept in RS main memory, while HFiles are written to HDFS.

    64) Which type of data HBase can process?

    Ans: 

    HBase can store and process Structured and Semi structured data

    65) How many filters available in apache HBase?

    Ans: 

    In apace HBase, we have 18 filters. Some of the filters are Page Filter, Column Prefix Filter and Timestamp Filter.

    66)  What are data model operations in HBase?

    Ans: 

     The four data model operations present in HBase are Get, Put, Scan and Delete.

    Compare RDBMS with HBase:

    CharacteristicRDBMSHBase
    SchemaHas a fixed schemaNo fixed schema
    Query LanguageSupports structured powerful query languageSimple Query language
    Transaction ProcessingSupport ACID transactions.Is eventually consistent but does not support ACID transactions.

    67)  What do you understand by CAP theorem and which features of CAP theorem does HBase follow?

    Ans: 

    CAP stands for Consistency, Availability and Partition Tolerance:

    • Consistency –At a given point of time, all nodes in a cluster will be able to see the same set of data.
    • Availability- Every request generates a response, regardless of whether it is a success or a failure.
    • Partition Tolerance – System continues to work even if there is failure of part of the system or intermittent message loss.
    • HBase is a column oriented databases providing features like partition tolerance and consistency.

    68)  Name few other popular column oriented databases like HBase.

    Ans: 

     CouchDB, MongoDB, Cassandra

    69) What do you understand by Filters in HBase?

    Ans: 

    HBase filters enhance the effectiveness of working with large data stored in tables by allowing users to add limiting selectors to a query and eliminate the data that is not required. Filters have access to the complete row to which they are applied.

    HBase has 18 filters:

    • TimestampsFilter
    • PageFilter
    • MultipleColumnPrefixFilter
    • FamilyFilter
    • ColumnPaginationFilter
    • SingleColumnValueFilter
    • RowFilter
    • QualifierFilter
    • ColumnRangeFilter
    • ValueFilter
    • PrefixFilter
    • SingleColumnValueExcludeFilter
    • ColumnCountGetFilter
    • InclusiveStopFilter
    • DependentColumnFilter
    • FirstKeyOnlyFilter
    • KeyOnlyFilter

    70)  Explain about the data model operations in HBase.

    Ans: 

    • Put Method: To store data in HBase
    • Get Method: To retrieve data stored in HBase.
    • Delete Method: To delete the data from HBase tables.
    Hbase Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

    71)How will you back up an HBase cluster?

    Ans: 

    HBase cluster backups are performed in 2 ways:

    • Live Cluster Backup
    • Full Shutdown Backup
    • In live cluster backup strategy, copy table utility is used to copy the data from one table to another on the same cluster or another cluster. Export utility can also be used to dump the contents of the table onto HDFS on the same cluster.
    • In full shutdown backup approach, a periodic complete shutdown of the HBase cluster is performed so that the Master and Region Servers go down and if there are hardly any chances of losing out the in-flight changes happening to metadata or StoreFiles. However, this kind of approach can be used only for back-end analytic capacity and not for applications that serve front end webpages.

    72) Does HBase support SQL like syntax?

    Ans: 

    SQL like support for HBase is not yet available. With the use of Apache Phoenix, user can retrieve data from HBase through SQL queries.

    73) Is it possible to iterate through the rows of HBase table in reverse order?

    Ans: 

    No. Column values are put on disk and the length of the value is written first and then the actual value is written. To iterate through these values in reverse order-the bytes of the actual value should be written twice.

    74)  Should the region server be located on all DataNodes?

    Ans: 

     Yes. Region Servers run on the same servers as Data Nodes.

    75)  Suppose that your data is stored in collections, for instance some binary data, message data or metadata is all keyed on the same value. Will you use HBase for this?

    Ans: 

    Yes, it is ideal to use HBase whenever key based access to data is required for storing and retrieving.

    76)   Assume that an HBase table Student is disabled. Can you tell me how will I access the student table using Scan command once it is disabled?

    Ans: 

    Any HBase table that is disabled cannot be accessed using Scan command.

    77)  What do you understand by compaction?

    Ans: 

    During periods of heavy incoming writes, it is not possible to achieve optimal performance by having one file per store. Thus, HBase combines all these H Files to reduce the number of disk seeds for every read. This process is referred to as Compaction in HBase.

    78) Explain about the various table design approaches in HBase?

    Ans: 

    • Tall-Narrow and Flat-Wide are the two HBase table design approaches that can be used. However, which approach should be used when merely depends on what you want to achieve and how you want to use the data. The performance of HBase completely depends on the Row Key and hence on directly on how data is accessed.
    • On a high level the major difference between flat-wide and tall-narrow approach is similar to the difference between get and scan. Full scans are costly in HBase because of ordered Row Key storage policy. Tall-narrow approach can be used when there is a complex Row Key so that focused scans can be performed on logical group of entries.
    • Ideally, tall-narrow approach is used when there are less number of rows and large number of columns whereas flat-wide approach is used when there are less number of columns and large number of rows.

    79)Which one would you recommend for HBase table design approach – tall-narrow or flat wide?

    Ans: 

    There are several factors to be considered when deciding between flat-wide (millions of columns and limited keys) and tall-narrow (millions of keys with limited columns), however, a tall-narrow approach is often recommended because of the following reasons:

    • Under extreme scenarios, a flat-wide approach might end up with a single row per region, resulting in poor performance and scalability.
    • Table scans are often efficient over multiple reads. Considering that only a subset of the row data will be required, tall-narrow table design approach will provide better performance over flat-wide approach.

    80)What is the best practice on deciding the number of column families for HBase table?

    Ans: 

     It is ideal not to exceed the number of columns families per HBase table by 15 because every column family in HBase is stored as a single file, so large number of columns families will be required to read and merge multiple files.

    81) What is MemStore?

    Ans: 

    The MemStore is a write buffer used in HBase to accumulate data in memory before its permanent write. When the MemStore fills up with data, the contents are flushed to disk and form a new Hfile. There exists one MemStore per column family.

    82)What is the difference between HBase and HDFS?

    Ans: 

    • HDFS is a local file system in Hadoop for storing large files but it does not provide tabular form of storage. HDFS is more like a local file system (NTFS or FAT). Data in HDFS is accessed through Map Reduce jobs and is well suited for high latency batch processing operations.
    • HBase is a column oriented database on Hadoop that runs on top of HDFS and stores data in tabular format. HBase is like a database management system that communicates with HDFS to write logical tabular data to physical file system. One can access single rows using HBase from billions of records it has and is well-suited for low latency operations. HBase puts data in indexed StoreFiles present on HDFS for high speed lookups.

    83)What do you know about Apache HBase?

    Ans: 

    Apache HBase is the open source Hadoop based non-relational, distributed, database management system. It is ideal for hosting large tables consists of a large number of rows and columns on top of a set of clustered commodity hardware. Moreover, it is the versioned and real-time database that gives the user read/write access. HBase has features like:

    • Modular and linear scalability.
    • Maintains strict and consistent reads/writes.
    • Configurable and automatic sharing of tables
    • Automatic support against fail over between Region Servers.
    • Base classes support to back up Hadoop Map Reduce jobs with Apache HBase tables.
    • User accessible Java API for client access.
    • Bloom Filters and Block cache for real-time queries.
    • Server-side filters for query predicate push down
    • Thrift gateway with REST-full Web service to support XML, binary data encoding, and Prototype.
    • An extensible shell which is J Ruby-based.
    • It supports exporting metrics using the Hadoop metrics subsystem to Ganglia or files or via JMX.

    84) What are the different operational commands used in Apache HBase?

    Ans: 

     These are: Put, Get, Delete, last, Scan and Increment.

    85) What are the best reasons to choose HBase as DBMS for Hadoop?

    Ans: 

    Some of the properties of Apache HBase make it the best choice as DBMS for Hadoop. Here are some of them:

    • Scalability
    • It is ideal for handling large tables which can fit billions of rows and columns
    • Users can read/write on the database on a real-time basis.
    • Compatible with Hadoop as both of them are Java based.
    • Vast operational support for CRUD operations.

        86) What are the different key components of HBase?

    Ans: 

    HBase key components are:

    • Regions: These are horizontally divided rows of HBase table. This component of HBase contains Hfile and memory data store.
    • Region Server: This component monitors the Regions.
    • HBase Master or H Master: This component is responsible for region assignment and also monitors the region server.
    • Zookeeper: It acts as the distributed coordination service between the client and HBase Master component and also maintains the server state in the cluster. It monitors which servers are available and alive. In addition to that,it notifies when a server fails to perform.
    HBase Interview Questions and Answers

    87)What is Row Key?

    Ans: 

    Row Key is a unique identifier in an HBase which is used to logically group the table cells to ensure that all cells with the similar Row Keys are placed on the same server. However, internally RowKey is a byte array.

    88)What are the differences between RDBMS and HBase table?

    Ans: 

    • RDBMS is schema-based database whereas HBase is a data model which is does not follow any schema.
    • RDBMS supports in-built table partitioning, but HBase supports an automated partitioning.
    • RDBMS stores data in normalized format whereas HBase stores d-normalized data.

    89)What do you mean by WAL?

    Ans: 

    WAL stands for Write Ahead Log. This HBase log records all the changes in the table data irrespective of the mode of change. This log file itself is a standard sequence file. The main utility of this file is to provide data access to the users even after the server crash.

    90) What are the different catalog tables in HBase?

    Ans: 

    • There are two main catalog tables in HBase which are ROOT and META. The purpose of the ROOT table is to track META table, and the META table is for storing the regions in HBase system.
    • Are you a fresher aspired to make a career in Hadoop? Read our previous blog that will help you to start learning Hadoop for beginners.

    91) What are the tombstones markers in HBase and how many tombstones markers are there in HBase?

    Ans: 

    When a user deletes a cell in HBase table, though it gets invisible in the table but remains in the server in the form of a marker, which is commonly known as tombstones marker. During compaction period the tombstones marker gets removed from the server.

    There are three tombstones markers :

    • Version delete
    • Family delete
    • Column delete

    92) Mention about a few scenarios when you should consider HBase as a database?

    Ans: 

    Here are a few scenarios :

    • When we need to shift an entire database
    • To handle large data operations
    • When we need frequent inner joins
    • When frequent transaction maintenance is a need.
    • When we need to handle variable schema.
    • When an application demands for key-based data retrieval
    • When the data is in the form of collections

    93) What is the difference between HBase and Hive?

    Ans: 

    Apache HBase and Apache Hive have a quite very different infrastructure which is based on Hadoop.

    • Apache Hive is a data warehouse that is built on top of Hadoop whereas Apache HBase is a data store which is No SQL key/value based and runs on top of Hadoop HDFS.
    • Apache Hive allows querying data stored on HDFS via HQL for analysis which gets translated to Map Reduce jobs. On the contrary Apache HBase operations run on a real-time basis on its database rather than as Map Reduce jobs.
    • Apache Hive can handle a limited amount of data through its partitioning feature whereas HBase supports handling of a large volume of data through its key/value pairs feature.
    • Apache Hive does not have versioning feature whereas Apache HBase provides versioned data.

    94) Name some of the important filters in HBase.

    Ans: 

    Column Filter, Page Filter, Row Filter, Family Filter, Inclusive Stop Filter 

    95) What is the column family? What is its purpose in HBase?

    Ans: 

     Column family is a key in HBase table that represents the logical deviation of data. It impacts the data storage format in HDFS. Hence, every HBase table must have at least one column family.

    96) What is BlockCache?

    Ans: 

      HBase BlockCache is another data storage used in HBase. It is used to keep the most used data in JVM heap. The main purpose of such data storage is to provide access to data from HFiles to avoid disk reading. Each column family in HBase has its own BlockCache. Similarly, each Block in BlockCache represents the unit of data whereas an Hfile is a sequence of blocks with an index over those blocks.

    HBase Interview Questions and Answers

    Are you looking training with Right Jobs?

    Contact Us
    Get Training Quote for Free