KNOW Top 25+ Data Warehouse [BI] Interview Questions & Answers
Data Warehouse Interview Questions and Answers

KNOW Top 25+ Data Warehouse [BI] Interview Questions & Answers

Last updated on 03rd Jul 2020, Blog, Interview Questions

About author

Karthikeyan (Technical Lead - Hive )

(5.0) | 17212 Ratings 2149

These Data Warehouse Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of Data Warehouse. As per my experience good interviewers hardly plan to ask any particular question during your interview, normally questions start with some basic concept of the subject and later they continue based on further discussion and what you answer.we are going to cover top 100 Data Warehouse Interview questions along with their detailed answers. We will be covering Data Warehouse scenario based interview questions, Data Warehouse interview questions for freshers as well as Data Warehouse interview questions and answers for experienced. 

1. What is Datawarehousing?

Ans:

A Datawarehouse is the repository of a data and it is used for Management decision support system. Datawarehouse consists of wide variety of data that has high level of business conditions at a single point in time.

In single sentence, it is repository of integrated information which can be available for queries and analysis.

2. What is Business Intelligence?

Ans:

Business Intelligence is also known as DSS – Decision support system which refers to the technologies, application and practices for the collection, integration and analysis of the business related information or data. Even, it helps to see the data on the information itself.

3. What is Dimension Table?

Ans:

Dimension table is a table which contain attributes of measurements stored in fact tables. This table consists of hierarchies, categories and logic that can be used to traverse in nodes.

4. What is Fact Table?

Ans:

Fact table contains the measurement of business processes, and it contains foreign keys for the dimension tables.Example – If the business process is manufacturing of bricks

Average number of bricks produced by one person/machine – measure of the business process

5. What are the stages of Datawarehousing?

Ans:

There are four stages of Datawarehousing:

datawarehouse
  • Offline Operational Database
  • Offline Data Warehouse
  • Real Time Datawarehouse
  • Integrated Datawarehouse

6. What is Data Mining?

Ans:

Data Mining is set to be a process of analyzing the data in different dimensions or perspectives and summarizing into a useful information. Can be queried and retrieved the data from database in their own format.

7. What is OLTP?

Ans:

OLTP is abbreviated as On-Line Transaction Processing, and it is an application that modifies the data whenever it received and has large number of simultaneous users.

8. What is OLAP?

Ans:

OLAP is abbreviated as Online Analytical Processing, and it is set to be a system which collects, manages, processes multi-dimensional data for analysis and management purposes.

9.What is ODS?

Ans:

ODS is abbreviated as Operational Data Store and it is a repository of real time operational data rather than long term trend data.

10. What is the difference between View and Materialized View?

Ans:

  • A view is nothing but a virtual table which takes the output of the query and it can be used in place of tables.
  • A materialized view is nothing but an indirect access to the table data by storing the results of a query in a separate schema.

11. What is ETL?

Ans:

ETL is abbreviated as Extract, Transform and Load. ETL is a software which is used to reads the data from the specified data source and extracts a desired subset of data. Next, it transform the data using rules and lookup tables and convert it to a desired state.

Then, load function is used to load the resulting data to the target database.

12. What is VLDB?

Ans:

VLDB is abbreviated as Very Large Database and its size is set to be more than one terabyte database. These are decision support systems which is used to server large number of users.

13.What is real-time datawarehousing?

Ans:

Real-time datawarehousing captures the business data whenever it occurs. When there is business activity gets completed, that data will be available in the flow and become available for use instantly.

14. What are Aggregate tables?

Ans:

Aggregate tables are the tables which contain the existing warehouse data which has been grouped to certain level of dimensions. It is easy to retrieve data from the aggregated tables than the original table which has more number of records.This table reduces the load in the database server and increases the performance of the query.

15.What is a data warehouse? List the types of Data warehouse architectures.

Ans:

A data warehouse is the electronic storage of an organization’s historical data for the purpose of data analytics. In other words, a data warehouse contains a wide variety of data that supports the decision-making process in an organization.

There are mainly 3 types of data warehouse architectures:

Types of Data Warehouse Architectures

  • Single-tier architecture

The objective of a single layer is to minimize the amount of data stored by removing data redundancy. 

It is not frequently used in practice.

  • Two-tier architecture

This architecture separates physically available sources from the data warehouse.

This architecture is not expandable & does not support a large number of end-users.

Because of network limitations, this architecture faces connectivity issues.

  • Three-tier architecture

It is the most widely used architecture that is consist of the Top, Middle and Bottom Tier.

  1. Bottom Tier:  Usually a relational database of the Datawarehouse serves as the bottom tier where Data is cleansed, transformed, and loaded.
  2. Middle Tier: This application tier is an OLAP server & presents an abstracted view of the database which acts as a mediator between the end-user and the database.
  3. Top-Tier: The top tier is a front-end client layer channels data out of the data warehouse.

16. Define data analytics in the context of data warehousing.

Ans:

Data analytics is the science of examining raw data with the purpose of drawing business-driven conclusions about that data.

The role of a data warehouse is to enable data analysis.

17. What is a subject-oriented data warehouse?

Ans:

Subject-oriented data warehouses are those that store data around a particular “subject” such as customer, sales, product, among others.

18. List the types of OLAP servers.

Ans:

  • Relational OLAP
  • Multidimensional OLAP
  • Hybrid OLAP
  • Specialized SQL Servers

19. List some of the functions performed by OLAP.

Ans:

Some of the major functions performed by OLAP include “roll-up”, “drill-down”, “slice”, “dice”, and “pivot”.

 20. What is a star schema?

Ans:

Star schema is a schema used in data warehousing where a single fact table references a number of dimension tables. In a star schema, “keys” from all the dimension tables flow into the fact table. This entity-relationship diagram resembles a star, hence it is named a Star schema.

    Subscribe For Free Demo

    21. What is a snow flake schema?

    Ans:

    Just like the star schema, a single fact table references number of other dimension tables in snow flake scheme. Here however, these dimension tables are further normalized into multiple related tables. As these tables are further snow flaked into smaller tables, this schema is called a snow flake schema.

    22. What is the language that is used for schema definition?

    Ans:

    Data Mining Query Language (DMQL) is used for schema definition.

    23. What are the different types of “dimension”?

    Ans:

    • Conformed dimension
    • Junk dimension
    • Degenerated dimension
    • Role Playing dimension

    24. What is a mini dimension?

    Ans:

    Mini dimensions are dimensions that are used when a large number of rapidly changing attributes are separated into smaller tables.

    25. Define fact-less fact.

    Ans:

    Fact-less fact is a fact table that does not contain any value. Such a table only contains keys from different dimension tables.

    26. What is a data cube?

    Ans:

    A data cube helps represent data in multiple facets. Data cubes are defined by dimensions and facts.

    27. What do you understand by the ER model?

    Ans:

    ER model or entity-relationship model is a methodology for data modeling wherein the goal of modeling is to normalize the data by reducing redundancy.

    28. What do you understand by dimensional modeling?

    Ans:

    Dimensional model is a methodology that consists of “dimensions” and “fact tables”. Fact tables are used to store various transactional measurements from “dimension tables” that qualifies the data.

    29. What is a data mart?

    Ans:

    Data mart is a subset of organizational data. In other words, it is a collection of data specific to a particular group within an organization.

    30. What is data aggregation?

    Ans:

    Data aggregation is the broad definition for any process that enables information gathering expression in a summary form, for statistical analysis.

    Course Curriculum

    Get Comprehensive Data Warehouse Training to Build Your Career

    • Instructor-led Sessions
    • Real-life Case Studies
    • Assignments
    Explore Curriculum

    31. What is summary information?

    Ans:

    Summary Information is the location within data warehouse where predefined aggregations are stored.

    32.List any five applications of data warehouse.

    Ans:

     Some applications include financial services, banking services, customer goods, retail sectors, controlled manufacturing.

    33.What is the very basic difference between data warehouse and operational databases?

    Ans:

    A data warehouse contains historical information that is made available for analysis of the business whereas an operational database contains current information that is required to run the business.

    34.List the Schema that a data warehouse system can implements.

    Ans:

    A data Warehouse can implement star schema, snowflake schema, and fact constellation schema.

    35.List the process that are involved in Data Warehousing.

    Ans:

     Data Warehousing involves data cleaning, data integration and data consolidations.

    36.List the functions of data warehouse tools and utilities.

    Ans:

    The functions performed by Data warehouse tool and utilities are Data Extraction, Data Cleaning, Data Transformation, Data Loading and Refreshing.

    37.What do you mean by Data Extraction?

    Ans:

    Data extraction means gathering data from multiple heterogeneous sources.

    38.Define metadata?

    Ans:

    Metadata is simply defined as data about data. In other words, we can say that metadata is the summarized data that leads us to the detailed data.

    39.What does Metadata Respiratory contain?

    Ans:

     Metadata respiratory contains definition of data warehouse, business metadata, operational metadata, data for mapping from operational environment to data warehouse, and the algorithms for summarization.

    40. How does a Data Cube help?

    Ans:

    Data cube helps us to represent the data in multiple dimensions. The data cube is defined by dimensions and facts.

    41.Define dimension?

    Ans:

    The dimensions are the entities with respect to which an enterprise keeps the records.

    42.What is Virtual Warehouse?

    Ans:

     The view over an operational data warehouse is known as virtual warehouse.

    43.List the phases involved in the data warehouse delivery process.

    Ans:

    The stages are IT strategy, Education, Business Case Analysis, technical Blueprint, Build the version, History Load, Ad hoc query, Requirement Evolution, Automation, and Extending Scope.

    44. Define load manager.

    Ans:

    A load manager performs the operations required to extract and load the process. The size and complexity of load manager varies between specific solutions from data warehouse to data warehouse.

    45.Define the functions of a load manager.

    Ans:

    A load manager extracts data from the source system. Fast load the extracted data into temporary data store. Perform simple transformations into structure similar to the one in the data warehouse.

    46.Define a warehouse manager.

    Ans:

     Warehouse manager is responsible for the warehouse management process. The warehouse manager consist of third party system software, C programs and shell scripts. The size and complexity of warehouse manager varies between specific solutions.

    47.Define the functions of a warehouse manager.

    Ans:

     The warehouse manager performs consistency and referential integrity checks, creates the indexes, business views, partition views against the base data, transforms and merge the source data into the temporary store into the published data warehouse, backs up the data in the data warehouse, and archives the data that has reached the end of its captured life.

    48.What does the Query Manager responsible for?

    Ans:

    Query Manager is responsible for directing the queries to the suitable tables.

    49.Which one is faster, Multidimensional OLAP or Relational OLAP?

    Ans:

     Multidimensional OLAP is faster than Relational OLAP.

    50.How many dimensions are selected in Slice operation?

    Ans:

    Only one dimension is selected for the slice operation.

    Course Curriculum

    Advance Your Skills with Data Warehouse Training Course

    Weekday / Weekend BatchesSee Batch Details

    51. What are some benefits of cloud-based data warehouses when compared to on-premise solutions?

    Ans:

     In the last few years, cloud computing is prevalent, and now most of the companies prefer to use cloud-based data warehouses over traditionally used on-site warehouses. Below are the top reasons for companies using cloud-based data warehouses:

    • It is easy and practical to scale data warehouse in the cloud.
    • It is economical to store data warehouse on the cloud as it eliminates the hardware and licensing cost, which is required for on-site warehouse setup.

    The cloud data warehouse is optimized for data analytics because it uses Massively Parallel Processing (MPP) and columnar storage, which are known for offering better performance and helps in executing complex queries.

    52. Name essential approaches to data warehouse design.

    Ans:

     There are two data warehouse design approaches, Kimball and Inmon.

    • Inmon approach or top-down was proposed by Mr. Bill Inmon, the Father of data warehousing. In this approach, first, it is recommended to prepare a data warehouse, and then Data Marts are created. Through this strategy, the data warehouse becomes the central point of Corporate Information Factory (CIF), which acts as a logical framework for BI.
    • Kimball approach, also known as a bottom-up approach, suggests creating Data Mart first and later integrating it to a more massive data warehouse to complete a data warehouse. This integration of Data Mart is known as a data warehouse bus (BUS) architecture.

    53. Name 3 types of Data Mart.

    Ans:

     Below are the 3 different types of Data Marts:

    • Dependent – It sources organizational data from a single data warehouse and helps in developing more Data Marts.
    • Independent – Here, no data is dependent on central or enterprise data warehouse, and data can be used separately for conducting an independent analysis.
    • Hybrid – It helps in ad hoc integration and is used when a data warehouse comprises inputs from different sources.

    54. What are the different stages of data warehouse decision support evolution?

    Ans:

     Below are the 5 stages involved in data warehouse decision support evolution:

    • Report
    • Analyze
    • Predict
    • Operationalize
    • Active warehousing

    55. Name the components of data warehousing.

    Ans:

    Below are the 5 components of data warehousing:

    • Data Warehouse Database
    • Sourcing, Acquisition, Clean-up and Transformation Tools (ETL)
    • Metadata
    • Query Tools
    • Data warehouse Bus Architecture

    56. How is data warehousing related to business intelligence?

    Ans:

    Modern IT units run at the core of many businesses driving innovation and integration at each level. At one time, IT departments functioned as a stand-alone unit. Today, they are much more essential for companies that have undergone digital transformation because the infrastructure of these businesses is primarily influenced by technology.

    57. How Can We Run The Graph? What Is The Procedure For That? How Can We Schedule The Graph In Unix?

    Ans:

    If you want to run the graph through GDE then after save the graph just press F5 button of your keyboard, it will run automatically. If you want to run through the shell script then you have to fire the command at your UNIX box.

    58.What Is A Real-time Data Warehouse? How Is It Different From Near To Real-time Data Warehouse?

    Ans:

    As the term suggests, a real-time data warehouse is a system, which reflects all changes to its sources in real time. As simple as it sounds, this is still an area of active research in the field. In traditional DWH, the operational system(s) are kept separate from the DWH for a good reason.

    The Operational systems are designed to accept inputs or changes to data regularly, hence have a good chance of being regularly queried. On the other hand, a DWH is supposed to do just the opposite – it is used to query data for reports only. No changes to data, through user actions is expected (or designed). The only inputs could come from the ETL feed at stipulated times. The ETL would source its data from the Operational systems just explained above.

    To create a real-time DWH we would have to merge both systems (several ways are being explored), a concept that is against the reason of creating a DWH. Bigger challenges occur in terms of updating aggregated data in facts at real time, still maintaining the surrogate keys.

    Besides, we would need lightening fast hardware to try this.Near Real time DWH is a trade-off between the conventional design and the dream of all clients today. The frequency of ETL updates in higher in this case for e.g. once in 2 hours. We can also analyze and use selective refreshes at shorter time intervals, while complete refreshes may still be kept further apart. Selective refreshes would look at only those tables that get updated regularly.

    59. What Is Difference Between Drill & Scope Of Analysis?

    Ans:

    Drilling can be done in drill down, up, through, and across; scope is the overall view of the drill exercise.

    60. I Have Two Universes Created By Two Difference Database Can We Join Them In Designer & Report Level? How?

    Ans:

    We can link one universe to other universe in Universe parameters.

    61. For Faster Process, What We Will Do With The Universe?

    Ans:

    For a faster process create aggregate tables and write better sql so that the process would fast.

    62.What Is Type 2 Version Dimension?

    Ans:

    Version dimension is the SCD type II in real time it using because of it will maintain the current data and full historical data.

    63. What Is Unit Testing?

    Ans:

    The Developer created the mapping that can be tested independently by the developer individually.

    64. What Is Informatica Architecture?

    Ans:

    Informatica Architecture contains Repository, Repository server, Repository server administration console, sources, repository server and Data warehousing and it have the Designer, Work for manager, work for monitor combination of all these are called Informatica Architecture.

    65. What Is Data Analysis? Where It Will Be Used?

    Ans:

    Data analysis: consider that you are running a business and u store the data of that; in some form say in register or in a comp and at the year end you want know the profit or loss then it called data analysis .Data analysis use: then u want to know which product was sold the highest and if the business is running in a loss then finding, where we went wrong we do analysis.

    66. What Are Data Modeling And Data Mining? Where It Will Be Used?

    Ans:

    Data modeling is the process of designing a data base model. In this data model data will be stored in two types of table fact table and dimension table.

    Fact table contains the transaction data and dimension table contains the master data. Data mining is process of finding the hidden trends is called the data mining.

    67. What Is “method/1”?

    Ans:

    Method 1 is system develop lifecycle create by Arthur Anderson a while back.

    68. After The Generation Of A Report To Whom We Have To Deploy Or What We Do After The Completion Of A Report?

    Ans:

    The generated report will be sent to the concerned business users through web or LAN.

    69.How to load the time dimensions?

    Ans:

    Time dimensions can be loaded with the help of a program where 100 years are represented with one row per day.

    70.Define the conformed fact.

    Ans:

    A conformed fact is a table that can be used across multiple data marts and multiple fact tables.

    Data Warehouse Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

    71. What are non-additive facts?

    Ans:

    Non-additive facts cannot be summed up for any of the dimension available in the fact table. If there is any change in the dimension then the same facts can be useful.

    72. Define the concept of Datamart in brief.

    Ans:

    A Datamart is a special version of the data warehouse that contains a snapshot of the operational data and helps business people to make decisions based on past experiences and trends. A data mart is more focused on easy access to information from fact tables.

    73. How can you differentiate the Data Warehouse and OLAP?

    Ans:

    A data warehouse is a place where whole data is stored for analysis, OLAP is used for data analysis, aggregations, summation, and information planning at the minor level.

    74. What is the Entity-Relationship diagram?

    Ans:

    An Entity-Relationship diagram illustrates the relationship among entities in a database. This diagram shows the structure of different tables and links between tables.

    75. How can you define a subject-oriented data warehouse?

    Ans:

    A subject-oriented database stores data around a subject such as customers, sales, people etc.

    76.Application of data warehousing system?

    Ans:

    • Consistent and Quality of data: Data warehouse system is consistent and cost-effective for various industries for collection their customer data through various resources
    • Cost reduction: A data warehouse system reduce cost by storing all electronically collected data in a data warehouse
    • Accessibility: A data warehouse system data can have easy access to data time to time for business improvement and reporting.

    77. Name some data mining tools used in data warehouse system?

    Ans:

    Orange Data mining, R software Environment, WEKA Data Mining, RapidMiner, KNIME.

    78.What is data visualization?

    Ans:

    Data warehouses are causing a surge in the popularity of data visualization techniques for looking at data. Data visualization is not a separate class of tools; rather, it is a method of presenting the output of all the tools in such as Orange Data Mining, R Software environment, WEKA Data Mining etc.. a way that the entire problem and/or the solution (e.g. a result of a Relational or multidimensional query, or the result of data mining) is clearly visible to domain experts and even casual observers.

    79. Name some data visualization tools for presenting data reports?

    Ans:

    Tableau, ZingChart, Tibco Spotfire, Google Chart, Sigma Plot, Mini Tab

    80. What are the languages used in Data cleansing?

    Ans:

    R – Programming language, SQL- Structure Query Language, Advance Excel Macros.

    81. How will you do data transformation in data warehouse platform?

    Ans:

    Data will be transferred by different database tools such as MySQL Tools, MS-Access Tools this tools is connected to a server such as SQL Server, Oracle Server. After setting up the environment by using SQL with the support of Shell scripting language will be able to transfer the data to the data warehouse system.

    82.What is the application of a data warehouse platform in the healthcare industry?

    Ans:

    Data warehouse system is very much benefited in the Healthcare industry by processing genomic and proteomic analysis. This report can be collected from different sources of patients and stored in a data warehouse for analytics and reporting to understand the disease and its improvement. For better drug and improvement of a drug, data warehouse technology is essential.

    Are you looking training with Right Jobs?

    Contact Us
    Get Training Quote for Free