KNOW Top 25+ Data Warehousing Interview Questions & Answers
Data Warehouse Interview Questions and Answers

KNOW Top 25+ Data Warehousing Interview Questions & Answers

Last updated on 04th Jul 2020, Blog, Interview Questions

About author

Ramanan (Sr Data Warehousing Architect )

He is a Proficient Technical Expert for Respective Industry Domain & Serving 8+ Years. Also, Dedicated to Imparts the Informative Knowledge's to Freshers. He Share's this Blogs for us.

(5.0) | 15212 Ratings 1289

These DATA WAREHOUSE Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of DATA WAREHOUSE. As per my experience good interviewers hardly plan to ask any particular question during your interview, normally questions start with some basic concept of the subject and later they continue based on further discussion and what you answer.

1) What is Datawarehousing?

Ans:

  • A Datawarehouse is the repository of a data and it is used for Management decision support system. Datawarehouse consists of wide variety of data that has high level of business conditions at a single point in time.
  • In single sentence, it is repository of integrated information which can be available for queries and analysis.

2) What is Business Intelligence?

Ans:

Business Intelligence is also known as DSS – Decision support system which refers to the technologies, application and practices for the collection, integration and analysis of the business related information or data. Even, it helps to see the data on the information itself.

3) What is Dimension Table?

Ans:

Dimension table is a table which contain attributes of measurements stored in fact tables. This table consists of hierarchies, categories and logic that can be used to traverse in nodes.

4)What is Fact Table?

Ans:

  • Fact table contains the measurement of business processes, and it contains foreign keys for the dimension tables.Example – If the business process is manufacturing of bricks
  • Average number of bricks produced by one person/machine – measure of the business process

5) What are the stages of Datawarehousing?

Ans:

There are four stages of Datawarehousing:

Data Warehousing Interview Questions and Answers
  • Offline Operational Database
  • Offline Data Warehouse
  • Real Time Datawarehouse
  • Integrated Datawarehouse

6) What is Data Mining?

Ans:

Data Mining is set to be a process of analyzing the data in different dimensions or perspectives and summarizing into a useful information. Can be queried and retrieved the data from database in their own format.

7) What is OLTP?

Ans:

OLTP is abbreviated as On-Line Transaction Processing, and it is an application that modifies the data whenever it received and has large number of simultaneous users.

8)What is the application of a data warehouse platform in the healthcare industry?

Ans:

Data warehouse system is very much benefited in the Healthcare industry by processing genomic and proteomic analysis. This report can be collected from different sources of patients and stored in a data warehouse for analytics and reporting to understand the disease and its improvement. For better drug and improvement of a drug, data warehouse technology is essential.

9)What is ODS?

Ans:

ODS is abbreviated as Operational Data Store and it is a repository of real time operational data rather than long term trend data.

10) What is the difference between View and Materialized View?

Ans:

  • A view is nothing but a virtual table which takes the output of the query and it can be used in place of tables.
  • A materialized view is nothing but an indirect access to the table data by storing the results of a query in a separate schema.

11) What is ETL?

Ans:

  • ETL is abbreviated as Extract, Transform and Load. ETL is a software which is used to reads the data from the specified data source and extracts a desired subset of data. Next, it transform the data using rules and lookup tables and convert it to a desired state.
  • Then, load function is used to load the resulting data to the target database.

12) What is VLDB?

Ans:

VLDB is abbreviated as Very Large Database and its size is set to be more than one terabyte database. These are decision support systems which is used to server large number of users.

13)What is real-time datawarehousing?

Ans:

Real-time datawarehousing captures the business data whenever it occurs. When there is business activity gets completed, that data will be available in the flow and become available for use instantly.

14) What are Aggregate tables?

Ans:

Aggregate tables are the tables which contain the existing warehouse data which has been grouped to certain level of dimensions. It is easy to retrieve data from the aggregated tables than the original table which has more number of records.This table reduces the load in the database server and increases the performance of the query.

15)What is a data warehouse? List the types of Data warehouse architectures.

Ans:

A data warehouse is the electronic storage of an organization’s historical data for the purpose of data analytics. In other words, a data warehouse contains a wide variety of data that supports the decision-making process in an organization.

  • There are mainly 3 types of data warehouse architectures:
  1. Types of Data Warehouse Architectures
  • Single-tier architecture
  • The objective of a single layer is to minimize the amount of data stored by removing data redundancy. 
  • It is not frequently used in practice.
  • Two-tier architecture
  • This architecture separates physically available sources from the data warehouse.
  • This architecture is not expandable & does not support a large number of end-users.
  • Because of network limitations, this architecture faces connectivity issues.
  • Three-tier architecture

It is the most widely used architecture that is consist of the Top, Middle and Bottom Tier.

  1. Bottom Tier:  Usually a relational database of the Datawarehouse serves as the bottom tier where Data is cleansed, transformed, and loaded.
  2. Middle Tier: This application tier is an OLAP server & presents an abstracted view of the database which acts as a mediator between the end-user and the database.
  3. Top-Tier: The top tier is a front-end client layer channels data out of the data warehouse.

16) Define data analytics in the context of data warehousing.

Ans:

  • Data analytics is the science of examining raw data with the purpose of drawing business-driven conclusions about that data.
  • The role of a data warehouse is to enable data analysis.

17) What is a subject-oriented data warehouse?

Ans:

Subject-oriented data warehouses are those that store data around a particular “subject” such as customer, sales, product, among others.

18)What does OLAP stand for?

Ans:

OLAP stands for On Line Analytical Processing. It is a system which collects, manages, and processes multi-dimensional data for analysis and management.

19)What does OLTP stand for?

Ans:

OLTP stands for On Line Transaction Processing. It is a system which modifies the data whenever it received, to a large number of concurrent users.

20)List the types of OLAP servers.

Ans:

  • Relational OLAP
  • Multidimensional OLAP
  • Hybrid OLAP
  • Specialized SQL Servers

    Subscribe For Free Demo

    21) List some of the functions performed by OLAP.

    Ans:

    Some of the major functions performed by OLAP include “roll-up”, “drill-down”, “slice”, “dice”, and “pivot”.

    22) What is a star schema?

    Ans:

    Star schema is a schema used in data warehousing where a single fact table references a number of dimension tables. In a star schema, “keys” from all the dimension tables flow into the fact table. This entity-relationship diagram resembles a star, hence it is named a Star schema.

    23) What is a snow flake schema?

    Ans:

    Just like the star schema, a single fact table references number of other dimension tables in snow flake scheme. Here however, these dimension tables are further normalized into multiple related tables. As these tables are further snow flaked into smaller tables, this schema is called a snow flake schema.

    24) What is the language that is used for schema definition?

    Ans:

    Data Mining Query Language (DMQL) is used for schema definition.

    25) What are the different types of “dimension”?

    Ans:

    • Conformed dimension
    • Junk dimension
    • Degenerated dimension
    • Role Playing dimension

    26)What is a mini dimension?

    Ans:

    Mini dimensions are dimensions that are used when a large number of rapidly changing attributes are separated into smaller tables.

    27) Define fact-less fact.

    Ans:

    Fact-less fact is a fact table that does not contain any value. Such a table only contains keys from different dimension tables.

    28)How will you do data transformation in data warehouse platform?

    Ans:

    Data will be transferred by different database tools such as MySQL Tools, MS-Access Tools this tools is connected to a server such as SQL Server, Oracle Server. After setting up the environment by using SQL with the support of Shell scripting language will be able to transfer the data to the data warehouse system.

    29) What is a data cube?

    Ans:

    A data cube helps represent data in multiple facets. Data cubes are defined by dimensions and facts.

    30) What do you understand by the ER model?

    Ans:

    ER model or entity-relationship model is a methodology for data modeling wherein the goal of modeling is to normalize the data by reducing redundancy.

    31) What do you understand by dimensional modeling?

    Ans:

    Dimensional model is a methodology that consists of “dimensions” and “fact tables”. Fact tables are used to store various transactional measurements from “dimension tables” that qualifies the data.

    32) What is VLDB in the context of data warehousing?

    Ans:

    VLDB stands for Very Large Database. The size of a VLDB is preset to more than one terabyte.

    33) What is a data mart?

    Ans:

    Data mart is a subset of organizational data. In other words, it is a collection of data specific to a particular group within an organization.

    34) What is data aggregation?

    Ans:

    Data aggregation is the broad definition for any process that enables information gathering expression in a summary form, for statistical analysis.

    35) What is summary information?

    Ans:

    Summary Information is the location within data warehouse where predefined aggregations are stored.

    Course Curriculum

    Learn On-Demand Data Warehousing Course from Real Time Experts

    Weekday / Weekend BatchesSee Batch Details

    36)What does subject-oriented data warehouse signify?

    Ans:

     Subject oriented signifies that the data warehouse stores the information around a particular subject such as product, customer, sales, etc.

    37)List any five applications of data warehouse.

    Ans:

     Some applications include financial services, banking services, customer goods, retail sectors, controlled manufacturing.

    38)What is the very basic difference between data warehouse and operational databases?

    Ans:

    A data warehouse contains historical information that is made available for analysis of the business whereas an operational database contains current information that is required to run the business.

    39)List the Schema that a data warehouse system can implements.

    Ans:

    A data Warehouse can implement star schema, snowflake schema, and fact constellation schema.

    40)List the process that are involved in Data Warehousing.

    Ans:

     Data Warehousing involves data cleaning, data integration and data consolidations.

    41)List the functions of data warehouse tools and utilities.

    Ans:

    The functions performed by Data warehouse tool and utilities are Data Extraction, Data Cleaning, Data Transformation, Data Loading and Refreshing.

    42)What do you mean by Data Extraction?

    Ans:

    Data extraction means gathering data from multiple heterogeneous sources.

    43)Define metadata?

    Ans:

    Metadata is simply defined as data about data. In other words, we can say that metadata is the summarized data that leads us to the detailed data.

    44)What does Metadata Respiratory contain?

    Ans:

     Metadata respiratory contains definition of data warehouse, business metadata, operational metadata, data for mapping from operational environment to data warehouse, and the algorithms for summarization.

    45)How does a Data Cube help?

    Ans:

    Data cube helps us to represent the data in multiple dimensions. The data cube is defined by dimensions and facts.

    46)Define dimension?

    Ans:

    The dimensions are the entities with respect to which an enterprise keeps the records.

    47)Explain data mart.

    Ans:

    Data mart contains the subset of organization-wide data. This subset of data is valuable to specific groups of an organization. In other words, we can say that a data mart contains data specific to a particular group.

    48)What is Virtual Warehouse?

    Ans:

     The view over an operational data warehouse is known as virtual warehouse.

    49)List the phases involved in the data warehouse delivery process.

    Ans:

    The stages are IT strategy, Education, Business Case Analysis, technical Blueprint, Build the version, History Load, Ad hoc query, Requirement Evolution, Automation, and Extending Scope.

    50) Define load manager.

    Ans:

    A load manager performs the operations required to extract and load the process. The size and complexity of load manager varies between specific solutions from data warehouse to data warehouse.

    51)Define the functions of a load manager.

    Ans:

    A load manager extracts data from the source system. Fast load the extracted data into temporary data store. Perform simple transformations into structure similar to the one in the data warehouse.

    52)Define a warehouse manager.

    Ans:

     Warehouse manager is responsible for the warehouse management process. The warehouse manager consist of third party system software, C programs and shell scripts. The size and complexity of warehouse manager varies between specific solutions.

    53)Define the functions of a warehouse manager.

    Ans:

     The warehouse manager performs consistency and referential integrity checks, creates the indexes, business views, partition views against the base data, transforms and merge the source data into the temporary store into the published data warehouse, backs up the data in the data warehouse, and archives the data that has reached the end of its captured life.

    54) What are the languages used in Data cleansing?

    Ans:

    R – Programming language, SQL- Structure Query Language, Advance Excel Macros.

    55)What does the Query Manager responsible for?

    Ans:

    Query Manager is responsible for directing the queries to the suitable tables.

    Course Curriculum

    Advance your Career with Data Warehousing Training By World Class Faculty

    • Instructor-led Sessions
    • Real-life Case Studies
    • Assignments
    Explore Curriculum

    56)Name some data visualization tools for presenting data reports?

    Ans:

    Tableau, ZingChart, Tibco Spotfire, Google Chart, Sigma Plot, Mini Tab

    57)Which one is faster, Multidimensional OLAP or Relational OLAP?

    Ans:

     Multidimensional OLAP is faster than Relational OLAP.

    58)List the functions performed by OLAP.

    Ans:

    OLAP performs functions such as roll-up, drill-down, slice, dice, and pivot.

    59)How many dimensions are selected in Slice operation?

    Ans:

    Only one dimension is selected for the slice operation.

    60) What are some benefits of cloud-based data warehouses when compared to on-premise solutions?

    Ans:

     In the last few years, cloud computing is prevalent, and now most of the companies prefer to use cloud-based data warehouses over traditionally used on-site warehouses. Below are the top reasons for companies using cloud-based data warehouses:

    • It is easy and practical to scale data warehouse in the cloud.
    • It is economical to store data warehouse on the cloud as it eliminates the hardware and licensing cost, which is required for on-site warehouse setup.

    The cloud data warehouse is optimized for data analytics because it uses Massively Parallel Processing (MPP) and columnar storage, which are known for offering better performance and helps in executing complex queries.

    61)Name essential approaches to data warehouse design.

    Ans:

     There are two data warehouse design approaches, Kimball and Inmon.

    • Inmon approach or top-down was proposed by Mr. Bill Inmon, the Father of data warehousing. In this approach, first, it is recommended to prepare a data warehouse, and then Data Marts are created. Through this strategy, the data warehouse becomes the central point of Corporate Information Factory (CIF), which acts as a logical framework for BI.
    • Kimball approach, also known as a bottom-up approach, suggests creating Data Mart first and later integrating it to a more massive data warehouse to complete a data warehouse. This integration of Data Mart is known as a data warehouse bus (BUS) architecture.

    62) Which are the different types of data warehousing?

    Ans:

     There are three types of data warehousing:

    • Enterprise Data Warehouse
    • It merges organizational data from its different functional areas in a centralized manner. It helps with data extracting and transforming and offers a detailed overview of any particular object in the data model.
    • Operational Data Store
    • It gives an option to produce the date from the database instantly and supports business operations by integrating contrast data from multiple sources.
    • Data Mart
    • Data Mart stores data from a particular functional area, and it comprises a subset of data that is saved in the data warehouse. It helps the analyst in swiftly analyzing the data by shrinking the volume of a large chunk of data.

    63)Name 3 types of Data Mart.

    Ans:

     Below are the 3 different types of Data Marts:

    • Dependent – It sources organizational data from a single data warehouse and helps in developing more Data Marts.
    • Independent – Here, no data is dependent on central or enterprise data warehouse, and data can be used separately for conducting an independent analysis.
    • Hybrid – It helps in ad hoc integration and is used when a data warehouse comprises inputs from different sources.

    64)What is data warehouse architecture?

    Ans:

    Conceptualized with a relational database management system (RDBMS), data warehouse architecture serves as a central repository for informational data. Here, the central repository includes several key components that make the environment operative, compliant, and accessible to operational systems.

    65) What is the three-tier architecture of a data warehouse?

    Ans:

     Below is the three-tier data warehouse architecture:

    • Bottom Tier
    • It represents the data warehouse database server, which is also known as the relational database system. It uses backend tools and utilities that are used to feed data and perform functions like – Extract, Clean, Load, and Refresh.
    • Middle Tier
    • It represents the OLAP Server, which is a form of extended relational database management system. It is known to implement multidimensional data and operations.
    • Top Tier
    • It factors the front-end client layer and holds query, analysis, and data mining tools.

    66) What are the different stages of data warehouse decision support evolution?

    Ans:

     Below are the 5 stages involved in data warehouse decision support evolution:

    • Report
    • Analyze
    • Predict
    • Operationalize
    • Active warehousing

    67) Name the components of data warehousing.

    Ans:

    Below are the 5 components of data warehousing:

    • Data Warehouse Database
    • Sourcing, Acquisition, Clean-up and Transformation Tools (ETL)
    • Metadata
    • Query Tools
    • Data warehouse Bus Architecture

    68) How is data warehousing related to business intelligence?

    Ans:

    Modern IT units run at the core of many businesses driving innovation and integration at each level. At one time, IT departments functioned as a stand-alone unit. Today, they are much more essential for companies that have undergone digital transformation because the infrastructure of these businesses is primarily influenced by technology.

    69) How Can We Run The Graph? What Is The Procedure For That? How Can We Schedule The Graph In Unix?

    Ans:

    If you want to run the graph through GDE then after save the graph just press F5 button of your keyboard, it will run automatically. If you want to run through the shell script then you have to fire the command at your UNIX box.

    70)What Is A Real-time Data Warehouse? How Is It Different From Near To Real-time Data Warehouse?

    Ans:

    • As the term suggests, a real-time data warehouse is a system, which reflects all changes to its sources in real time. As simple as it sounds, this is still an area of active research in the field. In traditional DWH, the operational system(s) are kept separate from the DWH for a good reason.
    • The Operational systems are designed to accept inputs or changes to data regularly, hence have a good chance of being regularly queried. On the other hand, a DWH is supposed to do just the opposite – it is used to query data for reports only. No changes to data, through user actions is expected (or designed). The only inputs could come from the ETL feed at stipulated times. The ETL would source its data from the Operational systems just explained above.
    • To create a real-time DWH we would have to merge both systems (several ways are being explored), a concept that is against the reason of creating a DWH. Bigger challenges occur in terms of updating aggregated data in facts at real time, still maintaining the surrogate keys.
    • Besides, we would need lightening fast hardware to try this.Near Real time DWH is a trade-off between the conventional design and the dream of all clients today. The frequency of ETL updates in higher in this case for e.g. once in 2 hours. We can also analyze and use selective refreshes at shorter time intervals, while complete refreshes may still be kept further apart. Selective refreshes would look at only those tables that get updated regularly.

    71) What Is Difference Between Drill & Scope Of Analysis?

    Ans:

    Drilling can be done in drill down, up, through, and across; scope is the overall view of the drill exercise.

    72)I Have Two Universes Created By Two Difference Database Can We Join Them In Designer & Report Level? How?

    Ans:

    We can link one universe to other universe in Universe parameters.

    73) For Faster Process, What We Will Do With The Universe?

    Ans:

    For a faster process create aggregate tables and write better sql so that the process would fast.

    74)What Is Type 2 Version Dimension?

    Ans:

    Version dimension is the SCD type II in real time it using because of it will maintain the current data and full historical data.

    75)What Is Unit Testing?

    Ans:

    The Developer created the mapping that can be tested independently by the developer individually.

    76) What Is Informatica Architecture?

    Ans:

    Informatica Architecture contains Repository, Repository server, Repository server administration console, sources, repository server and Data warehousing and it have the Designer, Work for manager, work for monitor combination of all these are called Informatica Architecture.

    77) What Is Data Analysis? Where It Will Be Used?

    Ans:

    Data analysis: consider that you are running a business and u store the data of that; in some form say in register or in a comp and at the year end you want know the profit or loss then it called data analysis .Data analysis use: then u want to know which product was sold the highest and if the business is running in a loss then finding, where we went wrong we do analysis.

    78)What Are Data Modeling And Data Mining? Where It Will Be Used?

    Ans:

    • Data modeling is the process of designing a data base model. In this data model data will be stored in two types of table fact table and dimension table.
    • Fact table contains the transaction data and dimension table contains the master data. Data mining is process of finding the hidden trends is called the data mining.
    Data Warehouse Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

    79) What Is “method/1”?

    Ans:

    Method 1 is system develop lifecycle create by Arthur Anderson a while back.

    80) After The Generation Of A Report To Whom We Have To Deploy Or What We Do After The Completion Of A Report?

    Ans:

    The generated report will be sent to the concerned business users through web or LAN.

    81) Define the concept of Business Intelligence.

    Ans:

    Business Intelligence is also named Decision Support Systems that refers to technologies, applications, and practices for the collection, integration, and analysis of business-related information or data.

    82) What are the dimension tables?

    Ans:

    A dimension table contains attributes of measurements stored in fact tables. This table is made up of hierarchies, nodes, and categories that can be used to traverse in nodes.

    83)Define the meaning of Fact table.

    Ans:

    A Fact Table contains the measurements of business processes and it will contain the foreign keys for dimension tables.

    84)How will you define the ETL?

    Ans:

    ETL means Extract, Transform, and Load. This is a software application that can read the data from a particular data source and extracts the needed subset of data. In the next step, it will transform the data using lookup tables or rules and convert it to the desired state. In the end, load function is used to load the resulting data from the target database.

    85)What do you mean by the real-time data warehousing?

    Ans:

    A real-time data warehouse captures the business data as soon as it occurs. When a business activity gets completed, data will become available for instant use.

    86) Define the meaning of aggregate tables.

    Ans:

    Aggregate table contains the data of an existing warehouse that has been grouped to a certain level of dimensions. This is easy to retrieve data from aggregate tables as compared to original tables because of a large number of records. It can reduce the load in the database server and increases the overall performance of a query.

    87)How to load the time dimensions?

    Ans:

    Time dimensions can be loaded with the help of a program where 100 years are represented with one row per day.

    88)Define the conformed fact.

    Ans:

    A conformed fact is a table that can be used across multiple data marts and multiple fact tables.

    89)What are non-additive facts?

    Ans:

    Non-additive facts cannot be summed up for any of the dimension available in the fact table. If there is any change in the dimension then the same facts can be useful.

    90) Define the concept of Datamart in brief.

    Ans:

    A Datamart is a special version of the data warehouse that contains a snapshot of the operational data and helps business people to make decisions based on past experiences and trends. A data mart is more focused on easy access to information from fact tables.

    91) How can you differentiate the Data Warehouse and OLAP?

    Ans:

    A data warehouse is a place where whole data is stored for analysis, OLAP is used for data analysis, aggregations, summation, and information planning at the minor level.

    92) What is the Entity-Relationship diagram?

    Ans:

    An Entity-Relationship diagram illustrates the relationship among entities in a database. This diagram shows the structure of different tables and links between tables.

    93) How can you define a subject-oriented data warehouse?

    Ans:

    A subject-oriented database stores data around a subject such as customers, sales, people etc.

    94)Application of data warehousing system?

    Ans:

    • Consistent and Quality of data: Data warehouse system is consistent and cost-effective for various industries for collection their customer data through various resources
    • Cost reduction: A data warehouse system reduce cost by storing all electronically collected data in a data warehouse
    • Accessibility: A data warehouse system data can have easy access to data time to time for business improvement and reporting.

    95)Name some data mining tools used in data warehouse system?

    Ans:

    Orange Data mining, R software Environment, WEKA Data Mining, RapidMiner, KNIME.

    96)What is data visualization?

    Ans:

    Data warehouses are causing a surge in the popularity of data visualization techniques for looking at data. Data visualization is not a separate class of tools; rather, it is a method of presenting the output of all the tools in such as Orange Data Mining, R Software environment, WEKA Data Mining etc.. a way that the entire problem and/or the solution (e.g. a result of a Relational or multidimensional query, or the result of data mining) is clearly visible to domain experts and even casual observers.



    Are you looking training with Right Jobs?

    Contact Us
    Get Training Quote for Free