25+ [SURE SHOT] Dimensional Data Modeling Interview Questions & Answers
Last updated on 04th Jul 2020, Blog, Interview Questions
Data modeling interview questions are those designed for candidates to display introductory to expert level knowledge of data modeling principles and practices. In addition to emphasizing your skills in data modeling, these interview questions also seek to extract your experience with data modeling tools, principles and resources. Employers use data modeling interview questions to determine if you have the technical data modeling skills for the role. Using the right set of data modeling interview questions employers can determine your level of experience with data models, where your data modeling skills lie and in what ways you can be developed. This helps employers choose the best candidate for the role they are trying to fill.
Q1. What do you understand by Data Modelling?
Data Modelling is the diagrammatic representation showing how the entities are related to each other. It is the initial step towards database design. We first create the conceptual model, then the logical model and finally move to the physical model.Generally, the data models are created in data analysis &design phase of software development life cycle.
Q2. Explain your understanding of different data models?
There are three types of data models:
The level of complexity and detail increases from conceptual to logical to a physical data model.The conceptual model shows a very basic high level of design while the physical data model shows a very detailed view of design.
- Conceptual Model will be just portraying entity names and entity relationships. Figure 1 shown in the later part of this article depicts a conceptual model.
- Logical Model will be showing up entity names, entity relationships, attributes, primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article depicts a logical model.
- Physical Data Model will be showing primary keys, foreign keys, table names, column names and column data types. This view actually elaborates how the model will be actually implemented in the database.
Q3. Throw some light on your experience in Data Modelling with respect to projects you have worked on till date?
This was the very first question in one of my Data Modelling interviews. So, before you step into the interview discussion, you should have a very clear picture of how data modeling fits into the assignments you have worked upon.
I have worked on a project for a health insurance provider company where we have interfaces build in Informatics that transforms and process the data fetched from Facets database and sends out useful information to vendors.
Note: Facets is an end to end solution to manage all the information for health care industry. The facets database in my project was created with SQL server 2012.
We had different entities that were linked together. These entities were subscriber, member, healthcare provider, claim, bill, enrollment, group, eligibility, plan/product, commission, capitation, etc.
Q4. What are the different design schemas in Data Modelling? Explain with the example?
There are two different kinds of schema in data modeling:
- Star Schema
- Snowflake Schema
Now, I will be explaining each of these schemas one by one.
The simplest of the schemas is star schema where we have a fact table in the center that references multiple dimension tables around it. All the dimension tables are connected to the fact table. The primary key in all dimension tables acts as a foreign key in the fact table.
Q5. Which scheme did you use in your project &why?
The star schema is quite simple, flexible and it is in de-normalized form.
In a snowflake schema, the level of normalization increases. The fact table here remains the same as in star schema. However, the dimension tables are normalized. Due to several layers of dimension tables, it looks like a snowflake and thus it is named as snowflake schema.
Q6. Which schema is better – star or snowflake?
Since star schema is in de-normalized form, you require fewer joins for a query. The query is simple and runs faster in a star schema. Coming to the snowflake schema, since it is in normalized form, it will require a number of joins as compared to a star schema, the query will be complex and execution will be slower than star schema.Another significant difference between these two schemas is that snowflake schema does not contain redundant data and thus it is easy to maintain. On the contrary, star schema has a high level of redundancy and thus it is difficult to maintain.Now, which one to choose for your project? If the purpose of your project is to do more of dimension analysis, you should go for snowflake schema.
Q7. What do you understand by dimension and attribute?
Dimensions represent qualitative data. For Example, plan, product, class are all dimensions.A dimension table contains descriptive or textual attributes. For Example, the product category &product name are the attributes of the product dimension.
Q8. What is a fact &a fact table?
Facts represent quantitative data.
For Example: the net amount due is a fact. A fact table contains numerical data and foreign keys from related dimensional tables.
Q9. What are the different types of dimensions you have come across? Explain each of them in detail with an example?
There are typically five types of dimensions:
- Conformed dimensions: A Dimension that is utilized as a part of different areas are called a conformed dimension. It might be utilized with different fact tables in a single database or over numerous data marts/warehouses.
- Junk Dimension: It is a dimension table comprising of attributes that dont have a place in the fact table or in any of the current dimension tables. Generally, these are properties like flags or indicators.
- Role-Playing Dimension: These are the dimensions that are utilized for multiple purposes in the same database.
- For Example: a date dimension can be used for Date of Claim, Billing dateor Plan Term date. So, such a dimension will be called a Role-playing dimension. The primary key of the Date dimension will be associated with multiple foreign keys in the fact table.
- Slowly Changing Dimension (SCD): These are most important amongst all the dimensions. These are the dimensions where attribute values vary with time. Below are the varies types of SCDs
- Type-0: These are the dimensions where attribute value remains steady with time. For Example, Subscribers DOB is a type-0 SCD because it will always remain the same irrespective of the time.
- Type-1: These are the dimensions where the previous value of the attribute is replaced by the current value. No history is maintained in the Type-1 dimension. For Example, Subscribers address (where the business requires to keep the only current address of subscriber) can be a Type-1 dimension.
- Type-2: These are the dimensions where unlimited history is preserved. For Example, Subscribers address (where the business requires to keep a record of all the previous addresses of the subscriber). In this case, multiple rows for a subscriber will be inserted in the table with his/her different addresses. There will be some column(s) that will identify the current address. For Example, Start dateand End date. The row where End datevalue will be blank would contain the subscribers current address and all other rows will be having previous addresses of the subscriber.
- Type-3: These are the type of dimensions where limited history is preserved. And we use an additional column to maintain the history. For Example, Subscribers address (where the business requires to keep a record of current &just one previous address). In this case, we can dissolve the address column into two different columns – current addressand previous address. So, instead of having multiple rows, we will be having just one-row showing current as well as the previous address of the subscriber.
- Type-4: In this type of dimension, the historical data is preserved in a separate table. The main dimension table holds only the current data. For Example, the main dimension table will have only one row per subscriber holding its current address. All other previous addresses of the subscriber will be kept in the separate history table. This type of dimension is hardly ever used.
- Degenerated Dimension: A degenerated dimension is a dimension that is not a fact but presents in the fact table as a primary key. It does not have its own dimension table. We can also call it as a single attribute dimension table.
But, instead of keeping it separately in a dimension table and putting an additional join, we put this attribute in the fact table directly as a key. Since it does not have its own dimension table, it can never act as a foreign key in the fact table.
Q10. Give your idea regarding factless fact? And why do we use it?
Factless fact table is a fact table that contains no fact measure in it. It has only the dimension keys in it.
At times, certain situations may arise in the business where you need to have a factless fact table.
For Example, suppose you are maintaining an employee attendance record system, you can have a factless fact table having three keys.
You can see that the above table does not contain any measure. Now, if you want to answer the below question, you can do easily using the above single factless fact table rather than having two separate fact tables:
How many employees of a particular department were present on a particular day:
So, the factless fact table offers flexibility to the design.
Q11. Distinguish between OLTP and OLAP?
OLTP stands for the Online Transaction Processing System &OLAP stands for the Online Analytical Processing System. OLTP maintains the transactional data of the business &is highly normalized generally. On the contrary, OLAP is for analysis and reporting purposes &it is in de-normalized form.
This difference between OLAP and OLTP also gives you the way to choosing the design of schema. If your system is OLTP, you should go with star schema design and if your system is OLAP, you should go with snowflake schema.
Q12. What do you understand by data mart?
Data marts are for the most part intended for a solitary branch of business. They are designed for the individual departments.
For Example, I used to work for a health insurance provider company that had different departments in it like Finance, Reporting, Sales and so forth.
We had a data warehouse that was holding the information pertaining to all these departments and then we have few data marts built on top of this data warehouse. These DataMart were specific to each department. In simple words, you can say that a DataMart is a subset of a data warehouse.
Q13. What are the different types of measures?
We have three types of measures, namely:
- Non- additive measures
- Semi- additive measures
- Additive measures
- Non-additive measures are the ones on top of which no aggregation function can be applied. For Example, a ratio or a percentage column; a flag or an indicator column present in fact table holding values like Y/N, etc. is a non-additive measure.
- Semi- additive measures are the ones on top of which some (but not all) aggregation functions can be applied. For Example, fee rate or account balance.
- Additive measures are the ones on top of which all aggregation functions can be applied. For Example, units purchased.
Q14. What is a Surrogate key? How is it different from a primary key?
Surrogate Key is a unique identifier or a system-generated sequence number key that can act as a primary key. It can be a column or a combination of columns. Unlike a primary key, it is not picked up from the existing application data fields.
Q15. Is this true that all databases should be in 3NF?
It is not mandatory for a database to be in 3NF. However, if your purpose is the easy maintenance of data, less redundancy, and efficient access then you should go with a de-normalized database.
Q16. Have you ever came across the scenario of recursive relationships? If yes, how did you handle it?
A recursive relationship occurs in the case where an entity is related to itself. Yes, I have come across such a scenario.
Talking about the health care domain, it is a possibility that a health care provider (say, a doctor) is a patient to any other health care provider. Because, if the doctor himself falls ill and needs surgery, he will have to visit some other doctor for getting the surgical treatment.
So, in this case, the entity – health care provider is related to itself. A foreign key to the health insurance providers number will have to present in each members (patient) record.
Q17. List out a few common mistakes encountered during Data Modelling?
Few common mistakes encountered during Data Modelling are:
- Building massive data models: Large data models are like to have more design faults. Try to restrict your data model to not more than 200 tables.
- Lack of purpose: If you do not know that what is your business solution is intended for, you might come up with an incorrect data model. So having clarity on the business purpose is very important to come up with the right data model.
- Inappropriate use of surrogate keys: Surrogate key should not be used unnecessarily. Use surrogate key only when the natural key cannot serve the purpose of a primary key.
- Unnecessary de-normalization: Dont denormalize until and unless you have a solid &clear business reason to do so because de-normalization creates redundant data which is difficult to maintain.
Q18. What is the number of child tables that can be created out from a single parent table?
The number of child tables that can be created out of the single parent table is equal to the number of fields/columns in the parent table that are non-keys.
Q19. Employee health details are hidden from his employer by the health care provider. Which level of data hiding is this? Conceptual, physical or external?
This is the scenario of an external level of data hiding.
Q20. What is the form of fact table &dimension table?
Generally, the fact table is in normalized form and the dimension table is in de-normalized form.
Q21. What particulars you would need to come up with a conceptual model in a health care domain project?
For a health care project, below details would suffice the requirement to design a basic conceptual model.
Different categories of health care plans and products:
- Type of subscription (group or individual).
- Set of health care providers.
- Claim and billing process overview.
Q22. Tricky one: If a unique constraint is applied to a column then will it throw an error if you try to insert two nulls into it?
No, it will not throw any error in this case because a null value is unequal to another null value. So, more than one null will be inserted in the column without any error.
Q23. Can you quote an example of a sub-type and super-type entity?
Yes, lets say we have these different entities: vehicle, car, bike, economy car, family car, sports car.
Here, a vehicle is a super-type entity. Car and bike are its sub-type entities. Furthermore, economy cars, sports cars, and family cars are sub-type entities of its super-type entity- car.
A super-type entity is the one that is at a higher level. Sub-type entities are ones that are grouped together on the basis of certain characteristics. For Example, all bikes are two-wheelers and all cars are four-wheelers. And since both are vehicles, so their super-type entity is vehicle.
Q24. What is the significance of metadata?
Metadata is data about data. It tells you what kind of data is actually stored in the system, what is its purpose and for whom it is intended.
Q25. What is the difference between star flake and snow flake schema?
Well in star schema you just enter your desired facts and all the primary keys of your dimensional tables in Fact table. And fact tables primary is the union of its all dimension table key. In star schema dimensional tables are usually not in BCNF form.
Its almost like starschema but in this our dimension tables are in 3rd NF, so more dimensions tables. And these dimension tables are linked by primary, foreign key relation.
Q26. What is data sparsity and how it effect on aggregation?
Data sparsity is term used for how much data we have for a particular dimension/entity of the model.
It affects aggregation depending on how deep the combination of members of the sparse dimension make up. If the combination is a lot and those combination do not have any factual data then creating space to store those aggregations will be a waste as a result, the database will become huge.
Q27. What is the difference between hashed file stage and sequential file stage in relates to DataStage Server?
- In data stage server jobs,can we use sequential file stage for a lookup instead of hashed file stage.If yes ,then what’s the advantage of a Hashed File stage over sequential file stage
- search is faster in hash files as you can directly get the address of record directly by hash algorithm as records are stored like that but in case of sequential file u must compare all the records.
Q28. When should you consider denormalization?
Denationalization is used when there is a lot of tables involved in retrieving data. Denormalization is done in dimensional modelling used to construct a data warehouse. This is not usually done for databases of transnational systems.
Q29. What is ERD?
Data models are tools used in analysis to describe the data requirements and assumptions in the system from a top-down perspective. They also set the stage for the design of databases later on in the SDLC.
Q30. What is third normal form?
An entity is in the third normal form if it is in the second normal form and all of its attributes are not transitively dependent on the primary key. Transitive dependence means that descriptor key attributes depend not only on the whole primary key, but also on other descriptor key attributes that, in turn, depend on the primary key. In SQL terms, the third normal form means that no column within a table is dependent on a descriptor column that, in turn, depends on the primary key.
For 3NF, first, the table must be in 2NF, plus, we want to make sure that the non-key fields are dependent upon ONLY the PK, and not other non-key fields for its existence. This is very similar to to 2NF, except that now you are comparing the non-key fields to OTHER non-key fields. After all, we know that the relationship to the PK is good, because we established that in 2NF.
Get Dimensional Data Modeling Training from Real Time Experts
- Instructor-led Sessions
- Real-life Case Studies
Q31. What is an artificial (derived) primary key? When should it be used?
Using a name as the primary key violates the principle of stability. The social security number might be a valid choice, but a foreign employee might not have a social security number. This is a case where a derived, rather than a natural, primary key is appropriate. A derived key is an artificial key that you create. A natural key is one that is already part of the database.
Q32. What is second normal form?
An entity is in the second normal form if all of its attributes depend on the whole (primary) key. In relational terms, every column in a table must be functionally dependent on the whole primary key of that table. Functional dependency indicates that a link exists between the values in two different columns.
If the value of an attribute depends on a column, the value of the attribute must change if the value in the column changes. The attribute is a function of the column. The following explanations make this more specific:
If the table has a one-column primary key, the attribute must depend on that key.
If the table has a composite primary key, the attribute must depend on the values in all its columns taken as a whole, not on one or some of them.
If the attribute also depends on other columns, they must be columns of a candidate key; that is, columns that are unique in every row.
If you do not convert your model to the second normal form, you risk data redundancy and difficulty in changing data. To convert first-normal-form tables to second-normal-form tables, remove columns that are not dependent on the primary key.
Q33. What is Enterprise Data Modeling?
The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. This type of data modeling provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models.
Enterprise Data Modeling is sometimes called as a global business model and the entire information about the enterprise would be captured in the form of entities.
Q34. Logical V/s Physical Data Model?
When a data modeler works with the client, his title may be a logical data modeler or a physical data modeler or a combination of both. A logical data modeler designs the data model to suit business requirements, creates and maintains the lookup data, compare the versions of data model, maintains change log, generate reports from data model and whereas a physical data modeler has to know about the source and target databases properties.
A physical data modeler should know the technical-know-how to create data models from existing databases and to tune the data models with referential integrity, alternate keys, indexes and how to match indexes to SQL code. It would be good if the physical data modeler knows about replication, clustering and so on.
Q35. Relational vs Dimensional?
Relational Data Modeling is used in OLTP systems which are transaction oriented and Dimensional Data Modeling is used in OLAP systems which are analytical based. In a data warehouse environment, staging area is designed on OLTP concepts, since data has to be normalized, cleansed and profiled before loaded into a data warehouse or data mart. In OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in an OLAP environment like data warehouse.
|Relational Data Modeling||Dimensional Data Modeling|
|Data is stored in RDBMS||Data is stored in RDBMS or Multidimensional databases|
|Tables are units of storage||Cubes are units of storage|
|Data is normalized and used for OLTP. Optimized for OLTP processing||Data is denormalized and used in data warehouse and datamart. Optimized for OLAP|
|Several tables and chains of relationships among them||Few tables and fact tables are connected to dimensional tables|
|Volatile(several updates) and time variant||Non volatile and time-invariant|
|Detailed level of transactional data|
Q36. What is Data Modeling Development Cycle?
- Gathering Business Requirements – First Phase: Data Modelers have to interact with business analysts to get the functional requirements and with end users to find out the reporting needs.
- Conceptual Data Modeling(CDM) – Second Phase: This data model includes all major entities, relationships and it will not contain much detail about attributes and is often used in the INITIAL PLANNING PHASE.
- Logical Data Modeling(LDM) – Third Phase: This is the actual implementation of a conceptual model in a logical data model. A logical data model is the version of the model that represents all of the business requirements of an organization.
- Physical Data Modeling(PDM) – Fourth Phase: This is a complete model that includes all required tables, columns, relationship, database properties for the physical implementation of the database.
- Database – Fifth Phase: DBAs instruct the data modeling tool to create SQL code from physical data model. Then the SQL code is executed in server to create databases.
Standardization Needs and Modeling data:
Several data modelers may work on the different subject areas of a data model and all data modelers should use the same naming convention, writing definitions and business rules.
Nowadays, business to business transactions(B2B) are quite common, and standardization helps in understanding the business in a better way. Inconsistency across column names and definition would create a chaos across the business.
For example: when a data warehouse is designed, it may get data from several source systems and each source may have its own names, data types etc. These anomalies can be eliminated if a proper standardization is maintained across the organization.
Q37. What is a Table?
A table consists of data stored in rows and columns. Columns, also known as fields, show data in vertical alignment. Rows, also called a record or tuple, represent datas horizontal alignment.
Q38. What is Normalization?
Database normalization is the process of designing the database in such a way that it reduces data redundancy without sacrificing integrity.
Q39. What Does a Data Modeler Use Normalization For?
The purposes of normalization are:
- Remove useless or redundant data
- Reduce data complexity
- Ensure relationships between the tables in addition to the data residing in the tables
- Ensure data dependencies and that the data is stored logically.
Q40. What Are the Critical Relationship Types Found in a Data Model? Describe Them:
The main relationship types are:
- Identifying: A relationship line normally connects parent and child tables. But if a child tables reference column is part of the tables primary key, the tables are connected by a thick line, signifying an identifying relationship.
- Non-identifying: If a child tables reference column is NOT a part of the tables primary key, the tables are connected by a dotted line, signifying a no-identifying relationship.
- Self-recursive: A recursive relationship is a standalone column in a table connected to the primary key in the same table.
Q41. What is an Enterprise Data Model?
This is a data model that consists of all the entries required by an enterprise.
- Data Scientist Master’s Program
- In Collaboration with IBMEXPLORE COURSE
- Intermediate Interview Questions.
Q42. What Are the Most Common Errors You Can Potentially Face in Data Modeling?
These are the errors most likely encountered during data modeling:
- Building overly broad data models: If tables are run higher than 200, the data model becomes increasingly complex, increasing the likelihood of failure
- Unnecessary surrogate keys: Surrogate keys must only be used when the natural key cannot fulfill the role of a primary key
- The purpose is missing: Situations may arise where the user has no clue about the businesss mission or goal. Its difficult, if not impossible, to create a specific business model if the data modeler doesnt have a workable understanding of the companys business model
- Inappropriate denormalization: Users shouldnt use this tactic unless there is an excellent reason to do so. Denormalization improves read performance, but it creates redundant data, which is a challenge to maintain.
Q43. What is a Slowly Changing Dimension?
These are dimensions used to manage both historical data and current data in data-warehousing. There are four different types of slowly changing dimensions: SCD Type 0 through SCD Type 3.
Q44. What is Granularity?
Granularity represents the level of information stored in a table. Granularity is defined as high or low. High granularity data contains transaction-level data. Low granularity has low-level information only, such as that found in fact tables.
Q45. What Are Subtype and Supertype Entities?
Entities can be broken down into several sub-entities, or grouped by specific features. Each sub-entity has relevant attributes and is called a subtype entity. Attributes common to every entity are placed in a higher or super level entity, which is why they are called supertype entities.
Q46. Whats the Difference Between Forward and Reverse Engineering, in the Context of Data Models?
Forward engineering is a process where Data Definition Language (DDL) scripts are generated from the data model itself. DDL scripts can be used to create databases. Reverse Engineering creates data models from a database or scripts. Some data modeling tools have options that connect with the database, allowing the user to engineer a database into a data model.
Q47. Whats a Confirmed Dimension?
If a dimension is confirmed, its attached to at least two fact tables.
Q48. Why Are NoSQL Databases More Useful than Relational Databases?
NoSQL databases have the following advantages:
- They can store structured, semi-structured, or unstructured data
- They have a dynamic schema, which means they can evolve and change as quickly as needed
- NoSQL databases have sharding, the process of splitting up and distributing data to smaller databases for faster access
- They offer failover and better recovery options thanks to the replication
- Its easily scalable, growing or shrinking as necessary.
Q49. Whats a Junk Dimension?
This is a grouping of low-cardinality attributes like indicators and flags, removed from other tables, and subsequently junkedinto an abstract dimension table. They are often used to initiate Rapidly Changing Dimensions within data warehouses.
Q50. Name the possible type of a Data Model?
It could be physical data model and the logical data model, enterprise data model, conceptual data model, relational data model, OLTP data model etc.
Best Hands-on Practical Dimensional Data Modeling Course to Build Your SkillsWeekday / Weekend BatchesSee Batch Details
Q51. What is contained by the logical data model?
A logical data model contains entity, attributes, primary key, alternate key, Inversion keys, rule, definition, business relation etc.
- Learn SQL Server in the Easiest Way
- Learn from the videos
- Learn anytime anywhere
- Pocket-friendly mode of learning
- Complimentary eBook available
- Buy Self-learning at Discounted Price.
Q52. How will you differentiate logical data model from logical data modeling?
A logical data model is related to the business requirements and it is used for the actual implementation of the data. The approach that is used for creating a logical data model is called the logical data modeling.
Q53. What is an entity (Table)?
A database is consisting of multiple rows and columns which is called a table together. Further, each column has a specific datatype and based on conditions constraints are set of the columns.
Q54. What is an attribute (Column)?
A Column is defined as the vertical alignment of data and information stored for that particular column.
Q55. What is a row?
A row is the set of tuples, records, or it could be taken as the horizontal arrangement of the data.
Q56. What do you understand by the primary key constraint in a database?
The primary key constraint is set on a column to avoid null values or duplicate values. In simple words, a column containing unique items can be defined as the primary key constraint. It could be the bank number, security number or more.
Q57. What do you know about foreign key constraint?
The Primary key can be defined for the parent table and foreign key is set for the child table. The foreign key constraint always refers to the primary key constraint in the main table:
- SQL Server Training &Certification
- No cost for a Demo Class
- Industry Expert as your Trainer
- Available as per your schedule
- Customer Support Available
- Enrol For a Free Demo Class
- Advanced-Data Modeling Interview Questions.
Q58. Why composite word is added before any key constraint?
When the same constraint is enforced on multiple columns then the composite word is added before that particular key constraint.
Q59. Name a few popular relationships within a data model?
These are identifying, non-identifying, and self-recursive relationships in a data model.
Q60. What do you mean by the identifying relationships in a data model?
As you know the parent table and the child table both are connected together with a thin line. When the referenced column in a child table is a part of the primary key in the parent table then those relationship is drawn by a thick line and it is named as the identifying relationships in a data model.
Q61. Is there exists any non-identifying relationship too?
In most of the cases, a parent table and the child table both are connected together with a thin line. When the referenced column in a child table is not a part of the primary key in the parent table then those relationship is drawn by a dotted line and it is named as the non-identifying relationships in a data model.
Q62. How will you define the cardinality in a data model?
Cardinalities are used to define relationships and it could be one-to-one, one-to-many, or many-to-many etc. Higher the value of cardinality, there will be more unique values within a column.
Q63. What is a constraint? Why constraints are important for a database?
This is a rule imposed on the data. A different type of constraints could be unique, null values, foreign keys, composite key or check constraint etc.
Q64. Define unique constraint for a database?
This constraint is added to avoid duplicate values within a column.
Q65. Define the check constraint?
A check constraint is useful to define the range of values within a column.
Q66. What is an index in a database?
An Index is composed of a set of columns or a single column that is needed for fast retrieval of data.
Q67. What is the sequence?
A sequence could be defined as the database object that is needed for the creation of a unique number.
Q68. What is ETL process in data warehousing?
ETL is Extract Transform Load. It is a process of fetching data from different sources, converting the data into a consistent and clean form and load into the data warehouse. Different tools are available in the market to perform ETL jobs.
Q69.What are Data Model ?
When a enterprise logical data model is transformed to a physical data model, supertypes and subtypes may not be as is. i.e. the logical and physical structure of supertypes and subtypes may be entirely different. A data modeler has to change that according to the physical and reporting requirement.
When a enterprise logical data model is transformed to a physical data model, length of table names, column names etc may exceed the maximum number of the characters allowed by the database. So a data modeler has to manually edit that and change the physical names according to database or organizations standards.
One of the important things to note is the standardization of the data model. Since a same attribute may be present in several entities, the attribute names and data types should be standardized and a conformed dimension should be used to connect to the same attribute present in several tables.
Standard Abbreviation document is a must so that all data structure names would be consistent across the data model.
Q70. Explain the difference between data mining and data warehousing.
Data warehousing is merely extracting data from different sources, cleaning the data and storing it in the warehouse. Where as data mining aims to examine or explore the data using queries. These queries can be fired on the data warehouse. Explore the data in data mining helps in reporting, planning strategies, finding meaningful patterns etc.
E.g: a data warehouse of a company stores all the relevant information of projects and employees. Using Data mining, one can use this data to generate different reports like profits generated etc.
Q71. What is an OLTP system and OLAP system?
- OLTP: Online Transaction and Processing helps and manages applications based on transactions involving high volume of data. Typical example of a transaction is commonly observed in Banks, Air tickets etc. Because OLTP uses client server architecture, it supports transactions to run cross a network.
- OLAP: Online analytical processing performs analysis of business data and provides the ability to perform complex calculations on usually low volumes of data. OLAP helps the user gain an insight on the data coming from different sources (multi dimensional).
Q72. What is PDAP?
A data cube stores data in a summarized version which helps in a faster analysis of data. The data is stored in such a way that it allows reporting easily.
using a data cube A user may want to analyze weekly, monthly performance of an employee. Here, month and week could be considered as the dimensions of the cube.
Q73. What is snow flake scheme design in database?
A snowflake Schema in its simplest form is an arrangement of fact tables and dimension tables. The fact table is usually at the center surrounded by the dimension table. Normally in a snow flake schema the dimension tables are further broken down into more dimension table.
E.g: Dimension tables include employee, projects and status. Status table can be further broken into status_weekly, status_monthly.