1. What are the most common tools used by data analysts?
Ans:
Data analysts commonly use tools like Microsoft Excel for basic analysis, SQL for database queries and visualization tools like Power BI or Tableau. For deeper analysis, programming languages such as Python (with Pandas and NumPy) and R are popular.
2. How should a dataset’s missing data be handled?
Ans:
One way to deal with missing data is to eliminate the impacted rows or columns if the impact is small or by filling gaps using mean, median or mode. For time series, forward or backward filling is used. Advanced methods involve predictive modeling or flagging missing data.
3. Describe how a database and a data warehouse differ from one another.
Ans:
A database stores real-time transactional data for quick operations, while a data warehouse holds large amounts of historical and aggregated data optimized for analysis, reporting and business intelligence.
4. What is the significance of data cleaning in data analysis?
Ans:
Data cleaning ensures the dataset is accurate and consistent which is crucial for reliable analysis. Clean data prevents misleading results and make trustworthy business decisions.
5. What is data normalization and why is it important?
Ans:
Data normalization organizes data in the databases to eliminate duplication and improve integrity. It breaks data into related tables, maintaining consistency and enabling efficient queries.
6. How do you create a pivot table in Excel?
Ans:
In Excel, to construct a pivot table, choose your data range, go to the “Insert” tab and choose “PivotTable.” Then place fields in Rows, Columns, Values and Filters to summarize data dynamically.
7. Can you explain what a join is in SQL and the different types of joins?
Ans:
A SQL join merges two or more rows together tables. INNER JOIN returns matching records, LEFT JOIN includes all left table records plus matches, RIGHT JOIN does the same for the right table, SELF JOIN joins a table to itself and CROSS JOIN returns all combinations.
8. What is data visualization and why is it important in data analysis?
Ans:
Data visualization uses charts, graphs and dashboards to represent data visually. It helps identify trends and patterns quickly, making data easier to understand and decisions faster.
9. How do you perform data validation?
Ans:
Data validation checks that data is correct and fits rules like formats or ranges. It can be done using Excel functions, SQL constraints or validation scripts and by comparing with original data sources.
10. Explain the concept of data modeling.
Ans:
Data modeling defines the structure of a database by setting up tables, fields, relationships and constraints. It ensures data is stored logically, consistently and can be queried efficiently.