1. What are the most common tools used by data analysts?
Ans:
Data analysts often use Microsoft Excel for basic tasks, SQL to query databases and visualization tools like Power BI or Tableau. For advanced analysis, programming languages like Python (with Pandas and NumPy) and R are widely used.
2. How should a dataset’s missing data be handled?
Ans:
Missing data can be managed by removing affected rows or columns if few or by filling gaps using mean, median or mode values. In time series, forward or backward filling is common. More advanced methods include predictive modeling or marking missing values.
3. Describe how a database and a data warehouse differ from one another.
Ans:
A database stores current transactional data for quick operations while a data warehouse holds large volumes of historical, aggregated data designed for analysis, reporting and business intelligence.
4. What is the significance of data cleaning in data analysis?
Ans:
Data cleaning removes errors and differences to ensure the dataset is accurate and reliable. This step is essential to avoid misleading results and support trustworthy business decisions.
5. What is data normalization and why is it important?
Ans:
Data normalization organizes database data by reducing duplication and improving integrity. It splits data into related tables to maintain consistency and enable efficient querying.
6. How do you create a pivot table in Excel?
Ans:
In Excel, to construct a pivot table, choose your data range, go to the “Insert” tab and click “PivotTable.” Then drag fields into Rows, Columns, Values and Filters to dynamically summarize your data.
7. Can you explain what a join is in SQL and the different types of joins?
Ans:
Rows from two or more tables are combined using a SQL join based on related columns. INNER JOIN returns matching records; LEFT JOIN includes all left table records plus matches; RIGHT JOIN includes all right table records plus matches; SELF JOIN joins a table with itself; CROSS JOIN returns all possible combinations.
8. What is data visualization and why is it important in data analysis?
Ans:
Data visualization presents data through charts, graphs and dashboards. It helps quickly spot trends and patterns, making complex data easier to understand and decisions faster to make.
9. How do you perform data validation?
Ans:
Data validation ensures data accuracy by checking formats, ranges and rules. It can be done using Excel functions, SQL constraints, scripts or by cross verifying with original data sources.
10. Explain the concept of data modeling.
Ans:
Data modeling defines data is structured in a database by creating tables, fields, relationships and constraints. It ensures data is stored logically and consistently for efficient querying.