1. What are the most common tools used by data analysts?
Ans:
Data analysts work with a range of tools depending on the type of task and business need. Microsoft Excel is commonly used for basic tasks like filtering, sorting and creating simple charts. Visualization tools such as Power BI and Tableau are popular for building interactive dashboards and reports. Programming languages such as R and Python are useful for more complex analysis are widely used to perform statistical operations, automate processes and handle large datasets.
2. How should a dataset’s missing data be handled?
Ans:
An essential part of data preparation is handling missing data. If there is little to no missing data and it has little bearing on the final results, it can be safely removed. Otherwise, missing values are often filled in using averages like mean, median or mode or by applying forward fill or backward fill for time-based data. In more complex cases, machine learning techniques are used to estimate the missing values or the gaps are flagged for further investigation.
3. Describe how a database and a data warehouse differ from one another.
Ans:
A database is designed for day-to-day operations and stores current, transactional data used in applications like banking or e-commerce. It allows quick data retrieval and updates. In contrast, a data warehouse is built for long-term storage and analysis of historical data collected from various sources. It supports advanced queries, reporting and business intelligence tasks, making it ideal for trends and forecasting analysis.
4. What is the significance of data cleaning in data analysis?
Ans:
Data cleaning is essential to ensure the information used in analysis is reliable, consistent and free from errors. Inaccurate or duplicated data can distort results and lead to poor decisions. Cleaning the data enhances the quality of insights, strengthens report accuracy and ensures that analysis supports decision making within the organization.
5. What is data normalization and why is it important?
Ans:
Data normalization is the process of organizing data related database to minimize redundancy and maintain data integrity. It involves splitting large tables into smaller related ones and linking them using foreign keys. This approach ensures that the data remains consistent, avoids duplication and improves the performance and clarity of database queries.
6. How do you create a pivot table in Excel?
Ans:
To create a pivot table in Excel, First, decide whatever dataset you wish to examine. Next, select "Insert" from the menu “PivotTable.” You can decide to place it in a new worksheet or the same one. Once created, you can drag fields into rows, columns, values and filters to generate summaries, compare metrics and gain insights quickly without writing formulas.
7. Can you explain what a join is in SQL and the different types of joins?
Ans:
A join in SQL is used to combine depending on a linked column, information from two or more tables. An INNER JOIN returns only records with matching values in both tables. A LEFT JOIN returns all records from the left table and matched data from the right. A RIGHT JOIN does the opposite, bringing all from the right and matching from the left. A SELF JOIN connects a table to itself, while a CROSS JOIN pairs every row from one table with every row from another, producing a Cartesian product.
8. What is data visualization and why is it important in data analysis?
Ans:
Data visualization involves representing data in visual formats like charts, graphs or dashboards. It allows analysts and business users to quickly see trends, spot anomalies and understand relationships within data. By making complex information easier to interpret and share, visualization supports quicker, clearer and more effective decision-making.
9. How do you perform data validation?
Ans:
Data validation ensures that data meets the required quality standards before analysis. It involves checking that data follows correct formats, is within expected ranges and adheres to business rules. Tools like Excel formulas, SQL constraints, scripts or ETL platforms help perform these checks. Cross-referencing the data with original sources is also a key step in confirming its accuracy and consistency.
10. Explain the concept of data modeling.
Ans:
Data modeling is the process of designing the logical structure of a database. It defines data is stored, organized and connected through tables, fields and relationships. Good data modeling ensures consistency, accuracy and efficient querying, helping developers and analysts maintain data integrity and streamline access to information across systems.