1. What does a data analyst do?
Ans:
A data analyst collects, processes, and interprets data to find patterns, trends, and insights that help guide business decisions. They transform raw data into actionable intelligence through reporting, visualization, and analysis. Their goal is to help stakeholders make smart, data-driven decisions based on facts rather than assumptions.
2. How do you ensure the data you work with is accurate and trustworthy?
Ans:
I ensure data accuracy by performing thorough data validation, removing duplicates, checking for inconsistencies, and handling missing or outlier values. I use techniques like data profiling and cross-checking with known sources. I also follow data governance practices and apply business rules to maintain high-quality, reliable data.
3. What is data cleaning and why do we need it?
Ans:
Data cleaning, also known as data scrubbing, is the process of detecting and correcting errors, inconsistencies, or missing values in datasets. Clean data ensures more accurate analysis, prevents misleading results, and helps businesses make reliable decisions. Without data cleaning, even advanced models can produce flawed outputs.
4. What tools do you use for working with data?
Ans:
- Excel for simple tasks
- SQL for databases
- Power BI or Tableau for visuals
- Python for deeper analysis
5. What’s the difference between a primary key and a foreign key in SQL?
Ans:
A primary key uniquely identifies each record in a table and ensures that no duplicate or null values exist in that column. A foreign key is a field in one table that links to the primary key in another, helping to establish relationships between tables. This allows structured data storage and relational database integrity.
6.How do you handle missing or incomplete data in a dataset?
Ans:
I start by analyzing the extent and pattern of the missing data. Depending on the situation, I might:
- Remove the rows or columns if the missing data is minimal.
- Fill in values using the mean, median, mode, or a placeholder.
- Use predictive models or domain-specific logic to estimate missing values.
- Flag missing values for further review in reporting.
7.Can you explain data normalization in simple terms?
Ans:
Data normalization is the process of organizing data in a database to reduce redundancy and improve efficiency. It involves dividing large tables into smaller ones and linking them through relationships. This ensures that each piece of information is stored only once, which saves space and improves data consistency and scalability.
8. What is a pivot table in Excel and how do you use it?
Ans:
A pivot table in Excel is a powerful tool that lets you quickly summarize and analyze large datasets. I use pivot tables to group, count, sum, or average data based on different categories or fields. It's especially useful for generating dynamic reports, comparisons, and identifying patterns or trends in data.
9. What’s the difference between correlation and causation?
Ans:
Correlation occurs when two variables show a relationship or move together in some way for example, as one increases, the other might also increase. Causation, on the other hand, means that one variable directly causes a change in the other. Just because two things are correlated doesn’t mean one causes the other this distinction is key in analysis.
10. Why is data visualization important?
Ans:
Data visualization transforms complex data into visual formats like charts, graphs, or dashboards, making it easier to understand and interpret. It helps non-technical stakeholders grasp trends, outliers, and key insights at a glance. Effective visualizations also aid in faster decision-making, storytelling, and communicating the value of data clearly and impactfully.