1. What are the key duties of a Data Analyst?
Ans:
A data analyst collects, processes, and interprets data to help organizations make informed decisions. They clean datasets, generate reports, and use analytical tools to identify trends and extract meaningful insights.
2. How proficient are you with SQL? Can you share a basic SQL query to fetch data from a database?
Ans:
I’m proficient in SQL and can write effective queries. Example: SELECT name, age FROM employees WHERE department = 'Sales'; This fetches names and ages from the Sales department.
3. What measures do you take to maintain data quality during analysis?
Ans:
I look for missing values, duplicates, incorrect data types, and outliers, applying validation checks. Additionally, I cross-verify data with original sources to ensure accuracy.
4. What is data cleaning, and why is it important?
Ans:
Data cleaning involves fixing or removing incorrect, inconsistent, or incomplete data. It is crucial for ensuring the reliability of analysis results and drawing valid conclusions.
5. Which tools do you typically use for data analysis?
Ans:
Common tools include Excel, SQL, Python (Pandas, NumPy), R, Power BI, Tableau, and Google Sheets.
6. How do a primary key and a foreign key differ in SQL?
Ans:
- Use rate limiting
- A primary key uniquely identifies each record in a table.
- A foreign key links one table to another by referencing the primary key.
7. How do you address missing or incomplete data in a dataset?
Ans:
- Deleting incomplete records.
- Imputing missing values using mean, median, or mode.
- Applying predictive models for imputation.
8. Can you explain what data normalization means?
Ans:
Normalization organizes a database to reduce redundancy by splitting data into related tables and defining relationships via foreign keys.
9. What is a pivot table, and how is it used in Excel?
Ans:
A pivot table allows quick summarization and reorganization of data by grouping, filtering, and aggregating. It is useful for analyzing large datasets like sales by region.
10. What is the difference between causation and correlation?
Ans:
Correlation means two variables move together but do not necessarily influence each other. Causation means one variable directly causes changes in another.
11. Why is data visualization crucial in data analysis?
Ans:
Visualization makes complex data easier to understand, uncovers patterns, and helps stakeholders grasp insights quickly through charts, graphs, and dashboards.
12. How would you describe a complex data analysis project to a non-technical audience?
Ans:
I would use simple language, focus on the business impact, support explanations with visuals, and avoid jargon to highlight what the results mean for their goals.
13. What is regression analysis? Can you name some types of regression?
Ans:
Regression estimates relationships between variables:
- Regression examines relationships between variables.
- Linear regression predicts continuous outcomes.
- Logistic regression predicts binary outcomes.
- Multiple regression uses several predictors.
- Polynomial regression models nonlinear trends.
14. What does the ETL process involve?
Ans:
ETL stands for Extract, Transform, Load. It refers to:
- ETL stands for Extract, Transform, Load.
- Extract data from different sources.
- Transform it to fit analysis requirements.
- Load it into a data warehouse or database.
15. How do you ensure data quality throughout the analysis?
Ans:
By performing comprehensive data cleaning, handling missing data and outliers, verifying consistency, and validating sources through cross-checking with reliable references.