1. What distinguishes data analytics from data science?
Ans:
- Data Science focuses on creating predictive models and using algorithms to forecast future trends.
- Data Analytics primarily involves examining historical data to extract insights and patterns.
- While Data Science is broader and includes machine learning Data Analytics is more about understanding past performance.
2. What are the key responsibilities of a Data Analyst?
Ans:
- Gather information, clean it up and arrange it from various sources.
- To find trends, patterns and useful insights, analyze data.
- Prepare reports and visualizations to help teams make informed decisions.
3. How should a dataset's missing data be handled?
Ans:
- Fill missing values using statistical methods like mean, median or mode.
- Remove records with missing information if they are few and don’t affect results.
- Predict missing values using algorithms based on other related data points.
4. What distinguishes structured data from unstructured data?
Ans:
Structured data is neatly organized in tables with rows and columns, like Excel sheets or databases. Unstructured data includes text files, images, videos and emails that do not have a fixed format.
5. What are the main phases of a data analysis project?
Ans:
The process generally includes understanding the problem, gathering relevant data, cleaning and preparing it, performing analysis, creating visual reports and sharing findings with stakeholders.
6. What differentiates supervised from unsupervised learning?
Ans:
Supervised learning uses labeled data to train models for prediction, while unsupervised learning identifies hidden pattern in unlabeled data without predefined categories.
7. What is cross-validation in machine learning?
Ans:
Cross-validation splits data into parts to train and test the model multiple times, ensuring the model’s performance is reliable and not overfitted.
8. What information does a confusion matrix provide?
Ans:
It compares actual versus predicted results in classification tasks, showing counts of true positives, true negatives, false positives and false negatives to evaluate model accuracy.
9. How do you choose important features from a dataset?
Ans:
Feature selection techniques include recursive elimination, model-based importance scoring and checking correlations to improve model performance and reduce complexity.
10. Can you explain how the K-Nearest Neighbors (KNN) algorithm works?
Ans:
KNN predicts the category or value of a new data point by looking at the ‘k’ closest points in the dataset and using majority voting or averaging their labels.