1. What makes data science different from traditional data analysis?
Ans:
Data Science goes beyond analyzing past data by using advanced tools like Python, machine learning and statistics to uncover insights and predict future trends. Unlike traditional analysis, which focuses mostly on reporting historical results, Data Science solves complex problems and supports decision-making with predictive models and algorithms.
2. How is supervised learning different from unsupervised learning?
Ans:
Supervised learning works with labeled data, meaning both input and correct output are known, similar to learning under guidance. Conversely, unsupervised learning works with unlabeled data and identifies hidden patterns or clusters, like grouping similar customers without prior knowledge of categories.
3. What is overfitting in machine learning and how can it be avoided?
Ans:
Overfitting happens a model learns the training data too precisely, including noise, which reduces accuracy on new data. To prevent it, techniques like cross-validation, simplifying the model or using regularization are applied, helping the model generalize well to unseen data.
4. Can you explain the bias-variance tradeoff?
Ans:
Bias is the error from wrong assumptions in a model, while variance is the error from being too sensitive to small changes in data. A balanced model avoids high bias (underfitting) and high variance (overfitting), ensuring accurate predictions for new datasets.
5. What are the main differences between R and Python in data science?
Ans:
Python is versatile, great for machine learning, application development and production-level solutions. R, however, excels at statistical analysis, visualization and academic research, making it ideal for detailed statistical computations and reporting.
6. How do you handle missing data in datasets?
Ans:
Missing data can be managed by removing incomplete rows, filling gaps with mean, median or mode values or using techniques like interpolation or predictive modeling. Proper handling ensures models learn accurately and avoid introducing bias.
7. What is feature engineering?
Ans:
Feature engineering involves creating new input variables from existing data to improve model performance. It includes cleaning, transforming and combining features so that models can better capture patterns and make more accurate predictions.
8. What is the difference between classification and regression problems?
Ans:
Classification predicts categorical outcomes, such as whether an email is spam or not, while regression predicts continuous values like house prices or temperature. Both are supervised learning tasks, but the choice depends on whether the target variable is categorical or numeric.
9. What is a confusion matrix in classification?
Ans:
A confusion matrix assesses well a categorization model performs by comparing actual versus predicted outcomes. It breaks predictions into true positives, true negatives, false positives and false negatives, helping to assess accuracy and error types.
10. What are precision and recall?
Ans:
Precision shows the percentage of correctly predicted positive cases out of all predicted positives, while recall measures many actual positive cases the model identified correctly. Together, they indicate a model’s effectiveness in making accurate predictions.