1. What is Data Science?
Ans:
Data Science is the process of using data to understand problems, discover patterns and support better decisions. It blends math, programming, statistics and domain knowledge to turn raw data into meaningful insights.
2. What constitutes data science essential elements?
Ans:
The key parts of Data Science include collecting data from sources, cleaning messy or missing values, analyzing the data, building models to make predictions and interpreting the results to guide decisions or actions.
3. What is a confusion matrix?
Ans:
A tool called a confusion matrix is used to measure well a classification model performs. It shows many predictions were correct or incorrect, including true positives, true negatives, false positives and false negatives.
4. What are some common metrics used to evaluate model performance?
Ans:
Common evaluation metrics include accuracy (overall correctness), precision (correct positive predictions), recall (well it finds all positives), F1 score (balance of precision and recall) and ROC-AUC (model’s ability to separate classes).
5. What is feature engineering?
Ans:
Feature engineering is the process of improving data by adding new features or changing ones that already exist. It helps machine learning models make better predictions by highlighting the most useful information in the data.
6. How do you handle missing data?
Ans:
Missing data can be managed by removing rows or columns, filling values using the mean, median or mode, using models that can handle missing values or predicting missing entries using other available information.
7. What is overfitting and how can it be prevented?
Ans:
A model that performs well on training data is said to be overfit fails on new data because it learns too many details, including noise. To prevent it, use simpler models, cross-validation, regularization or more training data.
8. What is a random forest and how does it work?
Ans:
A machine learning technique called random forest makes use of many decision trees. It builds trees on random parts of the data and combines their predictions. This improves accuracy and reduces the chance of overfitting.
9. Describe the steps in the Data Science workflow.
Ans:
The workflow starts with defining the problem, collecting and cleaning data, exploring patterns, building and testing models and finally deploying the solution and monitoring its performance for improvements.
10. How do you ensure the quality of your data?
Ans:
To ensure data quality, remove duplicates, correct errors, handle missing values, standardize formats and verify the trustworthiness of data sources. Clean and reliable data leads to better model performance.