1. What is Data Science?
Ans:
Data analysis is the practice of data science to understand problems, discover patterns and support better decision-making. It blends knowledge from mathematics, computer science, statistics and competence in the field to transform unprocessed data into meaningful insights.
2. Which components are fundamental to data science?
Ans:
Data Science involves several key steps collecting data from various sources, cleaning it by correcting or removing errors, analyzing it to uncover trends, building predictive models using algorithms and interpreting the results to guide decisions.
3. What is a confusion matrix?
Ans:
A table that is used to assess well an machine learning model is called a confusion matrix. It categorizes predictions into true positives and true negatives correct predictions, as well as false positives and false negatives errors, helping measure accuracy.
4. What metrics are commonly used to evaluate model performance?
Ans:
Model performance is often measured using accuracy (often the model is correct), precision (correctness of positive predictions), recall (ability to find actual positives) and the F1 score, which weighs recall and precision. The ROC-AUC metric shows well the model separates different classes.
5. What is feature engineering?
Ans:
The process of feature engineering involves developing or modifying input data features to improve a machine learning model’s performance. It involves selecting, transforming and combining data attributes to help the model make better predictions.
6. How do you handle missing data?
Ans:
Missing data can be managed by removing incomplete rows or columns, filling gaps with averages or most common values, using models that handle missing data or predicting missing values based on other information.
7. How may overfitting be avoided and what does it mean?
Ans:
Overfitting happens when a model learns noise and details from training data too well, causing poor results on new data. To prevent it, you can use simpler models, apply cross-validation, use regularization methods or increase the amount of training data.
8. What is a random forest and how does it work?
Ans:
One machine learning technique called a random forest builds many decision trees using random data samples. It combines their predictions to improve accuracy and reduce overfitting, making it a reliable and powerful tool.
9. What are the steps in the Data Science workflow?
Ans:
The typical workflow includes defining the problem, gathering data, cleaning and preparing it, exploring the data for insights, building and training models, testing their effectiveness and finally deploying and monitoring the results.
10. How do you ensure the quality of your data?
Ans:
Data quality is maintained by removing duplicates, correcting errors, handling missing values, standardizing data formats and verifying that data sources are reliable and trustworthy.