1. What is data science and how is it different from data analytics?
Ans:
The process of collecting, analyzing, and interpreting large data sets using various instruments and methods is known as data science. Data analytics focuses more on analyzing existing data to find trends and solve problems. Data science is broader and includes data analytics, machine learning, and predictive modeling.
2. What does a data scientist do in a company?
Ans:
A data scientist helps companies make smart decisions by analyzing data, creating models, and finding useful patterns. They help solve business problems using data.
3. What’s the difference between structured and unstructured data?
Ans:
Structured data is the organized in rows and columns such as Excel sheets or databases. Unstructured data doesn’t follow a clear format like emails, images, or videos.
4. What are a data science projects key steps?
Ans:
- Understanding the problem
- Collecting data
- Cleaning data
- Analyzing it
- Building models
- Interpreting the results
5. How do you deal with missing values in data?
Ans:
You can remove rows with missing values, fill them with the average or most common value, or predict them using other data.
6. What differentiates supervised learning and unsupervised learning?
Ans:
Supervised learning uses the labeled data to train the model (we know the correct answers). Unsupervised learning finds patterns in data without labels.
7. What is cross-validation?
Ans:
It’s a method to test if your model works well on different data by splitting data into parts and testing the model on each part.
8. What is a confusion matrix?
Ans:
It’s a table that shows how well a classification model performed. It includes true positives, true negatives, false positives, and false negatives.
9. How do you choose which features are important?
Ans:
You can use techniques like correlation, importance scores from models, or removing features one by one to see their impact.
10. How does KNN (k-nearest neighbors) work?
Ans:
KNN finds the closest data points to a new point and predicts its value based on them. It’s like asking nearby neighbors for advice.
11. How do decision trees work?
Ans:
A decision tree splits data into branches based on questions. Each step leads to a decision until the final result is reached.
12. What is SVM and where is it used?
Ans:
Support Vector Machine (SVM) is a model that finds the best line or boundary to separate different classes in data. It’s used in image recognition, spam detection, etc.
13. How does Naive Bayes work?
Ans:
Naive Bayes predicts outcomes based on past data using simple probabilities. It assumes all features are independent.
14. What is k-means clustering used for?
Ans:
K-means groups similar data points together into clusters. It’s useful for customer segmentation, grouping similar users, etc.
15. Describe the neural network.
Ans:
A neural network is the model that inspired by the human brain. It takes inputs, processes them through layers, and gives an output. It is used in image and voice recognition.
16. Describe the neural network.
Ans:
These are techniques that combine many models to improve accuracy. Examples are Random Forest and Gradient Boosting.
17. How do you manage outliers in data?
Ans:
You can remove, transform, or adjust outliers. Sometimes, they are important and need to be studied carefully.
18. What are some ways to scale features?
Ans:
Scaling means bringing all values to the same range using methods like normalization or standardization.
19. What is one-hot encoding?
Ans:
It converts categories into numbers using 0s and 1s so that machine learning models can understand them.
20. Why is feature selection important?
Ans:
It helps remove unnecessary data, reduces time and improves the model performance by focusing only on the important parts.