1. How are supervised and unsupervised learning techniques different?
Ans:
Supervised learning uses labeled datasets where each input has a known output, allowing the model to learn patterns for predictions. Unsupervised learning works with unlabeled data to uncover hidden structures, relationships, or clusters without predefined outcomes.
2. What is overfitting, and how can it be prevented in models?
Ans:
Overfitting occurs when a model memorizes training data, including noise, resulting in poor performance on new data. It can be reduced by simplifying the model, applying L1/L2 regularization, increasing training data, using cross-validation, or reducing model complexity.
3. How is a confusion matrix used in machine learning?
Ans:
A confusion matrix evaluates classification models by comparing predicted labels to actual ones. It shows true positives, true negatives, false positives, and false negatives, helping calculate metrics like accuracy, precision, recall, and F1-score to assess performance.
4. What is a Support Vector Machine (SVM), and when is it applied?
Ans:
SVM is a supervised learning algorithm mainly for classification, occasionally used for regression. It finds the optimal hyperplane separating classes with the largest margin. Kernel functions allow SVM to handle non-linear data in higher-dimensional spaces.
5. How does deep learning differ from classical machine learning?
Ans:
Traditional ML requires manual feature extraction and works well on simpler tasks using models like linear regression or decision trees. Deep learning uses multi-layered neural networks to automatically learn complex patterns, excelling in image recognition, NLP, and audio processing.
6. Which Python libraries are most useful for AI/ML, and why?
Ans:
Pandas and NumPy help with data manipulation and numeric calculations, while scikit-learn provides traditional ML algorithms. TensorFlow and PyTorch support deep learning. Together, they simplify preprocessing, training, evaluation, and deployment.
7. How should missing or inconsistent data be handled before modeling?
Ans:
Missing or corrupted data can be addressed by deleting rows, imputing values with mean, median, or mode, or using predictive imputation. After cleaning, data may be normalized, scaled, and encoded to prepare it for effective model training.
8. What is cross-validation, and why is it important?
Ans:
Cross-validation evaluates a model’s generalization by splitting data into multiple folds. The model trains on some folds and tests on others in rotation, minimizing overfitting and providing a more reliable estimate of performance on unseen data.
9. How do precision and recall differ, and why are both necessary?
Ans:
Precision measures how many predicted positives are actually correct, while recall shows the proportion of actual positives correctly identified. Precision matters when false positives are costly, recall when missing positives is risky; both ensure balanced model performance.
10. How is a machine learning model implemented in real-world applications?
Ans:
After training and validation, models are deployed using frameworks like Flask, FastAPI, or REST APIs. Hosted on servers or cloud platforms, they receive input data and return predictions in real-time, with monitoring and version control to maintain reliability.