1. How does supervised learning differ from unsupervised learning?
Ans:
Supervised learning relies on labeled data, where each input comes with a known output. The model learns from these examples to make predictions on new data. In contrast, unsupervised learning deals with unlabeled data, letting the model discover patterns or structures by itself, such as clustering data or reducing dimensions, which is helpful when output labels aren’t provided.
2. What Is Overfitting In Machine Learning And How Can It Be Prevented?
Ans:
Overfitting occurs when a model memorizes the training data too closely, including noise and irrelevant details, which leads to poor performance on new, unseen data. To prevent this, you can use simpler models, apply regularization techniques (like L1 or L2), perform cross‑validation, and properly split data into training and test sets. Additionally, reducing model complexity and increasing the amount of training data can help the model generalize more effectively.
3. What Is A Confusion Matrix And Why Is It Useful For Classification Tasks?
Ans:
A confusion matrix evaluates the performance of a classification model by comparing predicted labels with actual labels. It tracks true positives, true negatives, false positives, and false negatives. These values are then used to calculate metrics like accuracy, precision, recall, and F1‑score, providing insight not only into overall correctness but also the types of errors the model makes.
4. What Is A Support Vector Machine (SVM) And When Do We Use It?
Ans:
A Support Vector Machine (SVM) is a supervised learning algorithm primarily used for classification, and occasionally for regression. It identifies the optimal hyperplane that separates data points of different classes with the largest possible margin. SVMs can manage both linear and non-linear data by applying kernel functions, which map data into higher-dimensional spaces useful when the data isn’t linearly separable.
5. What Are The Differences Between Traditional Machine Learning And Deep Learning?
Ans:
Traditional machine learning typically relies on manual feature extraction and is effective for simpler tasks using algorithms like linear regression, decision trees, or SVMs. Deep learning, on the other hand, employs multi-layered neural networks that automatically learn complex patterns from raw data, making it ideal for tasks like image recognition, natural language processing, and speech analysis. While deep learning demands more data and computational resources, it excels at handling complex problems.
6. Which Python Libraries Or Tools Are Commonly Used In Machine Learning And Why?
Ans:
Common Python libraries include Pandas and NumPy for data manipulation and numerical computations, scikit‑learn for traditional machine learning tasks like regression, classification, and clustering, and frameworks such as TensorFlow or PyTorch for deep learning and neural networks. These tools streamline data preparation, model training, evaluation, and deployment, making the development process faster and more efficient.
7. How Would You Handle Missing Or Corrupted Data In A Dataset Before Training A Model?
Ans:
Missing or corrupted data can be addressed by deleting the affected records, imputing values using the mean, median, or mode, or applying more advanced methods like interpolation or predictive imputation, depending on the situation. After cleaning, data is often normalized or scaled, and categorical features are encoded if necessary. Proper preprocessing ensures the model is trained on clean, consistent, and reliable data.
8. Explain Cross‑Validation And Why It Is Important In Model Evaluation.
Ans:
Cross‑validation is a method used to evaluate a model’s generalization ability by splitting the data into multiple folds. The model is trained on some folds and tested on the remaining fold(s), and this process is repeated for all fold combinations. This approach helps prevent overfitting and gives a more reliable estimate of how the model will perform on unseen data, ensuring that evaluation isn’t biased by a single train/test split.
9. What Is The Difference Between Precision And Recall, And Why Are Both Important?
Ans:
Precision indicates the proportion of predicted positive cases that are actually positive, whereas recall measures the proportion of actual positive cases correctly identified by the model. Precision is crucial when false positives carry a high cost, and recall is key when false negatives are costly. Balancing both is important, as improving one can often reduce the other, and the optimal trade-off depends on the specific problem.
10. How Can A Machine Learning Model Be Deployed For Real-World Use After Training?
Ans:
Once a model is trained and validated, it can be deployed by packaging it and exposing it through tools like REST APIs or web frameworks such as Flask or FastAPI. The model is hosted on a server or cloud platform, allowing applications to send data and receive predictions in real time. Continuous monitoring and version control ensure the model stays reliable and up-to-date after deployment.