1. How do supervised and unsupervised learning approaches differ?
Ans:
Supervised learning predicts outcomes using labeled data, where inputs are paired with known outputs. Unsupervised learning, in contrast, works with unlabeled datasets to identify hidden groupings, trends, or patterns. Essentially, one forecasts results, the other discovers structure.
2. What is model overfitting, and how can it be avoided?
Ans:
Overfitting happens when a model becomes too tailored to the training dataset, including noise, reducing performance on new data. It can be minimized by simplifying model architecture, applying regularization, using cross-validation, increasing dataset size, or stopping training early.
3. How is a confusion matrix used to evaluate models?
Ans:
A confusion matrix compares predicted labels with actual outcomes in classification tasks. It highlights true positives, true negatives, false positives, and false negatives, enabling computation of performance metrics like accuracy, precision, recall, and F1-score.
4. What is a Support Vector Machine, and when is it applicable?
Ans:
A Support Vector Machine (SVM) is a supervised algorithm that determines the best boundary separating classes by maximizing the margin. With kernel tricks, it can handle non-linear datasets. It’s particularly effective for classification tasks with clear or complex separations.
5. How does deep learning differ from conventional machine learning?
Ans:
Conventional ML relies on manually crafted features and performs well on structured data. Deep learning uses multi-layered neural networks to automatically extract intricate patterns from raw data, making it suitable for images, text, audio, and other unstructured data.
6. Which Python libraries are essential for AI/ML workflows, and why?
Ans:
Python is preferred for its readability and rich ecosystem. Libraries like Pandas and NumPy assist with data handling, scikit-learn provides classical ML algorithms, and TensorFlow/PyTorch support deep learning, simplifying tasks from preprocessing to model deployment.
7. How should missing or inconsistent data be managed before training?
Ans:
Incomplete or noisy data can be addressed by removing affected entries, imputing values with mean, median, or predictive approaches, and encoding or scaling features. Proper preprocessing ensures the model learns accurately without bias from poor-quality data.
8. What is the role of cross-validation in machine learning?
Ans:
Cross-validation divides data into multiple subsets, training the model on some and validating on others in rotation. This prevents overfitting, improves generalization, and provides a more reliable measure of model performance on unseen data.
9. How do precision and recall differ, and why is balancing them important?
Ans:
Precision measures the proportion of predicted positives that are correct, while recall measures how many actual positives are identified. Balancing both is critical because focusing solely on one can compromise the other, impacting overall model effectiveness.
10. How is a trained AI/ML model implemented in real-world systems?
Ans:
Once trained and tested, models are deployed via APIs or frameworks like Flask, FastAPI, or cloud services. Applications send input data to the model to generate predictions in real time, while continuous monitoring ensures accuracy and stability over time.