1. How do predictive and pattern-discovery learning differ?
Ans:
Predictive learning (supervised) uses labeled data, pairing inputs with known outputs to forecast results. Pattern-discovery learning (unsupervised) works on unlabeled data to uncover hidden structures, clusters, or trends. Essentially, supervised predicts outcomes, while unsupervised explores underlying patterns.
2. What is overfitting, and how can it be mitigated?
Ans:
Overfitting occurs when a model memorizes the training data, including noise, resulting in poor performance on new data. It can be controlled by simplifying the model, applying L1/L2 regularization, using cross-validation, increasing training data, or stopping training early to prevent memorization.
3. How is a confusion matrix applied in model evaluation?
Ans:
A confusion matrix compares predicted outcomes with actual labels for classification problems. It shows true positives, true negatives, false positives, and false negatives, allowing calculation of accuracy, precision, recall, and F1-score to assess model performance comprehensively.
4. What is a Support Vector Machine (SVM) and when should it be used?
Ans:
SVM is a supervised learning algorithm that identifies the optimal boundary separating classes with maximum margin. Kernel functions allow it to handle non-linear relationships. It is best suited for classification tasks with clear or complex decision boundaries.
5. How does traditional machine learning differ from deep learning?
Ans:
Traditional ML relies on manually engineered features and works well on structured datasets. Deep learning uses multi-layer neural networks to automatically extract complex features from raw data, making it ideal for unstructured inputs like images, text, and audio.
6. Which Python libraries are most useful for AI/ML, and why?
Ans:
Python is widely used for AI/ML due to its simplicity and rich ecosystem. Pandas and NumPy facilitate data manipulation, scikit-learn provides classical ML algorithms, and TensorFlow or PyTorch handle deep learning, making preprocessing, modeling, and evaluation more efficient.
7. How should incomplete or inconsistent data be handled before modeling?
Ans:
Missing or corrupted data can be handled by removing affected records, imputing values using mean, median, or mode, or using predictive methods. Scaling and encoding features afterward ensures the dataset is clean and ready for effective model training.
8. What is cross-validation, and why is it important?
Ans:
Cross-validation splits data into multiple folds, training the model on some folds and testing on others in rotation. It helps prevent overfitting, ensures better generalization, and provides a more reliable estimate of model performance on unseen data.
9. How do precision and recall differ, and why are both metrics necessary?
Ans:
Effective product launches require meticulous planning and coordination across all teams involved. A detailed launch plan is created, outlining timelines, responsibilities, and risk mitigation strategies. Post-launch evaluations review performance, capture lessons learned, and inform improvements for future releases, ensuring continuous refinement of processes.
10. How can a machine learning model be deployed in practical applications?
Ans:
Trained models can be deployed using REST APIs or frameworks like Flask and FastAPI on servers or cloud platforms. Applications send input data to the model for real-time predictions, while monitoring ensures consistent performance and accuracy over time.