1. What is the role of a machine learning classifier, and how does it function?
Ans:
A classifier is an AI model that categorizes input data into specific groups. It learns from examples with known labels during training and predicts categories for unseen data. For example, it can distinguish spam emails from legitimate ones by recognizing patterns in prior messages.
2. How do bagging and boosting differ in ensemble machine learning?
Ans:
Bagging builds multiple independent models of the same type and combines their outputs to reduce variance. Boosting, on the other hand, trains models sequentially, where each model focuses on correcting mistakes made by the previous ones, reducing bias and improving accuracy for challenging datasets.
3. How does supervised learning differ from unsupervised learning?
Ans:
Supervised learning relies on labeled datasets to map inputs to outputs, allowing predictions on new data. Unsupervised learning works with unlabeled data, discovering hidden structures, groupings, or patterns without prior guidance. The method chosen depends on whether known outcomes are available.
4. What does the bias-variance tradeoff represent in model development?
Ans:
The bias-variance tradeoff describes the balance between underfitting and overfitting. High bias leads to overly simple models that miss patterns, while high variance results in models sensitive to noise in training data. The objective is to achieve good generalization on unseen data.
5. How are K-Nearest Neighbors (KNN) and K-Means clustering distinct?
Ans:
KNN is a supervised algorithm that predicts the class of a new point based on the closest labeled neighbors. K-Means is unsupervised, grouping data into clusters based on similarity. KNN needs labeled training data, whereas K-Means can discover patterns without labels.
6. What is overfitting in machine learning, and how can it be prevented?
Ans:
Overfitting happens when a model memorizes the training dataset, including noise, resulting in poor predictions on new data. Prevention techniques include cross-validation, regularization, simplifying the model, or increasing the training dataset to improve generalization.
7. Which programming languages or libraries are ideal for AI/ML, and why?
Ans:
Python is preferred due to its readability and extensive library ecosystem. Pandas and NumPy facilitate data manipulation, scikit-learn offers classic ML algorithms, and TensorFlow/PyTorch support deep learning. These tools streamline all stages of model development from preprocessing to deployment.
8. How is a confusion matrix utilized, and what insights does it provide?
Ans:
A confusion matrix compares predicted labels with true labels for classification tasks. It breaks down results into true positives, true negatives, false positives, and false negatives, allowing calculation of metrics like accuracy, precision, recall, and F1-score to assess model performance.
9. What are the main categories of machine learning, and when are they used?
Ans:
The three primary types are supervised, unsupervised, and reinforcement learning. Supervised learning predicts outcomes with labeled data, unsupervised learning finds hidden structures in unlabeled data, and reinforcement learning optimizes behavior through rewards in interactive environments.
10. How do you select the most suitable machine learning algorithm for a task?
Ans:
Choosing an algorithm depends on the type of data, dataset size, and the problem classification, regression, or clustering. Linear regression fits linear patterns, decision trees or ensemble models handle complex relationships, and deep learning networks are best for high-dimensional or unstructured data like images and text.