1. How can supervised learning be distinguished from unsupervised learning?
Ans:
Supervised learning uses labeled datasets to teach models the relationship between inputs and outputs, enabling accurate predictions or classifications. Unsupervised learning works with unlabeled data to detect hidden patterns, clusters, or structures. It is especially useful for grouping users or discovering natural data segments.
2. How does transfer learning improve machine learning models?
Ans:
Transfer learning enhances model performance by applying knowledge from a pre-trained model on a large dataset to a smaller, specialized task. This reduces training time, optimizes resource use and allows models to perform well even with limited data. Leveraging prior knowledge improves accuracy and efficiency for specific applications.
3. What is overfitting in machine learning and why is it a concern?
Ans:
Overfitting occurs when a model memorizes not only meaningful patterns but also noise from the training data, which reduces accuracy on new data. Techniques like regularization, pruning and cross-validation help maintain balance. Avoiding overfitting ensures models generalize well across different datasets and real-world scenarios.
4. What are GANs and how do they function?
Ans:
Generative Adversarial Networks consist of two neural networks: a generator that creates synthetic data and a discriminator that assesses its authenticity. These networks train competitively, pushing the generator to produce more realistic outputs over time. GANs are widely used for image synthesis, content creation and data augmentation tasks.
5. How do attention mechanisms enhance transformer models?
Ans:
Attention mechanisms allow transformers to focus on the most important parts of an input sequence. Self-attention enables each element to consider relationships with all others, capturing context efficiently. This mechanism significantly boosts performance in language understanding, translation and AI content generation tasks.
6. What is the importance of feature engineering in machine learning?
Ans:
Feature engineering converts raw data into meaningful inputs by selecting, refining, or creating variables that highlight key patterns. Effective features help algorithms learn efficiently, improve prediction accuracy and enhance generalization. Poorly designed features can limit even advanced models’ performance.
7. How is missing data usually managed in datasets?
Ans:
Missing data can be addressed by imputing values using mean, median, or mode, or by using algorithms that handle missing entries naturally. In cases of extensive missing data, removing affected rows or columns may be necessary. The chosen approach depends on the dataset size and impact of missing information.
8. How do Random Forest and XGBoost differ in approach?
Ans:
Random Forest constructs multiple independent decision trees and averages their predictions to reduce overfitting and variance. XGBoost builds trees sequentially, with each tree correcting errors from the previous one, achieving high predictive accuracy. Both methods are effective but use distinct strategies for problem-solving.
9. How is the performance of a machine learning model measured?
Ans:
Model evaluation varies by task type. Classification models are assessed using accuracy, precision, recall, F1-score and ROC AUC, while regression models rely on metrics like mean squared error, mean absolute error and R-squared. These metrics determine reliability and readiness for real-world deployment.
10. What ethical considerations are important when creating AI systems?
Ans:
Ethical AI emphasizes fairness by reducing bias in training data, transparency in model decisions and protection of user privacy. Building accountable and explainable systems ensures trust and prevents misuse. Following ethical standards aligns AI with societal values and ensures safe deployment.