# Machine Learning with Python Interview Questions and Answers

Last updated on 04th Jul 2020BlogInterview Questions

These Machine Learning Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of Machine Learning . As per my experience good interviewers hardly plan to ask any particular question during your interview, normally questions start with some basic concept of the subject and later they continue based on further discussion and what you answer.we are going to cover top Machine Learning Interview questions along with their detailed answers. We will be covering Machine Learning scenario based interview questions, Machine Learning interview questions for freshers as well as Machine Learning interview questions and answers for experienced.

**1. What’s the trade-off between bias and variance?**

**Ans:**

Variance is an error due to too much complexity in the learning algorithm you’re using. This leads to the algorithm being highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data. You’ll be carrying too much noise from your training data for your model to be very useful for your test data.

The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance and a bit of irreducible error due to noise in the underlying dataset. Essentially, if you make the model more complex and add more variables, you’ll lose bias but gain some variance — in order to get the optimally reduced amount of error, you’ll have to trade off bias and variance. You don’t want either high bias or high variance in your model.

**2. What is the difference between supervised and unsupervised machine learning?**

**Ans:**

**3. How is KNN different from k-means clustering?**

**Ans:**

The critical difference here is that KNN needs labeled points and is thus supervised learning, while k-means doesn’t — and is thus unsupervised learning.

**4.Explain how a ROC curve works.**

**Ans:**

**5. Define precision and recall.**

**Ans:**

**6. What is Bayes’ Theorem? How is it useful in a machine learning context?**

**Ans:**

Mathematically, it’s expressed as the true positive rate of a condition sample divided by the sum of the false positive rate of the population and the true positive rate of a condition. Say you had a 60% chance of actually having the flu after a flu test, but out of people who had the flu, the test will be false 50% of the time, and the overall population only has a 5% chance of having the flu. Would you actually have a 60% chance of having the flu after having a positive test?

Bayes’ Theorem says no. It says that you have a (.6 * 0.05) (True Positive Rate of a Condition Sample) / (.6*0.05)(True Positive Rate of a Condition Sample) + (.5*0.95) (False Positive Rate of a Population) = 0.0594 or 5.94% chance of getting a flu.

Bayes’ Theorem is the basis behind a branch of machine learning that most notably includes the Naive Bayes classifier. That’s something important to consider when you’re faced with machine learning interview questions.

**7. Why is “Naive” Bayes naive?**

**Ans:**

**8. Explain the difference between L1 and L2 regularization.**

**Ans:**

**9. What’s your favorite algorithm, and can you explain it to me in less than a minute?**

**Ans:**

**10. What’s the difference between Type I and Type II error?**

**Ans:**

Don’t think that this is a trick question! Many machine learning interview questions will be an attempt to lob basic questions at you just to make sure you’re on top of your game and you’ve prepared all of your bases.

Type I error is a false positive, while Type II error is a false negative. Briefly stated, Type I error means claiming something has happened when it hasn’t, while Type II error means that you claim nothing is happening when in fact something is.

A clever way to think about this is to think of Type I error as telling a man he is pregnant, while Type II error means you tell a pregnant woman she isn’t carrying a baby.

**11. What’s a Fourier transform?**

**Ans:**

A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. Or as this more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. The Fourier transform finds the set of cycle speeds, amplitudes and phases to match any time signal. A Fourier transform converts a signal from time to frequency domain — it’s a very common way to extract features from audio signals or other time series such as sensor data.

**12. What’s the difference between probability and likelihood?**

**Ans:**

**13. What is deep learning, and how does it contrast with other machine learning algorithms?**

**Ans:**

Deep learning is a subset of machine learning that is concerned with neural networks: how to use backpropagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data through the use of neural nets.

**14. What’s the difference between a generative and discriminative model?**

**Ans:**

A generative model will learn categories of data while a discriminative model will simply learn the distinction between different categories of data. Discriminative models will generally outperform generative models on classification tasks.

**15. What cross-validation technique would you use on a time series dataset?**

**Ans:**

standard k-folds cross-validation, you have to pay attention to the fact that a time series is not randomly distributed data — it is inherently ordered by chronological order. If a pattern emerges in later time periods for example, your model may still pick up on it even if that effect doesn’t hold in earlier years!

You’ll want to do something like forward chaining where you’ll be able to model on past data then look at forward-facing data.

- fold 1 : training [1], test [2]
- fold 2 : training [1 2], test [3]
- fold 3 : training [1 2 3], test [4]
- fold 4 : training [1 2 3 4], test [5]
- fold 5 : training [1 2 3 4 5], test [6]

**16. How is a decision tree pruned?**

**Ans:**

Pruning is what happens in decision trees when branches that have weak predictive power are removed in order to reduce the complexity of the model and increase the predictive accuracy of a decision tree model. Pruning can happen bottom-up and top-down, with approaches such as reduced error pruning and cost complexity pruning.

Reduced error pruning is perhaps the simplest version: replace each node. If it doesn’t decrease predictive accuracy, keep it pruned. While simple, this heuristic actually comes pretty close to an approach that would optimize for maximum accuracy.

**17. Which is more important to you– model accuracy, or model performance?**

**Ans:**

This question tests your grasp of the nuances of machine learning model performance! Machine learning interview questions often look towards the details. There are models with higher accuracy that can perform worse in predictive power — how does that make sense?

Well, it has everything to do with how model accuracy is only a subset of model performance, and at that, a sometimes misleading one. For example, if you wanted to detect fraud in a massive dataset with a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud. However, this would be useless for a predictive model — a model designed to find fraud that asserted there was no fraud at all! Questions like this help you demonstrate that you understand model accuracy isn’t the be-all and end-all of model performance.

**18. What’s the F1 score? How would you use it?**

**Ans:**

The F1 score is a measure of a model’s performance. It is a weighted average of the precision and recall of a model, with results tending to 1 being the best, and those tending to 0 being the worst. You would use it in classification tests where true negatives don’t matter much.

**19. How would you handle an imbalanced dataset?**

**Ans:**

An imbalanced dataset is when you have, for example, a classification test and 90% of the data is in one class. That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the other category of data! Here are a few tactics to get over the hump:

1- Collect more data to even the imbalances in the dataset.

2- Resample the dataset to correct for imbalances.

3- Try a different algorithm altogether on your dataset.

What’s important here is that you have a keen sense for what damage an unbalanced dataset can cause, and how to balance that.

**20. When should you use classification over regression?**

**Ans:**

Classification produces discrete values and dataset to strict categories, while regression gives you continuous results that allow you to better distinguish differences between individual points. You would use classification over regression if you wanted your results to reflect the belongingness of data points in your dataset to certain explicit categories (ex: If you wanted to know whether a name was male or female rather than just how correlated they were with male and female names.)

**21. Name an example where ensemble techniques might be useful.**

**Ans:**

Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data).

You could list some examples of ensemble methods, from bagging to boosting to a “bucket of models” method and demonstrate how they could increase predictive power.

**22. How do you ensure you’re not overfitting with a model?**

**Ans:**

This is a simple restatement of a fundamental problem in machine learning: the possibility of overfitting training data and carrying the noise of that data through to the test set, thereby providing inaccurate generalizations.

**There are three main methods to avoid overfitting:**

1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data.

2- Use cross-validation techniques such as k-folds cross-validation.

3- Use regularization techniques such as LASSO that penalize certain model parameters if they’re likely to cause overfitting.

**23. What evaluation approaches would you work to gauge the effectiveness of a machine learning model?**

**Ans:**

You would first split the dataset into training and test sets, or perhaps use cross-validation techniques to further segment the dataset into composite sets of training and test sets within the data. You should then implement a selection of performance metrics: here is a fairly comprehensive list. You could use measures such as the F1 score, the accuracy, and the confusion matrix. What’s important here is to demonstrate that you understand the nuances of how a model is measured and how to choose the right performance measures for the right situations.

**24. How would you evaluate a logistic regression model?**

**Ans:**

A subsection of the question above. You have to demonstrate an understanding of what the typical goals of a logistic regression are (classification, prediction, etc.) and bring up a few examples and use cases.

**25. What’s the “kernel trick” and how is it useful?**

**Ans:**

The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space. This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products. Using the kernel trick enables us to effectively run algorithms in a high-dimensional space with lower-dimensional data.

**26. What are the three stages for creating a model in machine learning?**

**Ans:**

- Model building
- Model test
- Applying the model

**27. Keep in mind that you are working in a data system, and explain whether you choose key variables.**

**Ans:**

Some methods are used to select the following critical variables:

- Using the lasso regression system.
- Using the Random Forest, the plot variable importance chart.
- Using linear lag.

**28. Why are the innocent demons ‘innocent’?**

**Ans:**

Since innocent ghosts are very ‘naïve’, all aspects of the data set are equally important and independent. As we know, this assumption is rare in the real world situation.

**29. How is KI different?**

**Ans:**

K-Recent neighboring countries have a classification algorithm, while k-object is an uncontrolled clustering algorithm. Although the mechanisms seem to look the same, you need data that you need to classify an unnamed point (neighboring area) to work with neighboring neighboring countries. K-material clustering requires only a single point of reference and a starting point: Algorithms can learn how to group the group into groups by taking unstoppable points and calculating the gap between different points.

The significant difference here is that the KNN has to be named for points, which require supervised learning, while the k-object does not – there is no supervision.

**30. Is It the Most Important For You Model Model Accuracy or Model Performance?**

**Ans:**

This question tests your grip on the machine learning model performance nuances! Machine Learning Interview Questions are often headed towards the details. There are models with greater accuracy, which advance the power of the advance – how is it realized?

Well, model accuracy model performance is only a subset of how to do it, sometimes it’s a misguided guide. For example, if you find millions of models in a large database, if only a very small number of fraud cases, the most accurate model does not contradict any fraud. However, it will be ineffective in advance – insisting that there is no fraud on a model designed to detect fraud! Questions like these help you to demonstrate that you need to understand the model’s accuracy.

**31. When Should You Use Taxonomy on Retreat?**

**Ans:**

Sorting creates a database for distinct values and strict categories, while you record the conclusions that allow you to distinguish the difference between individual points. You can categorize the consequences if you want to reflect the combination of data points in your database for certain specific sections. (For example, female names, when compared to male, female, male and female).

**32. What is upwards?**

**Ans:**

Overfitting occurs when a statistical model or machine learning algorithm captures data noise. Intuitively, overfitting occurs when the model or algorithm data fits very well. In particular, if a sample or algorithm is showing low mumps, there is a high variation. Floating is often a result of a more complex model, and it is compatible with many sample samples and test data to compare their predictive accuracy using a validation or cross-estimate.

**33. What is downwards?**

**Ans:**

Underfitting occurs when a statistical model or machine learning algorithm does not catch the basic trend of data. Instinctively, if the sample or algorithm does not match the data correctly, it shows the high independence, especially if it has shown a sample or algorithmic variance. The foundation is often a very simple model result.

**34. How do you make sure that you do not block a model?**

**Ans:**

It is a simple problem with a basic problem with machine learning: training data is likely to carry that data noise through overfitting and testing packages, thus providing inaccurate generalization.

**35. What are the main guidelines to avoid excesses?**

**Ans:**

- Simplify the sample: You can reduce the transition by lower variables and parameters, thus eliminating some of the noise in training data.
- Use k-folds cross-validation for cross-checking techniques.
- regulatory techniques such as LASOO, which are some sample parameters to be punished if they make the tablet.

**36. How to handle unbalanced databases?**

**Ans:**

When you have an unbalanced database, for example, a classification test and 90% of data is in a class. This leads to problems: if there is no computing power in the other section of data data, 90%.

**37. What is Learning Strength?**

**Ans:**

Reinforcement learning is a type of machine learning, and thus a branch of artificial intelligence. In order to increase its performance, it allows machines and software agents to automatically determine the best possible performance in a given environment. The simple reward idea for the agent to learn its behavior is essential; This is known as the Reinforcement Signal.

One fact is, reinforcement learning is defined by a particular type of problem, and all its solutions are classified as reinforcement learning algorithms. The problem is, an agent must decide on the basis of his current state and decide the best action. When this step is repeated, the problem is called Markov Decision Making.

**38. What is the result tree?**

**Ans:**

A conclusion is a concrete representation for all solutions that are based on specific conditions. It starts with a single box (or root), just like a tree, because it gives a solution like a tree.

**39.What is a random forest?**

**Ans:**

**40. What is the central trend?**

**Ans:**

Example: average, average, pattern

**41. When we use Pearson’s relationship coefficient method?**

**Ans:**

For example, a Pearson contact can be used to assess whether the increase in the temperature of your production facilities is associated with lower thickness of your chocolate coatings.

**42. What is the standard deviation, how is it calculated?**

**Ans:**

Step 1: Find the average.

Step 2: Find the average square of its distance for each data point.

Step 3: A total of values from step 2.

Step 4: Separate the number of data points.

Step 5: Take a square hunt.

**43. What is Z Score?**

**Ans:**

**44. What is Type I and Type II Error?**

**Ans:**

Type I Error: A Type I error occurs when a null hypothesis rejects the researcher. The probability of performing a type I error is called a significance, and is often denoted by α.

Type II Error: When a researcher accepts a null hypothesis wrong, Type II error occurs. The probability that a type II error occurs is called beta, and is often denoted by β. The probability of a Type II error is called Power Test.

**45. What is the remainder?**

**Ans:**

Remaining = Value Value – Estimated value e = y – y

The total and the remaining remaining are equal to zero. Σ e = 0 and e = 0.

**46. What is a Sample Model Test?**

**Ans:**

**47. What is F Statistics?**

**Ans:**

**48. What is ANOVA?**

**Ans:**

- One way is ANOVA (which is an independent variable).
- Two way ANOVA (there are two distinct variables)

**49. What is data preprocessing in machine learning in python?**

**Ans:**

- Pre- management is mentioned as the changes are activated to the facts before giving it to an algorithm.
- Data preprocessing is a method used to change the raw data in a clean data group. Data is collected from various origins and gathering in basic format is not practical for examining.
- To get the best outcomes from the registered model in the projects of machine learning and the pattern of the facts should be well arranged.

**50. What is a statistical method? Does it use it?**

**Ans:**

Uses

It is very important for a method in statistics. It is for assessing two physical full declarations of the population to examine and tell the supreme help of the sample data. Searching for a statistical remark is a hypothesis test. There are two terms normalization and standard normalization.

**51. What are the parameters of hypothesis testing?**

**Ans:**

The alternative hypothesis – It is used for analyzing the outcome of a real effect. It is used for testing hypotheses opposite to the Null hypothesis. It shapes the community as small, great or differs from the principle of hypothesis in the null hypothesis.

**52. What is a business dataset?**

**Ans:**

For example – Customers are commonly defined from the country, gender, age, name, etc. and the commodity is also defined by the type of product, producer, vendor, etc. It is very easy for people and difficult for the algorithm of machine learning because of various cause

- Mostly machine models are in algebraic
- ML packages convert class facts into numerical mechanical.
- Unqualified variables contain a large number of levels to appear as a small number of examples.

**53. Name the categories of Machine Learning Algorithms with Python?**

**Ans:**

- Supervised – In it the feedback is contained to the computer to provide for the trial data for learning. The system manages the sample inputs and needs the output to learn a common rule to measure inputs to outputs.
- Unsupervised – No tag is obtained by the python machine learning algorithm. Only a group of inputs is provided. It depends on itself to search the construction in the input. This is considered as the achievement for future learning and can analyze unsupervised learning as clustering, association.

**54. Explain Anova?**

**Ans:**

**55.Mention the reason for python is the best for machine learning?**

**Ans:**

- It is very simple and readable for both developers and exploratory students. It permits us to finish the project without using more codes.
- Python contains various and numerous libraries and frameworks so that we can save our time. Libraries such as Keras, TensorFlow, Scikit-learn.
- It is portable, extensible and to support community and corporate.

**56. What is scikit learning?**

**Ans:**

- An effective and simple implementation for data mining and to examine data.
- Available and renewable for everyone for different contexts.
- It is constructed on the top of Numpy, SciPy, and used commercially.

**57. Define the uses of PCA?**

**Ans:**

- It is for searching inner connectivity in the middle of variables of data It is used for explaining and envision data
- Analyzing become easy and simple when the counting of variable drops
- It is usually envisioned as hereditary distance and applicability between the community.
- It acts on the square, a balanced cast and a natural sum of squares and cross product cast.

**58. How to compute the dot product two vectors xx and yy?**

**Ans:**

**59. What does K- means?**

**Ans:**

**60. What is a type of supervised machine learning algorithm?**

**Ans:**

**61. Explain a decision tree pruned?**

**Ans:**

**62. How to detect fraud in datasets?**

**Ans:**

For example – If anyone wants to search fraud from the details which are huge with an example of millions. When the huge opposition of cases is fraud then the high accuracy model will forecast no fraud.

**63. Name the extension build on formalized linear regression?**

**Ans:**

**64. Define Ridge regression?**

**Ans:**

**65. What are the techniques to manage an Imbalanced Dataset?**

**Ans:**

- By using the accurate judgment of grade for the models. Choose the example which is suitable.
- Resample your unbalanced data set by the help of two methods known as under-sampling and oversampling.
- To reduce the issue of imbalance data the use K-fold cross-validation perfectly.
- Keep together various to sample again the datasets.
- Sample again with the various ratios between the rare and abundant class.
- Gather all the plentiful class
- Models to be designed

**66. How to change consumer evolution?**

**Ans:**

**67. How to notice heteroscedasticity in a simple regression model?**

**Ans:**

**68. Mention the NumPy and Scipy?**

**Ans:**

- NumPy – It is used for the primary operation like classifying, listing, fundamental function on the arrangement of a data type. It includes all numeric python and a multidimensional array of the item. It is written in C are used in different operation of the facts
- Scipy – It is known as a Scientific python that includes all the algebraic functions. It helps operations such as integration, differences, grading optimization. It is popular because of its speed. It does not have a group of ideas like it is more functional.

**69. What is T in ML?**

**Ans:**

**70. What is a quantitative metric?**

**Ans:**

**71. What is an experience?**

**Ans:**

**72. Define the libraries for machine learning?**

**Ans:**

**73. What are the filter methods?**

**Ans:**

**74. What is used for a greedy search to find a suitable feature subnet?**

**Ans:**

**75. Define an evolutionary algorithm for feature selection?**

**Ans:**

**76. Name the challenges and application of Machine learning**

**Ans:**

**Challenges**

- Provide low-quality data to generate the issue connected with data processing.
- It is a very time-consuming task for data acquisition, feature extraction, and retrieval.
- Absence of expert resources
- Error of overfitting and underfitting
- Profanity of dimensionality
- Problematic in the deployment

**Application**

- Analyzing emotions
- Analyzing sentiments
- Error Disclosure and avoidance
- Whether calculating and indicating
- Fraud analyzing and avoidance

**77. Define KNN?**

**Ans:**

**78. How a KNN algorithm performs?**

**Ans:**

**79. What is the bed of capacity?**

**Ans:**

**80. determine the numbers of neighbors in KNN?**

**Ans:**

**81. How to enhance KNN?**

**Ans:**