25+ [REAL-TIME] Machine Learning Interview Questions & Answers

25+ [REAL-TIME] Machine Learning Interview Questions & Answers

Last updated on 23rd Jun 2020, Blog, Interview Questions

About author

Vignesh (Sr Technical Project Manager )

He is a Proficient Technical Expert for Respective Industry Domain & Serving 8+ Years. Also, Dedicated to Imparts the Informative Knowledge's to Freshers. He Share's this Blogs for us.

(5.0) | 15212 Ratings 667

Machine learning is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence.Machine learning focuses on the development of computer programs that can access data and use it learn for themselves

1. What are the types of machine learning?


  • Supervised learning
  • Unsupervised learning
  • Reinforcement Learning

2. What is Supervised learning in machine learning?


Supervised learning : When you know your target variable for the problem statement, it becomes Supervised learning. This can be applied to perform regression and classification.

Example: Linear Regression and Logistic Regression.

3. What is Unsupervised learning in machine learning?


Unsupervised learning : When you do not know your target variable for the problem statement, it becomes Unsupervised learning. This is widely used to perform Clustering.

Example: K-Means and Hierarchical clustering.

4. What are the commonly used python packages?


  • Numpy
  • Pandas
  • SCI-KIT Learn
  • Matplot library

5. What are the commonly used R packages?


  • Caret
  • Data.Table
  • Reshape
  • Reshape2
  • E1071
  • DMwR
  • Dplyr
  • Lubridate

6. Name the commonly used algorithms.


  • Linear regression
  • Logistic regression
  • Random Forest
  • KNN

7. What is pecision?


The ratio of predicted positive against the actual positive. It is the most commonly used error metric as a classification mechanism. The range is from 0 to 1, where 1 represents 100%.

8. What is recall?


The ratio of the true positive rate against the actual positive rate. The range is again from 0 to 1

9. Which metric acts like accuracy in classification problem statements?


  •  F1 Score:  2 * (Precision*Recall./Precision + Recall

10. What is a normal distribution?


When the data distribution is equally distributed as such the mean, median and mode are equal.

11. What is overfitting?


Any prediction rate which has high inconsistency between the training error and the test error leads to a high business problem, if the error rate in the training set is low and the error rate in the test set is high, then we can conclude it as an overfitting model.

12. What is underfitting?


Any prediction rate which has provided low prediction in the training error and the test error leads to a high business problem, if the error rate in the training set is high and the error rate inthe test set is also high, then we can conclude it as an overfitting model.

13. What is a univariate analysis?


An Analysis that can be applied to one attribute at a time is called as a univariate analysis.

Boxplot is one of the widely used univariate models. Scatter plot and cook’s distance are other methods used for bivariate and multivariate analysis.

14. Name a few methods for Missing Value Treatments.


Central Imputation : This method acts more like central tendencies. All the missing values will be filed with mean and median mode respective to numerical and categorical data types.

KNN(K Nearest Neighbour imputation): Distance between two or multiple attributes are calculated using Euclidian’s distance and the same will be used to treat the missing values. Mean and mode will agaibe n used as in CI.

15. What is the Pearson correlation?


Correlation between predicted and actual data can be examined and understood using this method.

  • The range is from -1 to +1.
  • -1 refers to negative 100% whereas +1 refers to positive 100%.
  • The formula is Sd(x.*m/Sd.(y.

16. How and by what methods data visualizations can be effectively used?


 In addition to giving insights in a very effective and efficient manner, data visualization can also be used in such a way that it is not only restricted to bar, line or some stereotypic graphs. Data can be represented in a much more visually pleasing manner.

One thing has to be taken care of is to convey the intended insight or finding correctly to the audience. Once the baseline is set. Innovative and creative part can help you come up with better looking and functional dashboards. There is a fine line between the simple insightful dashboard and awesome looking 0 fruitful insight dashboards.

17. How to understand the problems faced during data analysis?


 Most of the problems faced during hands on analysis or data science is because of poor understanding of the problem in hand and concentrating more on tools, end results and other aspects of the project.

Breaking the problem down to a granular level and understanding takes a lot of time and practice to master. Coming back to square one in data science projects can be seen in a lot of companies and even in your own project or kaggle problems.

18. Advantages of Tableau Prep?


Tableau Prep will reduce a lot of time like how its parent software (Tableau. does when creating impressive visualizations. The tool has a lot of potential in taking professionals from data cleaning, merging steps to creating final usable data that can be linked to Tableau desktop for getting visualization and business insights. A lot of manual tasks will be reduced and the time can be used to make better findings and insights.

19. What is the common perception about visualization?


 People think visualization as just charts and summary information. But they are beyond that and drive business with a lot of underlying principles. Learning design principles can help anyone build effective and efficient visualizations and this Tableau prep tool can drastically increase our time on focusing on more important parts. The only issue with Tableau is, it is paid and companies need to pay for leveraging that awesome tool.

20. What are the time series algorithms?


Time series algorithms like ARIMA, ARIMAX, SARIMA, Holts winters are very interesting to learn and use as well to solve a lot of complex problems for businesses. Data preparation for time series analysis plays a vital role. The stationarity, seasonality, cycles and noises need time and attention. Take as much time as you would like to make the data right. Then you can run any model on top of it.

    Subscribe For Free Demo

    21. How to choose the right chart in case of creating a viz?


    Using the right chart to represent data is one of the key aspects of data visualization and design principle. You will always have options to choose from when deciding on a chart. But fixing to the right chart comes only by experience, practice and deep understanding of end-user needs. That dictates everything in the dashboard.

    22. Where to seek help in case of discrepancies in Tableau?


    When you face any issue regarding Tableau, try searching in the Tableau community forum. It is one of the best places to get your queries answered. You can always write your question and get the query answered within an hour or a day. You can always post on LinkedIn and follow people.

    23. Now companies are heavily investing their money and time to make the dashboards. Why?


    To make stakeholders more aware about the business through data. Working on visualization projects helps you develop one of the key skills every data scientist should possess i.e. Thinking from the shoes of the end user.

    If you’re learning any visualization tool, download a dataset from kaggle. Building charts and graphs for the dashboard should be the last step. Research more about the domain and think about the KPIs you would like to see in the dashboard if you’re going to be the end user. Then start building the dashboard piece by piece.

    24. How can I achieve accuracy in the first model that I built?


     Building machine learning models involves a lot of interesting steps. 90% accuracy models don’t come in the very first attempt. You have done a lot of better feature selection techniques to get that point, which means it involves a lot of trial and error. The process will help you learn new concepts in statistics, math and probability.

    25. What is the basic responsibility of a Data Scientist?


     As a data scientist, we have the responsibility to make complex things simple enough that anyone without context should understand what we are trying to convey.

    The moment, we start explaining even the simple things the mission of making the complex simple goes away. This happens a lot when we are doing data visualization.

    Less is more. Rather than pushing too much information to the reader’s brain, we need to figure out how easily we can help them consume a dashboard or a chart. The process is simple to say but difficult to implement. You must bring the complex business value out of a self-explanatory chart. It’s a skill every data scientist should strive towards and good to have in their arsenal.

    26. How do I enhance a SAS analyst?


    • Step 1: Earn a College Degree. Businesses prefer SAS programmers who have completed a statistics or computer science bachelor’s degree program.
    • Step 2: Acquire SAS Certification.
    • Step 3: Consider Getting an Advanced Degree.
    • Step 4: Gain SAS Program Coding Work Background.

    27. What does SAS stand out to be the best over other data analytics tools?


     Ease to understand: The provisions included in SAS are remarkably easy to learn. Further, it offers the most suitable option for those who already are aware of the SL. On the other hand, R comes with a steep training cover which is supposed to be a low-level programming style.

    Data Handling Capacities: it is at par the most leading tool which also includes the R& Python.

    If it advances before handling the huge data, it is the best platform to engage Graphical Capacities: it comes with functional graphical capacities and has a limited knowledge field.

    It is useful to customize the plots Better tool management: It benefits in a release the updates with regards to the controlled conditions. This is the main reason why it is well tested. Whereas if you considered R& Python, it has open contribution also the risk of errors in the current development is also high.

    28. What is RUN-Group processing?


    To practice RUN-group processing, you start the system and then submit many RUNgroups.

    A RUN-group is a group of records that contain at least one product group including ends with a RUN statement. It can contain different SAS statements such as AXIS, BY, GOPTIONS, LEGEND, Power, or WHERE.

    29. Definitions of is BY-Group processing?


     Definitions for BY-Group Processing. is a method of preparing observations from one or numerous SAS data sets that are arranged or ordered by importance of individual or more shared variables. All data sets that are being connected must include one or more BY variables.

    30. What is the right way to validate the SAS program?


    The OPTIONS OBS=0 through the commencement of the code needs to be written but it yourself required to perform the same then their mind be any log which gets recognized by the colors that get highlighted.

    Course Curriculum

    Best Hands-on Practical Machine Learning Course to Build Your Skills

    Weekday / Weekend BatchesSee Batch Details

    31. Do you know any SAS functions and Call Routines?


    Can be a mutable type, uniform, or any SAS expression, including different use. This product also a letter from contentions that SAS allows are called by special purposes. Multiple arguments are separated with a comma.

    32. What’s the trade-off between bias and variance?


     Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm you’re using. This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set.

    Variance is an error due to too much complexity in the learning algorithm you’re using. This leads to the algorithm being highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data. You’ll be carrying too much noise from your training data for your model to be very useful for your test data.

    The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance and a bit of irreducible error due to noise in the underlying dataset. Essentially, if you make the model more complex and add more variables, you’ll lose bias but gain some variance — in order to get the optimally reduced amount of error, you’ll have to tradeoff bias and variance. You don’t want either high bias or high variance in your model.

    33. What is deep learning?


    Deep learning is a process where it is considered to be a subset of machine learning processes.

    34. What is the F1 score?


    The F1 score is defined as a measure of a model’s performance.

    35. How is F1 score used?


    The average of Precision and Recall of a model is nothing but F1 score measure. Based on the results, the F1 score is 1 then it is classified as best and 0 being the worst

    36. What is the difference between Machine learning Vs Data Mining?


    • Data mining is about working on unlimited data and then extracting it to a level anywhere the unusual and unknown patterns are identified.
    • Machine learning is any method about a study whether it closely relates to design, development concerning the algorithms that provide an ability to certain computers to capacity to learn.

    37. What are confounding variables?


    These are obvious variables in a scientific model that correlates directly or inversely with both the subject and the objective variable. The study fails to account for the confounding factor.

    38. How can you randomize the items of a list in place in Python?


    Consider the example shown below:

    • from random import shuffle
    • x = [‘Data’, ‘Class’, ‘Blue’, ‘Flag’, ‘Red’, ‘Slow’]
    • shuffle(x.
    • print(x.
    • The output of the following code is as below.
    • [‘Red’, ‘Data’, ‘Blue’, ‘Slow’, ‘Class’, ‘Flag’]

    39. How to get indices of N maximum values in a NumPy array?


    We can get the indices of N maximum values in a NumPy array using the below code:

    • import numpy as np
    • arr = np.array([1, 3, 2, 4, 5].
    • print(arr.argsort(.[-3:][::-1].


    [ 4 3 1 ]

    40. How make you 3D plots/visualizations using NumPy/SciPy?


    Like 2D plotting, 3D graphics is beyond the scope of NumPy and SciPy, but just as in this 2D example, packages exist that integrate with NumPy. Matplotlib provides primary 3D plotting in the mplot3d subpackage, whereas Mayavi produces a wide range of high-quality 3D visualization features, utilizing the powerful VTK engine.

    41. What are the types of biases that can occur during sampling?


     Some simple models of selection bias are described below. Undercoverage occurs when some members of the population live badly represented inside the sample. The survey relied on a service unit, drawn of telephone directories and car registration lists.

    • Selection bias
    • Under coverage bias
    • Survivorship bias

    42. Which Python library is used for data visualization?


    Plotly. The fifth tool is Plotly, also called Plot.ly because of its main platform online. It is an interactive online visualization tool that is being used for data analytics, scientific graphs, and other visualization. This contains some great API including one for Python

    43. Write code to sort a DataFrame in Python in descending order.


    • DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind=’quicksort’:
    • na_position=’last’.[source]
    • Sort by the values along either axis
    • Parameters:
    • by: str or list of str
    • Name or list of names to sort by.
    • if an axis is 0 or ‘index’ then by may contain index levels and/or column labels
    • if the axis is 1 or ‘columns’ then by may contain column levels and/or index labels
    • Changed in version 0.23.0: Allow specifying index or column level names.
    • axis : {0 or ‘index’, 1 or ‘columns’}, default 0
    • Axis to be sorted
    • ascending: bool or list of bool, default True
    • Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must
    • match the length of the by.
    • in place: bool, default False
    • if True, perform operation in-place
    • kind: {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’
    • Choice of sorting algorithm. See also array.np.sort for more information. mergesort is the only
    • stable algorithm. For DataFrames, this option is only applied when sorting on a single column or
    • label.
    • na_position : {‘first’, ‘last’}, default ‘last’
    • first puts NaNs at the beginning, last put NaNs at the end
    • Returns:
    • sorted_obj: DataFrame

    44. Why should you use NumPy arrays instead of nested Python lists?


    Let’s say you have a list of numbers, and you want to add 1 to every element of the list.

    In regular python, you would do:

    • a = [6, 2, 1, 4, 3]
    • b = [e + 1 fore in a]

    Whereas with numpy, you simply have to do:

    • import numpy as np
    • a = np.array([6, 2, 1, 4, 3].
    • b = a + 1

    It also works for every numpy mathematics function: you can take the exponential of every element of a list using np.exp for example.

    45. Why is an import statement required in Python?


    To be able to use any functionality, the respective code logic needs to be accessible for the Python interpreter. With the help of the import statement, we can use specific scripts. However, there are thousands of such scripts available and every script available cannot be used at once. Hence we import statement to use only the scripts that we want to use

    • import pandas as pd
    • import numpy as np

    46. What is the alias in the import statement? Why is it used?


     Aliases are used in import statements for ease of usage. If the imported module has a large name, for example import multiprocessing . Everytime we want to access any script present in a multiprocessing module, we need  to use the word multiprocessing.

    However if an alias is used, import multiprocessing as mp, we can simply replace the words multiprocessing with mp

    47. Are the aliases used for a module fixed/static ?


    No, the aliases are not pre-fixed. The alias can be named as per your convenience. However, the documentation of a respective module sometimes specifies the alias to be used for ease of understanding.

    48. Why is “Naive” Bayes naive?


    Despite its practical applications, especially in text mining, Naive Bayes is considered “Naive” because it makes an assumption that is virtually impossible to see in real-life data: the conditional probability is calculated as the pure product of the individual probabilities of components. This implies the absolute independence of features — a condition probably never met in real life.

    As a Quora commenter put it whimsically, a Naive Bayes classifier that figured out that you liked pickles and ice cream would probably naively recommend you a pickle ice cream.

    49. What is a nonparametric test used for?


    Nonparametric tests do not assume that the data follows a specific distribution. They can be used whenever the data do not meet the assumptions of  parametric tests.

    50. What are the pros and cons of the Decision Trees algorithm?


    • Pros: Easy to interpret. Will ignore irrelevant independent variables since information gain will be minimal. Can handle missing data. Fast modelling.
    • Cons: Many combinations are possible to create a tree. There are chances that it might not find the best tree possible.

    51. Name some Classification of Linear Algorithms.


    Linear Classifiers: Logistic Regression, Naive Bayes Classifier, Decision Trees, Random Forest, Neural Networks, K Nearest Neighbor.

    Course Curriculum

    Get JOB Oriented Machine Learning Training from Real Time Experts

    • Instructor-led Sessions
    • Real-life Case Studies
    • Assignments
    Explore Curriculum

    52. What are pros and cons of Naive Bayes algorithm?


    • Big sized data is handled easily
    • Multiclass performance is good and accurate
    • It is not process intensive
    • Cons: Assume independence of predictor variables.

    53. What are the types of Skewness?


    A dataset that is skewed right or left are the two types.

    54. What is skewed data?


    A data distribution that is has skewed data towards the right or left.

    55. What is the skewness of this data? 27;28;30;32;34;38;41;42;43;44;46;53;56; 62


    The data set is skewed left

    56. What is an outlier?


     An outlier is a value that is very much away from the rest of the values in the data set.

    57. Mention the characteristics of symmetric data distribution?


    The mean is equal to the median and the tails of the distribution are balanced.

    58. What are the applications of data science?


    Optical character recognition, recommendation engines, filtering algorithms, personal assistants, advertising, surveillance, autonomous driving, facial recognition and more.

    59. Define EDA?


    EDA [exploratory data analysis] is an approach to analysing data to summarise their main characteristics, often with visual methods.

    60. What are the steps in exploratory data analysis?


    •  Make summary of observations
    •  describe central tendencies or core part of dataset
    •  describe shape of data
    •  identify potential associations
    •  develop insight into errors, missing values and major deviations

     61. What are the types of data available in Enterprises?


    • Structured data
    • unstructured data
    • big data from social media, surveys, pictures, audio, video, drawings, maps.
    • Machine generated data from instruments
    • real time data feeds

    62. What are the various types of analysis on type of data?


    • Univariate: 1 variable
    • bivariate : 2 variables
    • multivariate: more than 2 variables

    63. What is the difference between primary data and secondary data?


    • Data collected by the interested/self is primary data. This data is collected afresh and first time.
    • Someone else has collected the data and being used by you is secondary data.

    64. What is the difference between qualitative & quantitative ?


    Quantitative method analyses the data based on numbers. A qualitative method analyses the data by attributes.

    65. What is histogram?


    Histogram is the accurate representation of numerical data based on their occurrences/frequencies.

    66. What are the common measures of central tendencies?


    • Mean
    • Median
    • Mode

    67. What are quartiles?


     Quartiles are three points in the data, that divide the data into four groups. Each group consists of a quarter of data.

    68. What are the commonly used error metrics in regression tasks?


    • MSE( Mean squared error): Average of square of errors
    • RMSE (Root mean square error): root of MSE
    • MAPE (Mean absolute percentage error)

    69. What are the commonly used error metrics for classification tasks?


    • F1 score
    • Accuracy
    • Sensitivity
    • Specificity
    • Recall
    • Precision

    70. What is it called when there are more than 1 explanatory variable in the regression task?


     Multiple linear regression

    Machine Learning Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

    71. What are residuals in a regression task?


     The difference between the predicted value and the actual value is called the residual.

    72. What’s a Fourier transform?


    A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. Or as this more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. The Fourier transform finds the set of cycle speeds, amplitudes and phases to match any time signal. A Fourier transform converts a signal from time to frequency domain — it’s a very common way to extract features from audio signals or other time series such as sensor data.

    73. How is a decision tree pruned?


    Pruning is what happens in decision trees when branches that have weak predictive power are removed in order to reduce the complexity of the model and increase the predictive accuracy of a decision tree model. Pruning can happen bottom-up and top-down, with approaches such as reduced error pruning and cost complexity pruning.

    Reduced error pruning is perhaps the simplest version: replace each node. If it doesn’t decrease predictive accuracy, keep it pruned. While simple, this heuristic actually comes pretty close to an approach that would optimize for maximum accuracy.

    74. Can Random forest be used for classification and regression?


    Yes, it can be used.

    75. What is R square value?


    R squared values tells us how close the regression line is fit to the actual values.

    76. What are some common ways of imputation?


    Mean imputation, median imputation, KNN imputation, Stochastic regression, substitution

    77. How would you handle an imbalanced dataset?


    An imbalanced dataset is when you have, for example, a classification test and 90% of the data is in one class. That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the other category of data! Here are a few tactics to get over the hump:

    • Collect more data to even the imbalances in the dataset.
    • Resample the dataset to correct for imbalances.
    • Try a different algorithm altogether on your dataset.

    What’s important here is that you have a keen sense for what damage an unbalanced dataset can cause, and how to balance that.

    78. When should you use classification over regression?


    Classification produces discrete values and dataset to strict categories, while regression gives you continuous results that allow you to better distinguish differences between individual points. You would use classification over regression if you wanted your results to reflect the belongingness of data points in your dataset to certain explicit categories

    79. What parameter is used to update the data without explicitly assigning data to a variable.


     Inplace is used to assign the result of function to itself. If inplace = True , there is no need to explicitly assign to a variable

    80. What is the difference between a dictionary and a set?


    Dictionary has key value pair set does not have key value pairs Set has only unique elements

    81. How to create a series with letters as index?


    Series({‘a’:1,’b’:2}. will create a and b as index. 1 and 2 as their respective values.

    82. Which function can be used to filter a DataFrame?


    The query function can be used to filter a dataframe.

    83. What is the function to create a test train split?


     From sklearn.metrics import test_train_split . This function is used to create a test train split from the data.

    84. What is pickling?


     Pickling is the process of saving a data structure into the physical drive or hard disk.

    85. What is unpickling?


     Unpickling is used to read a pickled file from hard disk or physical storage drive.

    86. What are the most common web frameworks of Python?


    Django and Flask.

    87. How to convert a number of series to a dataframe?


    DataFrame(data = {‘col1’:series1,’col2’:series2}..

    88. How to select a section of a dataframe?


    Using iloc and loc functions the rows and columns can be selected.

    89. How are exceptions handled in Python?


     Exceptions can be handled using the try except statements.

    90. Is multiprocessing possible in python?


    Yes it is possible using the multiprocessing module.

    91. Can the values be replaced in tuples?


    No values cannot be replaced in tuple as tuple is data immutable

    92. What are lambda functions in Python and how it is different from def (defining functions) in Python?


     Lambda function in Python is used for evaluating an expression and then returning a value. Whereas def needs a function name, and the program logic is broken into smaller chunks. Lambda is an inline function consisting of only a single expression, It can take any number of arguments.

    93. Name an example where ensemble techniques might be useful.


    Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data). 

    You could list some examples of ensemble methods, from bagging to boosting to a “bucket of models” method and demonstrate how they could increase predictive power.

    94. How to differentiate from KNN and K-means clustering?


     KNN is standing for the K- Nearest Neighbours, it remains classified because a supervised algorithm.K-means is an unsupervised cluster algorithm.

    95. What is your opinion on our current data process  ?


    This type of question signifies asked and the individuals must carefully listen to their value case and at this same time, the return should be in a constructive and insightful manner. Based on your responses, the interviewer’s mind has a future to review and know whether you imply a vague reply to their team or not.

    96. How do you ensure you’re not overfitting with a model?


    This is a simple restatement of a fundamental problem in machine learning: the possibility of overfitting training data and carrying the noise of that data through to the test set, thereby providing inaccurate generalizations.

    There are three main methods to avoid overfitting:

    • Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data.
    • Use cross-validation techniques such as k-folds cross-validation.
    • Use regularization techniques such as LASSO that penalize certain model parameters if they’re likely to cause overfitting.

    97. Explain about the capture of the correlation between continuous and categorical variables?


     It is possible to do that using ANCOVA technique. It exists for Analysis of Covariance. It is used to calculate this association between continuous including categorical variables.

    98. Difference between an Array and a Linked list?


     An array is an established method of collection objects. A linked program is a group of objects that are prepared into sequential order.

    99. Difference between “long” and “wide” format data?


     In the wide form, each subject’s responses will remain in a separate row, and each answer is into a separate column. In the long format, each data is a one-time time by subject. You can understand data in wide form by that fact that columns usually design groups.

    100. How would you evaluate a logistic regression model?


     A subsection of the question above. You have to demonstrate an understanding of what the typical goals of a logistic regression are (classification, prediction, etc.) and bring up a few examples and use cases.

    Are you looking training with Right Jobs?

    Contact Us
    Get Training Quote for Free