# READ BEST Deep Learning Interview Questions & Answers

Last updated on 04th Jul 2020, Blog, Interview Questions

These Deep Learning Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of Deep Learning . As per my experience good interviewers hardly plan to ask any particular question during your interview, normally questions start with some basic concept of the subject and later they continue based on further discussion and what you answer.we are going to cover top 100 Deep Learning Interview questions along with their detailed answers. We will be covering Deep Learning scenario based interview questions, Deep Learning interview questions for freshers as well as Deep Learning interview questions and answers for experienced.

**Q1) What is meant by deep learning?**

**Ans:**

Deep learning is one part of a broader group of machine learning techniques based on learning data analytics designs, as exposed through task-specific algorithms. Deep Learning can be supervised by semi-supervised or unsupervised.

**Q2) Which data visualization libraries do you use and why they are useful?**

**Ans:**

It is valuable to determine your views value on the data value properly visualization and your individual preferences when one comes to tools. Popular methods add R’s ggplot, Python’s seaborn including matplotlib value, and media such as Plot.ly and Tableau.

**Q3) Where do you regularly source data-sets?**

**Ans:**

This type of question remains any real tie-breakers. If someone exists going into an interview, he/she needs to remember this drill of any related question. That completely explains your interest in Machine Learning.

**Q4) What is the cost function?**

**Ans:**

A cost function is a strength of the efficiency of the neural network data-set value with respect to given sample value and expected output data-set. It is a single value of data-set-function, non-vector as it gives the appearance of the neural network as a whole. MSE=1nΣi=0n(Y^i–Yi)^2

**Q5) What are the benefits of mini-batch gradient descent?**

**Ans:**

- This is more efficient compared tools to stochastic gradient reduction.
- The generalization data value by determining the flat minima.
- The Mini-batches provide help to approximate the gradient of this entire data-set advantage which helps us to neglect local minima.

**Q6) What is meant by gradient descent?**

**Ans:**

Gradient descent defined as an essential optimization algorithm value point, which is managed to get the value of parameters that reduces the cost function. It is an iterative algorithm data value function which moves towards the direction of steepest data value function relationship as described by the form of the gradient.

**Q7) What is meant by backpropagation?**

**Ans:**

- It ‘s Forward to the propagation of data-set value function in order to display the output data value function.
- Then using objective value also output value error derivative package is computed including respect to output activation.
- Then we propagate to computing the derivative of the error with regard to output activation value function and the previous and continue data value function for all the hidden layers.
- Using previously calculated the data-set value and its derivatives for output including any hidden stories we estimate error derivatives including respect to weights.

**Q8) What does “ convex hull?**

**Ans:**

The convex hull is represented by the outer boundaries of the two-level group of the data point. Once is the convex hull has to been created the data-set value, we get maximum data-set value level of margin hyperplane (MMH), which attempts to create data set value the greatest departure between two groups data set value, as a vertical bisector between two convex hulls data set value.

**Q9) Do you have experience including Spark about big data tools for machine learning?**

**Ans:**

The Spark and big data mean most favorite demand now, able to handle high-level data-sets value and including speed. Be true if you don’t should experience including those tools needed, but more take a look into assignment descriptions and also understand methods pop.

**Q10) How will I handle the missing data?**

**Ans:**

One can find out the missing data and then a data-set value either drop through those rows value or columns value or decide value to restore them with another value. In the python library using Pandas, there are two helpful functions, IsNull() and drop() the value function.

**Q11) What does auto-encoder mean?**

**Ans:**

An Auto-encoder does an autonomous Machine learning algorithm data that uses backpropagation system, where that target large values are data-set to be similar to the inputs provided data-set value. Internally, it converts a deep layer that describes a code used to represent specific input.

**Q12) Explain about Machine Learning in industry.**

**Ans:**

Robots are replacing individuals in various areas. It is because robots are added so that all can perform this task based on the data-set value function they find from sensors. They see from this data also behaves intelligently.

**Q13) What are the difference Algorithm techniques in Machine Learning?**

**Ans:**

- Reinforcement Learning
- Supervised Learning
- Unsupervised Learning
- Semi-supervised Learning
- Transduction
- Learning to Learn

**Q14) Difference between supervised and unsupervised machine learning?**

**Ans:**

Supervised learning is a method anywhere that requires instruction defined data While Unsupervised learning it doesn’t need data labeling.

**Q15) What is the advantage of Naive Bayes?**

**Ans:**

- The classifier preference converge active than discriminative types
- It cannot learn that exchanges between characteristics

**Q16) What are the functions using Supervised Learning?**

**Ans:**

- Classifications
- Speech recognition
- Regression
- Predict time series
- Annotate strings

**Q17) What are the functions using Unsupervised Learning?**

**Ans:**

- To Find that the data of the cluster of the data
- To Find the low-dimensional representations value of the data
- To Find determine interesting with directions in data
- To Find the Magnetic coordinates including correlations
- To Find novel observations

**Q18) How do you understand Machine Learning Concepts?**

**Ans:**

Machine learning is the use of artificial intelligence that provides operations that ability to automatically detect further improvement from occurrence without doing explicitly entered. Machine learning centers on the evolution of network programs that can access data and utilize it to learn for themselves.

**Q19) What are the roles of activation functions?**

**Ans:**

The activation function means related to data enter non-linearity within the neural network helping it to learn more system function. Without which that neural network data value would be simply able to get a linear function which is a direct organization of its input data.

**Q20) Definition of Boltzmann Machine?**

**Ans:**

Boltzmann Machine is used to optimize the resolution of a problem. The work of the Boltzmann machine is essential to optimize data-set value that weights and the quantity for data Value.

- It uses a recurrent structure data value.
- If we apply affected annealing on the discrete Hopfield network, when it would display Boltzmann Machine.

**Q21) What is Overfitting in Machine Learning?**

**Ans:**

Overfitting in Machine Learning is described as during a statistical data model represents random value error or noise preferably of any underlying relationship or when a pattern is extremely complex.

**Q22) How can you avoid overfitting?**

**Ans:**

- Lots of data
- Cross-validation

**Q23) What are the conditions when Overfitting happens?**

**Ans:**

One of the important design and chance of overfitting is because the models used as training that model is the same as that criterion used to assess the efficacy of a model.

**Q24) What are the advantages of decision trees?**

**Ans:**

- The Decision trees are easy to interpret
- Nonparametric
- There are comparatively few parameters to tune

**Q25) What are the three stages to build hypotheses or models in machine learning?**

**Ans:**

- Model building
- Model testing
- Applying the model

**Q26) What are parametric models and Non-Parametric models?**

**Ans:**

- Parametric models remain these with a limited number of parameters also to predict new data, you only need to understand those parameters from the model.
- Non Parametric designs are those with an unlimited number of parameters, allowing flexibility and to predict new data, you want to understand the parameters of this model also the state from the data that has been observed.

**Q27) What are some different cases of machine learning algorithms that can be used?**

**Ans:**

- Fraud Detection
- Face detection
- Natural language processing
- Market Segmentation
- Text Categorization
- Bioinformatics

**Q28) What are the popular algorithms for Machine Learning?**

**Ans:**

- Decision Trees
- Probabilistic networks
- Nearest Neighbor
- Support vector machines
- Neural Networks

**Q29) Define univariate multivariate and bivariate analysis?**

**Ans:**

if an analysis involves only one variable it is called as a univariate analysis for eg: Pie chart, Histogram etc. If an analysis involves 2 variables it is called a bivariate analysis for example to see how age vs population is varying we can plot a scatter plot. A multivariate analysis involves more than two variables, for example in regression analysis we see the effect of variables on the response variable

**Q30) How does missing value imputation lead to selection bias?**

**Ans:**

Case treatment- Deleting the entire row for one missing value in a specific column, Imputation by mean: distribution might get biased for instance std dev, regression, correlation.

**Q31) what is bootstrap sampling?**

**Ans:**

create resampled data from empirical data known as bootstrap replicates.

**Q32) What is permutation sampling?**

**Ans:**

Also known as randomization tests, the process of testing a statistic based on reshuffling the data labels to see the difference between two samples.

**Q33) what is the total sum of squares?**

**Ans:**

summation of squares of difference of individual points from the population mean.

**Q34) what is the sum of squares within?**

**Ans:**

summation of squares of difference of individual points from the group mean.

**Q35) what is the sum of squares between?**

**Ans:**

summation of squares of difference of individual group means from the population mean for each data point.

**Q36) what is p value?**

**Ans:**

p value is the worst case probability of a statistic under the assumption of null hypothesis being true.

**Q37) what is R^2 value?**

**Ans:**

It measures the goodness of fit for a linear regression model.

**Q38) what does it mean to have a high R^2 value?**

**Ans:**

the statistic measures the variance percentage in dependent variables that can be explained by the independent variables together.

**Q40) what are residuals in a regression model?**

**Ans:**

Residuals in a regression model is the difference between the actual observation and its distance from the predicted value from a regression model.

**Q41) what are fitted values, calculate fitted value for Y=7X+8, when X =5?**

**Ans:**

Response of the model when predictors values are used in the model, Ans=42.

**Q42) what pattern should residual vs fitted plots show in a regression analysis?**

**Ans:**

No pattern, if the plot shows a pattern regression coefficients cannot be trusted.

**Q43) what is overfitting and underfitting?**

**Ans:**

overfitting occurs when a model is excessively complex and cannot generalize well, an overfitted model has a poor predictive performance. Underfitting of a model occurs when the model is not able to capture any trends from the data.

**Q44) define precision and recall?**

**Ans:**

Recall = True Positives/(True Positives + False Negatives), Precision = True Positives/(True Positives + False Positive).

**Q45) What are type 1 and type 2 errors?**

**Ans:**

False positives are termed as Type 1 error, False negative are termed as Type 2 error.

**Q46) what is ensemble learning?**

**Ans:**

The art of combining multiple learning algorithms and achieve a model with a higher predictive power, for example bagging, boosting.

**Q47) what is the difference between supervised and unsupervised machine learning algorithms?**

**Ans:**

In supervised learning we use the dataset which is labelled and try to learn from that data, unsupervised modeling involves data which is not labelled.

**Q48) What is named entity recognition?**

**Ans:**

It is identifying, understanding textual data to answer certain questions like “who, when,where,what etc.”

**Q49) what is tf-idf?**

**Ans:**

It is the measure of a weight of a term in text data used majorly in text mining. It signifies how important a word is to a document.

- tf -> term frequency – (Count of text appearing in the data)
- idf -> inverse document frequency
- tf idf -> tf * idf

**Q50) what is the difference between regression and deep neural networks, is regression better than neural networks?**

**Ans:**

In some applications neural networks would fit better than regression it usually happens when there are non linearities involved, on the contrary a linear regression model would have less parameters to estimate than a neural network for the same set of input variables. thus for optimization neural networks would need more data in order to get better generalization and nonlinear association.

**Q51) How are node values calculated in a feed forward neural network?**

**Ans:**

The weights are multiplied with node/input values and are summed up to generate the next successive node

**Q52) Name two activation functions used in deep neural networks?**

**Ans:**

Sigmoid, softmax, relu, leaky relu, tanh.

**Q53) what is the use of activation functions in neural networks?**

**Ans:**

Activation functions are used to explain the nonlinearity present in the data.

**Q54) How are the weights calculated which determine interactions in neural networks?**

**Ans:**

The training model sets weights to optimize predictive accuracy.

**Q55) which layer in a deep learning model would capture a more complex or higher order interaction?**

**Ans:**

The last layer.

**Q56) what is gradient descent?**

**Ans:**

It comprises minimizing a loss function to find the optimal weights for a neural network.

**Q57) Imagine a loss function vs weights plot depicting a gradient descent. At what point of the curve would we achieve optimal weights?**

**Ans:**

local minima.

**Q58) How does slope of tangent to the curve of loss function vs weights help us in getting optimal weights for a neural network**

**Ans:**

Slope of a curve at any point will give us the direction component which would help us decide which direction we would want to go i.e what weights to consider to achieve a less magnitude for loss function.

**Q59) what is the learning rate in gradient descent?**

**Ans:**

A value depicting how slowly we should move towards achieving optimal weights, weights are changed by subtracting the value obtained from the product of learning rate and slope.

**Q60) If in backward propagation you have gone through 9 iterations of calculating slopes and updated the weights simultaneously, how many times you must have done forward propagation?**

**Ans:**

**Q61) How does ReLU activation function work? Define its value for -5 and +7**

**Ans:**

For all x>=0, the output is x, for all x<0 the output is 0, for -5 the output is 0 and +7 returns +7 when ReLu activation function is used.

**Q62) what is a batch in deep neural networks?**

**Ans:**

It is common that we calculate slopes only on a subset of the data known as batch for computational efficiencies.

**Q63) what is an epoch in Deep neural networks?**

**Ans:**

when an entire dataset is done with both forward and backward propagation after which the weights are updated it is said to have passed 1 epoch.

**Q64) Imagine you have 2000 training samples and batch size is set to 200 how many iterations will it take to complete 1 epoch?**

**Ans:**

**Q65) what is stochastic gradient descent?**

**Ans:**

when slopes are calculated on one batch at a time it is referred to as stochastic gradient descent.

**Q66) What is data normalization?**

**Ans:**

Data normalization is a technique used to scale all values in a dataset to fit with a specific range. It is important for achieving good converge for deep learning models.

**Q67) How can we normalize the data? State a method used for the same?**

**Ans:**

(Feature mean – observation)/standard deviation.

**Q68) Explain dying neuron problem?**

**Ans:**

Occurs when a neuron takes a value <0 for all rows of the data thus with ReLU activation function a will produce an output of 0 and thus the slope will be 0.

**Q69) explain vanishing gradients?**

**Ans:**

Occurs when many layers have small slopes for example we use a Tanh activation function in a deep network.

**Q70) what is model capacity?**

**Ans:**

It describes how complex a model can get also in deep neural networks it is proportional to the number of hidden layers included.

**Deep Learning Sample Resumes! Download & Edit, Get Noticed by Top Employers!**Download

**Q71) What is regularization?**

**Ans:**

The process of including models by adjusting data tuning parameters.

**Q72) what are hyper parameters in deep neural networks?**

**Ans:**

Hyper parameters are features which describe the network structure in a neural network. Some hyper parameters also decide how the model should be trained for achieving optimum results.

**Q73) what is dropout in deep neural networks?**

**Ans:**

To avoid over fitting a dropout regularization technique is used which increases the generalizing power

**Q74) Explain exploding gradient descent?**

**Ans:**

Huge error gradients when are added together during training which results into high value updates to the weights.

**Q75) What are auto encodes?**

**Ans:**

A neural network architecture in which back propagation occurs and targets are set equal to inputs.

**Q76) what is homoscedasticity and heteroscedasticity?**

**Ans:**

when there is an equal distribution of errors it is termed as homoscedasticity, on the contrary when there is an unequal distribution of errors it is termed as heteroscedasticity.

**Q77) Difference between adjusted R^2 and R^2**

**Ans:**

R^2 accounts for the variation of dependent variables explained by the independent variables, adjusted R^2 value just takes into account the variation explained by all the significant variables.

**Q78) In case residual vs fitted plots are showing a pattern and are not distributed evenly or have some outliers how should it be handled?**

**Ans:**

In such a case variable transformation should be tried, for example log, x^2,x^3 etc.

**Q79) Difference between collinearity and correlation?**

**Ans:**

correlation is the measure of strength of linear r00elationship between two variables, whereas if in a linear regression one of the predictors is derived or is associated with another predictor they both are said to be collinear.