35+ LATEST Data Science [ Python ] Interview Questions & Answers
Data Science with Python Interview Questions and Answers

35+ LATEST Data Science [ Python ] Interview Questions & Answers

Last updated on 04th Jul 2020, Blog, Interview Questions

About author

Sasikumar (Sr Project Manager )

He is a Proficient Technical Expert for Respective Industry Domain & Serving 8+ Years. Also, Dedicated to Imparts the Informative Knowledge's to Freshers. He Share's this Blogs for us.

(5.0) | 16547 Ratings 648

Python’s growing adoption in data science has pitched it as a competitor to R programming language. With its various libraries maturing over time to suit all data science needs, a lot of people are shifting towards Python from R. This might seem like the logical scenario. But R would still come out as the popular choice for data scientists. People are shifting towards Python but not as many as to disregard R altogether. We have highlighted the pros and cons of both these languages used in Data Science in our Python vs R article. It can be seen that many data scientists learn both languages Python and R to counter the limitations of either language. Being prepared with both languages will help in data science job interviews.

1) Name a few libraries in Python used for Data Analysis and Scientific computations.


NumPy, SciPy, Pandas, SciKit, Matplotlib, Seaborn

2)  Which library would you prefer for plotting in Python language: Seaborn or Matplotlib?


Matplotlib is the python library used for plotting but it needs lot of fine-tuning to ensure that the plots look shiny. Seaborn helps data scientists create statistically and aesthetically appealing meaningful plots. The answer to this question varies based on the requirements for plotting data.

3) Which method in pandas.tools.plotting is used to create scatter plot matrix?



4) How can you check if a data set or time series is Random?


To check whether a dataset is random or not use the lag plot. If the lag plot for the given dataset does not show any structure then it is random.

5) What are the possible ways to load an array from a text data file in Python? How can the efficiency of the code to load data file be improved?


   numpy.loadtxt ()

6) Which is the standard data missing marker used in Pandas?



7) Which Python library would you prefer to use for Data Munging?



8) Write the code to sort an array in NumPy by the nth column?


Using argsort () function this can be achieved. If there is an array X and you would like to sort the nth column then code for this will be x[x [: n-1].argsort ()]

9) Which python library is built on top of matplotlib and Pandas to ease data plotting?



10) Which plot will you use to access the uncertainty of a statistic?



11) What is pylab?


A package that combines NumPy, SciPy and Matplotlib into a single namespace.

12) Which python library is used for Machine Learning?



13) How can you copy objects in Python?


The functions used to copy objects in Python are-

  • Copy.copy () for shallow copy
  • Copy.deepcopy () for deep copy

However, it is not possible to copy all objects in Python using these functions.  For instance, dictionaries have a separate copy method whereas sequences in Python have to be copied by ‘Slicing’.

14) What is the difference between tuples and lists in Python?


Tuples can be used as keys for dictionaries i.e. they can be hashed. Lists are mutable whereas tuples are immutable – they cannot be changed. Tuples should be used when the order of elements in a sequence matters. For example, set of actions that need to be executed in sequence, geographic locations or list of points on a specific route.

15) What is PEP8?


PEP8 consists of coding guidelines for Python language so that programmers can write readable code making it easy to use for any other person, later on.

16) Is all the memory freed when Python exits?


No it is not, because the objects that are referenced from global namespaces of Python modules are not always de-allocated when Python exits.

17) What does _init_.py do?


_init_.py is an empty py file used for importing a module in a directory. _init_.py provides an easy way to organize the files. If there is a module maindir/subdir/module.py,_init_.py is placed in all the directories so that the module can be imported using the following command-import  maindir.subdir.module

18) What is the different between range () and xrange () functions in Python?


range () returns a list whereas xrange () returns an object that acts like an iterator for generating numbers on demand.

19) How can you randomize the items of a list in place in Python? 


Shuffle (lst) can be used for randomizing the items of a list in Python

20) What is a pass in Python?


Pass in Python signifies a no operation statement indicating that nothing is to be done.

    Subscribe For Free Demo

    21) If you are gives the first and last names of employees, which data type in Python will you use to store them?


    You can use a list that has first name and last name included in an element or use Dictionary.

    22) What happens when you execute the statement mango=banana in Python?


    A name error will occur when this statement is executed in Python.

    23) Optimize the below python code-

    • word = ‘word’
    • print word.__len__ ()


    • print ‘word’._len_ ()

    24) What is monkey patching in Python?


    Monkey patching is a technique that helps the programmer to modify or extend other code at runtime. Monkey patching comes handy in testing but it is not a good practice to use it in production environment as debugging the code could become difficult.

    25) Which tool in Python will you use to find bugs if any?


    Pylint and Pychecker. Pylint verifies that a module satisfies all the coding standards or not. Pychecker is a static analysis tool that helps find out bugs in the course code.

    26) How are arguments passed in Python- by reference or by value?


    The answer to this question is neither of these because passing semantics in Python are completely different. In all cases, Python passes arguments by value where all values are references to objects.

    27) You are given a list of N numbers. Create a single list comprehension in Python to create a new list that contains only those values which have even numbers from elements of the list at even indices. For instance if list[4] has an even value the it has be included in the new output list because it has an even index but if list[5] has an even value it should not be included in the list because it is not at an even index.


    •  [x for x in list [1::2] if x%2 == 0]

    The above code will take all the numbers present at even indices and then discard the odd numbers.

    28) Explain the usage of decorators.


    Decorators in Python are used to modify or inject code in functions or classes. Using decorators, you can wrap a class or function method call so that a piece of code can be executed before or after the execution of the original code. Decorators can be used to check for permissions, modify or track the arguments passed to a method, logging the calls to a specific method, etc.

    29) How can you check whether a pandas data frame is empty or not?


    The attribute df.empty is used to check whether a data frame is empty or not.

    30) What will be the output of the below Python code

    • def multipliers ():
    • return [lambda x: i * x for i in range (4)]
    • print [m (2) for m in multipliers ()]


    The output for the above code will be [6, 6,6,6]. The reason for this is that because of late binding the value of the variable i is looked up when any of the functions returned by multipliers are called.

    Course Curriculum

    Get On-Demand Data Science with Python Training & Certification Course

    Weekday / Weekend BatchesSee Batch Details

    31) What do you mean by list comprehension?


    The process of creating a list while performing some operation on the data so that it can be accessed using an iterator is referred to as List Comprehension.


    • [ord (j) for j in string.ascii_uppercase]
      [65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]

    32) What will be the output of the below code

    • word = ‘aeioubcdfg’
    • print word [:3] + word [3:]


    The output for the above code will be: ‘aeioubcdfg’.

    In string slicing when the indices of both the slices collide and a “+” operator is applied on the string it concatenates them.

    33)   list= [‘a’,’e’,’i’,’o’,’u’]

    • print list [8:]


    The output for the above code will be an empty list []. Most of the people might confuse the answer with an index error because the code is attempting to access a member in the list whose index exceeds the total number of members in the list. The reason being the code is trying to access the slice of a list at a starting index which is greater than the number of members in the list.

    34) Can the lambda forms in Python contain statements?


    No, as their syntax is restrcited to single expressions and they are used for creating function objects which are returned at runtime.

    35) Why Do Data Scientists Need Python?


    Data science goes beyond simple data analysis and requires that you be able to work with more advanced tools. Thus, if you work with big data and need to perform complex computations or create aesthetically pleasing and interactive plots, Python is one of the most efficient solutions out there.

    Python’s readability and simple syntax make it relatively easy to learn, even for non-programmers. Moreover, Python has plenty of data analysis libraries that make your work easier.

    36) What are the data types used in Python?


    Python has the following built-in data types:

    • Number (float, integer)
    • String
    • Tuple
    • List
    • Set
    • Dictionary

    Numbers, strings, and tuples are immutable data types, meaning they cannot be modified during runtime. Lists, sets, and dictionaries are mutable, which means they can be modified during runtime.

    37) Explain the difference between lists and tuples.


    Both lists and tuples are made up of elements, which are values of any Python data type. However, these data types have a number of differences:

    • Lists are mutable, while tuples are immutable.
    • Lists are created with square brackets (e.g., my_list = [a, b, c]), while tuples are enclosed in parentheses (e.g., my_tuple = (a, b, c)).
    • Lists are slower than tuples.

    38) What is a Python dictionary?


    A dictionary is one of the built-in data types in Python. It defines an unordered mapping of unique keys to values. Dictionaries are indexed by keys, and the values can be any valid Python data type (even a user-defined class). Notably, dictionaries are mutable, which means they can be modified. A dictionary is created with curly braces and indexed using the square bracket notation.

    Here’s an example:

    • my_dict = {‘name’: ‘Hugh Jackman’, ‘age’: 50, ‘films’: [‘Logan’, ‘Deadpool 2’, ‘The Front Runner’]}
    • my_dict[‘age’]

    Here, the keys include name, age, and films. As you can see, the corresponding values can be of different data types, including numbers, strings, and lists. Notice how the value 50 is accessed via the corresponding key age.

    39) What are lambda functions?


    Lambda functions are anonymous functions in Python. They’re very helpful when you need to define a function that’s very short and consists of only one expression. So, instead of formally defining the small function with a specific name, body, and return statement, you can write everything in one short line of code using a lambda function.

    Here’s an example of how lambda functions are defined and used:

    • (lambda x, y, z: (x+y) ** z)(3,2,2)

    In this example, we’ve defined an anonymous function that has three arguments and takes the sum of the first two arguments (x and y) to the power of the third argument (z). As you can see, the syntax of a lambda function is much more concise than that of a standard function.

    40) Explain list comprehensions and how they’re used in Python.


    List comprehensions provide a concise way to create lists.

    A list is traditionally created using square brackets. But with a list comprehension, these brackets contain an expression followed by a for clause and then if clauses, when necessary. Evaluating the given expression in the context of these for and if clauses produces a list.

    It’s best explained with an example:

    • old_list = [1, 0, -2, 4, -3]
    • new_list = [x**2 for x in old_list if x > 0]
    • print(new_list)


    Here, we’re creating a new list by taking the elements of the old list to the power of 2, but only for the elements that are strictly positive. The list comprehension allows us to solve this task in just one line of code.

    41) What is a negative index, and how is it used in Python?

    A negative index is used in Python to index a list, string, or any other container class in reverse order (from the end). Thus, [-1] refers to the last element, [-2] refers to the second-to-last element, and so on.

    Here are two examples:

    • list = [2, 5, 4, 7, 5, 6, 9]
    • print (list[-1])


    • text = “I love data science”
    • print (text[-3])


    42) Name some well-known Python data analysis libraries.


    If you’re doing data analysis with Python, you’re likely going to use:

    • NumPy
    • Pandas
    • Matplotlib
    • Seaborn
    • SciKit

    These libraries will help you work with arrays and DataFrames, build professional-looking plots, and run machine learning models.

    43) What is pandas?


    Pandas is a Python open-source library that provides high-performance and flexible data structures and data analysis tools that make working with relational or labeled data both easy and intuitive.

    44) Write Python code to create an employees DataFrame from the “HR.csv” file.


    We can create a DataFrame from a CSV file by using the read_csv() function from the pandas library. By convention, pandas is imported as pd. So the code is simply the following:

    • import pandas as pd
    • emloyees = pd.read_csv(‘HR.csv’)

    45) What is the default missing value marker in pandas, and how can you detect all missing values in a DataFrame?


    In pandas, the default missing value marker is NaN.

    You can detect all missing values in a DataFrame by using the isna() function from the pandas library:



    To explore the composition of a column, including how many missing values it has, use the info() function; it shows the number of non-null entries in each column together with the total number of entries in that column (the difference, of course, is the number of missing values!).

    • emploees.info()

    RangeIndex: 100 entries, 0 to 99

    Data columns (totas 3 columns):

    ID 100 non-null int64

    Department 92 non-null object

    Age 94 non-null float64

    dtypes: float64(1), int64(1), object(1)

    memory usage: 2.4+ KB

    46) Write Python code to select the Department and Age columns from the employees DataFrame.


    You can select multiple columns from a DataFrame by passing in their names to double square brackets:

    employees[[‘Department’, ‘Age’]]


    47) Add a new column named Active to the employees DataFrame with all 1s to indicate each employee’s active employment status.


    You can create a new column with a particular value for all entries by simply assigning this value to the whole column:

    • employees[‘Active’] = 1
    • employees.head()

    Data Science Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

    48) Write Python code to plot the distribution of employees by age.


    To plot the distribution of employees by age, you’ll simply need to create a histogram from the Age column of the employees DataFrame. This can be done by calling the hist() function on the selected column:

    • employees[‘Age’].hist()

    49) What libraries do data scientists use to plot data in Python?


    Matplotlib is the main library used for plotting data in Python. However, the plots created with this library need lots of fine-tuning to look shiny and professional. For that reason, many data scientists prefer Seaborn, which allows you to create appealing and meaningful plots with only one line of code.

    50) What Python IDEs are the most popular in data science?


    Granted, you won’t be asked this exact question, but it’s likely that your prospective employer will still want to get a better sense of how you work with Python and what exposure you have to the language.

    51) What is the difference between a list and a tuple?


    • Lists are mutable. They can be modified after creation.
    • Tuples are immutable. Once a tuple is created it cannot by changed
    • Lists have order. They are an ordered sequences, typically of the same type of object. Ie: all user names ordered by creation date, [“Seth”, “Ema”, “Eli”]
    • Tuples have structure. Different data types may exist at each index. Ie: a database record in memory, (2, “Ema”, “2020–04–16”) # id, name, created_at

    52) How is string interpolation performed?


    Without importing the Template class, there are 3 ways to interpolate strings.

    • name = ‘Chris’
    • # 1. f strings
    • print(f’Hello {name}’)
    • # 2. % operator
    • print(‘Hey %s %s’ % (name, name))
    • # 3. format
    • print(“My name is {}”.format((name)))

    Without importing the Template class, there are 3 ways to interpolate strings.

    Are you looking training with Right Jobs?

    Contact Us
    Get Training Quote for Free