# 35+ Best [ R ] Interview Questions & Answers [FREQUENTLY ASK]

Last updated on 17th Aug 2020, Blog, Interview Questions

Here is a list of *Top 50 R Interview Questions and Answers* you must prepare. This blog covers all the important questions which can be asked in your interview on R. These R interview questions will give you an edge in the burgeoning analytics market where global and local enterprises, big or small, are looking for professionals with certified expertise in R.

R is a programming language which can be as useful as you want it to be. It’s a tool at your disposal which can be used for multiple purposes such as statistical analysis, data visualization, data manipulation, predictive modelling, forecast analysis and the list goes on. R is used by the top companies such as Google, Facebook and Twitter.

**1.What are the different data structures in R? Briefly explain about them.**

**Ans:**

Broadly speaking these are Data Structures available in R:

Data Structure | Description |
---|---|

Vector | A vector is a sequence of data elements of the same basic type. Members in a vector are called components. |

List | Lists are the R objects which contain elements of different types like − numbers, strings, vectors or another list inside it. |

Matrix | A matrix is a two-dimensional data structure. Matrices are used to bind vectors from the same length. All the elements of a matrix must be of the same type (numeric, logical, character, complex). |

Dataframe | A data frame is more generic than a matrix, i.e different columns can have different data types (numeric, character, logical, etc). It combines features of matrices and lists like a rectangular list. |

**2.What is the use of the lm() function?**

**Ans:**

‘lm’ stands for a linear model. In R lm() function is used to create regression models. The two most important arguments given to the lm() function are formula and data. The formula defines the regression model and data is the dataset on which the regression is to be conducted.

**3)Give an example usage of tapply() method?**

**Ans:**

Consider two ordered vectors

1. students distributed across various schools (s1 is the school of the first student, s2 is the school of the second student, etc)

- > students <- c(“s1″,”s2″,”s1″,”s3″,”s3″,”s2”)

2. Percentage of each student’s marks

- > marks <- c(80,90,75,67,96,67)
- > means <- tapply(marks,students,mean)
- > means
- s1 s2 s3
- 77.5 78.5 81.5

The function tapply() applies a function ‘mean()’ to first argument ‘marks’, which is grouped by second argument ‘students’

**4) How to modify and construct lists? Show with an example?**

**Ans:**

Lists Construction:

- > Lst <- list(name=”Jack”, age=23, no.cars=3, cars.names = c(“Wagon”, “Bumper”, “Jazz”))

List Modification:

- > Lst$cars.names[1] <- “WagonR” OR > Lst[[4]][1] <- “WagonR”

**5) Explain what is R?**

**Ans:**

R is data analysis software which is used by analysts, quants, statisticians, data scientists and others.

**6) List out some of the function that R provides?**

**Ans:**

The function that R provides are

- Mean
- Median
- Distribution
- Covariance
- Regression
- Non-linear
- Mixed Effects
- GLM
- GAM. etc.

**7) Explain how you can start the R commander GUI?**

**Ans:**

Typing the command, (“Rcmdr”) into the R console starts the R commander GUI.

**8) In R how you can import Data?**

**Ans:**

You use R commander to import Data in R, and there are three ways through which you can enter data into it

- You can enter data directly via Data New Data Set
- Import data from a plain text (ASCII) or other files (SPSS, Minitab, etc.)
- Read a data set either by typing the name of the data set or selecting the data set in the dialog box

**9) Mention what does not ‘R’ language do? **

**Ans:**

- Though R programming can easily connects to DBMS is not a database
- R does not consist of any graphical user interface
- Though it connects to Excel/Microsoft Office easily, R language does not provide any spreadsheet view of data

**10) Explain how R commands are written?**

**Ans:**

In R, anywhere in the program you have to preface the line of code with a #sign, for example

- # subtraction
- # division
- # note order of operations exists

**11) How much will you rate your self in R?**

**Ans:**

When you attend an interview, Interviewer may ask you to rate your self in specific Technology like R, So It’s depend on your knowledge and work experience in R.

**12) What challenges did you face while working on R?**

**Ans:**

This question may be specific to your technology and completely depends on your past work experience. So you need to just explain the challenges you faced related to R in your Project.

**13) What was your role in last Project related to R?**

**Ans:**

It’s based on your role and responsibilities assigned to you and what functionality you implemented using R in your project. This question is generally asked in every interview.

**14) How much experience do you have in R?**

**Ans:**

Here you can tell about your overall work experience on R.

**15) Have you done any R Certification or Training?**

**Ans:**

It’s depend on candidate like you have done any R training or certification. Certifications or trainings are not essential but good to have.

**16) When should you apply “next” statement in R?**

**Ans:**

A data scientist will use next to skip an iteration in a loop. As an example:

Q.2 W> val1 <- 1:30

- for(val in val1){
- if(val == 25){
- next
- }
- print(val)
- }

**17) This piece of code will iterate through the numbers from 1 to 30. It will skip 25 as we have used the next statement to skip the iteration from which we move on to the next value. We will obtain the output from 1-24 and 26-30.hat is the value of equation1(3) for the following R code? **

**Ans:**

- > num <- 4
- > equation1 <- function (val)
- + {
- + num <- 3
- + num^3 + g (val)
- + }
- > equation2 <- function (val)
- + {
- + val*num
- + }
- }

**18) Which method is used for exporting the data in R?**

**Ans:**

There are many ways to export the data into another formats like SPSS, SAS , Stata , Excel Spreadsheet.

**19) Can you write and explain some of the most common syntax in R?**

**Ans:**

Again, this is an easy—but crucial—one to nail. For the most part, this can be demonstrated through any other code you might write for other R interview questions, but sometimes this is asked as a standalone. Some of the basic syntax for R that’s used most often might include:

- # — as in many other languages, # can be used to introduce a line of comments. This tells the compiler not to process the line, so it can be used to make code more readable by reminding future inspectors what blocks of code are intended to do.

- “” quotes operate as one might expect; they denote a string data type in R.

- <- — one of the quirks of R, the assignment operator is <- rather than the relatively more familiar use of =. This is an essential thing for those using R to know, so it would be good to display your knowledge of it if the question comes up.

- — the backslash, or reverse virgule, is the escape character in R. An escape character is used to “escape” (or ignore) the special meaning of certain characters in R and, instead, treat them literally.

**20) How do you list the preloaded datasets in R?**

**Ans:**

I’m running R version 3.5.1 for Windows on my machine with a standard, default install. The output in RStudio for me looks something like this:

**21)What are some advantages of R?**

**Ans:**

It’s important to be familiar with the advantages and disadvantages of certain languages and ecosystems. R is no exception. So what are the advantages of R?

Its open-source nature. This qualifies as both an advantage and disadvantage for various reasons, but being open source means it’s widely accessible, free to use, and extensible.

Its package ecosystem. The built-in functionality available via R packages means you don’t have to spend a ton of time reinventing the wheel as a data scientist.

Its graphical and statistical aptitude. By many people’s accounts, R’s graphing capabilities are unmatched.

**22) What are the disadvantages of R?**

**Ans:**

Just as you should know what R does well, you should understand its failings.

Memory and performance. In comparison to Python, R is often said to be the lesser language in terms of memory and performance. This is disputable, and many think it’s no longer relevant as 64-bit systems dominate the marketplace.

Related: Our list of Python Interview Questions and Answers

Open source. Being open source has its disadvantages as well as its advantages. For one, there’s no governing body managing R, so there’s no single source for support or quality control. This also means that sometimes the packages developed for R are not the highest quality.

Security. R was not built with security in mind, so it must rely on external resources to mind these gaps.

**23) Compare R & Python**

**Ans:**

R programming Language | Python programming language |
---|---|

Model Building is similar to Python | Model Building is similar to R. |

Model Interpretability is good | Model Interpretability is not good |

Production is not better than Python. | Production is good |

R has good community support over Python. | Community Support is not better than R |

Data Science Libraries are same as Python. | Data Science Libraries are same as R. |

R has good data visualizations libraries and tools | Data visualization is not better than R |

R has a steep learning curve. | Learning Curve in Python is easier than learning R. |

**24) Write code to accomplish a task**

**Ans:**

**25) When is it appropriate to use the “next” statement in R?**

**Ans:**

This code iterates through a range of numbers from 1 to 20 and prints the values. I don’t want to print 15, though, so I’ve used the next statement to skip that iteration and move on to other values. The output would print 1-14 and 16-20.

**26) How do you assign a variable in R?**

**Ans:**

- myVar <- 15
- print(myVar
- Notice the following is also valid:
- myVar = 15
- print(myVar)

You can also assign variables the other way around in R.

- “helloWorld” -> myVar

The string “helloWorld” is now stored in the myVar variable. Note that this would produce an error if we got mixed up and tried to plug an undefined variable object into a string, as in:

- myVar -> “helloWorld”

Error: object ‘myVar’ not found

Scoping of variables is something to consider as well. <<- acts as the “superassignment” operator and is useful for closures. A good example on how to use this can be found on stackoverflow.

**27) What are the different data types/objects in R?**

**Ans:**

*know*R, and you’re not winging it. Unlike other object-oriented languages such as C, R doesn’t ask users to declare a data type when assigning a variable. Instead, everything in R correlates to an R data object. When you assign a variable in R, you assign it a

**data object and that object’s**data type determines the data type of the variable. The most commonly used data objects include:

- Vectors
- Matrices
- Lists
- Arrays
- Factors
- Data frames

**28) What are the objects you use most frequently?**

**Ans:**

This question is meant to gather a sense of your experiences in R. Simply think about some recent work you’ve done in R and explain the data objects you use most often. If you use arrays frequently, explain why and how you’ve used them.

**29) Why use R?**

**Ans:**

This is a variant of the “advantages of R” question. Reasons to use R include its open-source nature and the fact that it’s a versatile tool for statistical plotting, analysis, and portrayal. Don’t be afraid to give some personal reasons as well. Maybe you simply love the assignment operator in R or feel that it’s more elegant than other languages—but always remember to explicate. You should be answering follow-up questions before they’re even asked.

**30) What are some of your favorite functions in R?**

**Ans:**

As a user of R, you should be able to come up with some functions on the spot and describe them. Functions that save time and, as a result, money will always be something an interviewer likes to hear about.

**31) Write a custom function in R**

**Ans:**Sometimes you’ll be asked to create a custom function on the fly. An example of a custom function from quick-r’s guide:

- myFunction <- function(arg1, arg2, … ){
- statements
- return(object)
- }

Functions can be simple or complex, but they should make your code more extensible, readable, and efficient. This is a chance to show your ingenuity and experience.

**32) How do you import data in R?**

**Ans:**

Let’s use CSV as an example, as it’s a very common data format. Simply make sure the file is saved in a CSV format, then use the read function to import the data.

yourRDateHere <- read.csv(“Data.csv”, header = TRUE)

Though not required, strictly speaking, the argument header = TRUE is used to ensure that labels are not parsed as data. Making this argument false means you either don’t have labels in your CSV or you want them to be part of the data output in R.

To do the same with data in a .txt file, simply change read.csv to read.table.

**33) How do you install a package in R?**

**Ans:**

There are many ways to install a package in R. Some even include using the GUI. We’re coders, so we’re not going to give those attention.

Type the following into your console and hit enter:

- install.packages(“package_name”)

Followed by:

- library(package_name)

It’s that simple. The first command installs the package and the second loads the package into the session.

**34)How you can produce co-relations and covariances?**

**Ans:**

Cor-relations is produced by cor() and covariances is produced by cov() function.

**35) What is difference between matrix and dataframes?**

**Ans:**

Dataframe can contain different type of data but matrix can contain only similar type of data.

**36) What is difference between lapply and sapply?**

**Ans:**

lapply is used to show the output in the form of list whereas sapply is used to show the output in the form of vector or data frame

**37) What is the difference between seq(4) and seq_along(4)?**

**Ans:**

Seq(4) means vector from 1 to 4 (c(1,2,3,4)) whereas seq_along(4) means a vector of the length(4) or 1(c(1)).

**38)What is the use of with() in R?**

**Ans:**

We use the with() function to write simpler code by applying an expression to a data set. Its syntax looks like this:

with(randomDataSet, expression.test(sample))

**39)What is the use of by() in R?**

**Ans:**

Like with(), by() can help write DRY (don’t repeat yourself) code.

You can use by() to apply a function to a data frame split by factors. Its usage is something like this:

by(data, factor, function, …)

The data frame plugged into this function is split into data frames (by row) subsetted by the values of factor(s), and a function is then applied to each subset.

A good example of this function in action can be found here:

**40) What is a factor variable, and why would you use one?**

**Ans:**

A factor variable is a form of categorical variable that accepts either numeric or character string values. The most salient reason to use a factor variable is that it can be used in statistical modeling with great accuracy. Another reason is that they are more memory efficient.

Simply use the factor() function to create a factor variable. There’s more information here.

**41)When is it appropriate to use the which() function?**

**Ans:**

The which() function loops through a logical object until the condition returns TRUE and returns the index (position) of the element.

To get a sense of how this works, plug in the letters array and search for the index of a specific letter using which().

In my console, I’ve checked the letters array, which contains the English alphabet in lowercase. I’ve used which() to find the positions of a, z, and m, which returned the indexes 1, 26, and 13, respectively, because these are the positions in the array, as they are typically the positions in the alphabet.

**42) How do you read a CSV file in R?**

**Ans:**

We’ve covered this already with the import process. Simply use the read.csv() function.

yourRDateHere <- read.csv(“Data.csv”, header = TRUE)

**43) What are 3 sorting algorithms available in R?**

**Ans:**

R uses the sort() function to order a vector or factor, listed and described below.

**Radix**: Usually the most performant algorithm, this is a non-comparative sorting algorithm that avoids overhead. It’s stable, and it’s the default algorithm for integer vectors and factors.

**Quick Sort:** This method “uses Singleton (1969)’s implementation of Hoare’s Quicksort method and is only available when x is numeric (double or integer) and partial is NULL,” according to R Documentation. It’s not considered a stable sort.

Shell: This method “uses Shellsort (an O(n4/3) variant from Sedgewick (1986)),” according to R Documentation.

**44) Can you create an R decision tree?**

**Ans:**

A decision tree is a familiar graph for data scientists. It represents choices and results through the graphical form of a tree. To keep things simple, let’s just go over the basics.

Install the party package to get started with making the tree.

- install.packages(“party”)

This gives you access to a fancy new function: ctree(), and, at its most basic, this is all we need to create a tree. First, let’s grab some data from our package; make sure the package is loaded.

- library(party)

Now we have access to some new data sets. Part of the strucchange package that bundles with party includes data on youth homicides in Boston called BostonHomicide. Let’s use that one. You can print the data to the screen if you like.

- print(BostonHomicide)

Now we’ll create the tree. The usage of ctree() goes something like this:

- ctree(
*formula*,*dataset*)

We’ve got our data set. I’ll assign it to a variable for simplicity.

- inputData <- BostonHomicides

Now we can determine our formula and create the tree.

- treeAnalysis <- ctree(year~population+homicides+unemploy, data = inputData)

Let’s plot it!

- plot(treeAnalysis)

Here’s the result I got.

**45) Why is R useful for data science?**

**Ans:**

R turns otherwise hours of graphically intensive jobs into minutes and keystrokes. In reality, you probably wouldn’t encounter the language of R outside the realm of data science or an adjacent field. It’s great for linear modeling, nonlinear modeling, time-series analysis, plotting, clustering, and so much more.

Simply put, R is designed for data manipulation and visualization, so it’s natural that it would be used for data science.

**46) Describe how R can be used for predictive analysis**6

**47)In R programming, how missing values are represented?**

**Ans:**

In R missing values are represented by NA which should be in capital letters.

**48)Explain how you can start the R commander GUI.**

**Ans:**

rcmdr command is used to start the R commander GUI.

**49)What is the memory limit of R?**

**Ans:**

In 32 bit system memory limit is 3Gb but most versions limited to 2Gb and in 64 bit system memory limit is 8Tb.

**50)What is the function which is used for merging of data frames horizontally in R?**

**Ans:**

Merge()function is used to merge two data frames

Eg. Sum<-merge(data frame1,data frame 2,by=’ID’)

**51) How do you concatenate strings in R?**

**Ans:**

Concatenating strings in R is less than intuitive. You don’t use a . operator, nor a + operator, and forget about the & operator. In fact, you don’t use an operator at all. Concatenating strings in R requires the use of the paste() function. Here’s an example:

- hello <- “Hello, “
- world <- “World.”
- paste(hello, world)
- [1] “Hello, World.”

I’ve stored Hello, and World. in variables aptly named hello and world. With paste(), I’ve simply plugged in the two variables, and it concatenates them such that it creates the single phrase “Hello, World.”

An oddity with this function is that it will automatically insert spaces between the terms. Try it out with some numbers.

paste(1,2,3,4)

[1] “1 2 3 4”

This can be a sort of “gotcha” with this question. If we don’t want spaces, we can adjust the sep parameter in the function, which defaults to a space ” “.

paste(1,2,3,4,sep=“”)

[1] “1234”

**52)How many data structures R has?**

**Ans:**

There are 5 data structure in R i.e. vector, matrix, array which are of homogenous type and other two are list and data frame which are heterogeneous.

Learn more about data structure in R programming tutorial.

**53) Explain how data is aggregated in R.**

**Ans:**

There are two methods that is collapsing data by using one or more BY variable and other is aggregate() function in which BY variable should be in list.

**54) How many sorting algorithms are available?**

**Ans:**

There are 5 types of sorting algorithms are used which are:-

- Bubble Sort
- Selection Sort
- Merge Sort
- Quick Sort
- Bucket Sort

**55)How to create new variable in R programming?**

**Ans:**

For creating new variable assignment operator ‘<-’ is used

For e.g. mydata$sum <- mydata$x1 + mydata$x2

**56)What is the workspace in R?**

**Ans:**

Workspace is the current R working environment which includes any user defined objects like vector, lists etc.

**57) How to determine data type of an object?**

**Ans:**

class() is used to determine data type of an object. See the example below –

- x <- factor(1:5)
- class(x)

It returns factor.

**Object Class**

To determine structure of an object, use str() function :

- str(x) returns “Factor w/ 5 level”

Example 2 :

- xx <- data.frame(var1=c(1:5))
- class(xx)

It returns “data.frame”.

- str(xx) returns ‘data.frame’ : 5 obs. of 1 variable: $ var1: int

**58) What is the use of mode() function?**

It returns the storage mode of an object.

- x <- factor(1:5)

- mode(x)

The above mode function returns numeric.

**Mode Function**

- x <- data.frame(var1=c(1:5))

- mode(x)

It returns list.

**59)Which data structure is used to store categorical variables?**

**Ans:**

R has a special data structure called “factor” to store categorical variables. It tells R that a variable is nominal or ordinal by making it a factor.

- gender = c(1,2,1,2,1,2)
- gender = factor(gender)
- gender

**60)How to check the frequency distribution of a categorical variable?**

**Ans:**

The table function is used to calculate the count of each categories of a categorical variable.

- gender = factor(c(“m”,”f”,”f”,”m”,”f”,”f”))
- table(gender)

**Output**

If you want to include % of values in each group, you can store the result in data frame using data.frame function and the calculate the column percent.

- t = data.frame(table(gender))

- t$percent= round(t$Freq / sum(t$Freq)*100,2)

**Frequency Distribution**

**61)How to check the cumulative frequency distribution of a categorical variable**

**Ans:**

The cumsum function is used to calculate the cumulative sum of a categorical variable.

- gender = factor(c(“m”,”f”,”f”,”m”,”f”,”f”))
- x = table(gender)

- cumsum(x)

**Cumulative Sum**

If you want to see the cumulative percentage of values, see the code below :

- t = data.frame(table(gender))
- t$cumfreq = cumsum(t$Freq)
- t$cumpercent= round(t$cumfreq / sum(t$Freq)*100,2)

**Cumulative Frequency Distribution**

**62)How to produce histogram**

**Ans:**

The hist function is used to produce the histogram of a variable.

- df = sample(1:100, 25)
- hist(df, right=FALSE)

**Produce Histogram with R**

To improve the layout of histogram, you can use the code below

- colors = c(“red”, “yellow”, “green”, “violet”, “orange”, “blue”, “pink”, “cyan”)
- hist(df, right=FALSE, col=colors, main=”Main Title “, xlab=”X-Axis Title”)

**63)How to produce bar graph**

**Ans:**

First calculate the frequency distribution with table function and then apply barplot function to produce bar graph

- mydata = sample(LETTERS[1:5],16,replace = TRUE)
- mydata.count= table(mydata)
- barplot(mydata.count)

*To improve the layout of bar graph, you can use the code below:*

- colors = c(“red”, “yellow”, “green”, “violet”, “orange”, “blue”, “pink”, “cyan”)
- barplot(mydata.count, col=colors, main=”Main Title “, xlab=”X-Axis Title”)

**Bar Graph with R**

**64) How to produce Pie Chart**

**Ans:**

First calculate the frequency distribution with table function and then apply pie function to produce pie chart.

- mydata = sample(LETTERS[1:5],16,replace = TRUE)
- mydata.count= table(mydata)
- pie(mydata.count, col=rainbow(12))

**Pie Chart with R**

**65) Multiplication of 2 vectors having different length**

**Ans:**

For example, you have two vectors as defined below

- x <- c(4,5,6)
- y <- c(2,3)

If you run this vector z <- x*y , what would be the output? What would be the length of z?

It returns 8 15 12 with the warning message as shown below. The length of z is 3 as it has three elements.

**Multiplication of vectors**

**First Step : **It performs multiplication of the first element of vector x i.e. 4 with first element of vector y i.e. 2 and the result is 8. In the second step, it multiplies second element of vector x i.e. 5 with second element of vector b i.e. 3, and the result is 15. In the next step, R multiplies first element of smaller vector (y) with last element of bigger vector x.

Suppose the vector x would contain four elements as shown below :

- x <- c(4,5,6,7)
- y <- c(2,3)
- x*y

It returns 8 15 12 21. It works like this : (4*2) (5*3) (6*2) (7*3)

**66) What are the different data structures R contain?**

**Ans:**

R contains primarily the following data structures:

- Vector
- Matrix
- Array
- List
- Data frame
- Factor

The first three data types (vector, matrix, array) are homogeneous in behavior. It means all contents must be of the same type. The fourth and fifth data types (list, data frame) are heterogeneous in behavior. It implies they allow different types. And the factor data type is used to store categorical variable.

Explanation : Data Types (Structures) in R

**67)How to combine data frames?**

**Ans:**

Let’s prepare 2 vectors for demonstration :

- x = c(1:5)
- y = c(“m”,”f”,”f”,”m”,”f”)

The cbind() function is used to combine data frame by columns.

- z=cbind(x,y)

**cbind : Output**

The rbind() function is used to combine data frame by rows.

- z = rbind(x,y)

**rbind : Output**

While using cbind() function, make sure the number of rows must be equal in both the datasets. While using rbind() function, make sure both the number and names of columns must be same. If names of columns would not be same, wrong data would be appended to columns or records might go missing.

**68) How to combine data by rows when different number of columns?**

**Ans:**

When the number of columns in datasets are not equal, rbind() function doesn’t work to combine data by rows. For example, we have two data frames df and df2. The data frame df has 2 columns and df2 has only 1 variable. See the code below **–**

- df = data.frame(x = c(1:4), y = c(“m”,”f”,”f”,”m”))
- df2 = data.frame(x = c(5:8))

The bind_rows() function from dplyr package can be used to combine data frames when number of columns do not match.

- library(dplyr)
- combdf = bind_rows(df,df2)

**69) What are valid variable names in R?**

**Ans:**

A valid variable name consists of letters, numbers and the dot or underline characters. A variable name can start with either a letter or the dot followed by a character (not number).

A variable name such as .1var is not valid. But .var1 is valid.

A variable name cannot have reserved words. The reserved words are listed below –

if else repeat while function for in next break

TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_ NA_complex_ NA_character_

A variable name can have maximum to 10,000 bytes.

**70) What is the use of with() and by() functions? What are its alternatives?**

**Ans:**

Suppose you have a data frame as shown below –

- df=data.frame(x=c(1:6), y=c(1,2,4,6,8,12))

You are asked to perform this calculation : (x+y) + (x-y) . Most of the R programmers write like code below –

- (df$x + df$y) + (df$x – df$y)

Using with() function, you can refer your data frame and make the above code compact and simpler-

- with(df, (x+y) + (x-y))

The with() function is equivalent to pipe operator in dplyr package. See the code below –

- library(dplyr)

- df %>% mutate((x+y) + (x-y))

by() function in R

The by() function is equivalent to group by function in SQL. It is used to perform calculation by a factor or a categorical variable. In the example below, we are computing mean of variable var2 by a factor var1.

- df = data.frame(var1=factor(c(1,2,1,2,1,2)), var2=c(10:15))
- with(df, by(df, var1, function(x) mean(x$var2)))

The group_by() function in dply package can perform the same task.

library(dplyr)

- df %>% group_by(var1)%>% summarise(mean(var2))

**R Sample Resumes! Download & Edit, Get Noticed by Top Employers!**Download

**71) How to rename a variable?**

**Ans:**

In the example below, we are renaming variable var1 to variable1.

- df = data.frame(var1=c(1:5))

- colnames(df)[colnames(df) == ‘var1’] <- ‘variable1’

The rename() function in dplyr package can also be used to rename a variable.

- library(dplyr)
- df= rename(df, variable1=var1)

**72) What is the use of which() function in R?**

**Ans:**

The which() function returns the position of elements of a logical vector that are TRUE. In the example below, we are figuring out the row number wherein the maximum value of a variable x is recorded.

It returns 3 as 10 is the maximum value and it is at 3rd row in the variable x.

**73)How to calculate first non-missing value in variables?**

**Ans:**

Suppose you have three variables X, Y and Z and you need to extract first non-missing value in each rows of these variables.

- data = read.table(text=”
- X Y Z
- NA 1 5
- 3 NA 2
- “, header=TRUE)

The coalesce() function in dplyr package can be used to accomplish this task.

- library(dplyr)
- data %>% mutate(var=coalesce(X,Y,Z))

**COALESCE Function in R**

**74) How to calculate max value for rows?**

**Ans:**

Let’s create a sample data frame

*dt1 = read.table(text=”**X Y Z**7 NA 5**2 4 5**“, header=TRUE)*

With apply() function, we can tell R to apply the max function rowwise. The na,rm = TRUE is used to tell R to ignore missing values while calculating max value. If it is not used, it would return NA.

- dt1$var = apply(dt1,1, function(x) max(x,na.rm = TRUE))

**Output**

**75) Count number of zeros in a row**

**Ans:**

- dt2 = read.table(text=”
- A B C
- 8 0 0
- 6 0 5
- “, header=TRUE)
- apply(dt2,1, function(x) sum(x==0))

**76) Does the following code work?**

**Ans:**

- ifelse(df$var1==NA, 0,1)

It does not work. The logic operation on NA returns NA. It does not TRUE or FALSE.

This code works ifelse(is.na(df$var1), 0,1)

**77) What would be the final value of x after running the following program?**

**Ans:**

x = 3

- mult <- function(j)
- {
- x = j * 2
- return(x)
- }
- mult(2)
- [1] 4

**Answer :** The value of ‘x’ will remain 3. See the output shown in the image below-

**Output**

It is because x is defined outside function. If you want to change the value of x after running the function, you can use the following program:

- x = 3
- mult <- function(j)
- {
- x <<- j * 2
- return(x)
- }
- mult(2)
- x

The operator “<<-” tells R to search in the parent environment for an existing definition of the variable we want to be assigned.

**78) How to convert a factor variable to numeric**

**Ans:**

The as.numeric() function returns a vector of the levels of your factor and not the original values. Hence, it is required to convert a factor variable to character before converting it to numeric.

- a <- factor(c(5, 6, 7, 7, 5))
- a1 = as.numeric(as.character(a))

**79) How to concatenate two strings?**

**Ans:**

The paste() function is used to join two strings. A single space is the default separator between two strings.

- a = “Deepanshu”
- b = “Bhalla”
- paste(a, b)

It returns “Deepanshu Bhalla”

If you want to change the default single space separator, you can add sep=”,” keyword to include comma as a separator.

- paste(a, b, sep=”,”) returns “Deepanshu,Bhalla”

**80)How to extract first 3 characters from a word**

**Ans:**

The substr() function is used to extract strings in a character vector. The syntax of substr function is substr(character_vector, starting_position, end_position)

- x = “AXZ2016”
- substr(x,1,3)

**81)How to extract last name from full name**

**Ans:**

The last name is the end string of the name. For example, Jhonson is the last name of “Dave,Jon,Jhonson”.

*dt2 = read.table(text=”**var**Sandy,Jones**Dave,Jon,Jhonson**“, header=TRUE)*

The word() function of stringr package is used to extract or scan word from a string. -1 in the second parameter denotes the last word.

- library(stringr)
- dt2$var2 = word(dt2$var, -1, sep = “,”)

**82) How to remove leading and trailing spaces?**

**Ans:** The trimws() function is used to remove leading and trailing spaces.

- a = ” David Banes “
- trimws(a)

It returns “David Banes”.

**83)How to generate random numbers between 1 and 100**

**Ans:**

The runif() function is used to generate random numbers.

- rand = runif(100, min = 1, max = 100)

**84) How to apply LEFT JOIN in R?**

**Ans:**

LEFT JOIN implies keeping all rows from the left table (data frame) with the matches rows from the right table. In the merge() function, all.x=TRUE denotes left join.

- df1=data.frame(ID=c(1:5), Score=runif(5,50,100))
- df2=data.frame(ID=c(3,5,7:9), Score2=runif(5,1,100))
- comb = merge(df1, df2, by =”ID”, all.x = TRUE)
- Left Join (SQL Style)
- library(sqldf)
- comb = sqldf(‘select df1.*, df2.* from df1 left join df2 on df1.ID = df2.ID’)
- Left Join with dply package
- library(dplyr)
- comb = left_join(df1, df2, by = “ID”)

**85)Which packages are used for exporting of data?**

**Ans:**

For excel xlsReadWrite package is used and for sas,spss ,stata foreign package is implemented.

**86) How impossible values are represented in R?**

**Ans:**

In R NaN is used to represent impossible values.

**87) How to calculate cartesian product of two datasets**

**Ans:**

The cartesian product implies cross product of two tables (data frames). For example, df1 has 5 rows and df2 has 5 rows. The combined table would contain 25 rows (5*5)

- comb = merge(df1,df2,by=NULL)
- CROSS JOIN (SQL Style)
- library(sqldf)
- comb2 = sqldf(‘select * from df1 join df2 ‘)

**88)Unique rows common to both the datasets?**

**Ans:**

- df1=data.frame(ID=c(1:5), Score=c(50:54))
- df2=data.frame(ID=c(3,5,7:9), Score=c(52,60:63))
*l*ibrary(dplyr)- comb = intersect(df1,df2)
- library(sqldf)
- comb2 = sqldf(‘select * from df1 intersect select * from df2 ‘)

**Output : Intersection with R**

**89)How to measure execution time of a program in R?**

**Ans:**

There are multiple ways to measure running time of code. Some frequently used methods are listed below –

R Base Method

- start.time <- Sys.time()
- runif(5555,1,1000)
- end.time <- Sys.time()
- end.time – start.time
- With tictoc package
- library(tictoc)
- tic()
- runif(5555,1,1000)
- toc()

**90) Which package is generally used for fast data manipulation on large datasets?**

**Ans:**

The package data.table performs fast data manipulation on large datasets. See the comparison between dplyr and data.table.

- # Load data
- library(nycflights13)
- data(flights)
- df = setDT(flights)
- # Load required packages
- library(tictoc)
- library(dplyr)
- library(data.table)
- # Using data.table package
- tic()
- df[arr_delay > 30 & dest == “IAH”,
- .(avg = mean(arr_delay),
- size = .N),
- by = carrier]
- toc()
- # Using dplyr package
- tic()
- flights %>% filter(arr_delay > 30 & dest == “IAH”) %>%
- group_by(carrier) %>% summarise(avg = mean(arr_delay), size = n())
- toc()

**Result **: data.table package took 0.04 seconds. whereas dplyr package took 0.07 seconds. So, data.table is approx. 40% faster than dplyr. Since the dataset used in the example is of medium size, there is no noticeable difference between the two. As size of data grows, the difference of execution time gets bigger.

**91) How to read large CSV file in R?**

**Ans:**

We can use fread() function of data.table package.

- library(data.table)
- yyy = fread(“C:\Users\Dave\Example.csv”, header = TRUE)

We can also use read.big.matrix() function of bigmemory package.

**92) What is the difference between the following two programs ?**

**Ans:**

- temp = data.frame(v1<-c(1:10),v2<-c(5:14))
- temp = data.frame(v1=c(1:10),v2=c(5:14))

In the first case, it created two vectors v1 and v2 and a data frame temp which has 2 variables with improper variable names. The second code creates a data frame temp with proper variable names.

**93) How to remove all the objects**

**Ans:**

rm(list=ls())

**94)What are the various sorting algorithms in R?**

**Ans:**

Major five sorting algorithms :

- Bubble Sort
- Selection Sort
- Merge Sort
- Quick Sort
- Bucket Sort

**95)Sort data by multiple variables**

**Ans:**

Create a sample data frame

- mydata = data.frame(score = ifelse(sign(rnorm(25))==-1,1,2),
- experience= sample(1:25))

Task : You need to sort score variable on ascending order and then sort experience variable on descending order.

R Base Method

- mydata1 <- mydata[order(mydata$score, -mydata$experience),]

With dplyr package

*library(dplyr)**mydata1 = arrange(mydata, score, desc(experience))*

**96) Drop Multiple Variables?**

**Ans:**

Suppose you need to remove 3 variables – x, y and z from data frame “mydata”.

R Base Method

- df = subset(mydata, select = -c(x,y,z))

With dplyr package

- library(dplyr)
- df = select(mydata, -c(x,y,z))

**97)How to save everything in R session**

**Ans:**

save.image(file=”dt.RData”)

**98) How R handles missing values?**

**Ans:**

Missing values are represented by capital NA.

To create a new data without any missing value, you can use the code below :

- df <- na.omit(mydata)