In this article, we will discuss how to import an excel file in the R Programming Language. There two different types of approaches to import the excel file into the R programming language and those are discussed properly below.
File in use:
Method 1: Using read_excel()
In this approach to import the Excel file in the R, the user needs to call the read_excel() function from readxl library of the R language with the name of the file as the parameter. readxl() package can import both .xlsx and .xls files. This package is pre-installed in R-Studio. With the use of this function, the user will be able to import the Excel file in R.
Syntax: read_excel(filename, sheet, dtype = “float32”)
Parameters:
- filename:-File name to read from.
- sheet:-Name of the sheet in Excel file.
- dtype:-Numpy data type.
Returns:
The variable is treated to be a data frame.
Example:
R
library
(readxl)
gfg_data=
read_excel
(
'Data_gfg.xlsx'
)
gfg_data
Output:
Method 2: Using inbuilt menu Options of Rstudio
This approach is the easy approach to import the excel file in R compared with the previous one as this is the only approach to import an excel file in R where the user need not type any code in the console to import the excel file. Further, here user just needs to work on the environment window of the studio.
Environment window of the Rstudio:
Steps to import excel file using Dataset option from the environment window of Rstudio:
Step 1: Select the Import Dataset option in the environment window. Here the user needs to select the option to import the dataset from the environment window in Rstudio.
Step 2: Select the option of “From excel” under the import Dataset option. In this step, the user needs to select the option to “from excel” as the file is in the form of excel under the import dataset option to import the excel file.
Step 3: Select the browse option and select the excel file to be imported. Now, under this with the click to the browse option user will be given the choice to select the needed excel file to be imported in R.And then the user need to select the needed excel file to be imported in R.
Step 4: Select the import option and the excel file is successfully imported. Now, in this final step user need to select the import button and this will lead to successful importation of the selected excel file by the user in R.
The user chose the dataset according to their choice means that changing the name of the file and type of sheet. May be there are 2 sheets , he choose 2nd one then he choose the 2nd list with the help of sheet option and in max rows how much rows he wants from the data he put into it. And in skip box , he skips the rows how much he want. In NA box, he write some value in it , if this value is in the data then it makes as NA.
There is also another method for to import the excel files into R-Studio.
Step 1: Click on file
Step 2: In file, click import dataset then choose from excel.
readxl
Overview
The readxl package makes it easy to get data out of Excel and into R.
Compared to many of the existing packages (e.g. gdata, xlsx,
xlsReadWrite) readxl has no external dependencies, so it’s easy to
install and use on all operating systems. It is designed to work with
tabular data.
readxl supports both the legacy .xls
format and the modern xml-based
.xlsx
format. The libxls C library
is used to support .xls
, which abstracts away many of the complexities
of the underlying binary format. To parse .xlsx
, we use the
RapidXML C++ library.
Installation
The easiest way to install the latest released version from CRAN is to
install the whole tidyverse.
install.packages("tidyverse")
NOTE: you will still need to load readxl explicitly, because it is not a
core tidyverse package loaded via library(tidyverse)
.
Alternatively, install just readxl from CRAN:
install.packages("readxl")
Or install the development version from GitHub:
#install.packages("pak") pak::pak("tidyverse/readxl")
Cheatsheet
You can see how to read data with readxl in the data import
cheatsheet, which also covers similar functionality in the related
packages readr and googlesheets4.
Usage
readxl includes several example files, which we use throughout the
documentation. Use the helper readxl_example()
with no arguments to
list them or call it with an example filename to get the path.
readxl_example() #> [1] "clippy.xls" "clippy.xlsx" "datasets.xls" "datasets.xlsx" #> [5] "deaths.xls" "deaths.xlsx" "geometry.xls" "geometry.xlsx" #> [9] "type-me.xls" "type-me.xlsx" readxl_example("clippy.xls") #> [1] "/private/tmp/Rtmpjectat/temp_libpath3b7822c649d8/readxl/extdata/clippy.xls"
read_excel()
reads both xls and xlsx files and detects the format from
the extension.
xlsx_example <- readxl_example("datasets.xlsx") read_excel(xlsx_example) #> # A tibble: 150 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> # … with 147 more rows xls_example <- readxl_example("datasets.xls") read_excel(xls_example) #> # A tibble: 150 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> # … with 147 more rows
List the sheet names with excel_sheets()
.
excel_sheets(xlsx_example) #> [1] "iris" "mtcars" "chickwts" "quakes"
Specify a worksheet by name or number.
read_excel(xlsx_example, sheet = "chickwts") #> # A tibble: 71 × 2 #> weight feed #> <dbl> <chr> #> 1 179 horsebean #> 2 160 horsebean #> 3 136 horsebean #> # … with 68 more rows read_excel(xls_example, sheet = 4) #> # A tibble: 1,000 × 5 #> lat long depth mag stations #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 -20.4 182. 562 4.8 41 #> 2 -20.6 181. 650 4.2 15 #> 3 -26 184. 42 5.4 43 #> # … with 997 more rows
There are various ways to control which cells are read. You can even
specify the sheet here, if providing an Excel-style cell range.
read_excel(xlsx_example, n_max = 3) #> # A tibble: 3 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa read_excel(xlsx_example, range = "C1:E4") #> # A tibble: 3 × 3 #> Petal.Length Petal.Width Species #> <dbl> <dbl> <chr> #> 1 1.4 0.2 setosa #> 2 1.4 0.2 setosa #> 3 1.3 0.2 setosa read_excel(xlsx_example, range = cell_rows(1:4)) #> # A tibble: 3 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa read_excel(xlsx_example, range = cell_cols("B:D")) #> # A tibble: 150 × 3 #> Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> #> 1 3.5 1.4 0.2 #> 2 3 1.4 0.2 #> 3 3.2 1.3 0.2 #> # … with 147 more rows read_excel(xlsx_example, range = "mtcars!B1:D5") #> # A tibble: 4 × 3 #> cyl disp hp #> <dbl> <dbl> <dbl> #> 1 6 160 110 #> 2 6 160 110 #> 3 4 108 93 #> # … with 1 more row
If NA
s are represented by something other than blank cells, set the
na
argument.
read_excel(xlsx_example, na = "setosa") #> # A tibble: 150 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 <NA> #> 2 4.9 3 1.4 0.2 <NA> #> 3 4.7 3.2 1.3 0.2 <NA> #> # … with 147 more rows
If you are new to the tidyverse conventions for data import, you may
want to consult the data import
chapter in R for Data Science.
readxl will become increasingly consistent with other packages, such as
readr.
Articles
Broad topics are explained in these
articles:
- Cell and Column
Types - Sheet
Geometry:
how to specify which cells to read - readxl
Workflows:
Iterating over multiple tabs or worksheets, stashing a csv snapshot
We also have some focused articles that address specific aggravations
presented by the world’s spreadsheets:
- Column
Names - Multiple Header
Rows
Features
-
No external dependency on, e.g., Java or Perl.
-
Re-encodes non-ASCII characters to UTF-8.
-
Loads datetimes into POSIXct columns. Both Windows (1900) and
Mac (1904) date specifications are processed correctly. -
Discovers the minimal data rectangle and returns that, by default.
User can exert more control withrange
,skip
, andn_max
. -
Column names and types are determined from the data in the sheet, by
default. User can also supply viacol_names
andcol_types
and
control name repair via.name_repair
. -
Returns a
tibble, i.e. a
data frame with an additionaltbl_df
class. Among other things, this
provide nicer printing.
Other relevant packages
Here are some other packages with functionality that is complementary to
readxl and that also avoid a Java dependency.
Writing Excel files: The example files datasets.xlsx
and
datasets.xls
were created with the help of
openxlsx (and Excel).
openxlsx provides “a high level interface to writing, styling and
editing worksheets”.
l <- list(iris = iris, mtcars = mtcars, chickwts = chickwts, quakes = quakes) openxlsx::write.xlsx(l, file = "inst/extdata/datasets.xlsx")
writexl is a new option in
this space, first released on CRAN in August 2017. It’s a portable and
lightweight way to export a data frame to xlsx, based on
libxlsxwriter. It is much
more minimalistic than openxlsx, but on simple examples, appears to be
about twice as fast and to write smaller files.
Non-tabular data and formatting:
tidyxl is focused on
importing awkward and non-tabular data from Excel. It also “exposes cell
content, position and formatting in a tidy structure for further
manipulation”.
Excel is a spreadsheet developed by Microsoft, which allows you to manage data in a very simple way. Until 2007, the XLS was the main file extension. However, in the 2007 release the XLSX (XML-based) extension was introduced to become the default workbook format. In this tutorial you will learn how to read Excel files into R and RStudio with several packages.
- 1 How to import Excel files into R?
- 1.1 Import Excel data into RStudio from the menu
- 1.2 Read XLSX without JAVA in R: readxl and openxlsx
- 1.2.1 readxl package
- 1.2.2 openxlsx package
- 1.3 The xlsx package
- 1.4 XLConnect package
- 2 Convert XLSX files to CSV in R
How to import Excel files into R?
If you need to read an Excel in R, you will need to use a specific package to achieve it. There are several options, but the best packages for reading Excel files could be openxlsx
and readxl
, as they don’t depend on JAVA (unlike xlsx
and XLConnect
packages) nor Perl (gdata
package).
Note that external dependencies can cause errors when loading the packages, but for huge datasets they should be faster than the other alternatives.
If you are using RStudio you can go to File → Import Dataset → From Excel.... Then, you can browse your Excel file and customize the output (the name of the variable, the sheet, cells range, …). You can also see a preview of the code that will be executed in the backend and of the data that will be loaded:
Note that, with this approach, you will need to have installed the readxl
package.
Read XLSX without JAVA in R: readxl and openxlsx
readxl package
The readxl
package is part of the tidyverse
package, created by Hadley Wickham (chief scientist at RStudio) and his team. This package supports XLS via the libxls
C library and XLSX files via the RapidXML
C++ library without using external dependencies.
The package provides some Excel (XLS and XLSX) files stored in the installation folder of the package, so in order to create a reproducible example, in the following examples we are going to use the clippy.xlsx
file, which first sheet is as follows:
In order to load the path of the sample Excel file you can make use of the readxl_example
function. Once loaded, or once you have the path of your own Excel file, you can use the excel_sheets
function to check the Excel file sheet names, if needed.
# install.packages("readxl")
library(readxl)
# Get the path of a sample XLSX dataset of the package
file_path <- readxl_example("clippy.xlsx")
# Check the Sheet names of the Excel file
excel_sheets(file_path) # "list-column" "two-row-header"
The generic function of the package to read Excel files into R is the read_excel
function, which guesses the file type (XLS or XLSX) depending on the file extension and the file itself.
read_excel(file_path)
# A tibble: 4 x 2
name value
<chr> <chr>
1 Name Clippy
2 Species paperclip
3 Approx date of death 39083
4 Weight in grams 0.9
The sheet
argument allows you to specify the sheet you want to load, passing its name or the corresponding number of the tab. Note that, by default, the function loads the first Excel sheet.
# Selecting the other sheet of the Excel file
read_excel(file_path, sheet = "two-row-header")
read_excel(file_path, sheet = 2) # Equivalent
# A tibble: 2 x 4
name species death weight
<chr> <chr> <chr> <chr>
1 (at birth) (office supply type) (date is approximate) (in grams)
2 Clippy paperclip 39083 0.9
You can also skip rows with the skip
argument of the function:
# Skip first row
read_excel(file_path, skip = 1)
# A tibble: 3 x 2
Name Clippy
<chr> <chr>
1 Species paperclip
2 Approx date of death 39083
3 Weight in grams 0.9
Note that you could also specify a range of cells to be selected with the range
argument. In this case, the skip
argument won’t be taken into account if you specify it.
read_excel(file_path, range = "B1:B5")
# A tibble: 4 x 1
value
<chr>
1 Clippy
2 paperclip
3 39083
4 0.9
In addition, if you want to avoid reading the column names, you can set the col_names
argument to FALSE
:
read_excel(file_path, col_names = FALSE)
New names:
* `` -> ...1
* `` -> ...2
...1 ...2
1 name value
2 Name Clippy
3 Species paperclip
4 Approx date of death 39083
5 Weight in grams 0.9
However, you may have noticed that the output is of class tibble
(a modern type of data frame). If you want the output to be of class data.frame
you will need to use the as.data.frame
function as follows:
data <- read_excel(file_path, skip = 1)
as.data.frame(data)
Name Clippy
1 Species paperclip
2 Approx date of death 39083
3 Weight in grams 0.9
Recall that the read_excel
function guesses the file extension. Nonetheless, if you know the file extension you are going to read you can use the corresponding function of the following to avoid guessing:
# If you know the extension of your Excel file
# use one of these functions instead
# For XLS files
read_xls()
# For XLSX files
read_xlsx()
openxlsx package
The openxlsx
package uses Rcpp
and, as it doesn’t depend on JAVA, it is an interesting alternative to to the readxl
package to read an Excel file in R. The differences respect to the previous package are that the output is of class data.frame
by default instead of tibble
and that its main use is not just importing Excel files, as it also provides a wide variety of functions to write, style and edit Excel files.
The function to read XLSX files is named read.xlsx
:
# install.packages("openxlsx")
library(openxlsx)
read.xlsx(file_path)
name value
1 Name Clippy
2 Species paperclip
3 Approx date of death 39083
4 Weight in grams 0.9
As in the function of the previous package, there are several arguments you can customize, as sheet
, skip
or colNames
. If you want to select specific cells you can make use of the rows
and cols
arguments. Recall to type ?read.xlsx
or help(read.xlsx)
for additional information.
read.xlsx(file_path, cols = 1:2, rows = 2:3)
Name Clippy
1 Species paperclip
The xlsx package
Although this package requires JAVA installed on your computer it is very popular. The main functions to import Excel files are the read.xlsx
and read.xlsx2
. The second has slightly differences in the default arguments and it does more work in JAVA, achieving better performance.
# install.packages("xlsx")
library(xlsx)
read.xlsx(file_path)
read.xlsx2(file_path)
You can customize several arguments as sheetIndex
, sheetName
, header
, rowIndex
, colIndex
, among others. Run ?read.xlsx
or help(read.xlsx)
for additional details.
XLConnect package
An alternative to the xlsx
package is XLConnect
, which allows writing, reading and formatting Excel files. In order to load an Excel file into R you can use the readWorksheetFromFile
function as follows. We recommend you to type ??XLConnect
to look for additional information of the arguments of each function of the package.
# install.packages("XLConnect")
library(XLConnect)
data <- readWorksheetFromFile(file_path, sheet = "list-column",
startRow = 1, endRow = 5,
startCol = 1, endCol = 2)
In case you want to load multiple sheets, it is recommended to use the loadWorkbook
function and then load each sheet with the readWorksheet
function:
load <- loadWorkbook(file_path)
data <- readWorksheet(load, sheet = "list-column",
startRow = 1, endRow = 5,
startCol = 1, endCol = 2)
data2 <- readWorksheet(load, sheet = "two-row-header",
startRow = 1, endRow = 3,
startCol = 1, endCol = 4)
Moreover, this package provides a function to load Excel named regions. Analogous to the previous example, you can import just a region with the readNamedRegionFromFile
, specifying the file name (if the file is in your working directory) or the file path and the region name.
data <- readNamedRegionFromFile(file, # File path
name, # Region name
...) # Arguments of readNamedRegion()
If you want to load multiple named regions you can load the workbook with the loadWorkbook
function and then import each region with the readNamedRegion
function.
load <- loadWorkbook(file_path)
data <- readNamedRegion(load, name_Region_1, ...)
data2 <- readNamedRegion(load, name_Region_2, ...)
It is worth to mention that if you are experiencing issues with the packages that require JAVA you can get and set the path of JAVA in R with the following codes:
# Prints the path of JAVA Home in R
Sys.getenv("JAVA_HOME")
# Sets the path of JAVA
Sys.setenv(JAVA_HOME = "path_to_jre_java_folder")
Note that you will need to specify the path to the jre
folder inside the Java
folder of your computer, which you should find inside Program Files
.
Convert XLSX files to CSV in R
Finally, you could also convert your Excel files into a CSV format and read the CSV file in R. For this purpose, you can use the convert
function of the rio
package. An alternative would be saving directly the Excel file as CSV with the menu of Microsoft Excel.
# install.packages("rio")
library(rio)
convert(file_path, "file.csv")
Looking to import an Excel file into R?
If so, you’ll see the full steps to import your file using the readxl package.
To start, here is a template that you can use to import an Excel file into R:
library("readxl") read_excel("Path where your Excel file is stored\File Name.xlsx")
And if you want to import a specific sheet within the Excel file, then you may use this template:
library("readxl") read_excel("Path where your Excel file is stored\File Name.xlsx",sheet = "Your sheet name")
Note: For previous versions of Excel, use the file extension of .xls
Step 1: Install the readxl package
In the R Console, type the following command to install the readxl package:
install.packages("readxl")
Follow the instructions to complete the installation. You may want to check the following guide that explains how to install a package in R.
Step 2: Prepare your Excel File
Let’s suppose that you have an Excel file with some data about products:
Product | Price |
Refrigerator | 1200 |
Oven | 750 |
Dishwasher | 900 |
Coffee Maker | 300 |
And let’s say that the Excel file name is product_list, and your goal is to import that file into R.
Step 3: Import the Excel file into R
In order to import your file, you’ll need to apply the following template in the R Editor:
library("readxl") read_excel("Path where your Excel file is stored\File Name.xlsx")
For demonstration purposes, let’s assume that an Excel file is stored under the following path:
C:\Users\Ron\Desktop\Test\product_list.xlsx
Where:
- product_list is the actual file name; and
- .xlsx is the Excel file extension. For previous versions of Excel, use the file extension of .xls
Note that a double backslash (‘\’) was used within the path name. By adding a double backslash, you’ll avoid the following error in R:
Error: ‘U’ used without hex digits in character string starting “”C:U”
Here is the complete code to import the Excel file for our example:
library("readxl") read_excel("C:\Users\Ron\Desktop\Test\product_list.xlsx")
You’ll need to adjust the path to reflect the location where the Excel file is stored on your computer. Once you run the code in R, you’ll get the same values as in the Excel file:
Product Price
1 Refrigerator 1200
2 Oven 750
3 Dishwasher 900
4 Coffee Maker 300
Alternatively, you may also want to check the following guide that explains how to export your data to an Excel file.
The easiest way to import an Excel file into R is by using the read_excel() function from the readxl package.
This function uses the following syntax:
read_excel(path, sheet = NULL)
where:
- path: Path to the xls/xlsx file
- sheet: The sheet to read. This can be the name of the sheet or the position of the sheet. If this is not specified, the first sheet is read.
This tutorial provides an example of how to use this function to import an Excel file into R.
Example: Import an Excel File into R
Suppose I have an Excel file saved in the following location:
C:UsersBobDesktopdata.xlsx
The file contains the following data:
The following code shows how to import this Excel file into R:
#install and load readxl package
install.packages('readxl')
library(readxl)
#import Excel file into R
data <- read_excel('C:\Users\Bob\Desktop\data.xlsx')
Note that we used double backslashes (\) in the file path to avoid the following common error:
Error: 'U' used without hex digits in character string starting ""C:U"
We can use the following code to quickly view the data:
#view entire dataset
data
#A tibble: 5 x 3
team points assists
<chr> <dbl> <dbl>
1 A 78 12
2 B 85 20
3 C 93 23
4 D 90 8
5 E 91 14
We can see that R imported the Excel file and automatically determined that team was a string variable while points and assists were numerical variables.
Additional Resources
The following tutorials explain how to import other file types into R:
How to Import CSV Files into R
How to Import SAS Files into R
How to Manually Enter Raw Data in R