Содержание
- Reading and Writing Excel Files With R Using readxl and writexl
- read_excel() method in readxl Package:
- write_xlsx() method in writexl package:
- Reading Data From Excel Files (xls|xlsx) into R
- Preleminary tasks
- Copying data from Excel and import into R
- On Windows system
- On Mac OSX system
- Importing Excel files into R using readxl package
- Installing and loading readxl package
- Using readxl package
- Importing Excel files using xlsx package
- Installing and loading xlsx package
- Using xlsx package
- Read more
- Summary
- Related articles
- Infos
- Recommended for You!
- Recommended for you
- Coursera — Online Courses and Specialization
- Data science
- Popular Courses Launched in 2020
- Trending Courses
- Books — Data Science
- Our Books
- Others
- R-bloggers
- R news and tutorials contributed by hundreds of R bloggers
- Reading Data From Excel Files (xls,xlsx,csv) into R-Quick Guide
- Reading Data From Excel Files into R
- 1. readxl package
- 2. xlsx Package
- 3. openxlsx Package
- 4. XLConnect package
- R Tutorial: Importing Data from Excel
- Importing Data from Excel
Reading and Writing Excel Files With R Using readxl and writexl
In this article let’s discuss reading and writing excel files using readxl and writexl packages of the R programming language.
read_excel() method in readxl Package:
The Readxl package is used to read data from the excel files i.e. the files of format .xls and .xlsx. The Readxl package provides a function called read_excel() which is used to read the data from excel files. The read_excel() method accepts the excel file which needs to read the content from it. In order to use the read_excel() method, first readxl library needs to be imported.
Output:
write_xlsx() method in writexl package:
The writexl package provides a method called write_xlsx() method which allows writing a data frame into an excel sheet i.e. the files of format .xls and .xlsx. The write_xlsx() method accepts a data frame and the name of the excel file in which the content of the data frame is copied into it. In order to use the write_xlsx() method, the first writexl library needs to be imported.
write_xlsx(dataframeName, “excelFile”, col_names=TRUE)
Parameters
- dataframeName – Name of the data frame that contains the data.
- excelFile – Name of the excel file into which we import from data frame.
- col_names – Write column names at the top of file if it set to True.
Syntax to install and import the writexl package:
Источник
Reading Data From Excel Files (xls|xlsx) into R
Previously, we described the essentials of R programming and some best practices for preparing your data. We also provided quick start guides for reading and writing txt and csv files using R base functions as well as using a most modern R package named readr, which is faster (X10) than R base functions.
In this article, you’ll learn how to read data from Excel xls or xlsx file formats into R. This can be done either by:
- copying data from Excel
- using readxl package
- or using xlsx package
Preleminary tasks
Prepare your data as described here: Best practices for preparing your data
Copying data from Excel and import into R
On Windows system
Open the Excel file containing your data: select and copy the data (ctrl + c)
Type the R code below to import the copied data from the clipboard into R and store the data in a data frame (my_data):
On Mac OSX system
Select and copy the data (Cmd + c)
Use the function pipe(pbpaste) to import the data you’ve copied (with Cmd + c):
Importing Excel files into R using readxl package
The readxl package, developed by Hadley Wickham, can be used to easily import Excel files (xls|xlsx) into R without any external dependencies.
Installing and loading readxl package
Using readxl package
The readxl package comes with the function read_excel() to read xls and xlsx files
- Read both xls and xlsx files
The above R code, assumes that the file “my_file.xls” and “my_file.xlsx” is in your current working directory. To know your current working directory, type the function getwd() in R console.
- It’s also possible to choose a file interactively using the function file.choose(), which I recommend if you’re a beginner in R programming:
If you use the R code above in RStudio, you will be asked to choose a file.
- Specify sheet with a number or name
- Case of missing values: NA (not available). If NAs are represented by something (example: “—”) other than blank cells, set the na argument:
Importing Excel files using xlsx package
The xlsx package, a java-based solution, is one of the powerful R packages to read, write and format Excel files.
Installing and loading xlsx package
Using xlsx package
There are two main functions in xlsx package for reading both xls and xlsx Excel files: read.xlsx() and read.xlsx2() [faster on big files compared to read.xlsx function].
The simplified formats are:
- file: file path
- sheetIndex: the index of the sheet to be read
- header: a logical value. If TRUE, the first row is used as column names.
Example of usage:
Read more
Read more about for reading, writing and formatting Excel files:
Summary
Read Excel files using readxl package: read_excel(file.choose(), sheet = 1)
Related articles
- Previous chapters
- R programming basics
- Best practices in preparing data files for importing into R
- Reading data from txt|csv files: R base functions
- Fast Reading of Data From txt|csv Files into R: readr package
- Next chapters
- Exporting data from R
Infos
This analysis has been performed using R (ver. 3.2.3).
Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.
Show me some love with the like buttons below. Thank you and please don’t forget to share and comment below!!
Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l’envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.
Montrez-moi un peu d’amour avec les like ci-dessous . Merci et n’oubliez pas, s’il vous plaît, de partager et de commenter ci-dessous!
Recommended for You!
Recommended for you
This section contains best data science and self-development resources to help you on your path.
Coursera — Online Courses and Specialization
Data science
- Course: Machine Learning: Master the Fundamentals by Standford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University
Popular Courses Launched in 2020
- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services
Trending Courses
- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts
Books — Data Science
Our Books
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
Others
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet
Want to Learn More on R Programming and Data Science?
Источник
R-bloggers
R news and tutorials contributed by hundreds of R bloggers
Reading Data From Excel Files (xls,xlsx,csv) into R-Quick Guide
Posted on June 14, 2021 by finnstats in R bloggers | 0 Comments
Reading Data From Excel Files into R, so many people still saving their dataset in R but sometimes coming to data analysis facing lots of difficulties, while loading data set into R, we can make use of the power of R functions.
In this tutorial we are going to describe how to read excel data xls or xlsx file formats into R. This can be done based on using readxl, xlsx, openxlsx, or XLConnect package.
Reading Data From Excel Files into R
1. readxl package
If you are not installed readxl package then you can use below code
Load readxl package into R.
Reading xls and xlsx format is given below.
You can choose a file interactively based on file.choose() function. This is time consuming so not recommended.
Imagine if you have multiple sheets then you can make use of argument sheet.
You need to specify sheet by its name
You can specify sheet by its index
Sometimes in excel sheet contains the missing values, if you are reading the file in R it will display as a blank cell, You can avoid these kinds of issues while setting na argument.
If you want to read multiple excel files then,
If you also want to include the files in subdirectories, then
Suppose all the sheets have same column name then you can make use of bind_rows,
2. xlsx Package
One of the another package is xlsx, java-based solution, for reading, writing and formatting excel files in R.
If you are not installed you can install the package based on below code.
Let’s load the xlsx package in R.
How to use xlsx package?
In xlsx pakage mainly two functions read.xlsx() and read.xlsx2()
Suppose if you have bigger files then read.xlsx2() function recommended because it’s load faster than read.xlsx.
Xlsx package format is given below.
file indicating the file path
sheetIndex indicate the index of the sheet to be read
header indicates a logical value. If header is TRUE then the first row is considered as column names.
Another way of importing data is copying from Excel and import into R
If you are using windows system the,
this is not the better way of importing data into R
3. openxlsx Package
openxlsx package is an another alternative to readxl package
4. XLConnect package
XLConnect is an alternative to the xlsx package
If you want to read several sheets then
Reading several sheets
In this package yu can Import a named region once
Reading several named regions
If you have csv file then
Sometimes reading excel files JAVA errors can occur, you can avoid those issues while seting the java path in R
Prints the path of JAVA Home in R
Sets the path of JAVA
jre folder contains inside the Java folder of your computer (Program Files)
Enjoyed this tutorial? Don’t forget to show your love, Please Subscribe the Newsletter and COMMENT below!
Источник
R Tutorial: Importing Data from Excel
Importing Data from Excel
Excel is a spreadsheet application, which is widely used by many institutions to store data. This tutorial will give a brief of reading, writing and manipulating the data in Excel files using R. We will learn about various R packages and extensions to read and import Excel files. At the end of this section, we have written about some common problems encountered while loading Excel files and spreadsheet data.
Before we import the data from Excel spreadsheet into R, there are some standard practices to tone your data, to avoid any unnecessary error.
- The first column of the spreadsheet is used to identify the sample dataset, therefore it should be a unique key id. Similarly the first row is reserved for header, describing the scheme of the data.
- Concatenating words in the cells should be done using ‘.’. For example, ‘Sample.data’.
- The names and header of the data scheme should usually avoid symbols.
- All missing data points in the Excel spreadsheet should be indicated with ‘NA’.
Before you import the Excel data in R, you would need to set the console in R to working directory.
Before we look into the packages available to extract data from Excel spreadsheet, we will show you simple R commands that can do the job. Utlis package is one of the core packages which contains bunch of basic utility functions and the following commands are part of this package.
The first argument of read.table() function is the name of the text file within the double quotes and if the data file has a header for data schema in the top row, the second argument will be true. This function will work for files, which are saved in .txt format.
Learn Data Science by working on interesting Data Science Projects
Reading data from an excel file is incredibly easy and it can be done using several packages. You can export the Excel file to a Comma delimited file and import it using the method shown in the tutorial Importing Data from Flat Files in R. Another method to Import Data in R from Excel is using xlsx package, which I used to access Excel files. The first row should contain variable names.
It is necessary that while using read.xlsx function, we mention the sheet index or the sheet name. If the required dataset is bigger, then read.xlsx2() function is used.
Additionally in the function above, user can mention the end row or the data import can be limited to certain row and column index. xlsx package does a lot more than importing data from Excel files, it can also manipulate the data and write data frames into the spreadsheets. The data frames can be written to Excel workbook using the function write.xlsx().
Apart from the xlsx package, we have gdata package, which has functions that can read from data in the Excel format. gdata provides a cross platform solution for importing data from Excel files into R. The read.xls function in particular can read data from an Excel spreadsheet and gives data frame as output. Take for example a sample Excel spreadsheet, named ‘Sample_Sheet.xls’ and to use this method, you would require Perl runtime in your system.
This function converts the Sample_Sheet.xls file into a temporary .csv or .tab limited file using Perl. While executing read.xls function, R will search for a path to the excel file and looks out for Perl on its way. If it doesn’t find perl.exe, then R will return an error. To avoid this error, another argument for the function can be given to search for the Perl executable file.
gdata has several other functions to convert the Excel file into various other formats. Such as:
The input arguments for these functions are same as that for read.xlsx() function.
Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro
Another package that can do the job of importing data from Excel is the XLConnect package using the loadWorkbook function. This function can be used to read the entire workbook, followed by readWorksheet function to load the worksheets into R. Java is required to be pre-installed for this package to work. This package also provides function to create Excel workbooks, and export data to them.
Other arguments can also be added after the Index argument such as startCol or StartRow or endCol or endRow to indicate and limit the cells that are required to be imported from the Excel workbook. Another argument ‘region’ can also be used in this function to highlight the range of starting and ending rows and columns.
Источник
- Preleminary tasks
- Copying data from Excel and import into R
- On Windows system
- On Mac OSX system
- Importing Excel files into R using readxl package
- Installing and loading readxl package
- Using readxl package
- Importing Excel files using xlsx package
- Installing and loading xlsx package
- Using xlsx package
- Read more
- Summary
- Related articles
- Infos
Previously, we described the essentials of R programming and some best practices for preparing your data. We also provided quick start guides for reading and writing txt and csv files using R base functions as well as using a most modern R package named readr, which is faster (X10) than R base functions.
In this article, you’ll learn how to read data from Excel xls or xlsx file formats into R. This can be done either by:
- copying data from Excel
- using readxl package
- or using xlsx package
Preleminary tasks
-
Launch RStudio as described here: Running RStudio and setting up your working directory
-
Prepare your data as described here: Best practices for preparing your data
Copying data from Excel and import into R
On Windows system
-
Open the Excel file containing your data: select and copy the data (ctrl + c)
-
Type the R code below to import the copied data from the clipboard into R and store the data in a data frame (my_data):
my_data <- read.table(file = "clipboard",
sep = "t", header=TRUE)
On Mac OSX system
-
Select and copy the data (Cmd + c)
-
Use the function pipe(pbpaste) to import the data you’ve copied (with Cmd + c):
my_data <- read.table(pipe("pbpaste"), sep="t", header = TRUE)
Importing Excel files into R using readxl package
The readxl package, developed by Hadley Wickham, can be used to easily import Excel files (xls|xlsx) into R without any external dependencies.
Installing and loading readxl package
- Install
install.packages("readxl")
- Load
library("readxl")
Using readxl package
The readxl package comes with the function read_excel() to read xls and xlsx files
- Read both xls and xlsx files
# Loading
library("readxl")
# xls files
my_data <- read_excel("my_file.xls")
# xlsx files
my_data <- read_excel("my_file.xlsx")
The above R code, assumes that the file “my_file.xls” and “my_file.xlsx” is in your current working directory. To know your current working directory, type the function getwd() in R console.
- It’s also possible to choose a file interactively using the function file.choose(), which I recommend if you’re a beginner in R programming:
my_data <- read_excel(file.choose())
If you use the R code above in RStudio, you will be asked to choose a file.
- Specify sheet with a number or name
# Specify sheet by its name
my_data <- read_excel("my_file.xlsx", sheet = "data")
# Specify sheet by its index
my_data <- read_excel("my_file.xlsx", sheet = 2)
- Case of missing values: NA (not available). If NAs are represented by something (example: “—”) other than blank cells, set the na argument:
my_data <- read_excel("my_file.xlsx", na = "---")
Importing Excel files using xlsx package
The xlsx package, a java-based solution, is one of the powerful R packages to read, write and format Excel files.
Installing and loading xlsx package
- Install
install.packages("xlsx")
- Load
library("xlsx")
Using xlsx package
There are two main functions in xlsx package for reading both xls and xlsx Excel files: read.xlsx() and read.xlsx2() [faster on big files compared to read.xlsx function].
The simplified formats are:
read.xlsx(file, sheetIndex, header=TRUE)
read.xlsx2(file, sheetIndex, header=TRUE)
- file: file path
- sheetIndex: the index of the sheet to be read
- header: a logical value. If TRUE, the first row is used as column names.
Example of usage:
library("xlsx")
my_data <- read.xlsx(file.choose(), 1) # read first sheet
Summary
-
Read Excel files using readxl package: read_excel(file.choose(), sheet = 1)
- Read Excel files using xlsx package: read.xlsx(file.choose(), sheetIndex = 1)
Related articles
- Previous chapters
- R programming basics
- Best practices in preparing data files for importing into R
- Reading data from txt|csv files: R base functions
- Fast Reading of Data From txt|csv Files into R: readr package
- Next chapters
- Exporting data from R
Infos
This analysis has been performed using R (ver. 3.2.3).
Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.
Show me some love with the like buttons below… Thank you and please don’t forget to share and comment below!!
Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l’envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.
Montrez-moi un peu d’amour avec les like ci-dessous … Merci et n’oubliez pas, s’il vous plaît, de partager et de commenter ci-dessous!
Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article
In this article, we will be discussing two different techniques to read or import an excel file in R.
Approach
- Import module
- Pass path of the file to required function
- Read file
- Display content
Method 1: Using read_excel() from readxl
read_excel() function is basically used to import/read an excel file and it can only be accessed after importing of the readxl library in R language..
Syntax:
read_excel(path)
Example:
R
library
(readxl)
Data_gfg <-
read_excel
(
"Data_gfg.xlsx"
)
Data_gfg
Output:
Method 2: Using read.xlsx() from xlsx
read.xlsx() function is imported from the xlsx library of R language and used to read/import an excel file in R language.
Syntax:
read.xlsx(path)
Example:
R
install.packages
(
"xlsx"
)
Data_gfg <-
read.xlsx
(
'Data_gfg.xlsx'
)
Data_gfg
Output:
Like Article
Save Article
In this tutorial, we will learn how to work with Excel files in R statistical programming environment. It will provide an overview of how to use R to load xlsx files and write spreadsheets to Excel.
In the first section, we will go through, with examples, how to use R read an Excel file. More specifically, we are going to learn how to;
- read specific columns from a spreadsheet ,
- import multiple spreadsheets and combine them to one dataframe,
- read many Excel files,
- import Excel datasets using RStudio
Furthermore, in the last part we are going to focus on how to export dataframes to Excel files. More specifically, we are going to learn how to write;
- Excel files, rename the sheet
- to multiple sheets,
- multiple dataframes to a Excel file
How to Install R Packages
Now, before we continue with this Excel in R tutorial we are going to learn how to install the needed packages. In this post, we are going to use tidyverses readxl and the xlsx package to read xlsx files to dataframes.
Note, we are mainly using xlsx, in this post, because readxl cannot write Excel files, only import them into R.
# Install tidyverse
install.packages("tidyverse")
# or just readxl
install.packages("readxl")
# how to install xlsx
install.packages("xlsx")
Code language: R (r)
Now, Tidyverse comes with a lot of useful packages. For example, using the package dplyr (part of Tidyverse) you can remove duplicates in R, and rename a column in R’s dataframe.
How to install RStudio
In the final example, we are going to read xlsx files in R using the interactive development environment RStudio. Now, RStudio is quite easy to install. In this post, we will cover two methods for installing RStudio.
Here’s two steps for installing RStudio:
- Download RStudio here
- Click on the installation file and follow the instructions
Now, there’s another option to get both R statistical programming environment and the great general-purpose language of Python. That is, to install the Anaconda Python distribution.
Note, RStudio is a great Integrated Development Environment for carrying out data visualization and analysis using R. RStudio is mainly for R but we can also use other programming languages ( e.g., Python). That is, we typically don’t use RStudio for importing xlsx files only.
How to Read Excel Files to R Dataframes
Can R read xlsx files? In this section, we are going to find out that the answer is, of course, “yes”. We are going to learn how to load Excel files using Tidyverse (e.g., readxl).
More specifically, in this section, we are going to learn how to read Excel files and spreadsheets to dataframes in R. In the read Excel examples we will read xlsx files from both the hard drive and URLs.
How to Import an Excel file in R using read_excel
First, we are going to load the r-package(s) we need. How do I load a package in R? It can be done either by using the library or require functions. In the next code chunk, we are going to load readxl so we can use the read_excel function to read Excel files into R dataframes.
require(readxl)
Code language: R (r)
If we look at the documentation for the function, read_excel, that we are going to use in this tutorial we can see that it takes a range of arguments.
Now it’s time to learn how to use read_excel to read in data from an Excel file. The easiest way to use this method is to pass the file name as a character. If we don’t pass any other parameters, such as sheet name, it will read the first sheet in the index. In the first example we are not going to use any parameters:
df <- read_excel("example_sheets2.xlsx")
head(df)
Code language: R (r)
Here, the read_excel function reads the data from the Excel file into a tibble object. We can if we want to, change this tibble to a dataframe.
df <- as.data.frame(df)
Code language: R (r)
Now, after importing the data from the Excel file you can carry on with data manipulation if needed. It is, for instance, possible to remove a column, by name and index, with the R-package dplyr. Furthermore, if you installed tidyverse you will have a lot of tools that enable you to do descriptive statistics in R, and create scatter plots with ggplot2.
Importing an Excel File to R in Two Easy Steps:
Time needed: 1 minute.
Here’s a quick answer to the question how do I import Excel data into R?? Importing an Excel file into an R dataframe only requires two steps, given that we know the path, or URL, to the Excel file:
- Load the readxl package
First, you type library(readxl) in e.g. your R-script
- Import the XLSX file
Second, you can use read_excel function to load the .xlsx (or .xls) file
We now know how to easily load an Excel file in R and can continue with learning more about the read_excel function.
Reading Specific Columns using read_excel
In this section, we are going to learn how to read specific columns from an Excel file using R. Note, here we will also use the read.xlsx function from the package xlsx.
- How to use %in% in R: 7 Example Uses of the Operator
- Learn How to Transpose a Dataframe or Matrix in R with the t() Function
Loading Specific Columns using read_excel in R
In this section, we are going to learn how to read certain columns from an Excel sheet using R. Reading only some columns from an Excel sheet may be good if we, for instance, have large xlsx files and we don’t want to read all columns in the Excel file. When using readxl and the read_excel function we will use the range parameter together with cell_cols.
When using read.xlsx, to import Excel in R, we can use the parameter colIndex to select specific columns from the sheet. For example, if want to create a dataframe with the columns Player, Salary, and Position, we can accomplish this by adding 1, 3, and 4 in a vector:
require(xlsx)
cols <- c(1, 2, 3)
df <- read.xlsx('MLBPlayerSalaries.xlsx',
sheetName='MLBPlayerSalaries', colIndex=cols)
head(df)
Code language: R (r)
Handling Missing Data when we Import Excel File(s) in R
If someone has coded the data and used some kind of value to represent missing values in our dataset, we need to tell r, and the read_excel function, what these values are. In the next, R read Excel example, we are going to use the na parameter of the read_excel function. Here “-99” is what is codes as missing values.
Read Excel Example with Missing Data
In the example below, we are using the parameter na and we are putting in a character (i.e., “-99”):
df <- read_excel('SimData/example_sheets2.xlsx', 'Session2',
na = '-99')
head(df, 6)
Code language: R (r)
The example datasets we’ve used in the how to use R to read Excel files tutorial can be found here and here.
How to Skip Rows when Importing an xlsx File in R
In this section, we will learn how to skip rows when loading an Excel file into R. Here’s a link to the example xlsx file.
In the following, read xlsx in R examples we are going to use both read_excel and read.xlsx to read a specific sheet. Furthermore, we are also going to skip the first 2 rows in the Excel file.
Skip Rows using read_excel
Here, we will use the parameter sheet and put the characters ‘Session1’ to read the sheet named ‘Session1’. In a previous example, we just added the character ‘Session2’ to read that sheet.
Note, the first sheet will be read if we don’t use the sheet_name parameter. In this example, the important part is the parameter skiprow=2. We use this to skip the first two rows:
df <- read_excel('SimData/example_sheets.xlsx',
sheet='Session1', skip = 2)
head(df, 4)
Code language: R (r)
How to Skip Rows when Reading Excel Files in R using read.xlsx
When working with read.xlsx we use the startRow parameter to skip the first 2 rows in the Excel sheet.
df <- read.xlsx('SimData/example_sheets.xlsx',
sheetName='Session1', startRow=3)
Code language: HTML, XML (xml)
Reading Multiple Excel Sheets in R
In this section of the R read excel tutorial, we are going to learn how to read multiple sheets into R dataframes.
There are two sheets: ‘Session1’, and ‘Session2, in the example xlsx file (example_sheets2.xlsx). In this file, each sheet has data from two experimental sessions.
We are now learning how to read multiple sheets using readxl. More specifically, we are going to read the sheets ‘Session1’ and ‘Session2’. First, we are going to use the function excel_sheets to print the sheet names:
xlsx_data <- "SimData/example_sheets.xlsx"
excel_sheets(path = xlsx_data)
Code language: R (r)
Now if we want to read all the existing sheets in an Excel document we create a variable, called sheet_names.
After we have created this variable we use the lapply function and loop through the list of sheets, use the read_excel function, and end up with the list of dataframes (excel_sheets):
sheet_names <- excel_sheets(path = xlsx_data)
excel_sheets <- lapply(sheet_names , function(x) read_excel(path = xlsx_data, sheet = x))
str(excel_sheets)
Code language: R (r)
When working with Pandas read_excel w may want to join the data from all sheets (in this case sessions). Merging Pandas dataframes are quite easy. We just use the concat function and loop over the keys (i.e., sheets):
df <- do.call("rbind", excel_sheets)
head(df)
Code language: R (r)
Again, there might be other tasks that we need to carry out. For instance, we can also create dummy variables in R.
Reading Many Excel Files in R
In this section of the R read excel tutorial, we will learn how to load many files into an R dataframe.
For example, in some cases, we may have a bunch of Excel files containing data from different experiments or experimental sessions. In the next example, we are going to work with read_excel, again, together with the lapply function.
However, this time we just have a character vector with the file names and then we also use the paste0 function to paste the subfolder where the files are.
xlsx_files <- c("example_concat.xlsx",
"example_concat1.xlsx",
"example_concat3.xlsx")
dataframes <- lapply(xlsx_files, function(x)
read_excel(path = paste0("simData/", x)))
Code language: R (r)
Finally, we use the do.call function, again, to bind the dataframes together to one. Note, if we want, we can also use, the bind_cols function from the r-package dplyr (part of tidyverse).
df <- do.call("rbind", dataframes)
tail(df)
Code language: R (r)
Note, if we want, we can also use, the bind_cols function from the r-package dplyr (part of tidyverse).
dplyr::bind_rows(dataframes)
Code language: R (r)
Reading all Files in a Directory in R
In this section, we are going to learn how to read all xlsx files in a directory. Knowing this may come in handy if we store every xlsx file in a folder and don’t want to create a character vector, like above, by hand. In the next example, we are going to use R’s Sys.glob function to get a character vector of all Excel files.
xlsx_files <- Sys.glob('./simData/*.xlsx')
Code language: R (r)
After we have a character vector with all the file names that we want to import to R, we just use lapply and do.call (see previous code chunks).
Setting the Data type for data or columns
We can also, if we like, set the data type for the columns. Let’s use Pandas to read the example_sheets1.xlsx again. In the Pandas read_excel example below we use the dtype parameter to set the data type of some of the columns.
df <- read_excel('SimData/example_sheets2.xlsx',
col_types=c("text", "text", "numeric",
"numeric", "text"),
sheet='Session1')
str(df)
Code language: R (r)
Importing Excel Files in RStudio
Before we continue this Excel in R tutorial, we are going to learn how to load xlsx files to R using RStudio. This is quite simple, open up RStudio, click on the Environment tab (right in the IDE), and then Import Dataset. That is, in this section, we will answer the question of how do I import an Excel file into RStudio?
Now we’ll get a dropdown menu and we can choose from different types of sources. As we are going to work with Excel files we choose “From Excel…”:
In the next step, we klick “Browse” and go to the folder where our Excel data is located.
Now we get some alternatives. For instance, we can change the name of the dataframe to “df”, if we want (see image below). Furthermore, before we import the Excel file in RStudio we can also specify how the missing values are coded as well as rows to skip.
Finally, when we have set everything as we want we can hit the Import button in RStudio to read the datafile.
Writing R Dataframes to Excel
Excel files can, of course, be created in R. In this section, we will learn how to write an Excel file using R. As for now, we have to use the r-package xlsx to write .xlsx files. More specifically, to write to an Excel file we will use the write.xlsx function:
We will start by creating a dataframe with some variables.
df <- data.frame("Age" = c(21, 22, 20, 19, 18, 23), "Names" = c("Andreas", "George", "Steve",
"Sarah", "Joanna", "Hanna"))
str(df)
Code language: R (r)
Now that we have a dataframe to write to xlsx we start by using the write.xlsx function from the xlsx package.
library(xlsx)
write.xlsx(df, 'names_ages.xlsx',
sheetName = "Sheet1"
Code language: R (r)
In the output below the effect of not using any parameters is evident. If we don’t use the parameter sheetName we get the default sheet name, ‘Sheet1’.
As can be noted in the image below, the Excel file has column (‘A’) containing numbers. These are the index from the dataframe.
In the next example we are going to give the sheet another name and we will set the row.names parameter to FALSE.
write.xlsx(df, 'names_ages.xlsx',
sheetName = "Names and Ages",
row.names=FALSE)
Code language: R (r)
As can be seen, in the image above, we get a new sheet name and we don’t have the indexes as a column in the Excel sheet. Note, if you get the error ‘could not find function “write.xlsx”‘ it may be that you did not load the xlsx library.
Writing Multiple Pandas Dataframes to an Excel File:
In this section, we are going to learn how to write multiple dataframes to one Excel file. More specifically, we will use R and the xlsx package to write many dataframes to multiple sheets in an Excel file.
First, we start by creating three dataframes and add them to a list.
df1 <-data.frame('Names' = c('Andreas', 'George', 'Steve',
'Sarah', 'Joanna', 'Hanna'),
'Age' = c(21, 22, 20, 19, 18, 23))
df2 <- data.frame('Names' = c('Pete', 'Jordan', 'Gustaf',
'Sophie', 'Sally', 'Simone'),
'Age' = c(22, 21, 19, 19, 29, 21))
df3 <- data.frame('Names' = c('Ulrich', 'Donald', 'Jon',
'Jessica', 'Elisabeth', 'Diana'),
'Age' = c(21, 21, 20, 19, 19, 22))
dfs <- list(df1, df2, df3)
Code language: R (r)
Next, we are going to create a workbook using the createWorkbook function.
wb <- createWorkbook(type="xlsx")
Code language: R (r)
Finally, we are going to write a custom function that we are going to use together with the lapply function, later. In the code chunk below,
add_dataframes <- function(i){
df = dfs[i]
sheet_name = paste0("Sheet", i)
sheet = createSheet(wb, sheet_name)
addDataFrame(df, sheet=sheet, row.names=FALSE)
}
Code language: R (r)
It’s time to use the lapply function with our custom R function. On the second row, in the code chunk below, we are writing the workbook to an xlsx file using the saveWorkbook function:
lapply(seq_along(dfs), function(x) multiple_dataframe(x))saveWorkbook(wb, 'multiple_Sheets.xlsx')
Code language: R (r)
Summary: How to Work With Excel Files in R
In this working with Excel in R tutorial we have learned how to:
- Read Excel files and Spreadsheets using read_excel and read.xlsx
- Load Excel files to dataframes:
- Import Excel sheets and skip rows
- Merging many sheets to a dataframe
- Reading many Excel files into one dataframe
- Load Excel files to dataframes:
- Write a dataframe to an Excel file
- Creating many dataframes and writing them to an Excel file with many sheets
Excel is the most popular spreadsheet software used to store tabular data. So, it’s important to be able to efficiently import and export data from these files.
R’s xlsx package makes it easy to read, write, and format excel files.
The xlsx Package
The xlsx package provides necessary tools to interact with both .xls or .xlsx format files from R.
In order to get started you first need to install and load the package.
# Install and load xlsx package
install.packages("xlsx")
library("xlsx")
Read an Excel file
Suppose you have the following Excel file.
You can read the contents of an Excel worksheet using the read.xlsx()
or read.xlsx2()
function.
The read.xlsx()
function reads the data and creates a data frame.
# Read the first excel worksheet
library(xlsx)
mydata <- read.xlsx("mydata.xlsx", sheetIndex=1)
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
3 Amy 20 Developer Houston
read.xlsx() vs read.xlsx2()
Both the functions work exactly the same except, read.xlsx()
is slow for large data sets (worksheet with more than 100 000 cells).
On the contrary, read.xlsx2()
is faster on big files.
Specify a File Name
When you specify the filename only, it is assumed that the file is located in the current folder. If it is somewhere else, you can specify the exact path that the file is located at.
Remember! While specifying the exact path, characters prefaced by (like n r t etc.) are interpreted as special characters.
You can escape them using:
- Changing the backslashes to forward slashes like:
"C:/data/myfile.xlsx"
- Using the double backslashes like:
"C:\data\myfile.xlsx"
# Specify absolute path like this
mydata <- read.csv("C:/data/mydata.xlsx")
# or like this
mydata <- read.csv("C:\data\mydata.xlsx")
Specify Worksheet
When you use read.xlsx()
function, along with a filename you also need to specify the worksheet that you want to import data from.
To specify the worksheet, you can pass either an integer indicating the position of the worksheet (for example, sheetIndex=1
) or the name of the worksheet (for example, sheetName="Sheet1"
)
The following two lines do exactly the same thing; they both import the data in the first worksheet (called Sheet1):
mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1)
mydata <- read.xlsx("mydata.xlsx", sheetIndex = "Sheet1")
Import the Data as is
The read.xlsx()
function automatically coerces character data into a factor (categorical variable). You can see that by inspecting the structure of your data frame.
# By default, character data is coerced into a factor
mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1)
str(mydata)
'data.frame': 3 obs. of 4 variables:
$ name: Factor w/ 3 levels "Amy","Bob","Sam": 2 3 1
$ age : num 25 30 20
$ job : Factor w/ 2 levels "Developer","Manager": 2 1 1
$ city: Factor w/ 3 levels "Houston","New York",..: 3 2 1
If you want your data interpreted as string rather than a factor, set the stringsAsFactors
parameter to FALSE.
# Set stringsAsFactors parameter to TRUE to interpret the data as is
mydata <- read.xlsx("mydata.xlsx",
sheetIndex = 1,
stringsAsFactors = FALSE)
str(mydata)
'data.frame': 3 obs. of 4 variables:
$ name: chr "Bob" "Sam" "Amy"
$ age : num 25 30 20
$ job : chr "Manager" "Developer" "Developer"
$ city: chr "Seattle" "New York" "Houston"
Read Specific Range
If you want to read a range of rows, specify the rowIndex argument.
# Read first three lines of a file
mydata <- read.xlsx("mydata.xlsx",
sheetIndex = 1,
rowIndex = 1:3)
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
If you want to read a range of columns, specify the colIndex argument.
# Read first two columns of a file
mydata <- read.xlsx("mydata.xlsx",
sheetIndex = 1,
colIndex = 1:2)
mydata
name age
1 Bob 25
2 Sam 30
3 Amy 20
Specify Starting Row
Sometimes the excel file (like the file below) may contain notes, comments, headers, etc. at the beginning which you may not want to include.
To start reading data from a specified row in the excel worksheet, pass startRow argument.
# Read excel file from third row
mydata <- read.xlsx("mydata.xlsx",
sheetIndex = 1,
startRow = 3)
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
3 Amy 20 Developer Houston
Write Data to an Excel File
To write to an existing file, use write.xlsx()
method and pass the data in the form of matrix or data frame.
# Export data from R to an excel workbook
df
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
3 Amy 20 Developer Houston
write.xlsx(df, file = "mydata.xlsx")
Notice that the write.xlsx()
function prepends each row with a row name by default. If you don’t want row labels in your excel file, set row.names
to FALSE.
# Remove row labels while writing an excel File
write.xlsx(df, file="mydata.xlsx",
row.names = FALSE)
To set the name of the current worksheet, specify sheetName argument.
# Rename current worksheet
write.xlsx(df, file="mydata.xlsx",
row.names = FALSE,
sheetName = "Records")
Add Multiple Datasets at once
To add multiple data sets in the same Excel workbook, you have to set the append argument to TRUE.
# Write the first data set
write.xlsx(iris, file = "mydata.xlsx",
sheetName = "IRIS", append = FALSE)
# Add a second data set
write.xlsx(mtcars, file = "mydata.xlsx",
sheetName = "CARS", append = TRUE)
# Add a third data set
write.xlsx(Titanic, file = "mydata.xlsx",
sheetName = "TITANIC", append = TRUE)
Create and Format an Excel Workbook
Sometimes you may wish to create a .xlsx file with some formatting. With the help of xlsx package, you can edit titles, borders, column width, format data table, add plot and much more.
The following example shows how to do so:
Step 1. Create a new excel workbook
You can create a new workbook using the createWorkbook()
function.
# create new workbook
wb <- createWorkbook()
Step 2. Define cell styles for formatting the workbook
In R, using the CellStyle()
function you can create your own cell styles to change the appearance of, for example:
- The sheet title
- The row and column names
- Text alignment for the columns
- Cell borders around the columns
# define style for title
title_style <- CellStyle(wb) +
Font(wb, heightInPoints = 16,
isBold = TRUE)
# define style for row and column names
rowname_style <- CellStyle(wb) +
Font(wb, isBold = TRUE)
colname_style <- CellStyle(wb) +
Font(wb, isBold = TRUE) +
Alignment(wrapText = TRUE, horizontal = "ALIGN_CENTER") +
Border(color = "black",
position =c("TOP", "BOTTOM"),
pen =c("BORDER_THIN", "BORDER_THIN"))
Step 3. Create worksheet and add title
Before you add data, you have to create an empty worksheet in the workbook. You can do this by using the creatSheet()
function.
# create a worksheet named 'Data'
ws <- createSheet(wb, sheetName = "Data")
Step 4. Add sheet title
Here’s how you can add a title.
# create a new row
rows <- createRow(ws, rowIndex = 1)
# create a cell in the row to contain the title.
sheetTitle <- createCell(rows, colIndex = 1)
# set the cell value
setCellValue(sheetTitle[[1,1]], "Vapor Pressure of Mercury")
# set the cell style
setCellStyle(sheetTitle[[1,1]], title_style)
Step 5. Add a table into a worksheet
With the addDataframe()
function, you can add the data table in the newly created worksheet.
Below example adds built-in pressure dataset on row #3.
# add data table to worksheet
addDataFrame(pressure, sheet = ws, startRow = 3, startColumn = 1,
colnamesStyle = colname_style,
rownamesStyle = rowname_style,
row.names = FALSE)
Step 6. Add a plot into a worksheet
You can add a plot in the worksheet using the addPicture()
function.
# create a png plot
png("plot.png", height=900, width=1600, res=250, pointsize=8)
plot(pressure, xlab = "Temperature (deg C)",
ylab = "Pressure (mm of Hg)",
main = "pressure data: Vapor Pressure of Mercury",
col="red", pch=19, type="b")
dev.off()
# Create a new sheet to contain the plot
sheet <-createSheet(wb, sheetName = "plot")
# Add the plot created previously
addPicture("plot.png", sheet, scale = 1, startRow = 2,
startColumn = 1)
# Remove the plot from the disk
res<-file.remove("plot.png")
Step 7. Change column width
Now change the column width to fit the contents.
# change column width of first 2 columns
setColumnWidth(sheet = ws, colIndex = 1:2, colWidth = 15)
Step 8. Save the workbook
Finally, save the workbook with the saveWorkbook()
function.
# save workbook
saveWorkbook(wb, file = "mydata.xlsx")
Step 9. View the result
readxl
Overview
The readxl package makes it easy to get data out of Excel and into R.
Compared to many of the existing packages (e.g. gdata, xlsx,
xlsReadWrite) readxl has no external dependencies, so it’s easy to
install and use on all operating systems. It is designed to work with
tabular data.
readxl supports both the legacy .xls
format and the modern xml-based
.xlsx
format. The libxls C library
is used to support .xls
, which abstracts away many of the complexities
of the underlying binary format. To parse .xlsx
, we use the
RapidXML C++ library.
Installation
The easiest way to install the latest released version from CRAN is to
install the whole tidyverse.
install.packages("tidyverse")
NOTE: you will still need to load readxl explicitly, because it is not a
core tidyverse package loaded via library(tidyverse)
.
Alternatively, install just readxl from CRAN:
install.packages("readxl")
Or install the development version from GitHub:
#install.packages("pak") pak::pak("tidyverse/readxl")
Cheatsheet
You can see how to read data with readxl in the data import
cheatsheet, which also covers similar functionality in the related
packages readr and googlesheets4.
Usage
readxl includes several example files, which we use throughout the
documentation. Use the helper readxl_example()
with no arguments to
list them or call it with an example filename to get the path.
readxl_example() #> [1] "clippy.xls" "clippy.xlsx" "datasets.xls" "datasets.xlsx" #> [5] "deaths.xls" "deaths.xlsx" "geometry.xls" "geometry.xlsx" #> [9] "type-me.xls" "type-me.xlsx" readxl_example("clippy.xls") #> [1] "/private/tmp/Rtmpjectat/temp_libpath3b7822c649d8/readxl/extdata/clippy.xls"
read_excel()
reads both xls and xlsx files and detects the format from
the extension.
xlsx_example <- readxl_example("datasets.xlsx") read_excel(xlsx_example) #> # A tibble: 150 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> # … with 147 more rows xls_example <- readxl_example("datasets.xls") read_excel(xls_example) #> # A tibble: 150 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> # … with 147 more rows
List the sheet names with excel_sheets()
.
excel_sheets(xlsx_example) #> [1] "iris" "mtcars" "chickwts" "quakes"
Specify a worksheet by name or number.
read_excel(xlsx_example, sheet = "chickwts") #> # A tibble: 71 × 2 #> weight feed #> <dbl> <chr> #> 1 179 horsebean #> 2 160 horsebean #> 3 136 horsebean #> # … with 68 more rows read_excel(xls_example, sheet = 4) #> # A tibble: 1,000 × 5 #> lat long depth mag stations #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 -20.4 182. 562 4.8 41 #> 2 -20.6 181. 650 4.2 15 #> 3 -26 184. 42 5.4 43 #> # … with 997 more rows
There are various ways to control which cells are read. You can even
specify the sheet here, if providing an Excel-style cell range.
read_excel(xlsx_example, n_max = 3) #> # A tibble: 3 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa read_excel(xlsx_example, range = "C1:E4") #> # A tibble: 3 × 3 #> Petal.Length Petal.Width Species #> <dbl> <dbl> <chr> #> 1 1.4 0.2 setosa #> 2 1.4 0.2 setosa #> 3 1.3 0.2 setosa read_excel(xlsx_example, range = cell_rows(1:4)) #> # A tibble: 3 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa read_excel(xlsx_example, range = cell_cols("B:D")) #> # A tibble: 150 × 3 #> Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> #> 1 3.5 1.4 0.2 #> 2 3 1.4 0.2 #> 3 3.2 1.3 0.2 #> # … with 147 more rows read_excel(xlsx_example, range = "mtcars!B1:D5") #> # A tibble: 4 × 3 #> cyl disp hp #> <dbl> <dbl> <dbl> #> 1 6 160 110 #> 2 6 160 110 #> 3 4 108 93 #> # … with 1 more row
If NA
s are represented by something other than blank cells, set the
na
argument.
read_excel(xlsx_example, na = "setosa") #> # A tibble: 150 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 <NA> #> 2 4.9 3 1.4 0.2 <NA> #> 3 4.7 3.2 1.3 0.2 <NA> #> # … with 147 more rows
If you are new to the tidyverse conventions for data import, you may
want to consult the data import
chapter in R for Data Science.
readxl will become increasingly consistent with other packages, such as
readr.
Articles
Broad topics are explained in these
articles:
- Cell and Column
Types - Sheet
Geometry:
how to specify which cells to read - readxl
Workflows:
Iterating over multiple tabs or worksheets, stashing a csv snapshot
We also have some focused articles that address specific aggravations
presented by the world’s spreadsheets:
- Column
Names - Multiple Header
Rows
Features
-
No external dependency on, e.g., Java or Perl.
-
Re-encodes non-ASCII characters to UTF-8.
-
Loads datetimes into POSIXct columns. Both Windows (1900) and
Mac (1904) date specifications are processed correctly. -
Discovers the minimal data rectangle and returns that, by default.
User can exert more control withrange
,skip
, andn_max
. -
Column names and types are determined from the data in the sheet, by
default. User can also supply viacol_names
andcol_types
and
control name repair via.name_repair
. -
Returns a
tibble, i.e. a
data frame with an additionaltbl_df
class. Among other things, this
provide nicer printing.
Other relevant packages
Here are some other packages with functionality that is complementary to
readxl and that also avoid a Java dependency.
Writing Excel files: The example files datasets.xlsx
and
datasets.xls
were created with the help of
openxlsx (and Excel).
openxlsx provides “a high level interface to writing, styling and
editing worksheets”.
l <- list(iris = iris, mtcars = mtcars, chickwts = chickwts, quakes = quakes) openxlsx::write.xlsx(l, file = "inst/extdata/datasets.xlsx")
writexl is a new option in
this space, first released on CRAN in August 2017. It’s a portable and
lightweight way to export a data frame to xlsx, based on
libxlsxwriter. It is much
more minimalistic than openxlsx, but on simple examples, appears to be
about twice as fast and to write smaller files.
Non-tabular data and formatting:
tidyxl is focused on
importing awkward and non-tabular data from Excel. It also “exposes cell
content, position and formatting in a tidy structure for further
manipulation”.
Importing Data from Excel
Excel is a spreadsheet application, which is widely used by many institutions to store data. This tutorial will give a brief of reading, writing and manipulating the data in Excel files using R. We will learn about various R packages and extensions to read and import Excel files. At the end of this section, we have written about some common problems encountered while loading Excel files and spreadsheet data.
Before we import the data from Excel spreadsheet into R, there are some standard practices to tone your data, to avoid any unnecessary error.
- The first column of the spreadsheet is used to identify the sample dataset, therefore it should be a unique key id. Similarly the first row is reserved for header, describing the scheme of the data.
- Concatenating words in the cells should be done using ‘.’. For example, ‘Sample.data’.
- The names and header of the data scheme should usually avoid symbols.
- All missing data points in the Excel spreadsheet should be indicated with ‘NA’.
Before you import the Excel data in R, you would need to set the console in R to working directory.
>getwd()//To get the working directory at the moment
>setwd(“”)
Before we look into the packages available to extract data from Excel spreadsheet, we will show you simple R commands that can do the job. Utlis package is one of the core packages which contains bunch of basic utility functions and the following commands are part of this package.
- read.table()
dataset <-read.table(“”,
header =TRUE)
The first argument of read.table() function is the name of the text file within the double quotes and if the data file has a header for data schema in the top row, the second argument will be true. This function will work for files, which are saved in .txt format.
Learn Data Science by working on interesting Data Science Projects
Reading data from an excel file is incredibly easy and it can be done using several packages. You can export the Excel file to a Comma delimited file and import it using the method shown in the tutorial Importing Data from Flat Files in R. Another method to Import Data in R from Excel is using xlsx package, which I used to access Excel files. The first row should contain variable names.
//read in the excel sheet from workbook sample_excel.xls
//variable name in the first row
library(xlsx)
sampledata<- read.xlsx(“sample_excel.xls”,
sheetName=”sample_sheet1”)
It is necessary that while using read.xlsx function, we mention the sheet index or the sheet name. If the required dataset is bigger, then read.xlsx2() function is used.
Sample.data <- read.xlsx2(“sample_excel.xls”,
sheetName=”sample_sheeet1”,
startRow = 100,
colIndex = 100)
Additionally in the function above, user can mention the end row or the data import can be limited to certain row and column index. xlsx package does a lot more than importing data from Excel files, it can also manipulate the data and write data frames into the spreadsheets. The data frames can be written to Excel workbook using the function write.xlsx().
>write.xlsx(Sample.data,
“Sample_Sheet.xls”,
sheetName=”sample_sheet1”)
Apart from the xlsx package, we have gdata package, which has functions that can read from data in the Excel format. gdata provides a cross platform solution for importing data from Excel files into R. The read.xls function in particular can read data from an Excel spreadsheet and gives data frame as output. Take for example a sample Excel spreadsheet, named ‘Sample_Sheet.xls’ and to use this method, you would require Perl runtime in your system.
>library(gdata)//Load gdata package
>sample_data = read.xls(“Sample_Sheet.xls”)//Read data from the sheet
This function converts the Sample_Sheet.xls file into a temporary .csv or .tab limited file using Perl. While executing read.xls function, R will search for a path to the excel file and looks out for Perl on its way. If it doesn’t find perl.exe, then R will return an error. To avoid this error, another argument for the function can be given to search for the Perl executable file.
>sample.data <- read.xlsx(“Sample_Sheet.xlsx,
sheetIndex = 1,
perl = “C:/Perl/bin/perl.exe””)
gdata has several other functions to convert the Excel file into various other formats. Such as:
- xls2sep()
- xls2csv()
- xls2tab()
- xls2tsv()
The input arguments for these functions are same as that for read.xlsx() function.
Another package that can do the job of importing data from Excel is the XLConnect package using the loadWorkbook function. This function can be used to read the entire workbook, followed by readWorksheet function to load the worksheets into R. Java is required to be pre-installed for this package to work. This package also provides function to create Excel workbooks, and export data to them.
>library(XLConnect)
>Sample_Workbook = loadWorkbook(“Sample_Sheet.xls”)
>Sample_Data = readWorksheet (Sample_Workbook,
sheet=”Sheet1”)
Other arguments can also be added after the Index argument such as startCol or StartRow or endCol or endRow to indicate and limit the cells that are required to be imported from the Excel workbook. Another argument ‘region’ can also be used in this function to highlight the range of starting and ending rows and columns.