Read data from excel file in к

In this article, we will discuss how to import an excel file in the R Programming Language. There two different types of approaches to import the excel file into the R programming language and those are discussed properly below.

File in use:

Method 1: Using read_excel()

In this approach to import the Excel file in the R, the user needs to call the read_excel() function from readxl library of the R language with the name of the file as the parameter. readxl() package can import both .xlsx and .xls files. This package is pre-installed in R-Studio. With the use of this function, the user will be able to import the Excel file in R.

Syntax: read_excel(filename, sheet, dtype = “float32”)

Parameters:

  • filename:-File name to read from.
  • sheet:-Name of the sheet in Excel file.
  • dtype:-Numpy data type.

Returns:

The variable is treated to be a data frame.

Example:

R

library(readxl)

gfg_data=read_excel('Data_gfg.xlsx')

gfg_data

Output:

Method 2: Using inbuilt menu Options of Rstudio

This approach is the easy approach to import the excel file in R compared with the previous one as this is the only approach to import an excel file in R where the user need not type any code in the console to import the excel file. Further, here user just needs to work on the environment window of the studio.

Environment window of the Rstudio:

Steps to import excel file using Dataset option from the environment window of Rstudio:

Step 1: Select the Import Dataset option in the environment window. Here the user needs to select the option to import the dataset from the environment window in Rstudio.

Step 2: Select the option of “From excel” under the import Dataset option. In this step, the user needs to select the option to “from excel” as the file is in the form of excel under the import dataset option to import the excel file.

Step 3: Select the browse option and select the excel file to be imported. Now, under this with the click to the browse option user will be given the choice to select the needed excel file to be imported in R.And then the user need to select the needed excel file to be imported in R.

Step 4: Select the import option and the excel file is successfully imported. Now, in this final step user need to select the import button and this will lead to successful importation of the selected excel file by the user in R.

The user chose the dataset according to their choice means that changing the name of the file and type of sheet. May be there are 2 sheets , he choose 2nd one then he choose the 2nd list with the help of sheet option and in max rows how much rows he wants from the data he put into it. And in  skip box , he skips the rows how much he want. In NA box, he write some value in it , if this value is in the data then it makes as NA.

There is also another method for to import the excel files into R-Studio.
Step 1: Click on file
Step 2: In file, click import dataset then choose from excel.

readxl

CRAN_Status_Badge
R-CMD-check
Codecov test coverage
lifecycle

Overview

The readxl package makes it easy to get data out of Excel and into R.
Compared to many of the existing packages (e.g. gdata, xlsx,
xlsReadWrite) readxl has no external dependencies, so it’s easy to
install and use on all operating systems. It is designed to work with
tabular data.

readxl supports both the legacy .xls format and the modern xml-based
.xlsx format. The libxls C library
is used to support .xls, which abstracts away many of the complexities
of the underlying binary format. To parse .xlsx, we use the
RapidXML C++ library.

Installation

The easiest way to install the latest released version from CRAN is to
install the whole tidyverse.

install.packages("tidyverse")

NOTE: you will still need to load readxl explicitly, because it is not a
core tidyverse package loaded via library(tidyverse).

Alternatively, install just readxl from CRAN:

install.packages("readxl")

Or install the development version from GitHub:

#install.packages("pak")
pak::pak("tidyverse/readxl")

Cheatsheet

You can see how to read data with readxl in the data import
cheatsheet
, which also covers similar functionality in the related
packages readr and googlesheets4.

Usage

readxl includes several example files, which we use throughout the
documentation. Use the helper readxl_example() with no arguments to
list them or call it with an example filename to get the path.

readxl_example()
#>  [1] "clippy.xls"    "clippy.xlsx"   "datasets.xls"  "datasets.xlsx"
#>  [5] "deaths.xls"    "deaths.xlsx"   "geometry.xls"  "geometry.xlsx"
#>  [9] "type-me.xls"   "type-me.xlsx"
readxl_example("clippy.xls")
#> [1] "/private/tmp/Rtmpjectat/temp_libpath3b7822c649d8/readxl/extdata/clippy.xls"

read_excel() reads both xls and xlsx files and detects the format from
the extension.

xlsx_example <- readxl_example("datasets.xlsx")
read_excel(xlsx_example)
#> # A tibble: 150 × 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 setosa 
#> 2          4.9         3            1.4         0.2 setosa 
#> 3          4.7         3.2          1.3         0.2 setosa 
#> # … with 147 more rows

xls_example <- readxl_example("datasets.xls")
read_excel(xls_example)
#> # A tibble: 150 × 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 setosa 
#> 2          4.9         3            1.4         0.2 setosa 
#> 3          4.7         3.2          1.3         0.2 setosa 
#> # … with 147 more rows

List the sheet names with excel_sheets().

excel_sheets(xlsx_example)
#> [1] "iris"     "mtcars"   "chickwts" "quakes"

Specify a worksheet by name or number.

read_excel(xlsx_example, sheet = "chickwts")
#> # A tibble: 71 × 2
#>   weight feed     
#>    <dbl> <chr>    
#> 1    179 horsebean
#> 2    160 horsebean
#> 3    136 horsebean
#> # … with 68 more rows
read_excel(xls_example, sheet = 4)
#> # A tibble: 1,000 × 5
#>     lat  long depth   mag stations
#>   <dbl> <dbl> <dbl> <dbl>    <dbl>
#> 1 -20.4  182.   562   4.8       41
#> 2 -20.6  181.   650   4.2       15
#> 3 -26    184.    42   5.4       43
#> # … with 997 more rows

There are various ways to control which cells are read. You can even
specify the sheet here, if providing an Excel-style cell range.

read_excel(xlsx_example, n_max = 3)
#> # A tibble: 3 × 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 setosa 
#> 2          4.9         3            1.4         0.2 setosa 
#> 3          4.7         3.2          1.3         0.2 setosa
read_excel(xlsx_example, range = "C1:E4")
#> # A tibble: 3 × 3
#>   Petal.Length Petal.Width Species
#>          <dbl>       <dbl> <chr>  
#> 1          1.4         0.2 setosa 
#> 2          1.4         0.2 setosa 
#> 3          1.3         0.2 setosa
read_excel(xlsx_example, range = cell_rows(1:4))
#> # A tibble: 3 × 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 setosa 
#> 2          4.9         3            1.4         0.2 setosa 
#> 3          4.7         3.2          1.3         0.2 setosa
read_excel(xlsx_example, range = cell_cols("B:D"))
#> # A tibble: 150 × 3
#>   Sepal.Width Petal.Length Petal.Width
#>         <dbl>        <dbl>       <dbl>
#> 1         3.5          1.4         0.2
#> 2         3            1.4         0.2
#> 3         3.2          1.3         0.2
#> # … with 147 more rows
read_excel(xlsx_example, range = "mtcars!B1:D5")
#> # A tibble: 4 × 3
#>     cyl  disp    hp
#>   <dbl> <dbl> <dbl>
#> 1     6   160   110
#> 2     6   160   110
#> 3     4   108    93
#> # … with 1 more row

If NAs are represented by something other than blank cells, set the
na argument.

read_excel(xlsx_example, na = "setosa")
#> # A tibble: 150 × 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 <NA>   
#> 2          4.9         3            1.4         0.2 <NA>   
#> 3          4.7         3.2          1.3         0.2 <NA>   
#> # … with 147 more rows

If you are new to the tidyverse conventions for data import, you may
want to consult the data import
chapter in R for Data Science.
readxl will become increasingly consistent with other packages, such as
readr.

Articles

Broad topics are explained in these
articles:

  • Cell and Column
    Types
  • Sheet
    Geometry:
    how to specify which cells to read
  • readxl
    Workflows:
    Iterating over multiple tabs or worksheets, stashing a csv snapshot

We also have some focused articles that address specific aggravations
presented by the world’s spreadsheets:

  • Column
    Names
  • Multiple Header
    Rows

Features

  • No external dependency on, e.g., Java or Perl.

  • Re-encodes non-ASCII characters to UTF-8.

  • Loads datetimes into POSIXct columns. Both Windows (1900) and
    Mac (1904) date specifications are processed correctly.

  • Discovers the minimal data rectangle and returns that, by default.
    User can exert more control with range, skip, and n_max.

  • Column names and types are determined from the data in the sheet, by
    default. User can also supply via col_names and col_types and
    control name repair via .name_repair.

  • Returns a
    tibble, i.e. a
    data frame with an additional tbl_df class. Among other things, this
    provide nicer printing.

Other relevant packages

Here are some other packages with functionality that is complementary to
readxl and that also avoid a Java dependency.

Writing Excel files: The example files datasets.xlsx and
datasets.xls were created with the help of
openxlsx (and Excel).
openxlsx provides “a high level interface to writing, styling and
editing worksheets”.

l <- list(iris = iris, mtcars = mtcars, chickwts = chickwts, quakes = quakes)
openxlsx::write.xlsx(l, file = "inst/extdata/datasets.xlsx")

writexl is a new option in
this space, first released on CRAN in August 2017. It’s a portable and
lightweight way to export a data frame to xlsx, based on
libxlsxwriter. It is much
more minimalistic than openxlsx, but on simple examples, appears to be
about twice as fast and to write smaller files.

Non-tabular data and formatting:
tidyxl is focused on
importing awkward and non-tabular data from Excel. It also “exposes cell
content, position and formatting in a tidy structure for further
manipulation”.

Excel is a spreadsheet developed by Microsoft, which allows you to manage data in a very simple way. Until 2007, the XLS was the main file extension. However, in the 2007 release the XLSX (XML-based) extension was introduced to become the default workbook format. In this tutorial you will learn how to read Excel files into R and RStudio with several packages.

  • 1 How to import Excel files into R?
    • 1.1 Import Excel data into RStudio from the menu
    • 1.2 Read XLSX without JAVA in R: readxl and openxlsx
      • 1.2.1 readxl package
      • 1.2.2 openxlsx package
    • 1.3 The xlsx package
    • 1.4 XLConnect package
  • 2 Convert XLSX files to CSV in R

How to import Excel files into R?

If you need to read an Excel in R, you will need to use a specific package to achieve it. There are several options, but the best packages for reading Excel files could be openxlsx and readxl, as they don’t depend on JAVA (unlike xlsx and XLConnect packages) nor Perl (gdata package).

Note that external dependencies can cause errors when loading the packages, but for huge datasets they should be faster than the other alternatives.

If you are using RStudio you can go to FileImport DatasetFrom Excel.... Then, you can browse your Excel file and customize the output (the name of the variable, the sheet, cells range, …). You can also see a preview of the code that will be executed in the backend and of the data that will be loaded:

Import Excel file into RStudio
Read Excel file in RStudio with the menu

Note that, with this approach, you will need to have installed the readxl package.

Read XLSX without JAVA in R: readxl and openxlsx

readxl package

The readxl package is part of the tidyverse package, created by Hadley Wickham (chief scientist at RStudio) and his team. This package supports XLS via the libxls C library and XLSX files via the RapidXML C++ library without using external dependencies.

The package provides some Excel (XLS and XLSX) files stored in the installation folder of the package, so in order to create a reproducible example, in the following examples we are going to use the clippy.xlsx file, which first sheet is as follows:

Sample XLSX in R

In order to load the path of the sample Excel file you can make use of the readxl_example function. Once loaded, or once you have the path of your own Excel file, you can use the excel_sheets function to check the Excel file sheet names, if needed.

# install.packages("readxl")
library(readxl)

# Get the path of a sample XLSX dataset of the package
file_path <- readxl_example("clippy.xlsx")

# Check the Sheet names of the Excel file
excel_sheets(file_path) # "list-column" "two-row-header"

The generic function of the package to read Excel files into R is the read_excel function, which guesses the file type (XLS or XLSX) depending on the file extension and the file itself.

read_excel(file_path)
# A tibble: 4 x 2
  name                 value    
  <chr>                <chr>    
1 Name                 Clippy   
2 Species              paperclip
3 Approx date of death 39083    
4 Weight in grams      0.9 

The sheet argument allows you to specify the sheet you want to load, passing its name or the corresponding number of the tab. Note that, by default, the function loads the first Excel sheet.

# Selecting the other sheet of the Excel file
read_excel(file_path, sheet = "two-row-header")
read_excel(file_path, sheet = 2) # Equivalent
# A tibble: 2 x 4
  name       species              death                 weight    
  <chr>      <chr>                <chr>                 <chr>     
1 (at birth) (office supply type) (date is approximate) (in grams)
2 Clippy     paperclip            39083                 0.9 

You can also skip rows with the skip argument of the function:

# Skip first row
read_excel(file_path, skip = 1)
# A tibble: 3 x 2
  Name                 Clippy   
  <chr>                <chr>    
1 Species              paperclip
2 Approx date of death 39083    
3 Weight in grams      0.9  

Note that you could also specify a range of cells to be selected with the range argument. In this case, the skip argument won’t be taken into account if you specify it.

read_excel(file_path, range = "B1:B5")
# A tibble: 4 x 1
  value    
  <chr>    
1 Clippy   
2 paperclip
3 39083    
4 0.9 

In addition, if you want to avoid reading the column names, you can set the col_names argument to FALSE:

read_excel(file_path, col_names = FALSE)
New names:
* `` -> ...1
* `` -> ...2
                  ...1      ...2
1                 name     value
2                 Name    Clippy
3              Species paperclip
4 Approx date of death     39083
5      Weight in grams       0.9

However, you may have noticed that the output is of class tibble (a modern type of data frame). If you want the output to be of class data.frame you will need to use the as.data.frame function as follows:

data <- read_excel(file_path, skip = 1)
as.data.frame(data)
                  Name    Clippy
1              Species paperclip
2 Approx date of death     39083
3      Weight in grams       0.9

Recall that the read_excel function guesses the file extension. Nonetheless, if you know the file extension you are going to read you can use the corresponding function of the following to avoid guessing:

# If you know the extension of your Excel file
# use one of these functions instead

# For XLS files
read_xls()

# For XLSX files
read_xlsx()

openxlsx package

The openxlsx package uses Rcpp and, as it doesn’t depend on JAVA, it is an interesting alternative to to the readxl package to read an Excel file in R. The differences respect to the previous package are that the output is of class data.frame by default instead of tibble and that its main use is not just importing Excel files, as it also provides a wide variety of functions to write, style and edit Excel files.

The function to read XLSX files is named read.xlsx:

# install.packages("openxlsx")
library(openxlsx)

read.xlsx(file_path)
                  name     value
1                 Name    Clippy
2              Species paperclip
3 Approx date of death     39083
4      Weight in grams       0.9

As in the function of the previous package, there are several arguments you can customize, as sheet, skip or colNames. If you want to select specific cells you can make use of the rows and cols arguments. Recall to type ?read.xlsx or help(read.xlsx) for additional information.

read.xlsx(file_path, cols = 1:2, rows = 2:3)
     Name    Clippy
1 Species paperclip

The xlsx package

Although this package requires JAVA installed on your computer it is very popular. The main functions to import Excel files are the read.xlsx and read.xlsx2. The second has slightly differences in the default arguments and it does more work in JAVA, achieving better performance.

# install.packages("xlsx")
library(xlsx)

read.xlsx(file_path)
read.xlsx2(file_path)

You can customize several arguments as sheetIndex, sheetName, header, rowIndex, colIndex, among others. Run ?read.xlsx or help(read.xlsx) for additional details.

XLConnect package

An alternative to the xlsx package is XLConnect, which allows writing, reading and formatting Excel files. In order to load an Excel file into R you can use the readWorksheetFromFile function as follows. We recommend you to type ??XLConnect to look for additional information of the arguments of each function of the package.

# install.packages("XLConnect")
library(XLConnect)

data <- readWorksheetFromFile(file_path, sheet = "list-column",
                              startRow = 1, endRow = 5,
                              startCol = 1, endCol = 2)

In case you want to load multiple sheets, it is recommended to use the loadWorkbook function and then load each sheet with the readWorksheet function:

load <- loadWorkbook(file_path)

data <- readWorksheet(load, sheet = "list-column",
                      startRow = 1, endRow = 5,
                      startCol = 1, endCol = 2)

data2 <- readWorksheet(load, sheet = "two-row-header",
                       startRow = 1, endRow = 3,
                       startCol = 1, endCol = 4)

Moreover, this package provides a function to load Excel named regions. Analogous to the previous example, you can import just a region with the readNamedRegionFromFile, specifying the file name (if the file is in your working directory) or the file path and the region name.

data <- readNamedRegionFromFile(file, # File path
                                name, # Region name
                                ...)  # Arguments of readNamedRegion()

If you want to load multiple named regions you can load the workbook with the loadWorkbook function and then import each region with the readNamedRegion function.

load <- loadWorkbook(file_path)

data <- readNamedRegion(load, name_Region_1, ...)
data2 <- readNamedRegion(load, name_Region_2, ...)

It is worth to mention that if you are experiencing issues with the packages that require JAVA you can get and set the path of JAVA in R with the following codes:

# Prints the path of JAVA Home in R
Sys.getenv("JAVA_HOME")

# Sets the path of JAVA
Sys.setenv(JAVA_HOME = "path_to_jre_java_folder")

Note that you will need to specify the path to the jre folder inside the Java folder of your computer, which you should find inside Program Files.

Convert XLSX files to CSV in R

Finally, you could also convert your Excel files into a CSV format and read the CSV file in R. For this purpose, you can use the convert function of the rio package. An alternative would be saving directly the Excel file as CSV with the menu of Microsoft Excel.

# install.packages("rio")
library(rio)

convert(file_path, "file.csv")

Looking to import an Excel file into R?

If so, you’ll see the full steps to import your file using the readxl package.

To start, here is a template that you can use to import an Excel file into R:

library("readxl")
read_excel("Path where your Excel file is stored\File Name.xlsx")

And if you want to import a specific sheet within the Excel file, then you may use this template:

library("readxl")
read_excel("Path where your Excel file is stored\File Name.xlsx",sheet = "Your sheet name") 

Note: For previous versions of Excel, use the file extension of .xls

Step 1: Install the readxl package

In the R Console, type the following command to install the readxl package:

install.packages("readxl")

Follow the instructions to complete the installation. You may want to check the following guide that explains how to install a package in R.

Step 2: Prepare your Excel File

Let’s suppose that you have an Excel file with some data about products:

Product Price
Refrigerator 1200
Oven 750
Dishwasher 900
Coffee Maker 300

And let’s say that the Excel file name is product_list, and your goal is to import that file into R.

Step 3: Import the Excel file into R

In order to import your file, you’ll need to apply the following template in the R Editor:

library("readxl")
read_excel("Path where your Excel file is stored\File Name.xlsx")

For demonstration purposes, let’s assume that an Excel file is stored under the following path:

C:\Users\Ron\Desktop\Test\product_list.xlsx

Where:

  • product_list is the actual file name; and
  • .xlsx is the Excel file extension. For previous versions of Excel, use the file extension of .xls

Note that a double backslash (‘\’) was used within the path name. By adding a double backslash, you’ll avoid the following error in R:

Error: ‘U’ used without hex digits in character string starting “”C:U”

Here is the complete code to import the Excel file for our example:

library("readxl")
read_excel("C:\Users\Ron\Desktop\Test\product_list.xlsx")

You’ll need to adjust the path to reflect the location where the Excel file is stored on your computer. Once you run the code in R, you’ll get the same values as in the Excel file:

  Product       Price
1 Refrigerator   1200
2 Oven            750
3 Dishwasher      900
4 Coffee Maker    300

Alternatively, you may also want to check the following guide that explains how to export your data to an Excel file.


The easiest way to import an Excel file into R is by using the read_excel() function from the readxl package.

This function uses the following syntax:

read_excel(path, sheet = NULL)

where:

  • path: Path to the xls/xlsx file
  • sheet: The sheet to read. This can be the name of the sheet or the position of the sheet. If this is not specified, the first sheet is read.

This tutorial provides an example of how to use this function to import an Excel file into R.

Example: Import an Excel File into R

Suppose I have an Excel file saved in the following location:

C:UsersBobDesktopdata.xlsx

The file contains the following data:

Import Excel into R

The following code shows how to import this Excel file into R:

#install and load readxl package
install.packages('readxl')
library(readxl)

#import Excel file into R
data <- read_excel('C:\Users\Bob\Desktop\data.xlsx')

Note that we used double backslashes (\) in the file path to avoid the following common error:

Error: 'U' used without hex digits in character string starting ""C:U"

We can use the following code to quickly view the data:

#view entire dataset
data

#A tibble: 5 x 3
 team  points  assists
 <chr>   <dbl>   <dbl>
1 A         78      12
2 B         85      20
3 C         93      23
4 D         90       8
5 E         91      14

We can see that R imported the Excel file and automatically determined that team was a string variable while points and assists were numerical variables.

Additional Resources

The following tutorials explain how to import other file types into R:

How to Import CSV Files into R
How to Import SAS Files into R
How to Manually Enter Raw Data in R


Понравилась статья? Поделить с друзьями:
  • Read and guess what the underlined word mean
  • Read csv file in excel
  • Read and fill in the gaps which word is missing
  • Read common word families
  • Read and fill in the correct word kazakhstan in action