How to open excel file in python - Word и Excel - помощь в работе с программами

How do I open a file that is an Excel file for reading in Python?

I’ve opened text files, for example, sometextfile.txt with the reading command. How do I do that for an Excel file?

asked Jul 13, 2010 at 16:26

Edit:
In the newer version of pandas, you can pass the sheet name as a parameter.

file_name =  # path to file + file name
sheet =  # sheet name or sheet number or list of sheet numbers and names

import pandas as pd
df = pd.read_excel(io=file_name, sheet_name=sheet)
print(df.head(5))  # print first 5 rows of the dataframe

Check the docs for examples on how to pass sheet_name:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

Old version:
you can use pandas package as well….

When you are working with an excel file with multiple sheets, you can use:

import pandas as pd
xl = pd.ExcelFile(path + filename)
xl.sheet_names

>>> [u'Sheet1', u'Sheet2', u'Sheet3']

df = xl.parse("Sheet1")
df.head()

df.head() will print first 5 rows of your Excel file

If you’re working with an Excel file with a single sheet, you can simply use:

import pandas as pd
df = pd.read_excel(path + filename)
print df.head()

answered Jun 25, 2013 at 7:16

Try the xlrd library.

[Edit] — from what I can see from your comment, something like the snippet below might do the trick. I’m assuming here that you’re just searching one column for the word ‘john’, but you could add more or make this into a more generic function.

from xlrd import open_workbook

book = open_workbook('simple.xls',on_demand=True)
for name in book.sheet_names():
    if name.endswith('2'):
        sheet = book.sheet_by_name(name)

        # Attempt to find a matching row (search the first column for 'john')
        rowIndex = -1
        for cell in sheet.col(0): # 
            if 'john' in cell.value:
                break

        # If we found the row, print it
        if row != -1:
            cells = sheet.row(row)
            for cell in cells:
                print cell.value

        book.unload_sheet(name)

answered Jul 13, 2010 at 16:29

Jon CageJon Cage

36k36 gold badges135 silver badges213 bronze badges

This isn’t as straightforward as opening a plain text file and will require some sort of external module since nothing is built-in to do this. Here are some options:

http://www.python-excel.org/

If possible, you may want to consider exporting the excel spreadsheet as a CSV file and then using the built-in python csv module to read it:

http://docs.python.org/library/csv.html

answered Jul 13, 2010 at 16:29

Donald MinerDonald Miner

38.5k8 gold badges90 silver badges117 bronze badges

There’s the openpxyl package:

>>> from openpyxl import load_workbook
>>> wb2 = load_workbook('test.xlsx')
>>> print wb2.get_sheet_names()
['Sheet2', 'New Title', 'Sheet1']

>>> worksheet1 = wb2['Sheet1'] # one way to load a worksheet
>>> worksheet2 = wb2.get_sheet_by_name('Sheet2') # another way to load a worksheet
>>> print(worksheet1['D18'].value)
3
>>> for row in worksheet1.iter_rows():
>>>     print row[0].value()

answered Jun 22, 2015 at 18:57

wordsforthewisewordsforthewise

13.1k5 gold badges83 silver badges113 bronze badges

This may help:

This creates a node that takes a 2D List (list of list items) and pushes them into the excel spreadsheet. make sure the IN[]s are present or will throw and exception.

this is a re-write of the Revit excel dynamo node for excel 2013 as the default prepackaged node kept breaking. I also have a similar read node. The excel syntax in Python is touchy.

thnx @CodingNinja — updated : )

###Export Excel - intended to replace malfunctioning excel node

import clr

clr.AddReferenceByName('Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c')
##AddReferenceGUID("{00020813-0000-0000-C000-000000000046}") ''Excel                            C:Program FilesMicrosoft OfficeOffice15EXCEL.EXE 
##Need to Verify interop for version 2015 is 15 and node attachemnt for it.
from Microsoft.Office.Interop import  * ##Excel
################################Initialize FP and Sheet ID
##Same functionality as the excel node
strFileName = IN[0]             ##Filename
sheetName = IN[1]               ##Sheet
RowOffset= IN[2]                ##RowOffset
ColOffset= IN[3]                ##COL OFfset
Data=IN[4]                      ##Data
Overwrite=IN[5]                 ##Check for auto-overwtite
XLVisible = False   #IN[6]      ##XL Visible for operation or not?

RowOffset=0
if IN[2]>0:
    RowOffset=IN[2]             ##RowOffset

ColOffset=0
if IN[3]>0:
    ColOffset=IN[3]             ##COL OFfset

if IN[6]<>False:
    XLVisible = True #IN[6]     ##XL Visible for operation or not?

################################Initialize FP and Sheet ID
xlCellTypeLastCell = 11                 #####define special sells value constant
################################
xls = Excel.ApplicationClass()          ####Connect with application
xls.Visible = XLVisible                 ##VISIBLE YES/NO
xls.DisplayAlerts = False               ### ALerts

import os.path

if os.path.isfile(strFileName):
    wb = xls.Workbooks.Open(strFileName, False)     ####Open the file 
else:
    wb = xls.Workbooks.add#         ####Open the file 
    wb.SaveAs(strFileName)
wb.application.visible = XLVisible      ####Show Excel
try:
    ws = wb.Worksheets(sheetName)       ####Get the sheet in the WB base

except:
    ws = wb.sheets.add()                ####If it doesn't exist- add it. use () for object method
    ws.Name = sheetName



#################################
#lastRow for iterating rows
lastRow=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Row
#lastCol for iterating columns
lastCol=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Column
#######################################################################
out=[]                                  ###MESSAGE GATHERING

c=0
r=0
val=""
if Overwrite == False :                 ####Look ahead for non-empty cells to throw error
    for r, row in enumerate(Data):   ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
        for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
            if col.Value2 >"" :
                OUT= "ERROR- Cannot overwrite"
                raise ValueError("ERROR- Cannot overwrite")
##out.append(Data[0]) ##append mesage for error
############################################################################

for r, row in enumerate(Data):   ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
    for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
        ws.Cells[r+1+RowOffset,c+1+ColOffset].Value2 = col.__str__()

##run macro disbled for debugging excel macro
##xls.Application.Run("Align_data_and_Highlight_Issues")

answered Apr 29, 2018 at 20:03

Apsis0215Apsis0215

931 silver badge9 bronze badges

import pandas as pd 
import os 
files = os.listdir('path/to/files/directory/')
desiredFile = files[i]
filePath = 'path/to/files/directory/%s'
Ofile = filePath % desiredFile
xls_import = pd.read_csv(Ofile)

Now you can use the power of pandas DataFrames!

answered Dec 2, 2015 at 18:31

This code worked for me with Python 3.5.2. It opens and saves and excel. I am currently working on how to save data into the file but this is the code:

import csv
excel = csv.writer(open("file1.csv", "wb"))

Pikamander2

7,0323 gold badges47 silver badges68 bronze badges

answered Nov 3, 2017 at 21:33

Источник

Время на прочтение
10 мин

Количество просмотров 290K

Первая часть статьи была опубликована тут.

Как читать и редактировать Excel файлы при помощи openpyxl

ПЕРЕВОД
Оригинал статьи — www.datacamp.com/community/tutorials/python-excel-tutorial
Автор — Karlijn Willems

Эта библиотека пригодится, если вы хотите читать и редактировать файлы .xlsx, xlsm, xltx и xltm.

Установите openpyxl using pip. Общие рекомендации по установке этой библиотеки — сделать это в виртуальной среде Python без системных библиотек. Вы можете использовать виртуальную среду для создания изолированных сред Python: она создает папку, содержащую все необходимые файлы, для использования библиотек, которые потребуются для Python.

Перейдите в директорию, в которой находится ваш проект, и повторно активируйте виртуальную среду venv. Затем перейдите к установке openpyxl с помощью pip, чтобы убедиться, что вы можете читать и записывать с ним файлы:

# Activate virtualenv
$ source activate venv

# Install `openpyxl` in `venv`
$ pip install openpyxl

Теперь, когда вы установили openpyxl, вы можете начать загрузку данных. Но что именно это за данные? Например, в книге с данными, которые вы пытаетесь получить на Python, есть следующие листы:

Функция load_workbook () принимает имя файла в качестве аргумента и возвращает объект рабочей книги, который представляет файл. Это можно проверить запуском type (wb). Не забудьте убедиться, что вы находитесь в правильной директории, где расположена электронная таблица. В противном случае вы получите сообщение об ошибке при импорте.

# Import `load_workbook` module from `openpyxl`
from openpyxl import load_workbook

# Load in the workbook
wb = load_workbook('./test.xlsx')

# Get sheet names
print(wb.get_sheet_names())

Помните, вы можете изменить рабочий каталог с помощью os.chdir (). Фрагмент кода выше возвращает имена листов книги, загруженной в Python. Вы можете использовать эту информацию для получения отдельных листов книги. Также вы можете проверить, какой лист активен в настоящий момент с помощью wb.active. В приведенном ниже коде, вы также можете использовать его для загрузки данных на другом листе книги:

# Get a sheet by name 
sheet = wb.get_sheet_by_name('Sheet3')

# Print the sheet title 
sheet.title

# Get currently active sheet
anotherSheet = wb.active

# Check `anotherSheet` 
anotherSheet

На первый взгляд, с этими объектами Worksheet мало что можно сделать. Однако, можно извлекать значения из определенных ячеек на листе книги, используя квадратные скобки [], к которым нужно передавать точную ячейку, из которой вы хотите получить значение.

Обратите внимание, это похоже на выбор, получение и индексирование массивов NumPy и Pandas DataFrames, но это еще не все, что нужно сделать, чтобы получить значение. Нужно еще добавить значение атрибута:

# Retrieve the value of a certain cell
sheet['A1'].value

# Select element 'B2' of your sheet 
c = sheet['B2']

# Retrieve the row number of your element
c.row

# Retrieve the column letter of your element
c.column

# Retrieve the coordinates of the cell 
c.coordinate

Помимо value, есть и другие атрибуты, которые можно использовать для проверки ячейки, а именно row, column и coordinate:

Атрибут row вернет 2;
Добавление атрибута column к “С” даст вам «B»;
coordinate вернет «B2».

Вы также можете получить значения ячеек с помощью функции cell (). Передайте аргументы row и column, добавьте значения к этим аргументам, которые соответствуют значениям ячейки, которые вы хотите получить, и, конечно же, не забудьте добавить атрибут value:

# Retrieve cell value 
sheet.cell(row=1, column=2).value

# Print out values in column 2 
for i in range(1, 4):
     print(i, sheet.cell(row=i, column=2).value)

Обратите внимание: если вы не укажете значение атрибута value, вы получите <Cell Sheet3.B1>, который ничего не говорит о значении, которое содержится в этой конкретной ячейке.

Вы используете цикл с помощью функции range (), чтобы помочь вам вывести значения строк, которые имеют значения в столбце 2. Если эти конкретные ячейки пусты, вы получите None.
Более того, существуют специальные функции, которые вы можете вызвать, чтобы получить другие значения, например get_column_letter () и column_index_from_string.

В двух функциях уже более или менее указано, что вы можете получить, используя их. Но лучше всего сделать их явными: пока вы можете получить букву прежнего столбца, можно сделать обратное или получить индекс столбца, перебирая букву за буквой. Как это работает:

# Import relevant modules from `openpyxl.utils`
from openpyxl.utils import get_column_letter, column_index_from_string

# Return 'A'
get_column_letter(1)

# Return '1'
column_index_from_string('A')

Вы уже получили значения для строк, которые имеют значения в определенном столбце, но что нужно сделать, если нужно вывести строки файла, не сосредотачиваясь только на одном столбце?

Конечно, использовать другой цикл.

Например, вы хотите сосредоточиться на области, находящейся между «A1» и «C3», где первый указывает левый верхний угол, а второй — правый нижний угол области, на которой вы хотите сфокусироваться. Эта область будет так называемой cellObj, которую вы видите в первой строке кода ниже. Затем вы указываете, что для каждой ячейки, которая находится в этой области, вы хотите вывести координату и значение, которое содержится в этой ячейке. После окончания каждой строки вы хотите выводить сообщение-сигнал о том, что строка этой области cellObj была выведена.

# Print row per row
for cellObj in sheet['A1':'C3']:
      for cell in cellObj:
              print(cells.coordinate, cells.value)
      print('--- END ---')

Обратите внимание, что выбор области очень похож на выбор, получение и индексирование списка и элементы NumPy, где вы также используете квадратные скобки и двоеточие чтобы указать область, из которой вы хотите получить значения. Кроме того, вышеприведенный цикл также хорошо использует атрибуты ячейки!

Чтобы визуализировать описанное выше, возможно, вы захотите проверить результат, который вернет вам завершенный цикл:

('A1', u'M')
('B1', u'N')
('C1', u'O')
--- END ---
('A2', 10L)
('B2', 11L)
('C2', 12L)
--- END ---
('A3', 14L)
('B3', 15L)
('C3', 16L)
--- END ---

Наконец, есть некоторые атрибуты, которые вы можете использовать для проверки результата импорта, а именно max_row и max_column. Эти атрибуты, конечно, являются общими способами обеспечения правильной загрузки данных, но тем не менее в данном случае они могут и будут полезны.

# Retrieve the maximum amount of rows 
sheet.max_row

# Retrieve the maximum amount of columns
sheet.max_column

Это все очень классно, но мы почти слышим, что вы сейчас думаете, что это ужасно трудный способ работать с файлами, особенно если нужно еще и управлять данными.
Должно быть что-то проще, не так ли? Всё так!

Openpyxl имеет поддержку Pandas DataFrames. И можно использовать функцию DataFrame () из пакета Pandas, чтобы поместить значения листа в DataFrame:

# Import `pandas` 
import pandas as pd

# Convert Sheet to DataFrame
df = pd.DataFrame(sheet.values)
Если вы хотите указать заголовки и индексы, вам нужно добавить немного больше кода:
# Put the sheet values in `data`
data = sheet.values

# Indicate the columns in the sheet values
cols = next(data)[1:]

# Convert your data to a list
data = list(data)

# Read in the data at index 0 for the indices
idx = [r[0] for r in data]

# Slice the data at index 1 
data = (islice(r, 1, None) for r in data)

# Make your DataFrame
df = pd.DataFrame(data, index=idx, columns=cols)

Затем вы можете начать управлять данными при помощи всех функций, которые есть в Pandas. Но помните, что вы находитесь в виртуальной среде, поэтому, если библиотека еще не подключена, вам нужно будет установить ее снова через pip.

Чтобы записать Pandas DataFrames обратно в файл Excel, можно использовать функцию dataframe_to_rows () из модуля utils:

# Import `dataframe_to_rows`
from openpyxl.utils.dataframe import dataframe_to_rows

# Initialize a workbook 
wb = Workbook()

# Get the worksheet in the active workbook
ws = wb.active

# Append the rows of the DataFrame to your worksheet
for r in dataframe_to_rows(df, index=True, header=True):
    ws.append(r)

Но это определенно не все! Библиотека openpyxl предлагает вам высокую гибкость в отношении того, как вы записываете свои данные в файлы Excel, изменяете стили ячеек или используете режим только для записи. Это делает ее одной из тех библиотек, которую вам точно необходимо знать, если вы часто работаете с электронными таблицами.

И не забудьте деактивировать виртуальную среду, когда закончите работу с данными!

Теперь давайте рассмотрим некоторые другие библиотеки, которые вы можете использовать для получения данных в электронной таблице на Python.

Готовы узнать больше?

Чтение и форматирование Excel файлов xlrd
Эта библиотека идеальна, если вы хотите читать данные и форматировать данные в файлах с расширением .xls или .xlsx.

# Import `xlrd`
import xlrd

# Open a workbook 
workbook = xlrd.open_workbook('example.xls')

# Loads only current sheets to memory
workbook = xlrd.open_workbook('example.xls', on_demand = True)

Если вы не хотите рассматривать всю книгу, можно использовать такие функции, как sheet_by_name () или sheet_by_index (), чтобы извлекать листы, которые необходимо использовать в анализе.

# Load a specific sheet by name
worksheet = workbook.sheet_by_name('Sheet1')

# Load a specific sheet by index 
worksheet = workbook.sheet_by_index(0)

# Retrieve the value from cell at indices (0,0) 
sheet.cell(0, 0).value

Наконец, можно получить значения по определенным координатам, обозначенным индексами.
О том, как xlwt и xlutils, соотносятся с xlrd расскажем дальше.

Запись данных в Excel файл при помощи xlrd

Если нужно создать электронные таблицы, в которых есть данные, кроме библиотеки XlsxWriter можно использовать библиотеки xlwt. Xlwt идеально подходит для записи и форматирования данных в файлы с расширением .xls.

Когда вы вручную хотите записать в файл, это будет выглядеть так:

# Import `xlwt` 
import xlwt

# Initialize a workbook 
book = xlwt.Workbook(encoding="utf-8")

# Add a sheet to the workbook 
sheet1 = book.add_sheet("Python Sheet 1") 

# Write to the sheet of the workbook 
sheet1.write(0, 0, "This is the First Cell of the First Sheet") 

# Save the workbook 
book.save("spreadsheet.xls")

Если нужно записать данные в файл, то для минимизации ручного труда можно прибегнуть к циклу for. Это позволит немного автоматизировать процесс. Делаем скрипт, в котором создается книга, в которую добавляется лист. Далее указываем список со столбцами и со значениями, которые будут перенесены на рабочий лист.

Цикл for будет следить за тем, чтобы все значения попадали в файл: задаем, что с каждым элементом в диапазоне от 0 до 4 (5 не включено) мы собираемся производить действия. Будем заполнять значения строка за строкой. Для этого указываем row элемент, который будет “прыгать” в каждом цикле. А далее у нас следующий for цикл, который пройдется по столбцам листа. Задаем условие, что для каждой строки на листе смотрим на столбец и заполняем значение для каждого столбца в строке. Когда заполнили все столбцы строки значениями, переходим к следующей строке, пока не заполним все имеющиеся строки.

# Initialize a workbook
book = xlwt.Workbook()

# Add a sheet to the workbook
sheet1 = book.add_sheet("Sheet1")

# The data
cols = ["A", "B", "C", "D", "E"]
txt = [0,1,2,3,4]

# Loop over the rows and columns and fill in the values
for num in range(5):
      row = sheet1.row(num)
      for index, col in enumerate(cols):
          value = txt[index] + num
          row.write(index, value)

# Save the result
book.save("test.xls")

В качестве примера скриншот результирующего файла:

Теперь, когда вы видели, как xlrd и xlwt взаимодействуют вместе, пришло время посмотреть на библиотеку, которая тесно связана с этими двумя: xlutils.

Коллекция утилит xlutils

Эта библиотека в основном представляет собой набор утилит, для которых требуются как xlrd, так и xlwt. Включает в себя возможность копировать и изменять/фильтровать существующие файлы. Вообще говоря, оба этих случая подпадают теперь под openpyxl.

Использование pyexcel для чтения файлов .xls или .xlsx

Еще одна библиотека, которую можно использовать для чтения данных таблиц в Python — pyexcel. Это Python Wrapper, который предоставляет один API для чтения, обработки и записи данных в файлах .csv, .ods, .xls, .xlsx и .xlsm.

Чтобы получить данные в массиве, можно использовать функцию get_array (), которая содержится в пакете pyexcel:

# Import `pyexcel`
import pyexcel

# Get an array from the data
my_array = pyexcel.get_array(file_name="test.xls")
 
Также можно получить данные в упорядоченном словаре списков, используя функцию get_dict ():
# Import `OrderedDict` module 
from pyexcel._compact import OrderedDict

# Get your data in an ordered dictionary of lists
my_dict = pyexcel.get_dict(file_name="test.xls", name_columns_by_row=0)

# Get your data in a dictionary of 2D arrays
book_dict = pyexcel.get_book_dict(file_name="test.xls")

Однако, если вы хотите вернуть в словарь двумерные массивы или, иными словами, получить все листы книги в одном словаре, стоит использовать функцию get_book_dict ().

Имейте в виду, что обе упомянутые структуры данных, массивы и словари вашей электронной таблицы, позволяют создавать DataFrames ваших данных с помощью pd.DataFrame (). Это упростит обработку ваших данных!

Наконец, вы можете просто получить записи с pyexcel благодаря функции get_records (). Просто передайте аргумент file_name функции и обратно получите список словарей:

# Retrieve the records of the file
records = pyexcel.get_records(file_name="test.xls")

Записи файлов при помощи pyexcel

Так же, как загрузить данные в массивы с помощью этого пакета, можно также легко экспортировать массивы обратно в электронную таблицу. Для этого используется функция save_as () с передачей массива и имени целевого файла в аргумент dest_file_name:

# Get the data
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Save the array to a file
pyexcel.save_as(array=data, dest_file_name="array_data.xls")

Обратите внимание: если указать разделитель, то можно добавить аргумент dest_delimiter и передать символ, который хотите использовать, в качестве разделителя между “”.

Однако, если у вас есть словарь, нужно будет использовать функцию save_book_as (). Передайте двумерный словарь в bookdict и укажите имя файла, и все ОК:

# The data
2d_array_dictionary = {'Sheet 1': [
                                   ['ID', 'AGE', 'SCORE']
                                   [1, 22, 5],
                                   [2, 15, 6],
                                   [3, 28, 9]
                                  ],
                       'Sheet 2': [
                                    ['X', 'Y', 'Z'],
                                    [1, 2, 3],
                                    [4, 5, 6]
                                    [7, 8, 9]
                                  ],
                       'Sheet 3': [
                                    ['M', 'N', 'O', 'P'],
                                    [10, 11, 12, 13],
                                    [14, 15, 16, 17]
                                    [18, 19, 20, 21]
                                   ]}

# Save the data to a file                        
pyexcel.save_book_as(bookdict=2d_array_dictionary, dest_file_name="2d_array_data.xls")

Помните, что когда используете код, который напечатан в фрагменте кода выше, порядок данных в словаре не будет сохранен!

Чтение и запись .csv файлов

Если вы все еще ищете библиотеки, которые позволяют загружать и записывать данные в CSV-файлы, кроме Pandas, рекомендуем библиотеку csv:

# import `csv`
import csv

# Read in csv file 
for row in csv.reader(open('data.csv'), delimiter=','):
      print(row)
      
# Write csv file
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
outfile = open('data.csv', 'w')
writer = csv.writer(outfile, delimiter=';', quotechar='"')
writer.writerows(data)
outfile.close()

Обратите внимание, что NumPy имеет функцию genfromtxt (), которая позволяет загружать данные, содержащиеся в CSV-файлах в массивах, которые затем можно помещать в DataFrames.

Финальная проверка данных

Когда данные подготовлены, не забудьте последний шаг: проверьте правильность загрузки данных. Если вы поместили свои данные в DataFrame, вы можете легко и быстро проверить, был ли импорт успешным, выполнив следующие команды:

# Check the first entries of the DataFrame
df1.head()

# Check the last entries of the DataFrame
df1.tail()

Note: Используйте DataCamp Pandas Cheat Sheet, когда вы планируете загружать файлы в виде Pandas DataFrames.

Если данные в массиве, вы можете проверить его, используя следующие атрибуты массива: shape, ndim, dtype и т.д.:

# Inspect the shape 
data.shape

# Inspect the number of dimensions
data.ndim

# Inspect the data type
data.dtype

Что дальше?

Поздравляем, теперь вы знаете, как читать файлы Excel в Python Но импорт данных — это только начало рабочего процесса в области данных. Когда у вас есть данные из электронных таблиц в вашей среде, вы можете сосредоточиться на том, что действительно важно: на анализе данных.

Если вы хотите глубже погрузиться в тему — знакомьтесь с PyXll, которая позволяет записывать функции в Python и вызывать их в Excel.

Источник

Improve Article

Save Article

Like Article

Read

Discuss

Improve Article

Save Article

Like Article

One can retrieve information from a spreadsheet. Reading, writing, or modifying the data can be done in Python can be done in using different methods. Also, the user might have to go through various sheets and retrieve data based on some criteria or modify some rows and columns and do a lot of work. Here, we will see the different methods to read our excel file.

Required Module

pip install xlrd

Input File:

Method 1: Reading an excel file using Python using Pandas

In this method, We will first import the Pandas module then we will use Pandas to read our excel file. You can read more operations using the excel file using Pandas in this article. Click here

Python3

import pandas as pd

dataframe1 = pd.read_excel('book2.xlsx')

print(dataframe1)

Output:

Method 2: Reading an excel file using Python using openpyxl

The load_workbook() function opens the Books.xlsx file for reading. This file is passed as an argument to this function. The object of the dataframe.active has been created in the script to read the values of the max_row and the max_column properties. These values are used in the loops to read the content of the Books2.xlsx file. You can read other operations using openpyxl in this article.

Python3

import openpyxl

dataframe = openpyxl.load_workbook("Book2.xlsx")

dataframe1 = dataframe.active

for row in range(0, dataframe1.max_row):

for col in dataframe1.iter_cols(1, dataframe1.max_column):

print(col[row].value)

Output:

Method 3: Reading an excel file using Python using Xlwings

Xlwings can be used to insert data in an Excel file similarly as it reads from an Excel file. Data can be provided as a list or a single input to a certain cell or a selection of cells. You can read other operations using Xlwings in this article.

Python3

import xlwings as xw

ws = xw.Book("Book2.xlsx").sheets['Sheet1']

v1 = ws.range("A1:A7").value

print("Result:", v1, v2)

Output:

Result: ['Name  Age    Stream  Percentage', 
'0      Ankit   18      Math          95', 
'1      Rahul   19   Science          90', 
'2    Shaurya   20  Commerce          85', 
'3  Aishwarya   18      Math          80', 
'4   Priyanka   19   Science          75', 
None]

RECOMMENDED ARTICLE – How to Automate an Excel Sheet in Python?

Like Article

Save Article

Источник

Microsoft Excel is one of the most powerful spreadsheet software applications in the world, and it has become critical in all business processes. Companies across the world, both big and small, are using Microsoft Excel to store, organize, analyze, and visualize data.

As a data professional, when you combine Python with Excel, you create a unique data analysis bundle that unlocks the value of the enterprise data.

In this tutorial, we’re going to learn how to read and work with Excel files in Python.

After you finish this tutorial, you’ll understand the following:

Loading Excel spreadsheets into pandas DataFrames
Working with an Excel workbook with multiple spreadsheets
Combining multiple spreadsheets
Reading Excel files using the xlrd package

In this tutorial, we assume you know the fundamentals of pandas DataFrames. If you aren’t familiar with the pandas library, you might like to try our Pandas and NumPy Fundamentals – Dataquest.

Let’s dive in.

Reading Spreadsheets with Pandas

Technically, multiple packages allow us to work with Excel files in Python. However, in this tutorial, we’ll use pandas and xlrd libraries to interact with Excel workbooks. Essentially, you can think of a pandas DataFrame as a spreadsheet with rows and columns stored in Series objects. Traversability of Series as iterable objects allows us to grab specific data easily. Once we load an Excel workbook into a pandas DataFrame, we can perform any kind of data analysis on the data.

Before we proceed to the next step, let’s first download the following spreadsheet:

Sales Data Excel Workbook — xlsx ver.

The Excel workbook consists of two sheets that contain stationery sales data for 2020 and 2021.

NOTE

Although Excel spreadsheets can contain formula and also support formatting, pandas only imports Excel spreadsheets as flat files, and it doesn’t support spreadsheet formatting.

To import the Excel spreadsheet into a pandas DataFrame, first, we need to import the pandas package and then use the read_excel() method:

import pandas as pd
df = pd.read_excel('sales_data.xlsx')

display(df)

	OrderDate	Region	Rep	Item	Units	Unit Cost	Total	Shipped
0	2020-01-06	East	Jones	Pencil	95	1.99	189.05	True
1	2020-02-09	Central	Jardine	Pencil	36	4.99	179.64	True
2	2020-03-15	West	Sorvino	Pencil	56	2.99	167.44	True
3	2020-04-01	East	Jones	Binder	60	4.99	299.40	False
4	2020-05-05	Central	Jardine	Pencil	90	4.99	449.10	True
5	2020-06-08	East	Jones	Binder	60	8.99	539.40	True
6	2020-07-12	East	Howard	Binder	29	1.99	57.71	False
7	2020-08-15	East	Jones	Pencil	35	4.99	174.65	True
8	2020-09-01	Central	Smith	Desk	32	125.00	250.00	True
9	2020-10-05	Central	Morgan	Binder	28	8.99	251.72	True
10	2020-11-08	East	Mike	Pen	15	19.99	299.85	False
11	2020-12-12	Central	Smith	Pencil	67	1.29	86.43	False

If you want to load only a limited number of rows into the DataFrame, you can specify the number of rows using the nrows argument:

df = pd.read_excel('sales_data.xlsx', nrows=5)
display(df)

	OrderDate	Region	Rep	Item	Units	Unit Cost	Total	Shipped
0	2020-01-06	East	Jones	Pencil	95	1.99	189.05	True
1	2020-02-09	Central	Jardine	Pencil	36	4.99	179.64	True
2	2020-03-15	West	Sorvino	Pencil	56	2.99	167.44	True
3	2020-04-01	East	Jones	Binder	60	4.99	299.40	False
4	2020-05-05	Central	Jardine	Pencil	90	4.99	449.10	True

Skipping a specific number of rows from the begining of a spreadsheet or skipping over a list of particular rows is available through the skiprows argument, as follows:

df = pd.read_excel('sales_data.xlsx', skiprows=range(5))
display(df)

	2020-05-05 00:00:00	Central	Jardine	Pencil	90	4.99	449.1	True
0	2020-06-08	East	Jones	Binder	60	8.99	539.40	True
1	2020-07-12	East	Howard	Binder	29	1.99	57.71	False
2	2020-08-15	East	Jones	Pencil	35	4.99	174.65	True
3	2020-09-01	Central	Smith	Desk	32	125.00	250.00	True
4	2020-10-05	Central	Morgan	Binder	28	8.99	251.72	True
5	2020-11-08	East	Mike	Pen	15	19.99	299.85	False
6	2020-12-12	Central	Smith	Pencil	67	1.29	86.43	False

The code above skips the first five rows and returns the rest of the data. Instead, the following code returns all the rows except for those with the mentioned indices:

df = pd.read_excel('sales_data.xlsx', skiprows=[1, 4,7,10])
display(df)

	OrderDate	Region	Rep	Item	Units	Unit Cost	Total	Shipped
0	2020-02-09	Central	Jardine	Pencil	36	4.99	179.64	True
1	2020-03-15	West	Sorvino	Pencil	56	2.99	167.44	True
2	2020-05-05	Central	Jardine	Pencil	90	4.99	449.10	True
3	2020-06-08	East	Jones	Binder	60	8.99	539.40	True
4	2020-08-15	East	Jones	Pencil	35	4.99	174.65	True
5	2020-09-01	Central	Smith	Desk	32	125.00	250.00	True
6	2020-11-08	East	Mike	Pen	15	19.99	299.85	False
7	2020-12-12	Central	Smith	Pencil	67	1.29	86.43	False

Another useful argument is usecols, which allows us to select spreadsheet columns with their letters, names, or positional numbers. Let’s see how it works:

df = pd.read_excel('sales_data.xlsx', usecols='A:C,G')
display(df)

	OrderDate	Region	Rep	Total
0	2020-01-06	East	Jones	189.05
1	2020-02-09	Central	Jardine	179.64
2	2020-03-15	West	Sorvino	167.44
3	2020-04-01	East	Jones	299.40
4	2020-05-05	Central	Jardine	449.10
5	2020-06-08	East	Jones	539.40
6	2020-07-12	East	Howard	57.71
7	2020-08-15	East	Jones	174.65
8	2020-09-01	Central	Smith	250.00
9	2020-10-05	Central	Morgan	251.72
10	2020-11-08	East	Mike	299.85
11	2020-12-12	Central	Smith	86.43

In the code above, the string assigned to the usecols argument contains a range of columns with : plus column G separated by a comma. Also, we’re able to provide a list of column names and assign it to the usecols argument, as follows:

df = pd.read_excel('sales_data.xlsx', usecols=['OrderDate', 'Region', 'Rep', 'Total'])
display(df)

	OrderDate	Region	Rep	Total
0	2020-01-06	East	Jones	189.05
1	2020-02-09	Central	Jardine	179.64
2	2020-03-15	West	Sorvino	167.44
3	2020-04-01	East	Jones	299.40
4	2020-05-05	Central	Jardine	449.10
5	2020-06-08	East	Jones	539.40
6	2020-07-12	East	Howard	57.71
7	2020-08-15	East	Jones	174.65
8	2020-09-01	Central	Smith	250.00
9	2020-10-05	Central	Morgan	251.72
10	2020-11-08	East	Mike	299.85
11	2020-12-12	Central	Smith	86.43

The usecols argument accepts a list of column numbers, too. The following code shows how we can pick up specific columns using their indices:

df = pd.read_excel('sales_data.xlsx', usecols=[0, 1, 2, 6])
display(df)

	OrderDate	Region	Rep	Total
0	2020-01-06	East	Jones	189.05
1	2020-02-09	Central	Jardine	179.64
2	2020-03-15	West	Sorvino	167.44
3	2020-04-01	East	Jones	299.40
4	2020-05-05	Central	Jardine	449.10
5	2020-06-08	East	Jones	539.40
6	2020-07-12	East	Howard	57.71
7	2020-08-15	East	Jones	174.65
8	2020-09-01	Central	Smith	250.00
9	2020-10-05	Central	Morgan	251.72
10	2020-11-08	East	Mike	299.85
11	2020-12-12	Central	Smith	86.43

Working with Multiple Spreadsheets

Excel files or workbooks usually contain more than one spreadsheet. The pandas library allows us to load data from a specific sheet or combine multiple spreadsheets into a single DataFrame. In this section, we’ll explore how to use these valuable capabilities.

By default, the read_excel() method reads the first Excel sheet with the index 0. However, we can choose the other sheets by assigning a particular sheet name, sheet index, or even a list of sheet names or indices to the sheet_name argument. Let’s try it:

df = pd.read_excel('sales_data.xlsx', sheet_name='2021')
display(df)

	OrderDate	Region	Rep	Item	Units	Unit Cost	Total	Shipped
0	2021-01-15	Central	Gill	Binder	46	8.99	413.54	True
1	2021-02-01	Central	Smith	Binder	87	15.00	1305.00	True
2	2021-03-07	West	Sorvino	Binder	27	19.99	139.93	True
3	2021-04-10	Central	Andrews	Pencil	66	1.99	131.34	False
4	2021-05-14	Central	Gill	Pencil	53	1.29	68.37	False
5	2021-06-17	Central	Tom	Desk	15	125.00	625.00	True
6	2021-07-04	East	Jones	Pen Set	62	4.99	309.38	True
7	2021-08-07	Central	Tom	Pen Set	42	23.95	1005.90	True
8	2021-09-10	Central	Gill	Pencil	47	1.29	9.03	True
9	2021-10-14	West	Thompson	Binder	57	19.99	1139.43	False
10	2021-11-17	Central	Jardine	Binder	11	4.99	54.89	False
11	2021-12-04	Central	Jardine	Binder	94	19.99	1879.06	False

The code above reads the second spreadsheet in the workbook, whose name is 2021. As mentioned before, we also can assign a sheet position number (zero-indexed) to the sheet_name argument. Let’s see how it works:

df = pd.read_excel('sales_data.xlsx', sheet_name=1)
display(df)

	OrderDate	Region	Rep	Item	Units	Unit Cost	Total	Shipped
0	2021-01-15	Central	Gill	Binder	46	8.99	413.54	True
1	2021-02-01	Central	Smith	Binder	87	15.00	1305.00	True
2	2021-03-07	West	Sorvino	Binder	27	19.99	139.93	True
3	2021-04-10	Central	Andrews	Pencil	66	1.99	131.34	False
4	2021-05-14	Central	Gill	Pencil	53	1.29	68.37	False
5	2021-06-17	Central	Tom	Desk	15	125.00	625.00	True
6	2021-07-04	East	Jones	Pen Set	62	4.99	309.38	True
7	2021-08-07	Central	Tom	Pen Set	42	23.95	1005.90	True
8	2021-09-10	Central	Gill	Pencil	47	1.29	9.03	True
9	2021-10-14	West	Thompson	Binder	57	19.99	1139.43	False
10	2021-11-17	Central	Jardine	Binder	11	4.99	54.89	False
11	2021-12-04	Central	Jardine	Binder	94	19.99	1879.06	False

As you can see, both statements take in either the actual sheet name or sheet index to return the same result.

Sometimes, we want to import all the spreadsheets stored in an Excel file into pandas DataFrames simultaneously. The good news is that the read_excel() method provides this feature for us. In order to do this, we can assign a list of sheet names or their indices to the sheet_name argument. But there is a much easier way to do the same: to assign None to the sheet_name argument. Let’s try it:

all_sheets = pd.read_excel('sales_data.xlsx', sheet_name=None)

Before exploring the data stored in the all_sheets variable, let’s check its data type:

type(all_sheets)

dict

As you can see, the variable is a dictionary. Now, let’s reveal what is stored in this dictionary:

for key, value in all_sheets.items():
    print(key, type(value))

2020 <class 'pandas.core.frame.DataFrame'>
2021 <class 'pandas.core.frame.DataFrame'>

The code above shows that the dictionary’s keys are the Excel workbook sheet names, and its values are pandas DataFrames for each spreadsheet. To print out the content of the dictionary, we can use the following code:

for key, value in all_sheets.items():
    print(key)
    display(value)

	OrderDate	Region	Rep	Item	Units	Unit Cost	Total	Shipped
0	2020-01-06	East	Jones	Pencil	95	1.99	189.05	True
1	2020-02-09	Central	Jardine	Pencil	36	4.99	179.64	True
2	2020-03-15	West	Sorvino	Pencil	56	2.99	167.44	True
3	2020-04-01	East	Jones	Binder	60	4.99	299.40	False
4	2020-05-05	Central	Jardine	Pencil	90	4.99	449.10	True
5	2020-06-08	East	Jones	Binder	60	8.99	539.40	True
6	2020-07-12	East	Howard	Binder	29	1.99	57.71	False
7	2020-08-15	East	Jones	Pencil	35	4.99	174.65	True
8	2020-09-01	Central	Smith	Desk	32	125.00	250.00	True
9	2020-10-05	Central	Morgan	Binder	28	8.99	251.72	True
10	2020-11-08	East	Mike	Pen	15	19.99	299.85	False
11	2020-12-12	Central	Smith	Pencil	67	1.29	86.43	False

	OrderDate	Region	Rep	Item	Units	Unit Cost	Total	Shipped
0	2021-01-15	Central	Gill	Binder	46	8.99	413.54	True
1	2021-02-01	Central	Smith	Binder	87	15.00	1305.00	True
2	2021-03-07	West	Sorvino	Binder	27	19.99	139.93	True
3	2021-04-10	Central	Andrews	Pencil	66	1.99	131.34	False
4	2021-05-14	Central	Gill	Pencil	53	1.29	68.37	False
5	2021-06-17	Central	Tom	Desk	15	125.00	625.00	True
6	2021-07-04	East	Jones	Pen Set	62	4.99	309.38	True
7	2021-08-07	Central	Tom	Pen Set	42	23.95	1005.90	True
8	2021-09-10	Central	Gill	Pencil	47	1.29	9.03	True
9	2021-10-14	West	Thompson	Binder	57	19.99	1139.43	False
10	2021-11-17	Central	Jardine	Binder	11	4.99	54.89	False
11	2021-12-04	Central	Jardine	Binder	94	19.99	1879.06	False

Combining Multiple Excel Spreadsheets into a Single Pandas DataFrame

Having one DataFrame per sheet allows us to have different columns or content in different sheets.

But what if we prefer to store all the spreadsheets’ data in a single DataFrame? In this tutorial, the workbook spreadsheets have the same columns, so we can combine them with the concat() method of pandas.

If you run the code below, you’ll see that the two DataFrames stored in the dictionary are concatenated:

combined_df = pd.concat(all_sheets.values(), ignore_index=True)
display(combined_df)

	OrderDate	Region	Rep	Item	Units	Unit Cost	Total	Shipped
0	2020-01-06	East	Jones	Pencil	95	1.99	189.05	True
1	2020-02-09	Central	Jardine	Pencil	36	4.99	179.64	True
2	2020-03-15	West	Sorvino	Pencil	56	2.99	167.44	True
3	2020-04-01	East	Jones	Binder	60	4.99	299.40	False
4	2020-05-05	Central	Jardine	Pencil	90	4.99	449.10	True
5	2020-06-08	East	Jones	Binder	60	8.99	539.40	True
6	2020-07-12	East	Howard	Binder	29	1.99	57.71	False
7	2020-08-15	East	Jones	Pencil	35	4.99	174.65	True
8	2020-09-01	Central	Smith	Desk	32	125.00	250.00	True
9	2020-10-05	Central	Morgan	Binder	28	8.99	251.72	True
10	2020-11-08	East	Mike	Pen	15	19.99	299.85	False
11	2020-12-12	Central	Smith	Pencil	67	1.29	86.43	False
12	2021-01-15	Central	Gill	Binder	46	8.99	413.54	True
13	2021-02-01	Central	Smith	Binder	87	15.00	1305.00	True
14	2021-03-07	West	Sorvino	Binder	27	19.99	139.93	True
15	2021-04-10	Central	Andrews	Pencil	66	1.99	131.34	False
16	2021-05-14	Central	Gill	Pencil	53	1.29	68.37	False
17	2021-06-17	Central	Tom	Desk	15	125.00	625.00	True
18	2021-07-04	East	Jones	Pen Set	62	4.99	309.38	True
19	2021-08-07	Central	Tom	Pen Set	42	23.95	1005.90	True
20	2021-09-10	Central	Gill	Pencil	47	1.29	9.03	True
21	2021-10-14	West	Thompson	Binder	57	19.99	1139.43	False
22	2021-11-17	Central	Jardine	Binder	11	4.99	54.89	False
23	2021-12-04	Central	Jardine	Binder	94	19.99	1879.06	False

Now the data stored in the combined_df DataFrame is ready for further processing or visualization. In the following piece of code, we’re going to create a simple bar chart that shows the total sales amount made by each representative. Let’s run it and see the output plot:

total_sales_amount = combined_df.groupby('Rep').Total.sum()
total_sales_amount.plot.bar(figsize=(10, 6))

Output

Reading Excel Files Using `xlrd`

Although importing data into a pandas DataFrame is much more common, another helpful package for reading Excel files in Python is xlrd. In this section, we’re going to scratch the surface of how to read Excel spreadsheets using this package.

NOTE

The xlrd package doesn’t support xlsx files due to a potential security vulnerability. So, we use the xls version of the sales data. You can download the xls version from the link below:
Sales Data Excel Workbook — xls ver.

Let’s see how it works:

import xlrd
excel_workbook = xlrd.open_workbook('sales_data.xls')

Above, the first line imports the xlrd package, then the open_workbook method reads the sales_data.xls file.

We can also open an individual sheet containing the actual data. There are two ways to do so: opening a sheet by index or by name. Let’s open the first sheet by index and the second one by name:

excel_worksheet_2020 = excel_workbook.sheet_by_index(0)
excel_worksheet_2021 = excel_workbook.sheet_by_name('2021')

Now, let’s see how we can print a cell value. The xlrd package provides a method called cell_value() that takes in two arguments: the cell’s row index and column index. Let’s explore it:

print(excel_worksheet_2020.cell_value(1, 3))

Pencil

We can see that the cell_value function returned the value of the cell at row index 1 (the 2nd row) and column index 3 (the 4th column).
Excel

The xlrd package provides two helpful properties: nrows and ncols, returning the number of nonempty spreadsheet’s rows and columns respectively:

print('Columns#:', excel_worksheet_2020.ncols)
print('Rows#:', excel_worksheet_2020.nrows)

Columns#: 8
Rows#: 13

Knowing the number of nonempty rows and columns in a spreadsheet helps us with iterating over the data using nested for loops. This makes all the Excel sheet data accessible via the cell_value() method.

Conclusion

This tutorial discussed how to load Excel spreadsheets into pandas DataFrames, work with multiple Excel sheets, and combine them into a single pandas DataFrame. We also explored the main aspects of the xlrd package as one of the simplest tools for accessing the Excel spreadsheets data.

Источник

Installation¶

Install openpyxl using pip. It is advisable to do this in a Python virtualenv
without system packages:

Note

There is support for the popular lxml library which will be used if it
is installed. This is particular useful when creating large files.

Warning

To be able to include images (jpeg, png, bmp,…) into an openpyxl file,
you will also need the “pillow” library that can be installed with:

or browse https://pypi.python.org/pypi/Pillow/, pick the latest version
and head to the bottom of the page for Windows binaries.

Working with a checkout¶

Sometimes you might want to work with the checkout of a particular version.
This may be the case if bugs have been fixed but a release has not yet been
made.

$ pip install -e hg+https://foss.heptapod.net/openpyxl/openpyxl/@3.1#egg=openpyxl

Create a workbook¶

There is no need to create a file on the filesystem to get started with openpyxl.
Just import the Workbook class and start work:

>>> from openpyxl import Workbook
>>> wb = Workbook()

A workbook is always created with at least one worksheet. You can get it by
using the Workbook.active property:

Note

This is set to 0 by default. Unless you modify its value, you will always
get the first worksheet by using this method.

You can create new worksheets using the Workbook.create_sheet() method:

>>> ws1 = wb.create_sheet("Mysheet") # insert at the end (default)
# or
>>> ws2 = wb.create_sheet("Mysheet", 0) # insert at first position
# or
>>> ws3 = wb.create_sheet("Mysheet", -1) # insert at the penultimate position

Sheets are given a name automatically when they are created.
They are numbered in sequence (Sheet, Sheet1, Sheet2, …).
You can change this name at any time with the Worksheet.title property:

Once you gave a worksheet a name, you can get it as a key of the workbook:

>>> ws3 = wb["New Title"]

You can review the names of all worksheets of the workbook with the
Workbook.sheetname attribute

>>> print(wb.sheetnames)
['Sheet2', 'New Title', 'Sheet1']

You can loop through worksheets

>>> for sheet in wb:
...     print(sheet.title)

You can create copies of worksheets within a single workbook:

Workbook.copy_worksheet() method:

>>> source = wb.active
>>> target = wb.copy_worksheet(source)

Note

Only cells (including values, styles, hyperlinks and comments) and
certain worksheet attributes (including dimensions, format and
properties) are copied. All other workbook / worksheet attributes
are not copied — e.g. Images, Charts.

You also cannot copy worksheets between workbooks. You cannot copy
a worksheet if the workbook is open in read-only or write-only
mode.

Playing with data¶

Accessing one cell¶

Now we know how to get a worksheet, we can start modifying cells content.
Cells can be accessed directly as keys of the worksheet:

This will return the cell at A4, or create one if it does not exist yet.
Values can be directly assigned:

There is also the Worksheet.cell() method.

This provides access to cells using row and column notation:

>>> d = ws.cell(row=4, column=2, value=10)

Note

When a worksheet is created in memory, it contains no cells. They are
created when first accessed.

Warning

Because of this feature, scrolling through cells instead of accessing them
directly will create them all in memory, even if you don’t assign them a value.

Something like

>>> for x in range(1,101):
...        for y in range(1,101):
...            ws.cell(row=x, column=y)

will create 100×100 cells in memory, for nothing.

Accessing many cells¶

Ranges of cells can be accessed using slicing:

>>> cell_range = ws['A1':'C2']

Ranges of rows or columns can be obtained similarly:

>>> colC = ws['C']
>>> col_range = ws['C:D']
>>> row10 = ws[10]
>>> row_range = ws[5:10]

You can also use the Worksheet.iter_rows() method:

>>> for row in ws.iter_rows(min_row=1, max_col=3, max_row=2):
...    for cell in row:
...        print(cell)
<Cell Sheet1.A1>
<Cell Sheet1.B1>
<Cell Sheet1.C1>
<Cell Sheet1.A2>
<Cell Sheet1.B2>
<Cell Sheet1.C2>

Likewise the Worksheet.iter_cols() method will return columns:

>>> for col in ws.iter_cols(min_row=1, max_col=3, max_row=2):
...     for cell in col:
...         print(cell)
<Cell Sheet1.A1>
<Cell Sheet1.A2>
<Cell Sheet1.B1>
<Cell Sheet1.B2>
<Cell Sheet1.C1>
<Cell Sheet1.C2>

Note

For performance reasons the Worksheet.iter_cols() method is not available in read-only mode.

If you need to iterate through all the rows or columns of a file, you can instead use the
Worksheet.rows property:

>>> ws = wb.active
>>> ws['C9'] = 'hello world'
>>> tuple(ws.rows)
((<Cell Sheet.A1>, <Cell Sheet.B1>, <Cell Sheet.C1>),
(<Cell Sheet.A2>, <Cell Sheet.B2>, <Cell Sheet.C2>),
(<Cell Sheet.A3>, <Cell Sheet.B3>, <Cell Sheet.C3>),
(<Cell Sheet.A4>, <Cell Sheet.B4>, <Cell Sheet.C4>),
(<Cell Sheet.A5>, <Cell Sheet.B5>, <Cell Sheet.C5>),
(<Cell Sheet.A6>, <Cell Sheet.B6>, <Cell Sheet.C6>),
(<Cell Sheet.A7>, <Cell Sheet.B7>, <Cell Sheet.C7>),
(<Cell Sheet.A8>, <Cell Sheet.B8>, <Cell Sheet.C8>),
(<Cell Sheet.A9>, <Cell Sheet.B9>, <Cell Sheet.C9>))

or the Worksheet.columns property:

>>> tuple(ws.columns)
((<Cell Sheet.A1>,
<Cell Sheet.A2>,
<Cell Sheet.A3>,
<Cell Sheet.A4>,
<Cell Sheet.A5>,
<Cell Sheet.A6>,
...
<Cell Sheet.B7>,
<Cell Sheet.B8>,
<Cell Sheet.B9>),
(<Cell Sheet.C1>,
<Cell Sheet.C2>,
<Cell Sheet.C3>,
<Cell Sheet.C4>,
<Cell Sheet.C5>,
<Cell Sheet.C6>,
<Cell Sheet.C7>,
<Cell Sheet.C8>,
<Cell Sheet.C9>))

Note

For performance reasons the Worksheet.columns property is not available in read-only mode.

Values only¶

If you just want the values from a worksheet you can use the Worksheet.values property.
This iterates over all the rows in a worksheet but returns just the cell values:

for row in ws.values:
   for value in row:
     print(value)

Both Worksheet.iter_rows() and Worksheet.iter_cols() can
take the values_only parameter to return just the cell’s value:

>>> for row in ws.iter_rows(min_row=1, max_col=3, max_row=2, values_only=True):
...   print(row)

(None, None, None)
(None, None, None)

Data storage¶

Once we have a Cell, we can assign it a value:

>>> c.value = 'hello, world'
>>> print(c.value)
'hello, world'

>>> d.value = 3.14
>>> print(d.value)
3.14

Saving to a file¶

The simplest and safest way to save a workbook is by using the
Workbook.save() method of the Workbook object:

>>> wb = Workbook()
>>> wb.save('balances.xlsx')

Warning

This operation will overwrite existing files without warning.

Note

The filename extension is not forced to be xlsx or xlsm, although you might have
some trouble opening it directly with another application if you don’t
use an official extension.

As OOXML files are basically ZIP files, you can also open it with your
favourite ZIP archive manager.

If required, you can specify the attribute wb.template=True, to save a workbook
as a template:

>>> wb = load_workbook('document.xlsx')
>>> wb.template = True
>>> wb.save('document_template.xltx')

Saving as a stream¶

If you want to save the file to a stream, e.g. when using a web application
such as Pyramid, Flask or Django then you can simply provide a
NamedTemporaryFile():

>>> from tempfile import NamedTemporaryFile
>>> from openpyxl import Workbook
>>> wb = Workbook()
>>> with NamedTemporaryFile() as tmp:
        wb.save(tmp.name)
        tmp.seek(0)
        stream = tmp.read()

Warning

You should monitor the data attributes and document extensions
for saving documents in the document templates and vice versa,
otherwise the result table engine can not open the document.

Note

The following will fail:

>>> wb = load_workbook('document.xlsx')
>>> # Need to save with the extension *.xlsx
>>> wb.save('new_document.xlsm')
>>> # MS Excel can't open the document
>>>
>>> # or
>>>
>>> # Need specify attribute keep_vba=True
>>> wb = load_workbook('document.xlsm')
>>> wb.save('new_document.xlsm')
>>> # MS Excel will not open the document
>>>
>>> # or
>>>
>>> wb = load_workbook('document.xltm', keep_vba=True)
>>> # If we need a template document, then we must specify extension as *.xltm.
>>> wb.save('new_document.xlsm')
>>> # MS Excel will not open the document

Loading from a file¶

You can use the openpyxl.load_workbook() to open an existing workbook:

>>> from openpyxl import load_workbook
>>> wb = load_workbook(filename = 'empty_book.xlsx')
>>> sheet_ranges = wb['range names']
>>> print(sheet_ranges['D18'].value)
3

Note

There are several flags that can be used in load_workbook.

data_only controls whether cells with formulae have either the
formula (default) or the value stored the last time Excel read the sheet.
keep_vba controls whether any Visual Basic elements are preserved or
not (default). If they are preserved they are still not editable.

Warning

openpyxl does currently not read all possible items in an Excel file so
shapes will be lost from existing files if they are opened and saved with
the same name.

Errors loading workbooks¶

Sometimes openpyxl will fail to open a workbook. This is usually because there is something wrong with the file.
If this is the case then openpyxl will try and provide some more information. Openpyxl follows the OOXML specification closely and will reject files that do not because they are invalid. When this happens you can use the exception from openpyxl to inform the developers of whichever application or library produced the file. As the OOXML specification is publicly available it is important that developers follow it.

You can find the spec by searching for ECMA-376, most of the implementation specifics are in Part 4.

This ends the tutorial for now, you can proceed to the Simple usage section

Источник

Method 1: Reading an excel file using Python using Pandas

Python3

Method 2: Reading an excel file using Python using openpyxl

Python3

Method 3: Reading an excel file using Python using Xlwings

Python3

Reading Spreadsheets with Pandas

Working with Multiple Spreadsheets

Combining Multiple Excel Spreadsheets into a Single Pandas DataFrame

Reading Excel Files Using xlrd

Conclusion

Installation¶

Working with a checkout¶

Create a workbook¶

Playing with data¶

Accessing one cell¶

Accessing many cells¶

Values only¶

Data storage¶

Saving to a file¶

Saving as a stream¶

Loading from a file¶

Errors loading workbooks¶

Reading Excel Files Using `xlrd`