Соединить excel файлы в один python

Normally, we’re working with Excel files, and we surely have come across a scenario where we need to merge multiple Excel files into one. The traditional method has always been using a VBA code inside excel which does the job but is a multi-step process and is not so easy to understand. Another method is manually copying long Excel files into one which is not only time-consume, troublesome but also error-prone. 

This task can be done easily and quickly with few lines of code in Python with the Pandas module. First, we need to install the module with pip. So let’s get the installation out of our way. 

Use the following command in the terminal:

pip install pandas

Method 1: Using dataframe.append()

Pandas dataframe.append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.

Syntax : DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=None)

Parameters :

  • other : DataFrame or Series/dict-like object, or list of these
  • ignore_index : If True, do not use the index labels. default False.
  • verify_integrity : If True, raise ValueError on creating index with duplicates. default False.
  • sort : Sort columns if the columns of self and other are not aligned. default False.

Returns: appended DataFrame

Example:

Excel Used: FoodSales1-1, FoodSales2-1
 

Python3

import glob

import pandas as pd

path = "C:/downloads"

file_list = glob.glob(path + "/*.xlsx")

excl_list = []

for file in file_list:

    excl_list.append(pd.read_excel(file))

excl_merged = pd.DataFrame()

for excl_file in excl_list:

    excl_merged = excl_merged.append(

      excl_file, ignore_index=True)

excl_merged.to_excel('total_food_sales.xlsx', index=False)

Output :

‘total_food_sales.xlsx’

Method 2: Using pandas.concat()

The pandas.concat() function does all the heavy lifting of performing concatenation operations along with an axis of Pandas objects while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

Syntax: concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)

Parameters:

  • objs: Series or DataFrame objects
  • axis: axis to concatenate along; default = 0 //along rows
  • join: way to handle indexes on other axis; default = ‘outer’
  • ignore_index: if True, do not use the index values along the concatenation axis; default = False
  • keys: sequence to add an identifier to the result indexes; default = None
  • levels: specific levels (unique values) to use for constructing a MultiIndex; default = None
  • names: names for the levels in the resulting hierarchical index; default = None
  • verify_integrity: check whether the new concatenated axis contains duplicates; default = False
  • sort: sort non-concatenation axis if it is not already aligned when join is ‘outer’; default = False
  • copy: if False, do not copy data unnecessarily; default = True

Returns: a pandas dataframe with concatenated data.

Example:

In the last example, we worked on only two Excel files with a few rows. Let’s try merging more files each containing approximately 5000 rows and 7 columns. We have 5 files BankE, BankD, BankC, BankB, BankA having historical stock data for respective bank. Let’s merge them into a single ‘Bank_Stocks.xlsx’ file. Here we are using the pandas.concat() method.

Python3

import glob

import pandas as pd

path = "C:/downloads"

file_list = glob.glob(path + "/*.xlsx")

excl_list = []

for file in file_list:

    excl_list.append(pd.read_excel(file))

excl_merged = pd.concat(excl_list, ignore_index=True)

excl_merged.to_excel('Bank_Stocks.xlsx', index=False)

Output :

Bank_Stocks.xlsx

Solution with openpyxl only (without a bunch of other dependencies).

This script should take care of merging together an arbitrary number of xlsx documents, whether they have one or multiple sheets. It will preserve the formatting.

There’s a function to copy sheets in openpyxl, but it is only from/to the same file. There’s also a function insert_rows somewhere, but by itself it won’t insert any rows. So I’m afraid we are left to deal (tediously) with one cell at a time.

As much as I dislike using for loops and would rather use something compact and elegant like list comprehension, I don’t see how to do that here as this is a side-effect show.

Credit to this answer on copying between workbooks.

#!/usr/bin/env python3

#USAGE
#mergeXLSX.py <a bunch of .xlsx files> ... output.xlsx
#
#where output.xlsx is the unified file

#This works FROM/TO the xlsx format. Libreoffice might help to convert from xls.
#localc --headless  --convert-to xlsx somefile.xls

import sys
from copy import copy

from openpyxl import load_workbook,Workbook

def createNewWorkbook(manyWb):
    for wb in manyWb:
        for sheetName in wb.sheetnames:
            o = theOne.create_sheet(sheetName)
            safeTitle = o.title
            copySheet(wb[sheetName],theOne[safeTitle])

def copySheet(sourceSheet,newSheet):
    for row in sourceSheet.rows:
        for cell in row:
            newCell = newSheet.cell(row=cell.row, column=cell.col_idx,
                    value= cell.value)
            if cell.has_style:
                newCell.font = copy(cell.font)
                newCell.border = copy(cell.border)
                newCell.fill = copy(cell.fill)
                newCell.number_format = copy(cell.number_format)
                newCell.protection = copy(cell.protection)
                newCell.alignment = copy(cell.alignment)

filesInput = sys.argv[1:]
theOneFile = filesInput.pop(-1)
myfriends = [ load_workbook(f) for f in filesInput ]

#try this if you are bored
#myfriends = [ openpyxl.load_workbook(f) for k in range(200) for f in filesInput ]

theOne = Workbook()
del theOne['Sheet'] #We want our new book to be empty. Thanks.
createNewWorkbook(myfriends)
theOne.save(theOneFile)

Tested with openpyxl 2.5.4, python 3.4.

Файлы к уроку:

  • Для спонсоров Boosty
  • Для спонсоров VK
  • YouTube
  • VK

Ссылки:

  • Страница курса
  • Плейлист YouTube
  • Плейлист ВК

Описание

У нас есть несколько однообразных книг Excel. В каждой книге Excel находится несколько листов с единой структурой. Нам нужно одновременно объединить все книги и все листы в них в одну таблицу.

Решим эту задачу с помощью модуля pandas.

Решение

Сначала импортируем нужные модули.

# Импорт модулей
import pandas as pd
import os
import glob

Укажем директорию, в которой находятся файлы.

# Сменим директорию
os.chdir('data')

Создадим список книг для объединения.

# Список файлов Excel для объединения
xl_files = glob.glob('*.xlsx')

Создадим датафрейм, в который запишем таблицы.

# Читаем каждую книгу объединяем все листы в один датафрейм
combined = pd.DataFrame()

Читаем каждый файл и объединяем все таблицы.

# Цикл по файлам
for xl_file in xl_files:
    # Создать объект ExcelFile
    xl_file_obj = pd.ExcelFile(xl_file)
    # Цикл по листам
    for sheet_name in xl_file_obj.sheet_names:
        # Прочитать нужный лист книги
        data = pd.read_excel(xl_file_obj,
                             sheet_name=sheet_name)
        # Создадать столбец с названием книги
        data['workbook'] = xl_file
        # Создать столбец с названием листа
        data['sheet'] = sheet_name
        # Дописать в датафрейм combined
        combined = combined.append(data)

Запишем результат в книгу Excel.

combined.to_excel('sales_combined.xlsx',
                  index=False)

Примененные функции

  • os.getcwd
  • os.chdir
  • glob.glob
  • pandas.ExcelFile
  • pandas.DataFrame
  • pandas.read_excel
  • pandas.DataFrame.append
  • pandas.DataFrame.to_excel

Курс Python Практический

Номер урока Урок Описание
1 Python Практический. Скачиваем котировки В этом уроке мы научимся скачивать котировки с помощью модуля pandas_datareader.
2 Python Практический. Объединить книги Excel В этом уроке мы объединим много Excel файлов в один CSV файл с помощью Python
3 Python Практический. Объединить книги Excel Дополним урок по объединению большого количества XLSX файлов в один CSV при помощи Python. Добавим Progress Bar и вывод времени начала обработки каждого файла.
4 Python Практический. Создать Progress Bar В этом уроке мы научимся создавать Progress Bar в Python. Для этого воспользуемся модулем tqdm.
5 Python Практический. Объединить листы книги Excel Объединим множество листов одной книги Excel по вертикали.
6 Python Практический. Объединить книги Excel и листы в них Как объединить книги Excel и все листы в них в одну таблицу.
7 Python Практический. Объединить множество CSV Объединим множество CSV файлов по вертикали в один CSV файл.
8 Python Практический. Таблицы из множества интернет-страниц Извлечем таблицу из множества веб-страниц и объединим по вертикали.
9 Python Практический. Многостраничное извлечение таблиц с Requests и BS4 В этом уроке мы с помощью Python модулей Requests и BS4 извлечем таблицу из множества web-страниц, потом все эти таблицы объединим по вертикали в одну и запишем результат в Excel файл.
10 Python Практический. Скрапинг/Парсинг сайтов с Selenium и BS4 В этом уроке мы будем скрапить/парсить веб сайт с Python модулями Selenium и BF4.
11 Python Практический. Автоматизация браузера Python Selenium, Скрапинг, скачивание выписок ЕГРЮЛ В этом уроке мы познакомимся с модулем Selenium для Python. Он позволяет автоматизировать работу браузера, например, открывать веб-страницы, заполнять формы, получать содержимое блоков и скачивать файлы. Мы изучим основы Selenium на примере получения данных ЕГРЮЛ по ИНН и автоматическому скачиванию выписок ЕГРЮЛ.
12 Python Практический. Множественная замена текста с Pandas В этом уроке мы выполним множественную замена текста с помощью модуля Pandas

By Lenin Mishra
in
python

Jun 21, 2022

Combine multiple Excel files into one using Openpyxl module in Python 3.

Combining multiple Excel sheets into one in Python

Prerequisites

  1. Reading Excel data with Openpyxl
  2. Writing to Excel with Openpyxl

Let’s say you have a directory with multiple Excel files containing sales data of every month.

Objective — You would like to store all those separate Excel files as separate sheets in one Excel sheet named “yearly_sales.csv”.

Steps to achieve the objective

  1. Find the absolute path of the Excel files.
  2. Iterate through each file and create a sheet of the same name in your destination file “yearly_sales.xlsx”.
  3. Copy the data from the Excel file to the sheet.

Step 1 — Finding the absolute path of the Excel files

There are multiple ways to find the absolute path of files in a directory in Python. For this article, we will use os.walk() function.

Code

import os

dir_containing_files = "C:\Users\91824\PycharmProjects\pythonProject\sales_2020"

for root, dir, filenames in os.walk(dir_containing_files):
    for file in filenames:
        file_name = file.split('.')[0]
        # Absolute Path for Excel files
        file_path = os.path.abspath(os.path.join(root, file))

Step 2 — Creating sheets

First you need to create an Excel workbook — “yearly_sales.xlsx”. Use the file_name from the above step to create new sheets in “yearly_sales.xlsx”.

Code

import os
#=====New Code====#
from openpyxl import Workbook
#=================#

dir_containing_files = "C:\Users\91824\PycharmProjects\pythonProject\sales_2020"

#=====New Code====#
dest_wb = Workbook()
#=================#

for root, dir, filenames in os.walk(dir_containing_files):
    for file in filenames:
        file_name = file.split('.')[0]
        # Absolute Path for Excel files
        file_path = os.path.abspath(os.path.join(root, file))

        #=====New Code====#

        # Create new sheet in destination Workbook
        dest_wb.create_sheet(file_name)
        dest_ws = dest_wb[file_name]
        #=================#

#=====New Code====#
dest_wb.save("yearly_sales.xlsx")
#=================#

Output


Step 3 — Copying data to sheets

The final step is to copy data from each of those Excel files to the newly created sheets in “yearly_sales.xlsx”.

Code

import os
from openpyxl import Workbook

#=====New Code====#
from openpyxl import load_workbook
#=================#

dir_containing_files = "C:\Users\91824\PycharmProjects\pythonProject\sales_2020"

dest_wb = Workbook()

for root, dir, filenames in os.walk(dir_containing_files):
    for file in filenames:
        file_name = file.split('.')[0]
        # Absolute Path for Excel files
        file_path = os.path.abspath(os.path.join(root, file))

        # Create new sheet in destination Workbook
        dest_wb.create_sheet(file_name)
        dest_ws = dest_wb[file_name]

        # =====New Code====#

        # Read source data
        source_wb = load_workbook(file_path)
        source_sheet = source_wb.active
        for row in source_sheet.rows:
            for cell in row:
                dest_ws[cell.coordinate] = cell.value
        # =================#

dest_wb.save("yearly_sales.xlsx")

All the data should be copied to the sheets in “yearly_sales.xlsx”.

Last Updated on April 9, 2023 by

In this short tutorial, I’ll show you how to use Python to combine multiple Excel files into one master spreadsheet. Imagine that you have dozens of Excel files with the same data fields, and your job is to aggregate sheets from those files. Manually doing this job is super inefficient, and Python will save you a lot of time in the long run, so let’s all work smarter!

Note that this article talks about appending Excel files with the same format/data fields. Merging multiple dataset is a different task.

If you are new to Python, this series Integrate Python with Excel offers some tips on how to use Python to supercharge your Excel spreadsheets.

The workflow

To solve the problem, we’ll need to follow the below work flow:

  1. Identify the files we need to combine
  2. Get data from the file
  3. Move data from step 2) to a master dataset (we will call it “dataframe”)
  4. Report 2-3 for the number of files
  5. Save the master dataset into an Excel spreadsheet

Import libraries

Alright, let’s see how to code the above work flow in Python. For this exercise, we’ll need to use two Python libraries: os and pandas. If you want to follow along, feel free to grab the source code and files used in this tutorial from here. Although you can combine as many Excel files as you wish, we’ll use three files to demonstrate the process.

If you need help with installing Python or libraries, here’s a guide on how to do that.

os library gives a way of using operating system dependent functionalities. Such as manipulating folder and file paths. We use this library to get all the Excel file names, including their paths.

pandas library is the gold standard for data analysis and manipulation. It is fast, powerful, and flexible. We use this library to load Excel data into Python, manipulate data, and recreate the master spreadsheet.

We’ll start by importing these two libraries. Then find the current working directory, as well as all the file names within the directory.

import os
import pandas as pd
cwd = os.path.abspath('') 
files = os.listdir(cwd) 

python code showing current working directory

Getting current working directory and files within it

The variable cwd shows the path to the current working directory, and the variable files is a list of all the file names within the current working directory. Notice there are non-Excel files, and we don’t want to open those, so we’ll handle that soon.

Next, we create an empty dataframe df for storing the data for master spreadsheet. We loop through all the files within the current working directory, but only process the Excel files whose name ends with “.xlsx”. This is done by this line of code
if file.endswith('.xlsx'):

pd.read_excel() will read Excel data into Python and store it as a pandas DataFrame object. Be aware that this method reads only the first tab/sheet of the Excel file by default. If your Excel file contains more than 1 sheet, continue reading to the next section.

df.append() will append/combine data from one file to another. Think about copying a block of data from one Excel file and pasting it into another. Instead of opening up Excel, data is stored inside your computer’s memory.

df = pd.DataFrame()
for file in files:
     if file.endswith('.xlsx'):
         df = df.append(pd.read_excel(file), ignore_index=True) 
df.head()

The above code does the following:

  1. Loop through all the files in the current working directory, determine if a file is Excel by checking the file name ends with “.xlsx”.
  2. If yes, read the file content (data), and append/add it to the master dataframe variable called df.
  3. Save the master dataframe into an Excel spreadsheet.

We can examine the master dataframe by checking df.head(), which shows the first 5 rows of the data.

python code showing the first 5 rows of data

Checking the first 5 rows of data in the dataframe

Seems good! Just another quick check to make sure we have loaded everything in the DataFrame. df.shape will show us the dimension (36 rows, 5 columns) of the data:

python code showing the size of the master dataframe

Everything looks good, so let’s output the data back into Excel. The last line df.to_excel() will do that.

Combine multiple sheets from the same Excel file

I talked about the two techniques to read multiple sheets from the same Excel file, so I won’t repeat it. However, I’ll walk through an example here with a slightly different setting.

We have 2 files each contains a number of sheets. We don’t know how many sheets are in each file, but we know the format is the same for all sheets. Our goal is to aggregate all sheets into one spreadsheet (and one file).

The workflow is similar:

  1. Get all Excel files
  2. Loop through the Excel files
  3. For each file, loop through all sheets
  4. Read each sheet into a dataframe, then combine all dataframes together.
df_total = pd.DataFrame()
for file in files:  # loop through Excel files
    if file.endswith('.xlsx'):
        excel_file = pd.ExcelFile(file)
        sheets = excel_file.sheet_names
        for sheet in sheets: # loop through sheets inside an Excel file
            df = excel_file.parse(sheet_name = sheet)
            df_total = df_total.append(df)
df_total.to_excel('combined_file.xlsx')

Putting it all together

Below is the full code put together. 10 lines of code will help you combine all your Excel files or sheets into one master spreadsheet. Enjoy!

import os
import pandas as pd
cwd = os.path.abspath('') 
files = os.listdir(cwd)  

## Method 1 gets the first sheet of a given file
df = pd.DataFrame()
for file in files:
    if file.endswith('.xlsx'):
        df = df.append(pd.read_excel(file), ignore_index=True) 
df.head() 
df.to_excel('total_sales.xlsx')



## Method 2 gets all sheets of a given file
df_total = pd.DataFrame()
for file in files:                         # loop through Excel files
    if file.endswith('.xlsx'):
        excel_file = pd.ExcelFile(file)
        sheets = excel_file.sheet_names
        for sheet in sheets:               # loop through sheets inside an Excel file
            df = excel_file.parse(sheet_name = sheet)
            df_total = df_total.append(df)
df_total.to_excel('combined_file.xlsx')

Like this post? Please share to your friends:
  • Соединить excel по списку
  • Соединение текста ячеек в excel в одну
  • Соединение текста в ячейках excel
  • Соединение таблиц в excel по данным таблицы
  • Соединение таблиц в excel в одну таблицу