dreftymac
31.1k26 gold badges118 silver badges181 bronze badges
asked Oct 10, 2019 at 15:28
Here is one way to do it using XlsxWriter:
import pandas as pd
# Create a Pandas dataframe from some data.
data = [10, 20, 30, 40, 50, 60, 70, 80]
df = pd.DataFrame({'Rank': data,
'Country': data,
'Population': data,
'Data1': data,
'Data2': data})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter("pandas_table.xlsx", engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object. Turn off the default
# header and index and skip one row to allow us to insert a user defined
# header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Get the dimensions of the dataframe.
(max_row, max_col) = df.shape
# Create a list of column headers, to use in add_table().
column_settings = []
for header in df.columns:
column_settings.append({'header': header})
# Add the table.
worksheet.add_table(0, 0, max_row, max_col - 1, {'columns': column_settings})
# Make the columns wider for clarity.
worksheet.set_column(0, max_col - 1, 12)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
Update: I’ve added a similar example to the XlsxWriter docs: Example: Pandas Excel output with a worksheet table
answered Aug 10, 2020 at 19:54
jmcnamarajmcnamara
37k6 gold badges86 silver badges105 bronze badges
1
You can’t do it with to_excel
. A workaround is to open the generated xlsx file and add the table there with openpyxl:
import pandas as pd
df = pd.DataFrame({'Col1': [1,2,3], 'Col2': list('abc')})
filename = 'so58326392.xlsx'
sheetname = 'mySheet'
with pd.ExcelWriter(filename) as writer:
if not df.index.name:
df.index.name = 'Index'
df.to_excel(writer, sheet_name=sheetname)
import openpyxl
wb = openpyxl.load_workbook(filename = filename)
tab = openpyxl.worksheet.table.Table(displayName="df", ref=f'A1:{openpyxl.utils.get_column_letter(df.shape[1])}{len(df)+1}')
wb[sheetname].add_table(tab)
wb.save(filename)
Please note the all table headers must be strings. If you have an un-named index (which is the rule) the first cell (A1) will be empty which leads to file corruption. To avoid this give your index a name (as shown above) or export the dataframe without the index using:
df.to_excel(writer, sheet_name=sheetname, index=False)
answered Oct 10, 2019 at 16:37
StefStef
28.2k2 gold badges23 silver badges51 bronze badges
3
Another workaround, if you don’t want to save, re-open, and re-save, is to use xlsxwriter. It can write ListObject tables directly, but does not do so directly from a dataframe, so you need to break out the parts:
import pandas as pd
import xlsxwriter as xl
df = pd.DataFrame({'Col1': [1,2,3], 'Col2': list('abc')})
filename = 'output.xlsx'
sheetname = 'Table'
tablename = 'TEST'
(rows, cols) = df.shape
data = df.to_dict('split')['data']
headers = []
for col in df.columns:
headers.append({'header':col})
wb = xl.Workbook(filename)
ws = wb.add_worksheet()
ws.add_table(0, 0, rows, cols-1,
{'name': tablename
,'data': data
,'columns': headers})
wb.close()
The add_table()
function expects 'data'
as a list of lists, where each sublist represents a row of the dataframe, and 'columns'
as a list of dicts for the header where each column is specified by a dictionary of the form {'header': 'ColumnName'}
.
answered Aug 10, 2020 at 19:17
Rob BulmahnRob Bulmahn
9957 silver badges10 bronze badges
I created a package to write properly formatted excel tables from pandas: pandas-xlsx-tables
from pandas_xlsx_tables import df_to_xlsx_table
import pandas as pd
data = [10, 20, 30, 40, 50, 60, 70, 80]
df = pd.DataFrame({'Rank': data,
'Country': data,
'Population': data,
'Strings': [f"n{n}" for n in data],
'Datetimes': [pd.Timestamp.now() for _ in range(len(data))]})
df_to_xlsx_table(df, "my_table", index=False, header_orientation="diagonal")
You can also do the reverse with xlsx_table_to_df
answered Oct 22, 2021 at 19:05
Thijs DThijs D
7623 silver badges20 bronze badges
1
Based on the answer of @jmcnamara, but as a convenient function and using «with» statement:
import pandas as pd
def to_excel(df:pd.DataFrame, excel_name: str, sheet_name: str, startrow=1, startcol=0):
""" Exports pandas dataframe as a formated excel table """
with pd.ExcelWriter(excel_name, engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name=sheet_name, startrow=startrow, startcol=startcol, header=True, index=False)
workbook = writer.book
worksheet = writer.sheets[sheet_name]
max_row, max_col = df.shape
olumn_settings = [{'header': header} for header in df.columns]
worksheet.add_table(startrow, startcol, max_row+startrow, max_col+startcol-1, {'columns': column_settings})
# style columns
worksheet.set_column(startcol, max_col + startcol, 21)
answered Sep 21, 2022 at 21:16
Ziur OlpaZiur Olpa
1,5221 gold badge11 silver badges25 bronze badges
Write Excel with Python Pandas. You can write any data (lists, strings, numbers etc) to Excel, by first converting it into a Pandas DataFrame and then writing the DataFrame to Excel.
To export a Pandas DataFrame as an Excel file (extension: .xlsx, .xls), use the to_excel()
method.
Related course: Data Analysis with Python Pandas
installxlwt, openpyxl
to_excel()
uses a library called xlwt and openpyxl internally.
- xlwt is used to write .xls files (formats up to Excel2003)
- openpyxl is used to write .xlsx (Excel2007 or later formats).
Both can be installed with pip. (pip3 depending on the environment)
1 |
$ pip install xlwt |
Write Excel
Write DataFrame to Excel file
Importing openpyxl is required if you want to append it to an existing Excel file described at the end.
A dataframe is defined below:
1 |
import pandas as pd |
You can specify a path as the first argument of the to_excel() method
.
Note: that the data in the original file is deleted when overwriting.
The argument new_sheet_name
is the name of the sheet. If omitted, it will be named Sheet1
.
1 |
df.to_excel('pandas_to_excel.xlsx', sheet_name='new_sheet_name') |
Related course: Data Analysis with Python Pandas
If you do not need to write index (row name), columns (column name), the argument index, columns is False.
1 |
df.to_excel('pandas_to_excel_no_index_header.xlsx', index=False, header=False) |
Write multiple DataFrames to Excel files
The ExcelWriter object allows you to use multiple pandas. DataFrame objects can be exported to separate sheets.
As an example, pandas. Prepare another DataFrame object.
1 |
df2 = df[['a', 'c']] |
Then use the ExcelWriter() function like this:
1 |
with pd.ExcelWriter('pandas_to_excel.xlsx') as writer: |
You don’t need to call writer.save(), writer.close() within the blocks.
Append to an existing Excel file
You can append a DataFrame to an existing Excel file. The code below opens an existing file, then adds two sheets with the data of the dataframes.
Note: Because it is processed using openpyxl, only .xlsx files are included.
1 |
path = 'pandas_to_excel.xlsx' |
Related course: Data Analysis with Python Pandas
Python Pandas is a Python data analysis
library. It can read, filter and re-arrange small and large data sets and
output them in a range of formats including Excel.
Pandas writes Excel xlsx files using either openpyxl or XlsxWriter.
Using XlsxWriter with Pandas
To use XlsxWriter with Pandas you specify it as the Excel writer engine:
import pandas as pd # Create a Pandas dataframe from the data. df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]}) # Create a Pandas Excel writer using XlsxWriter as the engine. writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter') # Convert the dataframe to an XlsxWriter Excel object. df.to_excel(writer, sheet_name='Sheet1') # Close the Pandas Excel writer and output the Excel file. writer.close()
The output from this would look like the following:
See the full example at Example: Pandas Excel example.
Accessing XlsxWriter from Pandas
In order to apply XlsxWriter features such as Charts, Conditional Formatting
and Column Formatting to the Pandas output we need to access the underlying
workbook and worksheet objects. After
that we can treat them as normal XlsxWriter objects.
Continuing on from the above example we do that as follows:
import pandas as pd # Create a Pandas dataframe from the data. df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]}) # Create a Pandas Excel writer using XlsxWriter as the engine. writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter') # Convert the dataframe to an XlsxWriter Excel object. df.to_excel(writer, sheet_name='Sheet1') # Get the xlsxwriter objects from the dataframe writer object. workbook = writer.book worksheet = writer.sheets['Sheet1']
This is equivalent to the following code when using XlsxWriter on its own:
workbook = xlsxwriter.Workbook('filename.xlsx') worksheet = workbook.add_worksheet()
The Workbook and Worksheet objects can then be used to access other XlsxWriter
features, see below.
Adding Charts to Dataframe output
Once we have the Workbook and Worksheet objects, as shown in the previous
section, we we can use them to apply other features such as adding a chart:
# Get the xlsxwriter objects from the dataframe writer object. workbook = writer.book worksheet = writer.sheets['Sheet1'] # Create a chart object. chart = workbook.add_chart({'type': 'column'}) # Get the dimensions of the dataframe. (max_row, max_col) = df.shape # Configure the series of the chart from the dataframe data. chart.add_series({'values': ['Sheet1', 1, 1, max_row, 1]}) # Insert the chart into the worksheet. worksheet.insert_chart(1, 3, chart)
The output would look like this:
See the full example at Example: Pandas Excel output with a chart.
Formatting of the Dataframe output
XlsxWriter and Pandas provide very little support for formatting the output
data from a dataframe apart from default formatting such as the header and
index cells and any cells that contain dates or datetimes. In addition it
isn’t possible to format any cells that already have a default format applied.
If you require very controlled formatting of the dataframe output then you
would probably be better off using Xlsxwriter directly with raw data taken
from Pandas. However, some formatting options are available.
For example it is possible to set the default date and datetime formats via
the Pandas interface:
writer = pd.ExcelWriter("pandas_datetime.xlsx", engine='xlsxwriter', datetime_format='mmm d yyyy hh:mm:ss', date_format='mmmm dd yyyy')
Which would give:
See the full example at Example: Pandas Excel output with datetimes.
It is possible to format any other, non date/datetime column data using
set_column()
:
# Add some cell formats. format1 = workbook.add_format({'num_format': '#,##0.00'}) format2 = workbook.add_format({'num_format': '0%'}) # Set the column width and format. worksheet.set_column(1, 1, 18, format1) # Set the format but not the column width. worksheet.set_column(2, 2, None, format2)
See the full example at Example: Pandas Excel output with column formatting.
Adding a Dataframe to a Worksheet Table
As explained in Working with Worksheet Tables, tables in Excel are a way of grouping a range
of cells into a single entity, like this:
The way to do this with a Pandas dataframe is to first write the data without
the index or header, and by starting 1 row forward to allow space for the
table header:
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
We then create a list of headers to use in add_table()
:
column_settings = [{'header': column} for column in df.columns]
Finally we add the Excel table structure, based on the dataframe shape and
with the column headers we generated from the dataframe columns:
(max_row, max_col) = df.shape worksheet.add_table(0, 0, max_row, max_col - 1, {'columns': column_settings})
See the full example at Example: Pandas Excel output with a worksheet table.
Adding an autofilter to a Dataframe output
As explained in Working with Autofilters, autofilters in Excel are a
way of filtering a 2d range of data to only display rows that match a user
defined criteria.
The way to do this with a Pandas dataframe is to first write the data without
the index (unless you want to include it in the filtered data):
df.to_excel(writer, sheet_name='Sheet1', index=False)
We then get the dataframe shape and add the autofilter:
worksheet.autofilter(0, 0, max_row, max_col - 1)
We can also add an optional filter criteria. The placeholder “Region” in the
filter is ignored and can be any string that adds clarity to the expression:
worksheet.filter_column(0, 'Region == East')
However, it isn’t enough to just apply the criteria. The rows that don’t match
must also be hidden. We use Pandas to figure our which rows to hide:
for row_num in (df.index[(df['Region'] != 'East')].tolist()): worksheet.set_row(row_num + 1, options={'hidden': True})
This gives us a filtered worksheet like this:
See the full example at Example: Pandas Excel output with an autofilter.
Handling multiple Pandas Dataframes
It is possible to write more than one dataframe to a worksheet or to several
worksheets. For example to write multiple dataframes to multiple worksheets:
# Write each dataframe to a different worksheet. df1.to_excel(writer, sheet_name='Sheet1') df2.to_excel(writer, sheet_name='Sheet2') df3.to_excel(writer, sheet_name='Sheet3')
See the full example at Example: Pandas Excel with multiple dataframes.
It is also possible to position multiple dataframes within the same
worksheet:
# Position the dataframes in the worksheet. df1.to_excel(writer, sheet_name='Sheet1') # Default position, cell A1. df2.to_excel(writer, sheet_name='Sheet1', startcol=3) df3.to_excel(writer, sheet_name='Sheet1', startrow=6) # Write the dataframe without the header and index. df4.to_excel(writer, sheet_name='Sheet1', startrow=7, startcol=4, header=False, index=False)
See the full example at Example: Pandas Excel dataframe positioning.
Passing XlsxWriter constructor options to Pandas
XlsxWriter supports several Workbook()
constructor options such as
strings_to_urls()
. These can also be applied to the Workbook
object
created by Pandas using the engine_kwargs
keyword:
writer = pd.ExcelWriter('pandas_example.xlsx', engine='xlsxwriter', engine_kwargs={'options': {'strings_to_numbers': True}})
Note, versions of Pandas prior to 1.3.0 used this syntax:
writer = pd.ExcelWriter('pandas_example.xlsx', engine='xlsxwriter', options={'strings_to_numbers': True})
Saving the Dataframe output to a string
It is also possible to write the Pandas XlsxWriter DataFrame output to a
byte array:
import pandas as pd import io # Create a Pandas dataframe from the data. df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]}) output = io.BytesIO() # Use the BytesIO object as the filehandle. writer = pd.ExcelWriter(output, engine='xlsxwriter') # Write the data frame to the BytesIO object. df.to_excel(writer, sheet_name='Sheet1') writer.close() xlsx_data = output.getvalue() # Do something with the data...
Note: This feature requires Pandas >= 0.17.
In this tutorial, you’ll learn how to save your Pandas DataFrame or DataFrames to Excel files. Being able to save data to this ubiquitous data format is an important skill in many organizations. In this tutorial, you’ll learn how to save a simple DataFrame to Excel, but also how to customize your options to create the report you want!
By the end of this tutorial, you’ll have learned:
- How to save a Pandas DataFrame to Excel
- How to customize the sheet name of your DataFrame in Excel
- How to customize the index and column names when writing to Excel
- How to write multiple DataFrames to Excel in Pandas
- Whether to merge cells or freeze panes when writing to Excel in Pandas
- How to format missing values and infinity values when writing Pandas to Excel
Let’s get started!
The Quick Answer: Use Pandas to_excel
To write a Pandas DataFrame to an Excel file, you can apply the .to_excel()
method to the DataFrame, as shown below:
# Saving a Pandas DataFrame to an Excel File
# Without a Sheet Name
df.to_excel(file_name)
# With a Sheet Name
df.to_excel(file_name, sheet_name='My Sheet')
# Without an Index
df.to_excel(file_name, index=False)
Understanding the Pandas to_excel Function
Before diving into any specifics, let’s take a look at the different parameters that the method offers. The method provides a ton of different options, allowing you to customize the output of your DataFrame in many different ways. Let’s take a look:
# The many parameters of the .to_excel() function
df.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None, storage_options=None)
Let’s break down what each of these parameters does:
Parameter | Description | Available Options |
---|---|---|
excel_writer= |
The path of the ExcelWriter to use | path-like, file-like, or ExcelWriter object |
sheet_name= |
The name of the sheet to use | String representing name, default ‘Sheet1’ |
na_rep= |
How to represent missing data | String, default '' |
float_format= |
Allows you to pass in a format string to format floating point values | String |
columns= |
The columns to use when writing to the file | List of strings. If blank, all will be written |
header= |
Accepts either a boolean or a list of values. If a boolean, will either include the header or not. If a list of values is provided, aliases will be used for the column names. | Boolean or list of values |
index= |
Whether to include an index column or not. | Boolean |
index_label= |
Column labels to use for the index. | String or list of strings. |
startrow= |
The upper left cell to start the DataFrame on. | Integer, default 0 |
startcol= |
The upper left column to start the DataFrame on | Integer, default 0 |
engine= |
The engine to use to write. | openpyxl or xlsxwriter |
merge_cells= |
Whether to write multi-index cells or hierarchical rows as merged cells | Boolean, default True |
encoding= |
The encoding of the resulting file. | String |
inf_rep= |
How to represent infinity values (as Excel doesn’t have a representation) | String, default 'inf' |
verbose= |
Whether to display more information in the error logs. | Boolean, default True |
freeze_panes= |
Allows you to pass in a tuple of the row, column to start freezing panes on | Tuple of integers with length 2 |
storage_options= |
Extra options that allow you to save to a particular storage connection | Dictionary |
.to_excel()
methodHow to Save a Pandas DataFrame to Excel
The easiest way to save a Pandas DataFrame to an Excel file is by passing a path to the .to_excel()
method. This will save the DataFrame to an Excel file at that path, overwriting an Excel file if it exists already.
Let’s take a look at how this works:
# Saving a Pandas DataFrame to an Excel File
import pandas as pd
df = pd.DataFrame.from_dict(
{'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
)
df.to_excel('filename.xlsx')
Running the code as shown above will save the file with all other default parameters. This returns the following image:
You can specify a sheetname by using the sheet_name=
parameter. By default, Pandas will use 'sheet1'
.
# Specifying a Sheet Name When Saving to Excel
import pandas as pd
df = pd.DataFrame.from_dict(
{'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
)
df.to_excel('filename.xlsx', sheet_name='Your Sheet')
This returns the following workbook:
In the following section, you’ll learn how to customize whether to include an index column or not.
How to Include an Index when Saving a Pandas DataFrame to Excel
By default, Pandas will include the index when saving a Pandas Dataframe to an Excel file. This can be helpful when the index is a meaningful index (such as a date and time). However, in many cases, the index will simply represent the values from 0 through to the end of the records.
If you don’t want to include the index in your Excel file, you can use the index=
parameter, as shown below:
# How to exclude the index when saving a DataFrame to Excel
import pandas as pd
df = pd.DataFrame.from_dict(
{'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
)
df.to_excel('filename.xlsx', index=False)
This returns the following Excel file:
In the following section, you’ll learn how to rename an index when saving a Pandas DataFrame to an Excel file.
How to Rename an Index when Saving a Pandas DataFrame to Excel
By default, Pandas will not named the index of your DataFrame. This, however, can be confusing and can lead to poorer results when trying to manipulate the data in Excel, either by filtering or by pivoting the data. Because of this, it can be helpful to provide a name or names for your indices.
Pandas makes this easy by using the index_label=
parameter. This parameter accepts either a single string (for a single index) or a list of strings (for a multi-index). Check out below how you can use this parameter:
# Providing a name for your Pandas index
import pandas as pd
df = pd.DataFrame.from_dict(
{'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
)
df.to_excel('filename.xlsx', index_label='Your Index')
This returns the following sheet:
How to Save Multiple DataFrames to Different Sheets in Excel
One of the tasks you may encounter quite frequently is the need to save multi Pandas DataFrames to the same Excel file, but in different sheets. This is where Pandas makes it a less intuitive. If you were to simply write the following code, the second command would overwrite the first command:
# The wrong way to save multiple DataFrames to the same workbook
import pandas as pd
df = pd.DataFrame.from_dict(
{'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
)
df.to_excel('filename.xlsx', sheet_name='Sheet1')
df.to_excel('filename.xlsx', sheet_name='Sheet2')
Instead, we need to use a Pandas Excel Writer to manage opening and saving our workbook. This can be done easily by using a context manager, as shown below:
# The Correct Way to Save Multiple DataFrames to the Same Workbook
import pandas as pd
df = pd.DataFrame.from_dict(
{'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
)
with pd.ExcelWriter('filename.xlsx') as writer:
df.to_excel(writer, sheet_name='Sheet1')
df.to_excel(writer, sheet_name='Sheet2')
This will create multiple sheets in the same workbook. The sheets will be created in the same order as you specify them in the command above.
This returns the following workbook:
How to Save Only Some Columns when Exporting Pandas DataFrames to Excel
When saving a Pandas DataFrame to an Excel file, you may not always want to save every single column. In many cases, the Excel file will be used for reporting and it may be redundant to save every column. Because of this, you can use the columns=
parameter to accomplish this.
Let’s see how we can save only a number of columns from our dataset:
# Saving Only a Subset of Columns to Excel
import pandas as pd
df = pd.DataFrame.from_dict(
{'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
)
df.to_excel('filename.xlsx', columns=['A', 'B'])
This returns the following Excel file:
How to Rename Columns when Exporting Pandas DataFrames to Excel
Continuing our discussion about how to handle Pandas DataFrame columns when exporting to Excel, we can also rename our columns in the saved Excel file. The benefit of this is that we can work with aliases in Pandas, which may be easier to write, but then output presentation-ready column names when saving to Excel.
We can accomplish this using the header=
parameter. The parameter accepts either a boolean value of a list of values. If a boolean value is passed, you can decide whether to include or a header or not. When a list of strings is provided, then you can modify the column names in the resulting Excel file, as shown below:
# Modifying Column Names when Exporting a Pandas DataFrame to Excel
import pandas as pd
df = pd.DataFrame.from_dict(
{'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
)
df.to_excel('filename.xlsx', header=['New_A', 'New_B', 'New_C'])
This returns the following Excel sheet:
How to Specify Starting Positions when Exporting a Pandas DataFrame to Excel
One of the interesting features that Pandas provides is the ability to modify the starting position of where your DataFrame will be saved on the Excel sheet. This can be helpful if you know you’ll be including different rows above your data or a logo of your company.
Let’s see how we can use the startrow=
and startcol=
parameters to modify this:
# Changing the Start Row and Column When Saving a DataFrame to an Excel File
import pandas as pd
df = pd.DataFrame.from_dict(
{'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
)
df.to_excel('filename.xlsx', startcol=3, startrow=2)
This returns the following worksheet:
How to Represent Missing and Infinity Values When Saving Pandas DataFrame to Excel
In this section, you’ll learn how to represent missing data and infinity values when saving a Pandas DataFrame to Excel. Because Excel doesn’t have a way to represent infinity, Pandas will default to the string 'inf'
to represent any values of infinity.
In order to modify these behaviors, we can use the na_rep=
and inf_rep=
parameters to modify the missing and infinity values respectively. Let’s see how we can do this by adding some of these values to our DataFrame:
# Customizing Output of Missing and Infinity Values When Saving to Excel
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict(
{'A': [1, np.NaN, 3], 'B': [4, 5, np.inf], 'C': [7, 8, 9]}
)
df.to_excel('filename.xlsx', na_rep='NA', inf_rep='INFINITY')
This returns the following worksheet:
How to Merge Cells when Writing Multi-Index DataFrames to Excel
In this section, you’ll learn how to modify the behavior of multi-index DataFrames when saved to Excel. By default Pandas will set the merge_cells=
parameter to True
, meaning that the cells will be merged. Let’s see what happens when we set this behavior to False
, indicating that the cells should not be merged:
# Modifying Merge Cell Behavior for Multi-Index DataFrames
import pandas as pd
import numpy as np
from random import choice
df = pd.DataFrame.from_dict({
'A': np.random.randint(0, 10, size=50),
'B': [choice(['a', 'b', 'c']) for i in range(50)],
'C': np.random.randint(0, 3, size=50)})
pivot = df.pivot_table(index=['B', 'C'], values='A')
pivot.to_excel('filename.xlsx', merge_cells=False)
This returns the Excel worksheet below:
How to Freeze Panes when Saving a Pandas DataFrame to Excel
In this final section, you’ll learn how to freeze panes in your resulting Excel worksheet. This allows you to specify the row and column at which you want Excel to freeze the panes. This can be done using the freeze_panes=
parameter. The parameter accepts a tuple of integers (of length 2). The tuple represents the bottommost row and the rightmost column that is to be frozen.
Let’s see how we can use the freeze_panes=
parameter to freeze our panes in Excel:
# Freezing Panes in an Excel Workbook Using Pandas
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict(
{'A': [1, np.NaN, 3], 'B': [4, 5, np.inf], 'C': [7, 8, 9]}
)
df.to_excel('filename.xlsx', freeze_panes=(3,4))
This returns the following workbook:
Conclusion
In this tutorial, you learned how to save a Pandas DataFrame to an Excel file using the to_excel method. You first explored all of the different parameters that the function had to offer at a high level. Following that, you learned how to use these parameters to gain control over how the resulting Excel file should be saved. For example, you learned how to specify sheet names, index names, and whether to include the index or not. Then you learned how to include only some columns in the resulting file and how to rename the columns of your DataFrame. You also learned how to modify the starting position of the data and how to freeze panes.
Additional Resources
To learn more about related topics, check out the tutorials below:
- How to Use Pandas to Read Excel Files in Python
- Pandas Dataframe to CSV File – Export Using .to_csv()
- Introduction to Pandas for Data Science
- Official Documentation: Pandas to_excel
Время чтения 4 мин.
Python Pandas — это библиотека для анализа данных. Она может читать, фильтровать и переупорядочивать небольшие и большие наборы данных и выводить их в различных форматах, включая Excel. ExcelWriter() определен в библиотеке Pandas.
Содержание
- Что такое функция Pandas.ExcelWriter() в Python?
- Синтаксис
- Параметры
- Возвращаемое значение
- Пример программы с Pandas ExcelWriter()
- Что такое функция Pandas DataFrame to_excel()?
- Запись нескольких DataFrames на несколько листов
- Заключение
Метод Pandas.ExcelWriter() — это класс для записи объектов DataFrame в файлы Excel в Python. ExcelWriter() можно использовать для записи текста, чисел, строк, формул. Он также может работать на нескольких листах.Для данного примера необходимо, чтоб вы установили на свой компьютер библиотеки Numpy и Pandas.
Синтаксис
pandas.ExcelWriter(path, engine= None, date_format=None, datetime_format=None, mode=’w’,**engine_krawgs) |
Параметры
Все параметры установлены на значения по умолчанию.
Функция Pandas.ExcelWriter() имеет пять параметров.
- path: имеет строковый тип, указывающий путь к файлу xls или xlsx.
- engine: он также имеет строковый тип и является необязательным. Это движок для написания.
- date_format: также имеет строковый тип и имеет значение по умолчанию None. Он форматирует строку для дат, записанных в файлы Excel.
- datetime_format: также имеет строковый тип и имеет значение по умолчанию None. Он форматирует строку для объектов даты и времени, записанных в файлы Excel.
- Mode: это режим файла для записи или добавления. Его значение по умолчанию — запись, то есть ‘w’.
Возвращаемое значение
Он экспортирует данные в файл Excel.
Пример программы с Pandas ExcelWriter()
Вам необходимо установить и импортировать модуль xlsxwriter. Если вы используете блокнот Jupyter, он вам не понадобится; в противном случае вы должны установить его.
Напишем программу, показывающую работу ExcelWriter() в Python.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
import pandas as pd import numpy as np import xlsxwriter # Creating dataset using dictionary data_set = { ‘Name’: [‘Rohit’, ‘Arun’, ‘Sohit’, ‘Arun’, ‘Shubh’], ‘Roll no’: [’01’, ’02’, ’03’, ’04’, np.nan], ‘maths’: [’93’, ’63’, np.nan, ’94’, ’83’], ‘science’: [’88’, np.nan, ’66’, ’94’, np.nan], ‘english’: [’93’, ’74’, ’84’, ’92’, ’87’]} # Converting into dataframe df = pd.DataFrame(data_set) # Writing the data into the excel sheet writer_obj = pd.ExcelWriter(‘Write.xlsx’, engine=‘xlsxwriter’) df.to_excel(writer_obj, sheet_name=‘Sheet’) writer_obj.save() print(‘Please check out the Write.xlsx file.’) |
Выход:
Please check out the Write.xlsx file. |
Содержимое файла Excel следующее.
В приведенном выше коде мы создали DataFrame, в котором хранятся данные студентов. Затем мы создали объект для записи данных DataFrame на лист Excel, и после записи данных мы сохранили лист. Некоторые значения в приведенном выше листе Excel пусты, потому что в DataFrame эти значения — np.nan. Чтобы проверить данные DataFrame, проверьте лист Excel.
Что такое функция Pandas DataFrame to_excel()?
Функция Pandas DataFrame to_excel() записывает объект на лист Excel. Мы использовали функцию to_excel() в приведенном выше примере, потому что метод ExcelWriter() возвращает объект записи, а затем мы используем метод DataFrame.to_excel() для его экспорта в файл Excel.
Чтобы записать один объект в файл Excel .xlsx, необходимо только указать имя целевого файла. Для записи на несколько листов необходимо создать объект ExcelWriter с именем целевого файла и указать лист в файле для записи.
На несколько листов можно записать, указав уникальное имя листа. При записи всех данных в файл необходимо сохранить изменения. Обратите внимание, что создание объекта ExcelWriter с уже существующим именем файла приведет к удалению содержимого существующего файла.
Мы также можем написать приведенный выше пример, используя Python с оператором.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
import pandas as pd import numpy as np import xlsxwriter # Creating dataset using dictionary data_set = { ‘Name’: [‘Rohit’, ‘Arun’, ‘Sohit’, ‘Arun’, ‘Shubh’], ‘Roll no’: [’01’, ’02’, ’03’, ’04’, np.nan], ‘maths’: [’93’, ’63’, np.nan, ’94’, ’83’], ‘science’: [’88’, np.nan, ’66’, ’94’, np.nan], ‘english’: [’93’, ’74’, ’84’, ’92’, ’87’]} # Converting into dataframe df = pd.DataFrame(data_set) with pd.ExcelWriter(‘WriteWith.xlsx’, engine=‘xlsxwriter’) as writer: df.to_excel(writer, sheet_name=‘Sheet’) print(‘Please check out the WriteWith.xlsx file.’) |
Выход:
Please check out the WriteWith.xlsx file. |
Вы можете проверить файл WriteWith.xlsx и просмотреть его содержимое. Это будет то же самое, что и файл Write.xlsx.
Запись нескольких DataFrames на несколько листов
В приведенном выше примере мы видели только один лист для одного фрейма данных. Мы можем написать несколько фреймов с несколькими листами, используя Pandas.ExcelWriter.
Давайте напишем пример, в котором мы создадим три DataFrames и сохраним эти DataFrames в файле multiplesheet.xlsx с тремя разными листами.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
import pandas as pd import numpy as np import xlsxwriter # Creating dataset using dictionary data_set = { ‘Name’: [‘Rohit’, ‘Arun’, ‘Sohit’, ‘Arun’, ‘Shubh’], ‘Roll no’: [’01’, ’02’, ’03’, ’04’, np.nan], ‘maths’: [’93’, ’63’, np.nan, ’94’, ’83’], ‘science’: [’88’, np.nan, ’66’, ’94’, np.nan], ‘english’: [’93’, ’74’, ’84’, ’92’, ’87’]} data_set2 = { ‘Name’: [‘Ankit’, ‘Krunal’, ‘Rushabh’, ‘Dhaval’, ‘Nehal’], ‘Roll no’: [’01’, ’02’, ’03’, ’04’, np.nan], ‘maths’: [’93’, ’63’, np.nan, ’94’, ’83’], ‘science’: [’88’, np.nan, ’66’, ’94’, np.nan], ‘english’: [’93’, ’74’, ’84’, ’92’, ’87’]} data_set3 = { ‘Name’: [‘Millie’, ‘Jane’, ‘Michael’, ‘Bobby’, ‘Brown’], ‘Roll no’: [’01’, ’02’, ’03’, ’04’, np.nan], ‘maths’: [’93’, ’63’, np.nan, ’94’, ’83’], ‘science’: [’88’, np.nan, ’66’, ’94’, np.nan], ‘english’: [’93’, ’74’, ’84’, ’92’, ’87’]} # Converting into dataframe df = pd.DataFrame(data_set) df2 = pd.DataFrame(data_set2) df3 = pd.DataFrame(data_set3) with pd.ExcelWriter(‘multiplesheet.xlsx’, engine=‘xlsxwriter’) as writer: df.to_excel(writer, sheet_name=‘Sheet’) df2.to_excel(writer, sheet_name=‘Sheet2’) df3.to_excel(writer, sheet_name=‘Sheet3’) print(‘Please check out the multiplesheet.xlsx file.’) |
Выход:
Вы можете видеть, что есть три листа, и каждый лист имеет разные столбцы имени.
Функция to_excel() принимает имя листа в качестве параметра, и здесь мы можем передать три разных имени листа, и этот DataFrame сохраняется на соответствующих листах.
Заключение
Если вы хотите экспортировать Pandas DataFrame в файлы Excel, вам нужен только класс ExcelWriter(). Класс ExcelWrite() предоставляет объект записи, а затем мы можем использовать функцию to_excel() для экспорта DataFrame в файл Excel.
Introduction
Just like with all other types of files, you can use the Pandas library to read and write Excel files using Python as well. In this short tutorial, we are going to discuss how to read and write Excel files via DataFrame
s.
In addition to simple reading and writing, we will also learn how to write multiple DataFrame
s into an Excel file, how to read specific rows and columns from a spreadsheet, and how to name single and multiple sheets within a file before doing anything.
If you’d like to learn more about other file types, we’ve got you covered:
- Reading and Writing JSON Files in Python with Pandas
- Reading and Writing CSV Files in Python with Pandas
Reading and Writing Excel Files in Python with Pandas
Naturally, to use Pandas, we first have to install it. The easiest method to install it is via pip
.
If you’re running Windows:
$ python pip install pandas
If you’re using Linux or MacOS:
$ pip install pandas
Note that you may get a ModuleNotFoundError
or ImportError
error when running the code in this article. For example:
ModuleNotFoundError: No module named 'openpyxl'
If this is the case, then you’ll need to install the missing module(s):
$ pip install openpyxl xlsxwriter xlrd
Writing Excel Files Using Pandas
We’ll be storing the information we’d like to write to an Excel file in a DataFrame
. Using the built-in to_excel()
function, we can extract this information into an Excel file.
First, let’s import the Pandas module:
import pandas as pd
Now, let’s use a dictionary to populate a DataFrame
:
df = pd.DataFrame({'States':['California', 'Florida', 'Montana', 'Colorodo', 'Washington', 'Virginia'],
'Capitals':['Sacramento', 'Tallahassee', 'Helena', 'Denver', 'Olympia', 'Richmond'],
'Population':['508529', '193551', '32315', '619968', '52555', '227032']})
The keys in our dictionary will serve as column names. Similarly, the values become the rows containing the information.
Now, we can use the to_excel()
function to write the contents to a file. The only argument is the file path:
df.to_excel('./states.xlsx')
Here’s the Excel file that was created:
Please note that we are not using any parameters in our example. Therefore, the sheet within the file retains its default name — «Sheet 1». As you can see, our Excel file has an additional column containing numbers. These numbers are the indices for each row, coming straight from the Pandas DataFrame
.
We can change the name of our sheet by adding the sheet_name
parameter to our to_excel()
call:
df.to_excel('./states.xlsx', sheet_name='States')
Similarly, adding the index
parameter and setting it to False
will remove the index column from the output:
df.to_excel('./states.xlsx', sheet_name='States', index=False)
Now, the Excel file looks like this:
Writing Multiple DataFrames to an Excel File
It is also possible to write multiple dataframes to an Excel file. If you’d like to, you can set a different sheet for each dataframe as well:
income1 = pd.DataFrame({'Names': ['Stephen', 'Camilla', 'Tom'],
'Salary':[100000, 70000, 60000]})
income2 = pd.DataFrame({'Names': ['Pete', 'April', 'Marty'],
'Salary':[120000, 110000, 50000]})
income3 = pd.DataFrame({'Names': ['Victor', 'Victoria', 'Jennifer'],
'Salary':[75000, 90000, 40000]})
income_sheets = {'Group1': income1, 'Group2': income2, 'Group3': income3}
writer = pd.ExcelWriter('./income.xlsx', engine='xlsxwriter')
for sheet_name in income_sheets.keys():
income_sheets[sheet_name].to_excel(writer, sheet_name=sheet_name, index=False)
writer.save()
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Here, we’ve created 3 different dataframes containing various names of employees and their salaries as data. Each of these dataframes is populated by its respective dictionary.
We’ve combined these three within the income_sheets
variable, where each key is the sheet name, and each value is the DataFrame
object.
Finally, we’ve used the xlsxwriter
engine to create a writer
object. This object is passed to the to_excel()
function call.
Before we even write anything, we loop through the keys of income
and for each key, write the content to the respective sheet name.
Here is the generated file:
You can see that the Excel file has three different sheets named Group1
, Group2
, and Group3
. Each of these sheets contains names of employees and their salaries with respect to the date in the three different dataframes in our code.
The engine parameter in the to_excel()
function is used to specify which underlying module is used by the Pandas library to create the Excel file. In our case, the xlsxwriter
module is used as the engine for the ExcelWriter
class. Different engines can be specified depending on their respective features.
Depending upon the Python modules installed on your system, the other options for the engine attribute are: openpyxl
(for xlsx
and xlsm
), and xlwt
(for xls
).
Further details of using the xlsxwriter
module with Pandas library are available at the official documentation.
Last but not least, in the code above we have to explicitly save the file using writer.save()
, otherwise it won’t be persisted on the disk.
Reading Excel Files with Pandas
In contrast to writing DataFrame
objects to an Excel file, we can do the opposite by reading Excel files into DataFrame
s. Packing the contents of an Excel file into a DataFrame
is as easy as calling the read_excel()
function:
students_grades = pd.read_excel('./grades.xlsx')
students_grades.head()
For this example, we’re reading this Excel file.
Here, the only required argument is the path to the Excel file. The contents are read and packed into a DataFrame
, which we can then preview via the head()
function.
Note: Using this method, although the simplest one, will only read the first sheet.
Let’s take a look at the output of the head()
function:
Pandas assigns a row label or numeric index to the DataFrame
by default when we use the read_excel()
function.
We can override the default index by passing one of the columns in the Excel file as the index_col
parameter:
students_grades = pd.read_excel('./grades.xlsx', sheet_name='Grades', index_col='Grade')
students_grades.head()
Running this code will result in:
In the example above, we have replaced the default index with the «Grade» column from the Excel file. However, you should only override the default index if you have a column with values that could serve as a better index.
Reading Specific Columns from an Excel File
Reading a file in its entirety is useful, though in many cases, you’d really want to access a certain element. For example, you might want to read the element’s value and assign it to a field of an object.
Again, this is done using the read_excel()
function, though, we’ll be passing the usecols
parameter. For example, we can limit the function to only read certain columns. Let’s add the parameter so that we read the columns that correspond to the «Student Name», «Grade» and «Marks Obtained» values.
We do this by specifying the numeric index of each column:
cols = [0, 1, 3]
students_grades = pd.read_excel('./grades.xlsx', usecols=cols)
students_grades.head()
Running this code will yield:
As you can see, we are only retrieving the columns specified in the cols
list.
Conclusion
We’ve covered some general usage of the read_excel()
and to_excel()
functions of the Pandas library. With them, we’ve read existing Excel files and written our own data to them.
Using various parameters, we can alter the behavior of these functions, allowing us to build customized files, rather than just dumping everything from a DataFrame
.
Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article
In this article, we will see how to export different DataFrames to different excel sheets using python.
Pandas provide a function called xlsxwriter for this purpose. ExcelWriter() is a class that allows you to write DataFrame objects into Microsoft Excel sheets. Text, numbers, strings, and formulas can all be written using ExcelWriter(). It can also be used on several worksheets.
Syntax:
pandas.ExcelWriter(path, date_format=None, mode=’w’)
Parameter:
- path: (str) Path to xls or xlsx or ods file.
- date_format: Format string for dates written into Excel files (e.g. ‘YYYY-MM-DD’). str, default None
- mode: {‘w’, ‘a’}, default ‘w’. File mode to use (write or append). Append does not work with fsspec URLs.
The to_excel() method is used to export the DataFrame to the excel file. To write a single object to the excel file, we have to specify the target file name. If we want to write to multiple sheets, we need to create an ExcelWriter object with target filename and also need to specify the sheet in the file in which we have to write. The multiple sheets can also be written by specifying the unique sheet_name. It is necessary to save the changes for all the data written to the file.
Syntax:
DataFrame.to_excel(excel_writer, sheet_name=’Sheet1′,index=True)
Parameter:
- excel_writer: path-like, file-like, or ExcelWriter object (new or existing)
- sheet_name: (str, default ‘Sheet1’). Name of the sheet which will contain DataFrame.
- index: (bool, default True). Write row names (index).
Create some sample data frames using pandas.DataFrame function. Now, create a writer variable and specify the path in which you wish to store the excel file and the file name, inside the pandas excelwriter function.
Example: Write Pandas dataframe to multiple excel sheets
Python3
import
pandas as pd
data_frame1
=
pd.DataFrame({
'Fruits'
: [
'Appple'
,
'Banana'
,
'Mango'
,
'Dragon Fruit'
,
'Musk melon'
,
'grapes'
],
'Sales in kg'
: [
20
,
30
,
15
,
10
,
50
,
40
]})
data_frame2
=
pd.DataFrame({
'Vegetables'
: [
'tomato'
,
'Onion'
,
'ladies finger'
,
'beans'
,
'bedroot'
,
'carrot'
],
'Sales in kg'
: [
200
,
310
,
115
,
110
,
55
,
45
]})
data_frame3
=
pd.DataFrame({
'Baked Items'
: [
'Cakes'
,
'biscuits'
,
'muffins'
,
'Rusk'
,
'puffs'
,
'cupcakes'
],
'Sales in kg'
: [
120
,
130
,
159
,
310
,
150
,
140
]})
print
(data_frame1)
print
(data_frame2)
print
(data_frame3)
with pd.ExcelWriter(
"path to filefilename.xlsx"
) as writer:
data_frame1.to_excel(writer, sheet_name
=
"Fruits"
, index
=
False
)
data_frame2.to_excel(writer, sheet_name
=
"Vegetables"
, index
=
False
)
data_frame3.to_excel(writer, sheet_name
=
"Baked Items"
, index
=
False
)
Output:
The output showing the excel file with different sheets got saved in the specified location.
Example 2: Another method to store the dataframe in an existing excel file using excelwriter is shown below,
Create dataframe(s) and Append them to the existing excel file shown above using mode= ‘a’ (meaning append) in the excelwriter function. Using mode ‘a’ will add the new sheet as the last sheet in the existing excel file.
Python3
import
pandas as pd
data_frame1
=
pd.DataFrame({
'Fruits'
: [
'Appple'
,
'Banana'
,
'Mango'
,
'Dragon Fruit'
,
'Musk melon'
,
'grapes'
],
'Sales in kg'
: [
20
,
30
,
15
,
10
,
50
,
40
]})
data_frame2
=
pd.DataFrame({
'Vegetables'
: [
'tomato'
,
'Onion'
,
'ladies finger'
,
'beans'
,
'bedroot'
,
'carrot'
],
'Sales in kg'
: [
200
,
310
,
115
,
110
,
55
,
45
]})
data_frame3
=
pd.DataFrame({
'Baked Items'
: [
'Cakes'
,
'biscuits'
,
'muffins'
,
'Rusk'
,
'puffs'
,
'cupcakes'
],
'Sales in kg'
: [
120
,
130
,
159
,
310
,
150
,
140
]})
data_frame4
=
pd.DataFrame({
'Cool drinks'
: [
'Pepsi'
,
'Coca-cola'
,
'Fanta'
,
'Miranda'
,
'7up'
,
'Sprite'
],
'Sales in count'
: [
1209
,
1230
,
1359
,
3310
,
2150
,
1402
]})
with pd.ExcelWriter(
"path_to_file.xlsx"
, mode
=
"a"
, engine
=
"openpyxl"
) as writer:
data_frame4.to_excel(writer, sheet_name
=
"Cool drinks"
)
Output:
Writing Large Pandas DataFrame to excel file in a zipped format.
If the output dataframe is large, you can also store the excel file as a zipped file. Let’s save the dataframe which we created for this example. as excel and store it as a zip file. The ZIP file format is a common archive and compression standard.
Syntax:
ZipFile(file, mode=’r’)
Parameter:
- file: the file can be a path to a file (a string), a file-like object, or a path-like object.
- mode: The mode parameter should be ‘r’ to read an existing file, ‘w’ to truncate and write a new file, ‘a’ to append to an existing file, or ‘x’ to exclusively create and write a new file.
Import the zipfile package and create sample dataframes. Now, specify the path in which the zip file has to be stored, This creates a zip file in the specified path. Create a file name in which the excel file has to be stored. Use to_excel() function and specify the sheet name and index to store the dataframe in multiple sheets
Example: Write large dataframes in ZIP format
Python3
import
zipfile
import
pandas as pd
data_frame1
=
pd.DataFrame({
'Fruits'
: [
'Appple'
,
'Banana'
,
'Mango'
,
'Dragon Fruit'
,
'Musk melon'
,
'grapes'
],
'Sales in kg'
: [
20
,
30
,
15
,
10
,
50
,
40
]})
data_frame2
=
pd.DataFrame({
'Vegetables'
: [
'tomato'
,
'Onion'
,
'ladies finger'
,
'beans'
,
'bedroot'
,
'carrot'
],
'Sales in kg'
: [
200
,
310
,
115
,
110
,
55
,
45
]})
data_frame3
=
pd.DataFrame({
'Baked Items'
: [
'Cakes'
,
'biscuits'
,
'muffins'
,
'Rusk'
,
'puffs'
,
'cupcakes'
],
'Sales in kg'
: [
120
,
130
,
159
,
310
,
150
,
140
]})
data_frame4
=
pd.DataFrame({
'Cool drinks'
: [
'Pepsi'
,
'Coca-cola'
,
'Fanta'
,
'Miranda'
,
'7up'
,
'Sprite'
],
'Sales in count'
: [
1209
,
1230
,
1359
,
3310
,
2150
,
1402
]})
with zipfile.ZipFile(
"path_to_file.zip"
,
"w"
) as zf:
with zf.
open
(
"filename.xlsx"
,
"w"
) as
buffer
:
with pd.ExcelWriter(
buffer
) as writer:
data_frame1.to_excel(writer, sheet_name
=
"Fruits"
, index
=
False
)
data_frame2.to_excel(writer, sheet_name
=
"Vegetables"
, index
=
False
)
data_frame3.to_excel(writer, sheet_name
=
"Baked Items"
, index
=
False
)
data_frame4.to_excel(writer, sheet_name
=
"Cool Drinks"
, index
=
False
)
Output:
Sample output of zipped excel file
Like Article
Save Article