I had a similar issue, where I had to read and combine a bunch of .xls files in a folder into one single dataframe. Turns out the error arose because .txt files were forcibly saved as .xls files. This also generated an error in excel upon attempting to open the file, which said
«The file format and extension of ‘filename.xls’ don’t match. The file
could be corrupted or unsafe. Unless you trust its source, don’t open
it. Do you want to open it anyway?»
Doing the following resolved it for me:
import glob
import os
import pandas as pd
path = r'C:tmp' ## use your path
all_files = glob.glob(os.path.join(path, "*.xls"))
df_from_each_file = (pd.read_csv(f, delimiter = "t") for f in all_files) ## reading the files using csv reader with tab delimiter
df1 = pd.concat(df_from_each_file, ignore_index=True) ## concatenating all the individual files
If pd.read_csv doesn’t work, you can also experimentally attempt to check which file reader on python is able to read your original file format.
P.S: Edited based on the comment from Yona
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
df_product = pd.read_excel("tmp/Presentaciones.xls")
Problem description
I got an error when I tried to open a product.xls with pd.read_excel («NDC database file — Excel version (zip format)» downloaded from https://www.fda.gov/drugs/drug-approvals-and-databases/national-drug-code-directory), I tried different engine, but i always got an error
pandas version: 1.2.4
xlrd version: 2.0.1
openpyxl version: 3.0.7
Expected Output
Output of pd.read_excel("tmp/Presentaciones.xls")
Traceback (most recent call last):
File «», line 1, in
File «/usr/local/lib/python3.9/site-packages/pandas/util/_decorators.py», line 299, in wrapper
return func(*args, **kwargs)
File «/usr/local/lib/python3.9/site-packages/pandas/io/excel/_base.py», line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File «/usr/local/lib/python3.9/site-packages/pandas/io/excel/_base.py», line 1071, in init
ext = inspect_excel_format(
File «/usr/local/lib/python3.9/site-packages/pandas/io/excel/_base.py», line 965, in inspect_excel_format
raise ValueError(«File is not a recognized excel file»)
ValueError: File is not a recognized excel file
Python reads the Excel class file error
I am incorrect when Pandas reads XLSX files.
ValueError: File is not a recognized excel file
ValueError: The file is not a recognizable Excel file
The reason for this error is the problem of the XLSX file itself, which may be WPS or Excel save format errors unrecognizable.
Then re-produce XLSX files and save it complete
ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.
ImportError: Lack of optional dependencies "XLRD". Install XLRD> = 1.0.0 for Excel supports XLRD using PIP or CONDA.
There have been new problems here, lacking XLRD, PIP installation XLRD module
pip install xlrd
If you install a version of the XLRD module below 2.0, the read problem should have been resolved.
But if the installed XLRD version exceeds 2.0, there will still be wrong
ValueError: Your version of xlrd is 2.0.1. In xlrd >= 2.0, only the xls format is supported. Install openpyxl instead.
ValueError: Your XLRD version is 2.0.1. In XLRD> = 2.0, only XLS formats are supported. Change to Install OpenPYXL.
The problem here is due to the XLRD2.0 or higher version only supports XLS format Excel, PIP installed OpenPYXL module
pip install openpyxl
import xlrd
import openpyxl
After the introduction module, you can use a variety of ways to read the Excel file. Of course, you can also do not import modules, you only need to add both modules to the external library.
For example, in the Pycharm settings in the Project: Interpreter option, you can see that there is two modules.
Я хочу для анализа читабельности с текстовыми данными, хранящимися в файле Excel. Часть кода, которую я адаптировал ниже:
import time, datetime
import pandas as pd
from textstat.textstat import textstat
from openpyxl import load_workbook
ExcelFile = 'Readability.xlsx'
Sheet = 'Raw Data'
Field_ID = 0
book = load_workbook(ExcelFile)
writer = pd.ExcelWriter(ExcelFile, engine='openpyxl')
writer.book = book
df = pd.read_excel(ExcelFile, sheet_name=Sheet)
После запуска я получаю следующую ошибку:
Traceback (most recent call last):
File "\fileUsersR$rtf13HomeDesktopreadability_using_textstat.py", line 19, in <module>
df = pd.read_excel(ExcelFile, sheet_name=Sheet)
File "C:Python39libsite-packagespandasutil_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "C:Python39libsite-packagespandasioexcel_base.py", line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "C:Python39libsite-packagespandasioexcel_base.py", line 1071, in __init__
ext = inspect_excel_format(
File "C:Python39libsite-packagespandasioexcel_base.py", line 965, in inspect_excel_format
raise ValueError("File is not a recognized excel file")
ValueError: File is not a recognized excel file
Кроме того, файл Excel заканчивается поврежден после выполнения кода. Я использую Pandas 1.2.4, opennyxl 3.0.7 и использовал XLRD 1.2.0 (из-за более поздних версий, не работающих с файлами .xlsx). Любой совет приветствуется. Спасибо.
1 ответ
Лучший ответ
pd.ExcelWriter
используется для написания объектов pd.DataFrame
.
df = pd.read_excel(ExcelFile)
with ExcelWriter(ExcelFile , mode='a') as writer:
df.to_excel(writer, sheet_name=Sheet)
Обновить
Попробуйте так:
df = pd.read_excel(r'<path_to_file>Readability.xlsx', engine='openpyxl') # UPDATED: pip install openpyxl
with ExcelWriter(ExcelFile , mode='a') as writer:
df.to_excel(writer, sheet_name=Sheet)
Установите еще на Здесь
0
Anurag Dhadse
27 Май 2021 в 03:43
Problem:
When trying to read an .xlsx
file using pandas pd.read_excel()
you see this error message:
XLRDError: Excel xlsx file; not supported
Solution:
The xlrd library only supports .xls
files, not .xlsx
files. In order to make pandas able to read .xlsx
files, install openpyxl
:
sudo pip3 install openpyxl
After that, retry running your script (if you are running a Jupyter Notebook, be sure to restart the notebook to reload pandas!).
If the error still persists, you have two choices:
Choice 1 (preferred): Update pandas
Pandas 1.1.3 doesn’t automatically select the correct XLSX reader engine, but pandas 1.3.1 does:
sudo pip3 install --upgrade pandas
If you are running a Jupyter Notebook, be sure to restart the notebook to load the updated pandas version!
Choice 2: Explicitly set the engine in pd.read_excel()
Add engine='openpyxl'
to your pd.read_excel()
command, for example:
pd.read_excel('my.xlsx', engine='openpyxl')