File is not a recognized excel file python

I had a similar issue, where I had to read and combine a bunch of .xls files in a folder into one single dataframe. Turns out the error arose because .txt files were forcibly saved as .xls files. This also generated an error in excel upon attempting to open the file, which said

«The file format and extension of ‘filename.xls’ don’t match. The file
could be corrupted or unsafe. Unless you trust its source, don’t open
it. Do you want to open it anyway?»

Doing the following resolved it for me:

import glob 
import os 
import pandas as pd

path = r'C:tmp' ## use your path

all_files = glob.glob(os.path.join(path, "*.xls"))
df_from_each_file = (pd.read_csv(f, delimiter = "t") for f in all_files) ## reading the files using csv reader with tab delimiter
df1   = pd.concat(df_from_each_file, ignore_index=True)  ## concatenating all the individual files

If pd.read_csv doesn’t work, you can also experimentally attempt to check which file reader on python is able to read your original file format.

P.S: Edited based on the comment from Yona

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

df_product = pd.read_excel("tmp/Presentaciones.xls")

Problem description

I got an error when I tried to open a product.xls with pd.read_excel («NDC database file — Excel version (zip format)» downloaded from https://www.fda.gov/drugs/drug-approvals-and-databases/national-drug-code-directory), I tried different engine, but i always got an error

pandas version: 1.2.4
xlrd version: 2.0.1
openpyxl version: 3.0.7

Expected Output

Output of pd.read_excel("tmp/Presentaciones.xls")

Traceback (most recent call last):
File «», line 1, in
File «/usr/local/lib/python3.9/site-packages/pandas/util/_decorators.py», line 299, in wrapper
return func(*args, **kwargs)
File «/usr/local/lib/python3.9/site-packages/pandas/io/excel/_base.py», line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File «/usr/local/lib/python3.9/site-packages/pandas/io/excel/_base.py», line 1071, in init
ext = inspect_excel_format(
File «/usr/local/lib/python3.9/site-packages/pandas/io/excel/_base.py», line 965, in inspect_excel_format
raise ValueError(«File is not a recognized excel file»)
ValueError: File is not a recognized excel file

Python reads the Excel class file error

I am incorrect when Pandas reads XLSX files.

ValueError: File is not a recognized excel file
ValueError: The file is not a recognizable Excel file

The reason for this error is the problem of the XLSX file itself, which may be WPS or Excel save format errors unrecognizable.

Then re-produce XLSX files and save it complete

ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.
ImportError: Lack of optional dependencies "XLRD". Install XLRD> = 1.0.0 for Excel supports XLRD using PIP or CONDA. 

There have been new problems here, lacking XLRD, PIP installation XLRD module

pip install xlrd

If you install a version of the XLRD module below 2.0, the read problem should have been resolved.
But if the installed XLRD version exceeds 2.0, there will still be wrong

ValueError: Your version of xlrd is 2.0.1. In xlrd >= 2.0, only the xls format is supported. Install openpyxl instead.
ValueError: Your XLRD version is 2.0.1. In XLRD> = 2.0, only XLS formats are supported. Change to Install OpenPYXL.

The problem here is due to the XLRD2.0 or higher version only supports XLS format Excel, PIP installed OpenPYXL module

pip install openpyxl

import xlrd
import openpyxl

After the introduction module, you can use a variety of ways to read the Excel file. Of course, you can also do not import modules, you only need to add both modules to the external library.
For example, in the Pycharm settings in the Project: Interpreter option, you can see that there is two modules.

Я хочу для анализа читабельности с текстовыми данными, хранящимися в файле Excel. Часть кода, которую я адаптировал ниже:

import time, datetime     
import pandas as pd     
from textstat.textstat import textstat    
from openpyxl import load_workbook    

ExcelFile = 'Readability.xlsx'
Sheet = 'Raw Data'
Field_ID = 0 

book = load_workbook(ExcelFile)
writer = pd.ExcelWriter(ExcelFile, engine='openpyxl')
writer.book = book
df = pd.read_excel(ExcelFile, sheet_name=Sheet)

После запуска я получаю следующую ошибку:

Traceback (most recent call last):
  File "\fileUsersR$rtf13HomeDesktopreadability_using_textstat.py", line 19, in <module>
    df = pd.read_excel(ExcelFile, sheet_name=Sheet)
  File "C:Python39libsite-packagespandasutil_decorators.py", line 299, in wrapper
    return func(*args, **kwargs)
  File "C:Python39libsite-packagespandasioexcel_base.py", line 336, in read_excel
    io = ExcelFile(io, storage_options=storage_options, engine=engine)
  File "C:Python39libsite-packagespandasioexcel_base.py", line 1071, in __init__
    ext = inspect_excel_format(
  File "C:Python39libsite-packagespandasioexcel_base.py", line 965, in inspect_excel_format
    raise ValueError("File is not a recognized excel file")
ValueError: File is not a recognized excel file

Кроме того, файл Excel заканчивается поврежден после выполнения кода. Я использую Pandas 1.2.4, opennyxl 3.0.7 и использовал XLRD 1.2.0 (из-за более поздних версий, не работающих с файлами .xlsx). Любой совет приветствуется. Спасибо.

1 ответ

Лучший ответ

pd.ExcelWriter используется для написания объектов pd.DataFrame.

df = pd.read_excel(ExcelFile) 
with ExcelWriter(ExcelFile , mode='a') as writer:
    df.to_excel(writer, sheet_name=Sheet)


Обновить

Попробуйте так:

df = pd.read_excel(r'<path_to_file>Readability.xlsx', engine='openpyxl') # UPDATED: pip install openpyxl

with ExcelWriter(ExcelFile , mode='a') as writer:
    df.to_excel(writer, sheet_name=Sheet)

Установите еще на Здесь


0

Anurag Dhadse
27 Май 2021 в 03:43

Problem:

When trying to read an .xlsx file using pandas pd.read_excel() you see this error message:

XLRDError: Excel xlsx file; not supported

Solution:

The xlrd library only supports .xls files, not .xlsx files. In order to make pandas able to read .xlsx files, install openpyxl:

sudo pip3 install openpyxl

After that, retry running your script (if you are running a Jupyter Notebook, be sure to restart the notebook to reload pandas!).

If the error still persists, you have two choices:

Choice 1 (preferred): Update pandas

Pandas 1.1.3 doesn’t automatically select the correct XLSX reader engine, but pandas 1.3.1 does:

sudo pip3 install --upgrade pandas

If you are running a Jupyter Notebook, be sure to restart the notebook to load the updated pandas version!

Choice 2: Explicitly set the engine in pd.read_excel()

Add engine='openpyxl' to your pd.read_excel() command, for example:

pd.read_excel('my.xlsx', engine='openpyxl')

Like this post? Please share to your friends:
  • File information in excel
  • File in use word document
  • File formats supported by excel
  • File formats for excel
  • File formats excel can open