Advantages Of Excel’s Binary XLSB File Format
Excel files with lots of formulas, data and objects (e.g. shapes) can expand considerably in size. Large files take longer to open/save and occupy valuable space on discs or in mailboxes when attached to emails. Learn how to reduce the file size of Excel files, so that your workbooks load and save much faster.
Microsoft Office Excel 2007+ files can be ‘Saved As’ in several other formats besides the default XLSX format. The file formats that are available in the Save As dialog box may vary, depending on what type of sheet is active (e.g. a worksheet, chart sheet).
Save As Binary in Excel 2007
An Excel file saved as a Binary Workbook (XLSB extension) could be significantly smaller than the one saved as an Excel Workbook (.xlsx or .xlsm extensions).
Save As Binary in Excel 2010 or later
The XLSB Format Explained
For the technically minded, the XLSB file format, code named BIFF12, is a ZIP container based on the Open XML file specification. The key difference between XLSB and XLSX — XLSM is that file parts within the zipped package are compressed binary components (.bin) encoded in a proprietary format, instead of being readable XML code. Binary files are optimized for performance and can store anything you can create in Excel.
Facts & Myths About Binary Excel Workbooks
|
|
Smaller binary files open and save much faster than Excel workbooks (.xlsx or .xlsm)
Binary file compression offers potential cost savings, because it reduces the disk space required to store files and decreases the bandwidth needed to transport files through e-mail or networks.
Only Excel workbooks smaller than 5MB stored in a OneDrive folder will open in an Excel Web App.
- Binary files have the same RAM memory requirements as other Excel file formats, as compression decreases the space that the file occupies on the hard disk only.
- Excel Binary & macro-enabled workbooks may store VBA/macro code, a potential security concern. Macros cannot be stored in the default Excel .xlsx format.
- Save as Binary is available in Excel 2003, if the Excel 2007 compatibility pack has been installed along with SP3.
- Excel Binary files are encoded in a proprietary compressed file format and not in the open, standards-based XML file format (OpenXML).
- Myth: «I am not sure what is lost, but I know binary workbooks are more compact…»
No data is lost, when saving a workbook as a binary file format. Binary files use a better compression than the standard ZIP compression used in other Open XML file formats. - Myth: «You can then go from xlsb back to xlsx and the smaller size prevails…»
Wrong. Saving a binary file as XLSX or XLSM will result in a larger file size — ceteris paribus — (with all Excel features stored in the file being the same) - Myth: «My workbook doesn’t have macros, so I cannot SaveAs in binary format»
Wrong. You can save any workbook (with or without macros) in binary format.
You can protect macros in binary workbooks with Unviewable+ VBA. - Myth: «Binary workbooks are more prone to file corruption»
Over years, we have come across about an equal number of binary and non-binary corrupted workbooks. Therefore, the above claim seems to be a myth. Workbook format is not a contributing factor of file corruption. Read about Excel file corruption here - Myth: «Customized ribbon UIs are not available in XLSB files»
Wrong. Our 2048 game in Excel has a custom ribbon UI and is saved in a binary format. - Myth: «Binary workbooks cannot be opened in LibreOffice or OpenOffice»
False! Both suites support the XLSB file format for compatibility with Excel. - Myth: «Formulas, VLOOKUP in particular, recalculate much faster in binary files»
Excel data structures and algorithms residing in memory (RAM), including calculation, are not affected in any way by the file structure used, apart from Open and Save operations obviously.
For more details, please read:
- File formats that are supported in Excel
- Introducing the Office (2007) Open XML File Formats
Disadvantages Of Binary Files
Binary files cannot not be accessed by programs that understand the XSLX, XLSM file format only. We have come across several file previewing software, which cannot display Excel binary files. You need to save binary workbook in XLSX or XLSM file format. Use the latter if macros are present in the file.
|
However, applications that read Excel VBA directly, can access binary files without an issue, as VBA code is not stored in XML format! VBA code is not compressed in binary xlsb files. If you want to shrink your VBA code, use our VBA Code Cleaner powered by Ribbon Commander. |
Power Query cannot read data from binary XLSB workbooks. Excel shows an odd error after a long delay, if you try to create a Powery Query from a binary workbook.
|
|
Test If A Binary File Has Macros Without Opening It
There is no macro-free binary file format. To fiind out if a XLSB file has a VBA project before opening it, use our Macro Mover add-in bundled for free with the Ribbon Commander framework. The Macro Mover add-in can detect macros in closed workbooks.
The Excel Personal Workbook
The personal macro file, if present, is opened as a hidden workbook every time you start Excel. The personal workbook is saved by default in binary format (Personal.xlsb) in order speed-up opening.
How To Compress VBA In Binary XLSB Files
Unfortunately, VBA code doesn’t properly clean-up after itself, so lots of junk gets left behind in performance caches during editing of VBA projects. Not only does this increase the file size, but it may lead to odd behaviour of your program at runtime or slow loading.
The Ribbon Commander VBA Cleaner removes these redundant caches from Office macro-enabled files. Read more here.
Last Update: Jan 03, 2023
This is a question our experts keep getting from time to time. Now, we have got the complete detailed explanation and answer for everyone, who is interested!
Asked by: Murphy Rath DDS
Score: 4.4/5
(41 votes)
An XLSB file is an Excel Binary Workbook file. They store information in binary format instead of XML like with most other Excel files (like XLSX). Since XLSB files are binary, they can be read from and written to much faster, making them extremely useful for very large spreadsheets.
What is the difference between XLSB and XLSX?
Since the xlsb files in the Excel workbook are Binary, they can be read and written a bit faster, making them useful for the larger spreadsheets. … It is believed that in Excel, the xlsx generally loads 4 times longer when compared to the xlsb, and saves 2 times slower followed by 1.5 times a bigger file.
How do I convert an Excel file to binary?
How to Create an Binary Workbook (XLSB) File with Microsoft Excel
- Click on the “Home” button on the Excel Ribbon menu.
- Proceed to select “Save As” and click on “Browse”.
- Once you are at the “Save As” browsing menu, click on the “Save as type” drop down menu and select “Excel Binary Workbook”.
What is Excel binary workbook disadvantages?
The main disadvantage: Binary Excel files can contain VBA macros. So unless you don’t know the origin of a file, please consider well before opening them. Besides that: All the other disadvantages seem minor. Smaller file size.
Is XLSB faster than Xlsx?
Do XLSB runs formulas faster than XLSX? XLSB file are only loaded and unloaded faster (saved and closed) than XLSX files. Afterwards both formats run in RAM memory with similar performance on the same Excel engine. Hence, you won’t see your Excel formula’s running significantly faster.
15 related questions found
Can you convert XLSB to XLSX?
How to convert XLSB to XLSX. Click inside the file drop area to upload XLSB file or drag & drop XLSB file. Click on Convert button. Your XLSB files will be uploaded and converted to XLSX result format.
Which is better XLS or XLSX?
The underlying file format is what makes the main difference between the XLS and XLSX files. … Data is arranged in an XLS file as binary streams in the form of a compound file as described in [MS-XLS]. In contrast, an XLSX file is based on Office Open XML format that stores data in compressed XML files in ZIP format.
Why is Excel file so big?
Excel has a “used range” for every sheet in your workbook. The larger this is, the bigger the file size becomes. … Especially in older files, even if cells are blank and have no formatting, Excel may be treating them inside the used range, leading to a larger file size for no reason.
How do I open a binary worksheet in Excel?
However, Excel is the best option for opening XLSB files because it fully supports the formatting of Excel spreadsheets, which may include graphs and the spacing of data fields. To open an XLSB file with Excel, select File → Open → Browse, then choose the XLSB file you would like to open.
What does binary mean in Excel Solver?
A binary constraint is one in which the variable must equal either 0 or 1. To specify a binary constraint, use the Cell Reference box to identify the variable cell that must be binary and then select the bin operator from the unnamed drop-down list box.
What is the smallest Excel file type?
As a general rule, if your file size is small (less than 5 MB), it’s better to stick to XLSX/XLSM formats. Based on what I heard from people and read on many forums, a lot of people prefer to use XLSB as the file format when it comes to using Excel.
How do I change the file type in Excel?
You can change the file type that is used by default when you save a workbook.
- Click the Microsoft Office Button. , and then click Excel Options.
- In the Save category, under Save workbooks, in the Save files in this format box, click the file format that you want to use by default.
What does XLSB mean in Excel?
Unlike XLSX (which is based on Open XML file format), the XLSB represents binary Excel workbook file. XLSB files can be read and written to faster which makes them useful for working with large files.
What does XLSM mean in Excel?
A file with the XLSM file extension is an Excel Macro-Enabled Workbook file created in Excel 2007 or newer.
Does Excel have unlimited rows?
In the latest version of excel, a maximum of 1,048,576 rows by 16,384 columns are available.
Does pivot table Increase Size?
Pivot Table creates a copy of the source data and saves it in the file. This increases the file size and also slows down the open / close operations. You can ask Excel NOT to save the copy of the data and save on file size.
How can I reduce my Excel file size?
Ways to Reduce Excel File Size
- Remove unnecessary worksheets, data, and formulas. The number of worksheets and the amount of data contained in an Excel file are directly related to the size of the file. …
- Remove formatting. …
- Remove Pivot Cache. …
- Save in binary format (. …
- Compress the file.
Why is Excel so slow?
The biggest reason for slow Excel files are formulas that take too long to calculate. So the first tip you can use is to ‘press pause’ on any calculations! … This stops formulas being recalculated after every edit you make. When it’s set to Manual, formulas won’t re-calculate unless you edit an individual cell directly.
Who found MS Excel?
The electronic spreadsheet was essentially invented in 1979 by software pioneer Dan Bricklin, who started up Software Arts with Bob Frankston and created VisiCalc.
What is the most common Excel format?
XLS – Excel file extension
This extension is the most common and default type in the spreadsheet generated by Microsoft office.
How do I convert XLS to XLSX?
Click on the Office button, then Convert. You can also try opening the . xls file in Excel > then use the Save As command to save as an «Excel Workbook». Once you select that file type, the file extension will be xlsx and you can delete the old xls version.
How do I convert XLSX to XLS?
To get started, kindly perform the following steps:
- Open File Explorer.
- Go to the View Tab.
- Tick File name extensions under Show/Hide.
- Go to the folder where you Excel file is stored.
- Right-click on the Excel File.
- Select Rename.
- Rename «. XLSX» to «. XLS».
- Click Enter once done.
Do you have any idea what an XLSB file format is?
It is basically an Excel binary book file format that helps in storing data in binary format rather than XML format often used in Excel files.
XLSB files are encoded in binary form that’s why these files can be read and written quicker than other files. Therefore, when you have large files, this binary function helps you a lot. XLSB files are capable enough to store data just the way other Excel workbook formats do. Each workbook has many worksheets and each worksheet has lots of columns and rows containing a huge range of cells, in which you apply formulas.
How You can Open an XLSB File
Microsoft Excel version 2007 and updated are designed to open and edit XLSB files. On the other hand, if your Excel version is not updated, you have to use third-party tools to open, edit and save these files. Or else, you can even use WPS Office Spreadsheet to open an XLSB file. With the Excel Viewer option of Microsoft, you can smoothly open an XLSB file, and not only this; you can even print this file the way you want to.
Keep in mind that third-party tools only help you in performing a certain task; however, these tools are not designed to get back your original data once you have made changes in the file. ZIP compression is used to store the XLSB files, that’s why you need a free file zip/unzip tool to open up the file. Again, you have to bear in mind that you will not be able to read or edit the file just as you need it to.
How to Convert an XLSB File
In Excel binary workbook, the quickest way to convert an XLSB file is by opening the file in a program to save it in the system using another format. Some of the highly supported formats include XLS, PDF, CSV, XLSX, XLSM, and TXT.
Key Points to Consider when Working with Large Excel Files
The XLSB file format is not a bad option to opt for when you have larger datasets. Below are some necessary points to consider when you are about to deal with large Excel files.
You can save data files without formatting because formatting needs much storage space. In case, if you are dealing with a dataset that does not need formatting, you can save the file in .xml format or .csv format.
The file size can be reduced easily once you have deleted unused cells without wasting time.
When dealing with large Excel files, you may have to deactivate the automatic calculations because it can often lead to unwanted crashes or freeze to Excel. It could be because of many reasons and the main thing noticed here is the large number of formulas applied already. Each formula has to recalculate the time to complete the function. So, that’s why you have to turn off auto-calculation options and do manual problem-solving.
PowerQuery Add-In is basically used to manage Big Data as well as intricate data. Instead of being the number one tool, Excel has its boundaries when it comes to dealing with large datasets. When using PowerQuery, you can handle such types of datasets easily, moreover, you will not have to face limitations because of having large datasets.
XLSB Format Pros and Cons
Where a tool offers benefits, you always have to face some limitations as well. The same is the case with Excel formats. Below you will find some benefits besides drawbacks of the XLSB format.
Pros
- You can use VBA coding macros with XLSB format files.
- It helps in smaller file sizes.
- You can apply formulas to XLSB files having greater than the standard 8192-character limit.
- XLSB files are eligible to read and write faster than other files.
Cons
- XLSB files are compatible with Excel 2007 and updated versions only.
- You are restricted to make changes in the Excel Ribbon menu while working on an XLSB file.
- These file formats could be unsupported with software or services that need XML format instead of binary data.
Final Thoughts
Excel has always been a brand name when it comes to finding a tool for analytical and mathematical queries. However, you have to face some limitations when working with large datasets because each file has a different file format that’s why it could be unsupported by the tool. Excel binary workbook needs an updated version of Microsoft Excel, otherwise, you will not be able to read or edit the file.
Содержание
- File formats that are supported in Excel
- Excel file formats
- Text file formats
- Other file formats
- File formats that use the Clipboard
- File formats that are not supported in Excel
- File formats that are not supported in Excel Starter
- Opening or viewing unsupported file formats
- Need more help?
- XLSX vs XLSB: How To Save A Workbook In Binary Format
- Advantages Of Excel’s Binary XLSB File Format
- The XLSB Format Explained
- Facts & Myths About Binary Excel Workbooks
- [MS-XLS]: Excel Binary File Format (.xls) Structure
- Published Version
- Previous Versions
- Preview Versions
- Development Resources
- Intellectual Property Rights Notice for Open Specifications Documentation
File formats that are supported in Excel
You can save an Excel file in another file format by clicking the File > Save As. The file formats that are available in the Save As dialog box vary, depending on what type of sheet is active (a worksheet, chart sheet, or other type of sheet).
Note: Whenever you save a file in another file format, some of its formatting, data, and features might not be transferred.
To open a file that was created in another file format, either in an earlier version of Excel or in another program, click File > Open. If you open an Excel 97-2003 workbook, it automatically opens in Compatibility Mode. To take advantage of the new features of Excel 2010, you can save the workbook to an Excel 2010 file format. However, you also have the option to continue to work in Compatibility Mode, which retains the original file format for backward compatibility.
Excel file formats
The default XML-based file format for Excel 2010 and Excel 2007. Cannot store Microsoft Visual Basic for Applications (VBA) macro code or Microsoft Office Excel 4.0 macro sheets (.xlm).
Excel Macro-Enabled Workbook (code)
The XML-based and macro-enabled file format for Excel 2016, Excel 2013, Excel 2010, and Excel 2007. Stores VBA macro code or Excel 4.0 macro sheets (.xlm).
Excel Binary Workbook
The binary file format (BIFF12) for Excel 2010 and Excel 2007.
The default file format for an Excel template for Excel 2010 and Excel 2007. Cannot store VBA macro code or Excel 4.0 macro sheets (.xlm).
The macro-enabled file format for an Excel template Excel 2010 and Excel 2007. Stores VBA macro code or Excel 4.0 macro sheets (.xlm).
Excel 97- Excel 2003 Workbook
The Excel 97 — Excel 2003 Binary file format (BIFF8).
Excel 97- Excel 2003 Template
The Excel 97 — Excel 2003 Binary file format (BIFF8) for an Excel template.
Microsoft Excel 5.0/95 Workbook
The Excel 5.0/95 Binary file format (BIFF5).
XML Spreadsheet 2003
XML Spreadsheet 2003 file format (XMLSS).
XML Data format.
The XML-based and macro-enabled Add-In format for Excel 2010 and Excel 2007. An Add-In is a supplemental program that is designed to run additional code. Supports the use of VBA projects and Excel 4.0 macro sheets (.xlm).
Excel 97-2003 Add-In
The Excel 97-2003 Add-In, a supplemental program that is designed to run additional code. Supports the use of VBA projects.
Excel 4.0 Workbook
An Excel 4.0 file format that saves only worksheets, chart sheets, and macro sheets. You can open a workbook in this file format in Excel 2010, but you cannot save an Excel file to this file format.
Works 6.0-9.0 spreadsheet
Spreadsheet saved in Microsoft Works 6.0-9.0.
Note: This format is supported in Excel Starter only.
Text file formats
Formatted Text (Space-delimited)
Lotus space-delimited format. Saves only the active sheet.
Saves a workbook as a tab-delimited text file for use on another Microsoft Windows operating system, and ensures that tab characters, line breaks, and other characters are interpreted correctly. Saves only the active sheet.
Saves a workbook as a tab-delimited text file for use on the Macintosh operating system, and ensures that tab characters, line breaks, and other characters are interpreted correctly. Saves only the active sheet.
Saves a workbook as a tab-delimited text file for use on the MS-DOS operating system, and ensures that tab characters, line breaks, and other characters are interpreted correctly. Saves only the active sheet.
Saves a workbook as Unicode text, a character encoding standard that was developed by the Unicode Consortium.
CSV (comma delimited)
Saves a workbook as a comma-delimited text file for use on another Windows operating system, and ensures that tab characters, line breaks, and other characters are interpreted correctly. Saves only the active sheet.
Saves a workbook as a comma-delimited text file for use on the Macintosh operating system, and ensures that tab characters, line breaks, and other characters are interpreted correctly. Saves only the active sheet.
Saves a workbook as a comma-delimited text file for use on the MS-DOS operating system, and ensures that tab characters, line breaks, and other characters are interpreted correctly. Saves only the active sheet.
Data Interchange Format. Saves only the active sheet.
Symbolic Link Format. Saves only the active sheet.
Note: If you save a workbook in any text format, all formatting is lost.
Other file formats
dBase III and IV. You can open these files formats in Excel, but you cannot save an Excel file to dBase format.
OpenDocument Spreadsheet. You can save Excel 2010 files so they can be opened in spreadsheet applications that use the OpenDocument Spreadsheet format, such as Google Docs and OpenOffice.org Calc. You can also open spreadsheets in the .ods format in Excel 2010. Formatting might be lost when saving and opening .ods files.
Portable Document Format (PDF). This file format preserves document formatting and enables file sharing. When the PDF format file is viewed online or printed, it retains the format that you intended. Data in the file cannot be easily changed. The PDF format is also useful for documents that will be reproduced by using commercial printing methods.
Note: This format is not supported in Excel 2007.
XML Paper Specification (XPS). This file format preserves document formatting and enables file sharing. When the XPS file is viewed online or printed, it retains exactly the format that you intended, and the data in the file cannot be easily changed.
Note: This format is not supported in Excel 2007.
File formats that use the Clipboard
You can paste data from the Microsoft Office Clipboard into Excel by using the Paste or Paste Special command ( Home tab, Clipboard group, Paste button) if the Office Clipboard data is in one of the following formats.
Clipboard type identifiers
Pictures in Windows Metafile Format (WMF) or Windows Enhanced Metafile Format (EMF).
Note If you copy a Windows metafile picture from another program, Excel pastes the picture as an enhanced metafile.
Pictures stored in Bitmap format (BMP).
Microsoft Excel file formats
Binary file formats for Excel versions 5.0/95 (BIFF5), Excel 97-2003 (BIFF8), and Excel 2010 (BIFF12).
Symbolic Link Format.
Data Interchange Format.
Tab-separated text format.
Comma-separated values format.
Formatted text (Space-delimited)
Rich Text Format (RTF). Only from Excel.
.gif, .jpg, .doc, .xls, or .bmp
Microsoft Excel objects, objects from properly registered programs that support OLE 2.0 (OwnerLink), and Picture or another presentation format.
.gif, .jpg, .doc, .xls, or .bmp
OwnerLink, ObjectLink, Link, Picture, or other format.
Office drawing object
Office drawing object format or Picture (Windows enhanced metafile format, EMF).
Display Text, OEM Text.
Single File Web Page
Single File Web Page (MHT or MHTML). This file format integrates inline graphics, applets, linked documents, and other supporting items referenced in the document.
Note: This format is not supported in Excel 2007.
Hypertext Markup Language (HTML).
Note: When you copy text from another program, Excel pastes the text in HTML format, regardless of the format of the original text.
File formats that are not supported in Excel
The following file formats are no longer supported in Excel 2016, Excel 2013, Excel 2010, Excel Starter, and Excel 2007. You cannot open or save files in these file formats.
Clipboard type identifiers
Excel 2.0, 3.0, and 2.x file formats
WK1, FMT, WK2, WK3, FM3, WK4
.wk1, .wk2, .wk3, .wk4, .wks
Lotus 1-2-3 file formats (all versions)
Microsoft Works file format (all versions)
DBASE II file format
Quattro Pro for MS-DOS file format
Quattro Pro 5.0 and 7.0 for Windows.
File formats that are not supported in Excel Starter
Additionally, the following file formats are no longer supported in Excel Starter. You cannot open or save files in these file formats.
Excel 97-2003 Add-In
Data source name
Access MDE database
Office Data Connection
Opening or viewing unsupported file formats
If a file format that you want to use is not supported in Excel, you can try the following:
Search the Internet for a company that makes file format converters for file formats that are not supported in Excel.
Save to a file format that another program supports and then export from that program into a file format that Excel supports.
Need more help?
You can always ask an expert in the Excel Tech Community or get support in the Answers community.
Источник
XLSX vs XLSB: How To Save A Workbook In Binary Format
Advantages Of Excel’s Binary XLSB File Format
Excel files with lots of formulas, data and objects (e.g. shapes) can expand considerably in size. Large files take longer to open/save and occupy valuable space on discs or in mailboxes when attached to emails. Learn how to reduce the file size of Excel files, so that your workbooks load and save much faster.
Microsoft Office Excel 2007+ files can be ‘Saved As’ in several other formats besides the default XLSX format. The file formats that are available in the Save As dialog box may vary, depending on what type of sheet is active (e.g. a worksheet, chart sheet).
The XLSB Format Explained
Facts & Myths About Binary Excel Workbooks
Smaller binary files open and save much faster than Excel workbooks (.xlsx or .xlsm)
Binary file compression offers potential cost savings, because it reduces the disk space required to store files and decreases the bandwidth needed to transport files through e-mail or networks.
Only Excel workbooks smaller than 5MB stored in a OneDrive folder will open in an Excel Web App.
- Binary files have the same RAM memory requirements as other Excel file formats, as compression decreases the space that the file occupies on the hard disk only.
Excel Binary & macro-enabled workbooks may store VBA/macro code, a potential security concern. Macros cannot be stored in the default Excel .xlsx format.
Save as Binary is available in Excel 2003, if the Excel 2007 compatibility pack has been installed along with SP3.
- Myth:«I am not sure what is lost, but I know binary workbooks are more compact. «
No data is lost, when saving a workbook as a binary file format. Binary files use a better compression than the standard ZIP compression used in other Open XML file formats.
Myth: » You can then go from xlsb back to xlsx and the smaller size prevails. «
Wrong. Saving a binary file as XLSX or XLSM will result in a larger file size — ceteris paribus — (with all Excel features stored in the file being the same)
Myth: «My workbook doesn’t have macros, so I cannot SaveAs in binary format»
Wrong. You can save any workbook (with or without macros) in binary format.
You can protect macros in binary workbooks with Unviewable+ VBA.
Myth: «Binary workbooks are more prone to file corruption»
Over years, we have come across about an equal number of binary and non-binary corrupted workbooks. Therefore, the above claim seems to be a myth. Workbook format is not a contributing factor of file corruption. Read about Excel file corruption here
Wrong. Our 2048 game in Excel has a custom ribbon UI and is saved in a binary format.
False! Both suites support the XLSB file format for compatibility with Excel.
Myth:«Formulas, VLOOKUP in particular, recalculate much faster in binary files»
Excel data structures and algorithms residing in memory (RAM), including calculation, are not affected in any way by the file structure used, apart from Open and Save operations obviously.
Источник
[MS-XLS]: Excel Binary File Format (.xls) Structure
Specifies the Excel Binary File Format (.xls) Structure, which is the binary file format used by Microsoft Excel 97, Microsoft Excel 2000, Microsoft Excel 2002, and Microsoft Office Excel 2003.
This page and associated content may be updated frequently. We recommend you subscribe to the RSS feed to receive update notifications.
Published Version
Previous Versions
Preview Versions
From time to time, Microsoft may publish a preview, or pre-release, version of an Open Specifications technical document for community review and feedback. To submit feedback for a preview version of a technical document, please follow any instructions specified for that document. If no instructions are indicated for the document, please provide feedback by using the Open Specification Forums .
The preview period for a technical document varies. Additionally, not every technical document will be published for preview.
A preview version of this document may be available on the Office File Formats — Preview Documents page. After the preview period, the most current version of the document is available on this page.
Development Resources
Find resources for creating interoperable solutions for Microsoft software, services, hardware, and non-Microsoft products:
Intellectual Property Rights Notice for Open Specifications Documentation
Technical Documentation. Microsoft publishes Open Specifications documentation (“this documentation”) for protocols, file formats, data portability, computer languages, and standards support. Additionally, overview documents cover inter-protocol relationships and interactions.
Copyrights. This documentation is covered by Microsoft copyrights. Regardless of any other terms that are contained in the terms of use for the Microsoft website that hosts this documentation, you can make copies of it in order to develop implementations of the technologies that are described in this documentation and can distribute portions of it in your implementations that use these technologies or in your documentation as necessary to properly document the implementation. You can also distribute in your implementation, with or without modification, any schemas, IDLs, or code samples that are included in the documentation. This permission also applies to any documents that are referenced in the Open Specifications documentation.
No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.
Patents. Microsoft has patents that might cover your implementations of the technologies described in the Open Specifications documentation. Neither this notice nor Microsoft’s delivery of this documentation grants any licenses under those patents or any other Microsoft patents. However, a given Open Specifications document might be covered by the Microsoft Open Specifications Promise or the Microsoft Community Promise . If you would prefer a written license, or if the technologies described in this documentation are not covered by the Open Specifications Promise or Community Promise, as applicable, patent licenses are available by contacting iplg@microsoft.com .
License Programs. To see all of the protocols in scope under a specific license program and the associated patents, visit the Patent Map .
Trademarks. The names of companies and products contained in this documentation might be covered by trademarks or similar intellectual property rights. This notice does not grant any licenses under those rights. For a list of Microsoft trademarks, visit www.microsoft.com/trademarks .
Fictitious Names. The example companies, organizations, products, domain names, email addresses, logos, people, places, and events that are depicted in this documentation are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred.
Reservation of Rights. All other rights are reserved, and this notice does not grant any rights other than as specifically described above, whether by implication, estoppel, or otherwise.
Tools. The Open Specifications documentation does not require the use of Microsoft programming tools or programming environments in order for you to develop an implementation. If you have access to Microsoft programming tools and environments, you are free to take advantage of them. Certain Open Specifications documents are intended for use in conjunction with publicly available standards specifications and network programming art and, as such, assume that the reader either is familiar with the aforementioned material or has immediate access to it.
Источник
What to Know
- An XLSB file is an Excel binary workbook file.
- Open one with Excel Viewer, Excel, or WPS Office Spreadsheet.
- Convert to XLSX, CSV, and others with some of those programs or other spreadsheet software.
This article describes what XLSB files are, how they’re different than other Excel formats, how to open one, and how to convert one to various other formats like PDF, CSV, XLSX, etc.
What Is an XLSB File?
An XLSB file is an Excel binary workbook file. They store information in binary format instead of XML like with most other Excel files (e.g., XLSX).
Since XLSB files are binary, they can be read from and written to much faster, making them extremely useful for very large spreadsheets. When dealing with big spreadsheets, you might also notice smaller file sizes when using XLSB vs XLSX.
XLSB files store spreadsheet data just like any other Excel workbook format. Workbooks can contain multiple worksheets, and within each worksheet is a collection of cells organized by rows and columns where text, images, charts, and formulas can exist.
XLSB Files.
How to Open an XLSB File
Excel (version 2007 and newer) is the primary software program used to open and edit XLSB files. If you have an earlier version of Excel, you can still open, edit, and save XLSB files with it, but you have to install the free Microsoft Office Compatibility Pack first.
It’s possible for an XLSB file to have macros embedded in it, which have the potential to store malicious code. It’s important to take great care when opening executable file formats like this that you may have received via email or downloaded from websites you’re not familiar with. See our List of Executable File Extensions for a listing of file extensions to avoid and why.
If you don’t have any versions of Microsoft 365 (formerly Microsoft Office), you can use WPS Office Spreadsheet, OpenOffice Calc or LibreOffice Calc to open XLSB files.
Microsoft’s free Excel Viewer lets you open and print XLSB files without needing Excel. Just keep in mind that you can’t make any changes to the file and then save it back to the same format—you’ll need the full Excel program for that.
XLSB files are stored using ZIP compression, so while you can use a free file zip/unzip utility to «open» the file, doing so won’t let you read or edit it like the programs from above can do.
How to Convert an XLSB File
If you have Excel or Calc, the easiest way to convert an XLSB file is to open the file in the program and then save it back to your computer in another format.
Some file formats supported by these programs include XLSX, XLS, XLSM, CSV, PDF, and TXT.
XLSB Files and Macros
The XLSB format is similar to XLSM—both can embed and run macros if Excel has macro capabilities turned on.
However, an important thing to understand is that XLSM is a macro-specific file format. In other words, the «M’ at the end of the file extension indicates that the file may or may not contain macros, while it’s non-macro counterpart XLSX can also have macros but is unable to run them.
XLSB, on the other hand, is much like XLSM in that it can be used to store and run macros, but there isn’t a macro-free format like there is with XLSM.
All this really means is that it’s not as easily understood whether or not a macro may exist in the XLSM format, so it’s important to understand where the file came from to ensure that it isn’t loading harmful macros.
More Help With XLSB Files
If your file won’t open with the programs suggested above, the very first thing you should check is that the file extension for your file actually does read as «.XLSB» and not just something that looks similar. It’s really easy to confuse other file formats with XLSB given that their extensions look alike.
For example, you might really be dealing with an XLB file which doesn’t open the same way in Excel or OpenOffice. Follow that link to learn more about those files.
XSB files are similar in how their file extension is spelled, but they’re really XACT Sound Bank files that have nothing to do with Excel or spreadsheets in general. Instead, these Microsoft XACT files reference sound files and describe when they should be played during a video game.
Another to be careful with is XLR. Depending on the age of the file, it might not open in Excel at all.
If you don’t have an XLSB file and that’s why it’s not working with the programs mentioned on this page, then research the file extension you do have so that you can find out which program or website can open or convert your file.
Thanks for letting us know!
Get the Latest Tech News Delivered Every Day
Subscribe
I’m trying to read the contents of a xls-file without the use of any xls-libraries but having problems doing so.
I’m trying to use information I found here. It has a little step-by-step instruction of how to read the file.
Also using this xls-file-specification.
I’m not sure if I even do this step correctly:
3, Open the Workbook stream and scan for the first instance of a BOF record. This is the beginning of the Globals substream.
According to the file-specification or this page with a list of the record-numbers, I should be looking for for 2057(0809h) but the whole file doesn’t contain that record anywhere(also using a hexa-editor when trying to find it).
But then I read this part on page 20 in the specification:
Byte Swapping Excel BIFF files are transportable across the
MS-DOS/Windows and Apple Macintosh operating systems, among others. To
support transportability, Excel writes BIFF files where the low-order
byte of the word appears first in the file, followed by the highorder
byte.
If I understand that correctly (not sure that I do) big endian of the words are used, so that what I’m looking for is actually 2312(0908h). This makes the impression of being correct as it is found very early in every file i try.
So then over to the next step:
4, Read the Globals substream, loading the BoundSheet8 records and the SST into memory. For more details, see Globals.
I look for 133(8500h) and it’s found shortly after BOF, good. But the problem lies in the two next steps:
5, From the BoundSheet8 record that corresponds to the substream you want to open, read the first 4 bytes, which contains the lbPlyPos FilePointer.
6, Go to the offset in the stream specified by the lbPlyPos FilePointer. This is the BOF record for the worksheet.
So the following 4 bytes is a pointer that points to a position in the file I should go to. But reading those bytes in any order gives me a number that is larger than the whole file. And also, this part confuses me: «This is the BOF record for the worksheet.» Wasn’t that what I found in an earlier step? Hmm…
Sorry for my rambling. And I hope I make sense and that someone would be willing to help me a little.
Update:
Okay, I’ve gotten a little further with this. It’s quite confusing to me but it seems that each record is also read as «big endian», ie the last variable in the record is the one that is positioned earliest in the file. Though I don’t know if it applies to values with variable length? So, looking at this, the values of variable length are listed as the last one in a record. But obviously they can’t come as the first in the file because there would be no way to know how many bytes to read in if that info comes after it?
Anyway, if I ignore this value, and and skip 2 bytes for dt and A/unused and read the following 4 bytes as a uint it turns out as 1130 in my case. Adding that to the pos of the first BOF gives me the exact position of the sheet-BOF. And that cant be a coincidence, right?
Now the next problem arises. After that BOF-record the index-record is supposed to follow immediately. But no matter in what way I read in the bytes it still makes no sense…
Here’s what it looks like:
09 08 10 00 00 06 10 00 BB 0D CC 07 00 00 00 00 06 00 00 00 00 02 0E
00 00 00 00 00 1E 00 00 00 00 00 12 00 00 00 3E 02 12 00 B6 06 00 00
00 00 40 00 00 00 00 00 00 00 00 00 00 00 7D 00 0C 00 00 00 00 00 DD
06 0F 00 00 00 00 00 7D 00 0C 00 02 00 02 00 DD 06 0F 00 00 00 00 00
7D 00 0C 00 04 00 04 etc…
The first 2 bytes there being the BOF record 09 08, or 0809 swapped which is 2057 (which represents BOF) so the rest should be the INDEX but doesn’t make sense… I would greatly appreciate if someone could help me with this.
There are many different Excel workbook filetypes—XLS, XLSB, XLSM, XLSX—, but one in particular stands out from the rest. That filetype is the Excel binary workbook format and has XLSB extension as XLS + B for Binary. This type of spreadsheet file differs from other Excel workbook format files (like XLSX and XLSM filetypes—which are the standard workbook and macro-enabled workbook files respectively) in the fact that XLSB workbook files store data in a binary format whereas other workbook files store data in XML format.
The benefits of storing data in a binary format means that XLSB spreadsheets are read from and written to significantly faster when compared to other workbook format files. These benefits also manifest themselves in calculation speed and workbook file size, making them significantly faster and significantly smaller respectively. Excel binary spreadsheet have all these benefits while still allowing for the implementation of VBA code macros. Binary workbooks can also load formulas greater than the standard 8192 character limit that is present in other workbook format files, such as XLSX and XLSM. This means that the use of binary spreadsheet format files is very beneficial in massively sized spreadsheets with long, complex formulas and long calculation times which may or may not need to utilize VBA code macros. However, binary XLSB format files are not compatible with versions of Excel prior to Microsoft Excel 2003 and are only compatible with Excel 2003 if the Excel 2007 compatibility pack has been installed along with Service Pack 3. Binary workbook format files can also be incompatible with any software or service that requires XML data instead of binary data, such as certain web servers or certain linked or layered systems.
XLSB format Pros and Cons VS XLSX
Pros:
– Smaller file size
– Faster to read from and write to (XLSB can save to/load from file faster)
– Can load formulas greater than the standard 8192 character limit
– Can have VBA code macros
Cons:
– Can only be loaded by Microsoft Excel 2007 and later versions
– Modification of the Excel Ribbon menu is not allowed while working on an XLSB
– Maybe incompatible with software or services that require XML data instead of binary data
Create Binary Workbook (EXE file) With DoneEx XCell Compiler From Any Workbook Format
DoneEx XCell Compiler can compile XLSB files! In fact, DoneEx XCell Compiler can compile all Excel workbook files (XLS, XLSX, XLSM, XLSB)! This means that you can just use the Excel workbook type that best suits your needs—from your workbook view and logic to your environment.
After compiling with DoneEx XCell Compiler, you will get the binary workbook in an executable (exe) file format with protected formulas and VBA code, which is more secure than any standard workbook file.
How to Create an Binary Workbook (XLSB) File with Microsoft Excel
The process of creating an binary spreadsheet file is much like creating any other workbook file. Just open Microsoft Excel and work on your workbook, then once you are ready and willing to save your work, just follow these simple steps:
1. Click on the “Home” button on the Excel Ribbon menu.
2. Proceed to select “Save As” and click on “Browse”.
3. Once you are at the “Save As” browsing menu, click on the “Save as type” drop down menu and select “Excel Binary Workbook”.
4. Now that you have “Excel Binary Workbook” selected as your file type, you can click on the “Save” button to save your workbook as an “Excel Binary Workbook”.
How to Convert an XLSB format file to XLSX, XLSM, or XLS file
If you are working with a software, service, or any type of environment that does not support binary spreadsheet files and requires workbooks are stored as XML data you may need to convert your binary spreadsheet file into another workbook file. The process of converting your XLSB file into another workbook format file is similar to creating an binary workbook format file. Once you have the Excel binary workbook you wish to convert into another workbook file open, just follow these simple steps:
1. Click on the “Home” button on the Excel Ribbon menu.
2. Proceed to select “Save As” and click on “Browse”.
3. Once you are at the “Save As” browsing menu, click on the “Save as type” drop down menu and select the Excel spreadsheet format that is the most appropriate for you. If you wish to convert your XLSB into a standard Excel workbook (XLSX) that does not support VBA code macros then you need to save your workbook with the “XLSX” filetype, so you need to select the “Excel Workbook” from the “Save as type” drop down menu.
If you wish to convert your binary .xlsb format spreadsheet into a standard Excel Workbook that can support VBA code macros then you need to save your workbook with the “XLSM” filetype, so you need to select the “Excel Macro-Enabled Workbook” from the “Save as type” drop down menu.
If you wish to convert your .XLSB format Workbook into a standard Excel Workbook that can be read by Excel 2003 and prior versions, then you need to save your workbook with the “XLS” filetype, so you need to select the “Excel 97-2003 Workbook” from the “Save as type” drop down menu.
4. Once you have the appropriate filetype that you wish to convert your binary spreadsheet into, just click the “Save” button and you are done! Your binary spreadsheet has now been converted to the filetype that you have chosen in step 3!
- Download OLE read/write (C++) — 9 Kb
- Download BIFF12 Reader (C++) — 404 Kb
- Download OLE read/write (C#) — 10 Kb
- Download BIFF12 Reader (C#) — 404 Kb
Stephane Rodriguez, August 2006 — this document is not endorsed by Microsoft.
Introduction
The new Office 2007 file formats are ZIP files that contain parts some of which are XML, some others are native file formats such as JPEG pictures, and the remaining binary parts end up being referred to as BIN parts. BIN parts are of particular interest for the file format consumer or updater since the underlying file formats are undocumented (at the time of writing, August 10 2006) and are several additional file formats to deal with.
BIN parts appear in a number of cases. If you insert a VBA macro or an OLE object in a Word 2007, Excel 2007 or Powerpoint 2007 document, then there will be one or more BIN parts of interest. BIN parts are zip entries consisting of files with extension .BIN, that actually contain their own file format depending on the MIME type defined in the relationships part (xxx.rels) :
- VBA macros: vbaProject.bin (MIME type: application/vnd.ms-office.vbaProject)
- OLE objects: oleObjectxxx.bin (MIME type: application/vnd.openxmlformats-officedocument.oleObject)
An example of vbaProject.bin when cracking-open a Word 2007 .DOCM file
An example of oleObjectxxx.bin when cracking-open a Powerpoint 2007 .PPTX file
Before I get into further details about these two BIN parts, there are also BIN parts introduced by a new variant of the Excel 2007 file format known as Excel binary workbook which is a file ending with .XLSB. Apparently for performance reasons, it was decided to store an Excel file using a number of BIN parts instead of XML parts. Those BIN parts are a subset of the XML parts most affected by performance and scalability issues, most noteworthy each worksheet because of its arbitrary size. For some reason, the workbook, styles and a number of other small parts are also BIN parts despite the fact that those contribute only marginally to the overall processing of the workbook. Again, there are underlying file formats to deal with for both the consumer and the implementer.
Some of the BIN parts from an Excel 2007 binary .XLSB file with 3 worksheets
In addition to VBA projects and embedded OLE objects, we find BIN parts in Excel 2007 .XLSB files for the following reasons :
- Workbook part workbook.bin (MIME type: application/vnd.ms-excel.workbook)
- Styles dictionary part styles.bin (MIME type: application/vnd.ms-excel.styles)
- For each worksheet,
- an index part worksheets/binaryIndexxx.bin (MIME type: application/vnd.ms-excel.binIndexWs)
- a worksheet part worksheets/sheetxxx.bin (MIME type: application/vnd.ms-excel.worksheet)
- an optional printer settings part printerSettings/printerSettingsxxx.bin (MIME type: application/vnd.openxmlformats-officedocument.spreadsheetml.printerSettings)
- Optional calculation chain part calcChain.bin (MIME type: application/vnd.ms-excel.calcChain)
- Optional comments parts commentsxxx.bin (MIME type: application/vnd.ms-office.legacyDrawing)
- Optional tables parts tables/tablexxx.bin (MIME type: application/vnd.ms-excel.table)
- Optional connections parts connections.bin (MIME type: application/vnd.ms-excel.connections)
- Optional Chartsheet, Dialogsheet, Macrosheet parts (chartsheets/sheetxxx.bin, dialogsheets/sheetxxx.bin, macrosheets/sheetxxx.bin) (MIME type: application/vnd.ms-excel.chartsheet, application/vnd.ms-excel.dialogsheet, application/vnd.ms-excel.macrosheet)
- Optional Pivot Table parts (pivotTables/pivotTablexxx.bin) (MIME type: application/vnd.ms-excel.pivotTable)
- Optional Pivot Table Cache definition and records parts (pivotCache/pivotCacheDefinitionxxx.bin, pivotCache/pivotCacheRecordsxxx.bin) (MIME type: application/vnd.ms-excel.pivotCacheDefinition, application/vnd.ms-excel.pivotCacheRecords)
- Optional Query tables (queryTables/queryTablexxx.bin) (MIME type: application/vnd.ms-excel.queryTable)
Note that the MIME content types don’t differentiate whether the actual files are stored in XML or BIN.
Out of the full list of parts taken from the Ecma specs, other optional parts of the Excel 2007 .XLSB file format are generally left as XML and thus the same than regular Excel 2007 files (.XLSX, .XLSM, …) are :
- Theme part (theme/themexxx.xml)
- Background image part (drawings/imagexxx.xml)
- Core and app-specific document properties parts (docProps/core.xml, docProps/app.xml)
- Custom properties part (customproperty.bin)
- Custom XML mapping part (xmlMaps.xml)
- Chart part (charts/chartxxx.xml)
- Drawing part (drawings/drawingxxx.xml, drawings/legacyDrawingxxx.vml)
- External workbook references part (externalLinks/externalLinkxxx.xml)
- Metadata part (metadata.xml)
- Shared workbook revisions parts (revisions/…)
- Single Cell Table part (tables/tableSinceCellsxxx.xml)
- Volatile dependencies part (volatileDependencies.xml)
A notable exception pointed above are legacy drawings, stored using the VML file format. While VML is XML markup, it requires an outstanding effort to read, write and possibly render from it. When legacy drawings are dropped in, they may (OLE object for instance) or may not (comment for instance) contain relations markup to other parts.
Printer settings in Excel 2007 files are always stored as BIN parts whether it’s .XLSB or not.
Reading or updating vbaProject.bin parts
In previous versions of Word, Excel and Powerpoint, VBA projects were stored as an OLE sub-container of the OLE document container. For the record, .doc / .dot / .xls / .xlt / .xlm / .xla / .ppt / .pot / .ppm / .ppa files are OLE document containers. As pictured below, we have created a Word 97 document and added a VBA macro to it. Notice the Macros sub-container. It contains a VBA container, which contains a number of streams, as well as two other streams.
To view an OLE document container, you can either use one of the tools part of Visual Studio 6.0 known as the DocFile Viewer, or you can use an OLE viewer freely available here.
In previous Word versions, VBA macros were stored as an OLE sub-container (Macros) including a number of streams.
If you double-click on a stream, you can see the actual content. Obviously, each stream is a file format by itself, that needs to be paid special attention to if you are hoping to read or update it. A basic scenario is to make a replacement of one stream by another, and does not imply you know anything that constitutes the streams themselves.
Let’s return to Word 2007, Excel 2007 and Powerpoint 2007. What you can do is extract the vbaProject.bin zip entry from a file with an inserted VBA macro and open it in the OLE viewer. What a surprise indeed when you see the following appear :
vbaProject.bin is the content of the Macros container defined above.
To read or update vbaProject.bin parts, you need native API calls represented by IStream
for streams, and IStorage
for containers. As a sidenote, the mandatory reliance on native API calls makes any client code unable to execute in a partial trust environment such as Click-Once: executing native code implies full trust.
At the top of the article is a sample code that reads an arbitrary OLE container using C++ and C#.
Reading or updating oleObjectxxx.bin parts
Much like the vbaProject.bin parts, oleObjectxxx.bin parts are OLE sub-containers. You can use the same tool (Doc File viewer or equivalent) to view the content of the file, and you can use the same source code provided in the previous section to read or update that file.
As an example, let’s create a simple Excel 97 document, insert an OLE object in it (a Wordpad document for instance), then close it and view the resulting .xls file in the OLE viewer. Notice a MBD0032B277 sub-container with two streams inside :
In previous versions of Office, embedded OLE objects are stored as sub-containers (MBD0032B277) of the OLE file.
Now let’s return to Word 2007, Excel 2007 or Powerpoint 2007. Just extract the oleObjectxxx.bin part from any such file were you have also inserted an OLE object, open it in the OLE viewer, to see something equivalent to :
oleObjectxxx.bin is the content of the MBD0032B277 container defined above.
What we have figured out so far is that BIN parts in the new file formats contain different underlying structures although they share common interfaces to traverse it (IStream
/IStorage
). To read and update VBA macros parts and OLE objects parts, you need interfaces to IStream
and IStorage
, and possibly the knowledge of the underlying content of streams (not a requirement in basic replacement scenarios). That applies equally to Word, Excel and Powerpoint.
With Excel 2007 binary workbooks however, other BIN parts don’t follow the structure and content.
Reading Excel 2007 BIN parts
The remainder of the article describes some of the BIN parts. Source code is provided in C++ and C# to read (and possibly write) those BIN parts. In addition to VBA project parts and OLE objects parts being documented in the first part of this article, in green are the parts about to get documented :
- workbook part workbook.bin (MIME type: application/vnd.ms-excel.workbook)
- styles dictionary part styles.bin (MIME type: application/vnd.ms-excel.styles)
- for each worksheet,
- an index part worksheets/binaryIndexxx.bin (MIME type: application/vnd.ms-excel.binIndexWs)
- a worksheet part worksheets/sheetxxx.bin (MIME type: application/vnd.ms-excel.worksheet)
- an optional printer settings part printerSettings/printerSettingsxxx.bin (MIME type: application/vnd.openxmlformats-officedocument.spreadsheetml.printerSettings)
- optional calculation chain part calcChain.bin (MIME type: application/vnd.ms-excel.calcChain)
- optional comments parts commentsxxx.bin (MIME type: application/vnd.ms-office.legacyDrawing)
- optional tables parts tables/tablexxx.bin (MIME type: application/vnd.ms-excel.table)
- optional connections parts connections.bin (MIME type: application/vnd.ms-excel.connections)
- optional Chartsheet, Dialogsheet, Macrosheet parts (chartsheets/sheetxxx.bin, dialogsheets/sheetxxx.bin, macrosheets/sheetxxx.bin) (MIME type: application/vnd.ms-excel.chartsheet, application/vnd.ms-excel.dialogsheet, application/vnd.ms-excel.macrosheet)
- optional Pivot Table parts (pivotTables/pivotTablexxx.bin) (MIME type: application/vnd.ms-excel.pivotTable)
- optional Pivot Table Cache definition and records parts (pivotCache/pivotCacheDefinitionxxx.bin, pivotCache/pivotCacheRecordsxxx.bin) (MIME type: application/vnd.ms-excel.pivotCacheDefinition, application/vnd.ms-excel.pivotCacheRecords)
- optional Query tables (queryTables/queryTablexxx.bin) (MIME type: application/vnd.ms-excel.queryTable)
Those parts in green are sufficient to read and possibly update arbitrary cells complete with associated formatting in an Excel 2007 workbook.
Introducing BIFF12
Each BIN part may be made up of its own underlying structure. Fortunately, most BIN parts share a common structure known as BIFF12
. This word I have made up based on the name of the binary file format name of older Excel versions, where BIFF stands for Binary Interchange File Format. The history of BIFF is worth an article of its own, but the most interesting thing to know about it is that the latest known major revision was BIFF8
, introduced in Excel 97. And the Excel team over at Microsoft were apparently so traumatized by the file format snafu that occurred when customers were forced to migrate from Excel 95 to Excel 97, two different file formats, that they have never considered changing the file format version again, despite adding a ton of new records in BIFF8
in Excel 2000, Excel XP and Excel 2003. Excel 2007 continues that trend in the sense that, if you save back an Excel 2007 file as a «Excel97-2003» compatible file, then Excel 2007 will save the bulk of its features in new BIFF8
records, thereby enabling round-tripping scenarios. BIFF is an OLE stream part inside an OLE document container. VBA projects, OLE objects, document summary properties, and XML maps (Excel 2003) are also stored as separate OLE streams. BIFF8
is basically a sequence of records where each record is identified using two bytes, followed by the length of the record also in two bytes, followed by the record content itself. That’s all there is. If you’d like to get the last public BIFF8
documentation from Microsoft (bundled with the MSO documentation), consider buying MSDN Library CDs of March 1998. BIFF12
kind of inherits this, but chose to make some interesting changes, like disregarding the existing BIFF8
record identifiers.
Just like BIFF8
, BIFF12
consists of a sequence of records consisting of an identifier, a length and the record content itself. Where it differs is that both the record identifiers and the record length are encoded using a variable-length technique. It works as follows : the first byte of the record identifier is read. If the most significant bit of that byte is set to 1, then another byte will have to be read, up to a maximum of 4 bytes (i.e. the record identifier can always be stored in a DWORD
). This most significant bit is irrelevant and thus appropriate shifting needs to occur to construct a record identifier.
So for example if you read byte 0x80
, the most significant bit is set, you need to read another byte. Let’s assume the other byte is 0x01
, the most significant bit is not set, so the record identifier is obtained. Shifting aside, the record identifier is 0x0180
. Note that, because you may want to match this record identifier against a number of known record identifiers, you can do the match using the unshifted record identifier, and as a bonus be able to figure out record identifiers in a BIFF12
hexadecimal dump in a straight forward manner.
It works the same for the record length. Shifting to the left in the containing DWORD
must occur in order to construct a proper length.
BIFF12 record structure | ||
record id (variable length) |
record length (variable length) | record content (size defined by record length) |
record id (variable length) | record length (variable length) | record content (size defined by record length) |
… | … | … |
Reading BIFF12 records
Note that, because Windows uses the little Endian notation, record identifiers and any two-byte, 4-byte or 8-byte value must be read right-to-left. If you are using C# and store the BIN part in a byte[]
array, then you must provide the appropriate function helpers to decode such structure. Here is how to decode words (2 bytes), dwords (4 bytes), single-precision floats (4 bytes), double-precision floats (8 bytes), and strings :
public static UInt16 GetWord(byte[] buffer, UInt32 offset) { UInt16 val = (UInt16) (buffer[offset + 1] << 8); val += (UInt16) (buffer[offset + 0]); return val; } public static UInt32 GetDword(byte[] buffer, UInt32 offset) { return ((UInt32)(buffer[offset + 3]) << 24) + ((UInt32)(buffer[offset + 2]) << 16) + ((UInt32)(buffer[offset + 1]) << 8) + ((UInt32)(buffer[offset + 0])); } public double GetDouble(byte[] buffer, UInt32 offset) { double d = 0; using (MemoryStream mem = new MemoryStream()) { BinaryWriter bw = new BinaryWriter(mem); for (UInt32 i = 0 ; i < 8; i++) bw.Write(buffer[offset + i]); mem.Seek(0,SeekOrigin.Begin); BinaryReader br = new BinaryReader(mem); d = br.ReadDouble(); br.Close(); bw.Close(); } return d; } public static String GetString(byte[] buffer, UInt32 offset, UInt32 len) { StringBuilder sb = new StringBuilder((int)len); for (UInt32 i = offset; i < offset + 2 * len; i += 2) sb.Append((Char)GetWord(buffer, i)); return sb.ToString(); } public static bool GetRecordID(byte[] buffer, ref UInt32 offset, ref UInt32 recid) { recid = 0; if (offset >= buffer.Length) return false; byte b1 = buffer[offset++]; recid = (UInt32)(b1 & 0x7F); if ((b1 & 0x80) == 0) return true; if (offset >= buffer.Length) return false; byte b2 = buffer[offset++]; recid = ((UInt32)(b2 & 0x7F) << 7) | recid; if ((b2 & 0x80) == 0) return true; if (offset >= buffer.Length) return false; byte b3 = buffer[offset++]; recid = ((UInt32)(b3 & 0x7F) << 14) | recid; if ((b3 & 0x80) == 0) return true; if (offset >= buffer.Length) return false; byte b4 = buffer[offset++]; recid = ((UInt32)(b4 & 0x7F) << 21) | recid; return true; }
As a sidenote, the structure of strings stored inside records is the following : 4 bytes for the length (encoded in little Endian) which defines the number of string characters (not bytes) to follow, followed by such number of Unicode characters (2 bytes each, also encoded in little Endian). The strings are never zero-terminated, and I have never encountered in this reverse engineering game strings encoded in something else other than Unicode.
Once you’ve got this, you can read a BIFF12 structure with code like this :
UInt32 offset = 0; while (offset < buffer.Length) { UInt32 recid = 0; UInt32 reclen = 0; if (!BaseRecord.GetRecordID(buffer, ref offset, ref recid) || !BaseRecord.GetRecordLen(buffer, ref offset, ref reclen)) { Console.WriteLine("***Damaged buffer***"); break; } BaseRecord recHandler = (BaseRecord) h[recid]; if (recHandler != null) { Console.Write( String.Format("<{0}>rn[rec=0x{1:X} len=0x{2:X}]", recHandler.GetTag(), recid, reclen) ); for (int i = 0; i < reclen; i++) { Console.Write( String.Format(" {0:X2}", buffer[offset + i]) ); } Console.WriteLine(); recHandler.Read(buffer, ref offset, recid, reclen, h, w); if (offset == UInt32.MaxValue) { Console.WriteLine("***Damaged buffer***"); break; } } else { Console.Write( String.Format("[rec=0x{0:X} len=0x{1:X}]", recid, reclen) ); for (int i = 0; i < reclen; i++) { Console.Write( String.Format(" {0:X2}", buffer[offset + i]) ); } Console.WriteLine(); } offset += reclen; Console.WriteLine(); }
When applying this code to a worksheet BIN part, it produces the following :
// Here is how to read what follows // For each record known by the BIFF12 reader, we come up with // the XML markup tag associated to the record. // This provides clues as how to swap from the XML part to the // BIN part and vice versa and is easier to understand. // Followed by square brackets enclosing the record identifier // and associated length // Followed by the record content itself // (i.e. nothing if the length is zero) // Anytime the record has an underlying structure // (as with <sheetData>, the structure // is decoded, and human readable info is provided). *** Dumping a worksheet part <worksheet> [rec=0x181 len=0x0] <sheetPr> [rec=0x193 len=0xF] C9 04 02 00 40 00 00 00 00 00 00 00 00 00 00 info : <tabColor rgb=.../> info : <outlinePr showOutlineSymbols=.../> info : <pageSetUpPr .../> info : </sheetPr> <dimension> [rec=0x194 len=0x10] 04 00 00 00 04 00 00 00 00 00 00 00 07 00 00 00 info : r1=4, c1=0, r2=4, c2=7 <sheetViews> [rec=0x185 len=0x0] <sheetView> [rec=0x189 len=0x1E] DC 03 00 00 00 00 01 00 00 00 00 00 00 00 40 00 00 00 64 00 00 00 00 00 00 00 00 00 00 00 <selection> [rec=0x198 len=0x24] 03 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 04 00 00 00 04 00 00 00 00 00 00 00 FF 3F 00 00 </sheetView> [rec=0x18A len=0x0] </sheetViews> [rec=0x186 len=0x0] <sheetFormatPr> [rec=0x3E5 len=0xC] FF FF FF FF 08 00 2C 01 00 00 00 01 <cols> [rec=0x386 len=0x0] info : colmin=1, colmax=2, width=9,140625, style=1, outline=false, resize=false, hidden=false info : colmin=4, colmax=5, width=9,140625, style=2, outline=false, resize=false, hidden=false info : colmin=6, colmax=6, width=9,140625, style=2, outline=true, resize=true, hidden=false info : colmin=7, colmax=7, width=9,140625, style=0, outline=true, resize=true, hidden=false info : colmin=8, colmax=8, width=0, style=0, outline=false, resize=true, hidden=true info : colmin=9, colmax=9, width=11, style=0, outline=false, resize=true, hidden=false </cols> [rec=0x387 len=0x0] <sheetData> [rec=0x191 len=0x0] info : row=4, height=405, style=0, outline=false, resize=true, hidden=false info : col=0, style=0, v:stringindex=0 v:string=a </sheetData> [rec=0x192 len=0x0] [rec=0x497 len=0x42] 00 00 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 <printOptions> [rec=0x3DD len=0x2] 10 00 <pageMargins> [rec=0x3DC len=0x30] 66 66 66 66 66 66 E6 3F 66 66 66 66 66 66 E6 3F 00 00 00 00 00 00 E8 3F 00 00 00 00 00 00 E8 3F 33 33 33 33 33 33 D3 3F 33 33 33 33 33 33 D3 3F <pageSetup> [rec=0x3DE len=0x22] 01 00 00 00 64 00 00 00 2C 01 00 00 2C 01 00 00 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00 00 00 <headerFooter> [rec=0x3DF len=0x1A] 0C 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF [rec=0x3E0 len=0x0] </worksheet> [rec=0x182 len=0x0]
BIFF12 records for a number of BIN parts
So what are some of those important records you ask?
public const int BIFF12_DEFINEDNAME = 0x27; public const int BIFF12_FILEVERSION = 0x0180; public const int BIFF12_WORKBOOK = 0x0183; public const int BIFF12_WORKBOOK_END = 0x0184; public const int BIFF12_BOOKVIEWS = 0x0187; public const int BIFF12_BOOKVIEWS_END = 0x0188; public const int BIFF12_SHEETS = 0x018F; public const int BIFF12_SHEETS_END = 0x0190; public const int BIFF12_WORKBOOKPR = 0x0199; public const int BIFF12_SHEET = 0x019C; public const int BIFF12_CALCPR = 0x019D; public const int BIFF12_WORKBOOKVIEW = 0x019E; public const int BIFF12_EXTERNALREFERENCES = 0x02E1; public const int BIFF12_EXTERNALREFERENCES_END = 0x02E2; public const int BIFF12_EXTERNALREFERENCE = 0x02E3; public const int BIFF12_WEBPUBLISHING = 0x04A9; public const int BIFF12_ROW = 0x00; public const int BIFF12_BLANK = 0x01; public const int BIFF12_NUM = 0x02; public const int BIFF12_BOOLERR = 0x03; public const int BIFF12_BOOL = 0x04; public const int BIFF12_FLOAT = 0x05; public const int BIFF12_STRING = 0x07; public const int BIFF12_FORMULA_STRING = 0x08; public const int BIFF12_FORMULA_FLOAT = 0x09; public const int BIFF12_FORMULA_BOOL = 0x0A; public const int BIFF12_FORMULA_BOOLERR = 0x0B; public const int BIFF12_COL = 0x3C; public const int BIFF12_WORKSHEET = 0x0181; public const int BIFF12_WORKSHEET_END = 0x0182; public const int BIFF12_SHEETVIEWS = 0x0185; public const int BIFF12_SHEETVIEWS_END = 0x0186; public const int BIFF12_SHEETVIEW = 0x0189; public const int BIFF12_SHEETVIEW_END = 0x018A; public const int BIFF12_SHEETDATA = 0x0191; public const int BIFF12_SHEETDATA_END = 0x0192; public const int BIFF12_SHEETPR = 0x0193; public const int BIFF12_DIMENSION = 0x0194; public const int BIFF12_SELECTION = 0x0198; public const int BIFF12_COLS = 0x0386; public const int BIFF12_COLS_END = 0x0387; public const int BIFF12_CONDITIONALFORMATTING = 0x03CD; public const int BIFF12_CONDITIONALFORMATTING_END = 0x03CE; public const int BIFF12_CFRULE = 0x03CF; public const int BIFF12_CFRULE_END = 0x03D0; public const int BIFF12_ICONSET = 0x03D1; public const int BIFF12_ICONSET_END = 0x03D2; public const int BIFF12_DATABAR = 0x03D3; public const int BIFF12_DATABAR_END = 0x03D4; public const int BIFF12_COLORSCALE = 0x03D5; public const int BIFF12_COLORSCALE_END = 0x03D6; public const int BIFF12_CFVO = 0x03D7; public const int BIFF12_PAGEMARGINS = 0x03DC; public const int BIFF12_PRINTOPTIONS = 0x03DD; public const int BIFF12_PAGESETUP = 0x03DE; public const int BIFF12_HEADERFOOTER = 0x03DF; public const int BIFF12_SHEETFORMATPR = 0x03E5; public const int BIFF12_HYPERLINK = 0x03EE; public const int BIFF12_DRAWING = 0x04A6; public const int BIFF12_LEGACYDRAWING = 0x04A7; public const int BIFF12_COLOR = 0x04B4; public const int BIFF12_OLEOBJECTS = 0x04FE; public const int BIFF12_OLEOBJECT = 0x04FF; public const int BIFF12_OLEOBJECTS_END = 0x0580; public const int BIFF12_TABLEPARTS = 0x0594; public const int BIFF12_TABLEPART = 0x0595; public const int BIFF12_TABLEPARTS_END = 0x0596; public const int BIFF12_SI = 0x13; public const int BIFF12_SST = 0x019F; public const int BIFF12_SST_END = 0x01A0; public const int BIFF12_FONT = 0x2B; public const int BIFF12_FILL = 0x2D; public const int BIFF12_BORDER = 0x2E; public const int BIFF12_XF = 0x2F; public const int BIFF12_CELLSTYLE = 0x30; public const int BIFF12_STYLESHEET = 0x0296; public const int BIFF12_STYLESHEET_END = 0x0297; public const int BIFF12_COLORS = 0x03D9; public const int BIFF12_COLORS_END = 0x03DA; public const int BIFF12_DXFS = 0x03F9; public const int BIFF12_DXFS_END = 0x03FA; public const int BIFF12_TABLESTYLES = 0x03FC; public const int BIFF12_TABLESTYLES_END = 0x03FD; public const int BIFF12_FILLS = 0x04DB; public const int BIFF12_FILLS_END = 0x04DC; public const int BIFF12_FONTS = 0x04E3; public const int BIFF12_FONTS_END = 0x04E4; public const int BIFF12_BORDERS = 0x04E5; public const int BIFF12_BORDERS_END = 0x04E6; public const int BIFF12_CELLXFS = 0x04E9; public const int BIFF12_CELLXFS_END = 0x04EA; public const int BIFF12_CELLSTYLES = 0x04EB; public const int BIFF12_CELLSTYLES_END = 0x04EC; public const int BIFF12_CELLSTYLEXFS = 0x04F2; public const int BIFF12_CELLSTYLEXFS_END = 0x04F3; public const int BIFF12_COMMENTS = 0x04F4; public const int BIFF12_COMMENTS_END = 0x04F5; public const int BIFF12_AUTHORS = 0x04F6; public const int BIFF12_AUTHORS_END = 0x04F7; public const int BIFF12_AUTHOR = 0x04F8; public const int BIFF12_COMMENTLIST = 0x04F9; public const int BIFF12_COMMENTLIST_END = 0x04FA; public const int BIFF12_COMMENT = 0x04FB; public const int BIFF12_COMMENT_END = 0x04FC; public const int BIFF12_TEXT = 0x04FD; public const int BIFF12_AUTOFILTER = 0x01A1; public const int BIFF12_AUTOFILTER_END = 0x01A2; public const int BIFF12_FILTERCOLUMN = 0x01A3; public const int BIFF12_FILTERCOLUMN_END= 0x01A4; public const int BIFF12_FILTERS = 0x01A5; public const int BIFF12_FILTERS_END = 0x01A6; public const int BIFF12_FILTER = 0x01A7; public const int BIFF12_TABLE = 0x02D7; public const int BIFF12_TABLE_END = 0x02D8; public const int BIFF12_TABLECOLUMNS = 0x02D9; public const int BIFF12_TABLECOLUMNS_END= 0x02DA; public const int BIFF12_TABLECOLUMN = 0x02DB; public const int BIFF12_TABLECOLUMN_END = 0x02DC; public const int BIFF12_TABLESTYLEINFO = 0x0481; public const int BIFF12_SORTSTATE = 0x0492; public const int BIFF12_SORTCONDITION = 0x0494; public const int BIFF12_SORTSTATE_END = 0x0495; public const int BIFF12_QUERYTABLE = 0x03BF; public const int BIFF12_QUERYTABLE_END = 0x03C0; public const int BIFF12_QUERYTABLEREFRESH = 0x03C1; public const int BIFF12_QUERYTABLEREFRESH_END = 0x03C2; public const int BIFF12_QUERYTABLEFIELDS = 0x03C7; public const int BIFF12_QUERYTABLEFIELDS_END = 0x03C8; public const int BIFF12_QUERYTABLEFIELD = 0x03C9; public const int BIFF12_QUERYTABLEFIELD_END = 0x03CA; public const int BIFF12_CONNECTIONS = 0x03AD; public const int BIFF12_CONNECTIONS_END = 0x03AE; public const int BIFF12_CONNECTION = 0x01C9; public const int BIFF12_CONNECTION_END = 0x01CA; public const int BIFF12_DBPR = 0x01CB; public const int BIFF12_DBPR_END = 0x01CC;
Workbook part
How to proceed for each BIN part of interest is quite straight forward in fact. You can create a regular Excel 2007 file that uses a particular feature, for instance a chart, and save it both as .XLSX and .XLSB. Then you can unzip the content in separate folders, take the parts side-by-side and try to figure out which XML markup corresponds to which BIFF12
record identifier. The XML markup itself is not mandatory at all, it only makes the whole thing approachable for human beings…
I have done some of this work for a number of important BIN parts, but I concentrated on what was really needed to make sure I was able to read the content of cells. That’s why, while records are for the most part identified and matched with their XML markup siblings, the record content itself is not. It’s not either because I was too lazy to do it, or because it does not serve the goal, or because it can’t be disambiguated that easily. For instance, if you take a look at a regular workbook BIN part below, you’ll notice a few records with no associated XML markup :
// This is what a workbook part looks like in XML ="1.0"="UTF-8"="yes" <workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/5/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"> <fileVersion lastEdited="4" lowestEdited="4" rupBuild="4017"/> <workbookPr defaultThemeVersion="123820"/> <bookViews> <workbookView xWindow="360" yWindow="60" windowWidth="11295" windowHeight="5580"/> </bookViews> <sheets> <sheet name="Sheet1" sheetId="1" r:id="rId1"/> <sheet name="Sheet2" sheetId="2" r:id="rId2"/> <sheet name="Sheet3" sheetId="3" r:id="rId3"/> </sheets> <calcPr calcId="122211"/> <webPublishing codePage="1252"/> </workbook> // This is what a workbook part looks like in BIN 83 01 00 80 01 14 04 04 b1 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 99 01 0c 20 00 01 00 ac e3 01 00 00 00 00 00 87 01 00 9e 01 1d 68 01 00 00 3c 00 00 00 1f 2c 00 00 cc 15 00 00 58 02 00 00 00 00 00 00 00 00 00 00 78 88 01 00 8f 01 00 9c 01 28 00 00 00 00 00 00 00 00 01 00 00 00 04 00 00 00 72 00 49 00 64 00 31 00 06 00 00 00 53 00 68 00 65 00 65 00 74 00 31 00 9c 01 28 00 00 00 00 00 00 00 00 02 00 00 00 04 00 00 00 72 00 49 00 64 00 32 00 06 00 00 00 53 00 68 00 65 00 65 00 74 00 32 00 9c 01 28 00 00 00 00 00 00 00 00 03 00 00 00 04 00 00 00 72 00 49 00 64 00 33 00 06 00 00 00 53 00 68 00 65 00 65 00 74 00 33 00 90 01 00 9d 01 19 63 dd 01 00 01 00 00 00 64 00 00 00 fc a9 f1 d2 4d 62 50 3f 01 00 00 00 6a 96 04 06 00 00 00 00 00 00 9a 01 01 00 a9 04 0b 07 00 03 60 00 00 00 e4 04 00 00 9b 01 01 00 84 01 00 // This is how, after breaking the BIN part in records, you can match // records to the XML markup. // Note that record identifiers are the two bytes on the left, // the associated record length is surrounded by parentheses, // and it's followed by the record content itself. <workbook> 83 01 (00) <fileVersion lastEdited="4" lowestEdited="4" rupBuild="4017"/> (hint : 4017 = 0x0FB1) 80 01 (14) 04 04 b1 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <workbookPr defaultThemeVersion="123820"/> (hint : 123820 = 0x0001E3AC) 99 01 (0c) 20 00 01 00 ac e3 01 00 00 00 00 00 <bookViews> 87 01 (00) <workbookView xWindow="360" yWindow="60" windowWidth="11295" windowHeight="5580"/> 9e 01 (1d) 68 01 00 00 3c 00 00 00 1f 2c 00 00 cc 15 00 00 58 02 00 00 00 00 00 00 00 00 00 00 78 </bookViews> 88 01 (00) <sheets> 8f 01 (00) <sheet name="Sheet1" sheetId="1" r:id="rId1"/> 9c 01 (28) 00 00 00 00 00 00 00 00 01 00 00 00 sheetid 04 00 00 00 length of string to follow (in characters, not bytes) 72 00 49 00 64 00 31 00 relation identifier 06 00 00 00 length of string to follow (in characters, not bytes) 53 00 68 00 65 00 65 00 74 00 31 00 sheetname <sheet name="Sheet2" sheetId="2" r:id="rId2"/> 9c 01 (28) 00 00 00 00 00 00 00 00 02 00 00 00 sheetid 04 00 00 00 length of string to follow (in characters, not bytes) 72 00 49 00 64 00 32 00 relation identifier 06 00 00 00 length of string to follow (in characters, not bytes) 53 00 68 00 65 00 65 00 74 00 32 00 sheetname <sheet name="Sheet3" sheetId="3" r:id="rId3"/> 9c 01 (28) 00 00 00 00 00 00 00 00 03 00 00 00 sheetid 04 00 00 00 length of string to follow (in characters, not bytes) 72 00 49 00 64 00 33 00 relation identifier 06 00 00 00 length of string to follow (in characters, not bytes) 53 00 68 00 65 00 65 00 74 00 33 00 sheetname </sheets> 90 01 (00) <externalReferences> e1 02 (00) <externalReference r:id="rId4" /> e3 02 (0c) 04 00 00 00 length of string to follow (in characters, not bytes) 72 00 49 00 64 00 34 00 string representing a relation identifier e5 02 (00) ea 02 1c 02 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 </externalReferences> e2 02 (00) <definedName name="externalrange" comment="">[1]Sheet1!$B$3 </definedName> 27 (3c) 00 00 grbits 00 00 00 ff ff ff ff nametype 0d 00 00 00 length of string to follow (in characters, not bytes) 65 00 78 00 74 00 65 00 72 00 6e 00 61 00 6c 00 72 00 61 00 6e 00 67 00 65 00 defined name 09 00 00 00 length of formula to follow (in bytes) 3a 01 00 02 00 00 00 01 00 formula 00 00 00 00 00 00 00 00 <definedName name="anotherrange">Sheet1!$B$9:$C$10</definedName> 27 (40) 00 00 grbits 00 00 00 ff ff ff ff nametype 0c 00 00 00 length of string to follow (in characters, not bytes) 61 00 6e 00 6f 00 74 00 68 00 65 00 72 00 72 00 61 00 6e 00 67 00 65 00 defined name 0f 00 00 00 length of formula to follow (in bytes) 3b 00 00 08 00 00 00 09 00 00 00 01 00 02 00 formula 00 00 00 00 ff ff ff ff <definedName name="Database1" localSheetId="1" hidden="1">Sheet2!$A$1:$D$6</definedName> 27 (3a) 01 00 grbits 00 00 00 01 00 00 00 nametype 09 00 00 00 length of string to follow (in characters, not bytes) 44 00 61 00 74 00 61 00 62 00 61 00 73 00 65 00 31 00 defined name 0f 00 00 00 length of formula to follow (in bytes) 3b 01 00 00 00 00 00 05 00 00 00 00 00 03 00 formula 00 00 00 00 ff ff ff ff <definedName name="myrange">Sheet1!$C$2:$D$3</definedName> 27 (36) 00 00 grbits 00 00 00 ff ff ff ff nametype 07 00 00 00 length of string to follow (in characters, not bytes) 6d 00 79 00 72 00 61 00 6e 00 67 00 65 00 defined name 0f 00 00 00 length of formula to follow (in bytes) 3b 00 00 01 00 00 00 02 00 00 00 02 00 03 00 formula 00 00 00 00 ff ff ff ff <definedName name="_xlnm.Print_Area" localSheetId="0">Sheet1!$A$1:$E$7</definedName> 27 (3c) 20 00 grbits 00 00 00 00 00 00 00 nametype 0a 00 00 00 length of string to follow (in characters, not bytes) 50 00 72 00 69 00 6e 00 74 00 5f 00 41 00 72 00 65 00 61 00 defined name 0f 00 00 00 length of formula to follow (in bytes) 3b 00 00 00 00 00 00 06 00 00 00 00 00 04 00 formula 00 00 00 00 ff ff ff ff <definedName name="_xlnm.Print_Titles" localSheetId="0">Sheet1!$2:$3 </definedName> 27 (40) 20 00 grbits 00 00 00 00 00 00 00 nametype 0c 00 00 00 length of string to follow (in characters, not bytes) 50 00 72 00 69 00 6e 00 74 00 5f 00 54 00 69 00 74 00 6c 00 65 00 73 00 defined name 0f 00 00 00 length of formula to follow (in bytes) 3b 00 00 01 00 00 00 02 00 00 00 00 00 ff 3f formula 00 00 00 00 ff ff ff ff <definedName name="myrange" hidden="1">Sheet1!$B$2:$B$3 </definedName> 27 (36) 01 00 grbits 00 00 00 ff ff ff ff nametype 07 00 00 00 length of string to follow (in characters, not bytes) 6d 00 79 00 72 00 61 00 6e 00 67 00 65 00 defined name 0f 00 00 00 length of formula to follow (in bytes) 3b 00 00 01 00 00 00 02 00 00 00 01 00 01 00 formula 00 00 00 00 ff ff ff ff <calcPr calcId="122211"/> 9d 01 (19) 63 dd 01 00 01 00 00 00 64 00 00 00 fc a9 f1 d2 4d 62 50 3f 01 00 00 00 6a 96 04 (06) 00 00 00 00 00 00 9a 01 (01) 00 <webPublishing codePage="1252"/> a9 04 (0b) 07 00 03 60 00 00 00 e4 04 00 00 9b 01 (01) 00 </workbook> 84 01 (00)
As you can see, I could not easily find the markup associated to record identifiers 0x0496
, 0x019A
and 0x019B
above. While it’s not blocking at this point, let’s take a few moments to discuss the issue.
The big impedance mismatch
The example clearly shows that the person who wrote the workbook BIN part serializer knows more than the person who developed the workbook XML part serializer. Unfortunately, the opposite is also true since the XML markup contains namespaces and associated semantics that is for obvious reasons nowhere to be found in the BIN part. Where are we going?
Short of Microsoft providing the exact specs for the BIN serializers of every involved part, consumers and implementers of the file format will have to stick to replicating structures that cannot be understood because of a discrepancy between serializers. It goes all the way up to guessing default values of the objects you work with, that’s why it’s such a big deal. One of those well-known file format loopholes, the ones that can give a vendor a say in the format’s future as well as any interoperability scenario, across Windows and non-Windows platforms.
How to read the workbook BIN part using C# or C++ ? Using the source code provided in an attachment to the article, you can easily do that.
Hashtable h = new Hashtable(); h[C.BIFF12_DEFINEDNAME] = new DefinedNameRecord(); h[C.BIFF12_FILEVERSION] = new FileVersionRecord(); h[C.BIFF12_WORKBOOK] = new WorkbookRecord(); h[C.BIFF12_WORKBOOK_END] = new WorkbookEndRecord(); h[C.BIFF12_BOOKVIEWS] = new BookViewsRecord(); h[C.BIFF12_BOOKVIEWS_END] = new BookViewsEndRecord(); h[C.BIFF12_SHEETS] = new SheetsRecord(); h[C.BIFF12_SHEETS_END] = new SheetsEndRecord(); h[C.BIFF12_WORKBOOKPR] = new WorkbookPRRecord(); h[C.BIFF12_SHEET] = new SheetRecord(); h[C.BIFF12_CALCPR] = new CalcPRRecord(); h[C.BIFF12_WORKBOOKVIEW] = new WorkbookViewRecord(); h[C.BIFF12_EXTERNALREFERENCES] = new ExternalReferencesRecord(); h[C.BIFF12_EXTERNALREFERENCES_END] = new ExternalReferencesEndRecord(); h[C.BIFF12_EXTERNALREFERENCE] = new ExternalReferenceRecord(); h[C.BIFF12_WEBPUBLISHING] = new WebPublishingRecord(); Workbook w = new Workbook(); using (FileStream fs = new FileStream(@"....Excel12_filesBook1.xlsbxlworkbook.bin", FileMode.Open, FileAccess.Read)) { byte[] bufferWorkbookPart = new BinaryReader(fs).ReadBytes((int)fs.Length); Read(w, h, bufferWorkbookPart); }
The workbook provides the list of worksheet references we are interested in. For each worksheet, we can see :
- The sheet ordering id. If you happen to change the order of worksheet tabs in an Excel file, those ids are reordered, and the rest of the file (relationships and parts) is left untouched.
- The sheet relationship identifier, or r:id. You can lookup this identifier in the associated workbook.bin.rels file to know what part it refers to.
- the sheet name itself, stored as a 4-byte length followed by a Unicode UCS2 string.
Worksheet part
The worksheet part describes the values in cells, along with formulas whenever it applies, and also describes objects mapping to cells or cell ranges (charts, pivot tables, …). Here is an example of a reverse engineered worksheet which contains a double-precision float, a float formula which happens to return a «division by zero» error, and a copy of that cell elsewhere :
Reverse engineering numeric cell values with and without formulas.
// This is what a worksheet part looks like in XML ="1.0"="UTF-8"="yes" <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/5/main" xmlns:r= "http://schemas.openxmlformats.org/officeDocument/2006/relationships"> <dimension ref="D3:D6"/> <sheetViews> <sheetView tabSelected="1" workbookViewId="0"> <selection activeCell="D4" sqref="D4"/> </sheetView> </sheetViews> <sheetFormatPr defaultRowHeight="15"/> <cols> <col min="4" max="4" width="12.5703125" customWidth="1"/> </cols> <sheetData> <row r="3" spans="4:4"><c r="D3"><v> 2.123456789</v></c></row> <row r="4" spans="4:4"><c r="D4" t="e"><f>5/E3</f><v>#DIV/0!</v> </c></row> <row r="6" spans="4:4"><c r="D6" t="e"><v> #DIV/0!</v></c></row> </sheetData> <printOptions/> <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/> <headerFooter/> </worksheet> // This is how, after breaking the BIN part in records, you can match // records to the XML markup. // Note that record identifiers are the two bytes on the left, // the associated record length is surrounded by parentheses, // and it's followed by the record content itself. <worksheet/> 81 01 (00) <sheetPr/> (figured out) 93 01 (0f) c9 04 02 00 40 00 00 00 00 00 00 00 00 00 00 <dimension ref="D3:D6"/> 94 01 (10) 02 00 00 00 05 00 00 00 03 00 00 00 03 00 00 00 <sheetViews> 85 01 (00) <sheetView tabSelected="1" workbookViewId="0"> 89 01 (1e) dc 03 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 64 00 00 00 00 00 00 00 00 00 00 00 <selection activeCell="D4" sqref="D4"/> 98 01 (24) 03 00 00 00 02 00 00 00 02 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00 </sheetView> 8a 01 (00) </sheetViews> 86 01 (00) <sheetFormatPr defaultRowHeight="15"/> e5 03 (0c) ff ff ff ff 08 00 2c 01 00 00 00 00 <cols> 86 03 (00) <col min="4" max="4" width="12.5703125" customWidth="1"/> 3c (12) 03 00 00 00 colmin (0-based) 03 00 00 00 colmax (0-based) 92 0c 00 00 width * 256 00 00 00 00 style (0-based) 02 00 flags </cols> 87 03 (00) <sheetData> 91 01 (00) <row r="3" spans="4:4"></row> 00 (19) 02 00 00 00 00 00 00 00 2c 01 00 00 00 01 00 00 00 03 00 00 00 03 00 00 00 <c r="D3"><v>2.123456789</v></c> 05 (10) 03 00 00 00 col (0-based) 00 00 00 00 style (0-based) 1b cb b9 e9 d6 fc 00 40 float (IEEE 8 bytes) <row r="4" spans="4:4"></row> 00 (19) 03 00 00 00 00 00 00 00 2c 01 00 00 00 01 00 00 00 03 00 00 00 03 00 00 00 <c r="D4" t="e"><f>5/E3</f><v> #DIV/0!</v></c> 0b (1e) 03 00 00 00 col (0-based) 00 00 00 00 style (0-based) 07 boolerr (7 = DIV/0) 00 00 grbits 0b 00 00 00 len of formula to follow in bytes 1e 05 00 44 02 00 00 00 04 c0 06 formula (1E = ptgTokenInt (5) 00 00 00 00 44 = ptgTokenRefV (E3) 06 = ptgTokenDiv) <row r="6" spans="4:4"></row> 00 (19) 05 00 00 00 00 00 00 00 2c 01 00 00 00 01 00 00 00 03 00 00 00 03 00 00 00 <c r="D6" t="e"><v>#DIV/0!</v></c> 03 (09) 03 00 00 00 col 00 00 00 00 style 07 boolerr (7 = DIV/0) </sheetData> 92 01 (00) 97 04 (42) 00 00 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 <conditionalFormatting sqref="B3:C4"> cd 03 (1c) 01 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 03 00 00 00 01 00 00 00 02 00 00 00 <cfRule type="dataBar" priority="3"> cf 03 (8c) 02 04 00 00 00 03 00 00 00 ff ff ff ff 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 67 00 00 00 67 00 00 00 00 00 00 00 ff ff ff ff <formula>MAX(IF(ISBLANK($B$3:$C$4), "", IF(ISERROR($B$3:$C$4), "", $B$3:$C$4)))</formula> 67 00 00 00 65 02 00 00 00 03 00 00 00 01 00 02 00 61 81 00 19 02 0b 00 19 40 00 01 17 00 00 19 08 43 00 65 02 00 00 00 03 00 00 00 01 00 02 00 61 03 00 19 02 0b 00 19 40 00 01 17 00 00 19 08 1c 00 19 40 00 01 25 02 00 00 00 03 00 00 00 01 00 02 00 19 40 00 01 19 08 03 00 22 03 01 00 19 08 03 00 22 03 01 00 42 01 07 00 00 00 00 00 <formula>MAX(IF(ISBLANK($B$3:$C$4), "", IF(ISERROR($B$3:$C$4), "", $B$3:$C$4)))</formula> 67 00 00 00 65 02 00 00 00 03 00 00 00 01 00 02 00 61 81 00 19 02 0b 00 19 40 00 01 17 00 00 19 08 43 00 65 02 00 00 00 03 00 00 00 01 00 02 00 61 03 00 19 02 0b 00 19 40 00 01 17 00 00 19 08 1c 00 19 40 00 01 25 02 00 00 00 03 00 00 00 01 00 02 00 19 40 00 01 19 08 03 00 22 03 01 00 19 08 03 00 22 03 01 00 42 01 06 00 00 00 00 00 <dataBar> d3 03 (03) 0a 5a 01 <cfvo type="min" val="0" /> d7 03 (18) 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <cfvo type="max" val="0" /> d7 03 (18) 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <color rgb="FF638EC6" /> (hint : color encoding = BGR not RGB) b4 04 (08) 05 ff 00 00 63 8e c6 ff </dataBar> d4 03 (00) </cfRule> d0 03 (00) </conditionalFormatting> ce 03 (00) <conditionalFormatting sqref="B6:C7"> cd 03 (1c) 01 00 00 00 00 00 00 00 01 00 00 00 05 00 00 00 06 00 00 00 01 00 00 00 02 00 00 00 <cfRule type="colorScale" priority="2"> cf 03 (8c) 02 03 00 00 00 02 00 00 00 ff ff ff ff 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 67 00 00 00 67 00 00 00 00 00 00 00 ff ff ff ff <formula>MAX(IF(ISBLANK($B$6:$C$7), "", IF(ISERROR($B$6:$C$7), "", $B$6:$C$7)))</formula> 67 00 00 00 65 05 00 00 00 06 00 00 00 01 00 02 00 61 81 00 19 02 0b 00 19 40 00 01 17 00 00 19 08 43 00 65 05 00 00 00 06 00 00 00 01 00 02 00 61 03 00 19 02 0b 00 19 40 00 01 17 00 00 19 08 1c 00 19 40 00 01 25 05 00 00 00 06 00 00 00 01 00 02 00 19 40 00 01 19 08 03 00 22 03 01 00 19 08 03 00 22 03 01 00 42 01 07 00 00 00 00 00 <formula>MAX(IF(ISBLANK($B$6:$C$7), "", IF(ISERROR($B$6:$C$7), "", $B$6:$C$7)))</formula> 67 00 00 00 65 05 00 00 00 06 00 00 00 01 00 02 00 61 81 00 19 02 0b 00 19 40 00 01 17 00 00 19 08 43 00 65 05 00 00 00 06 00 00 00 01 00 02 00 61 03 00 19 02 0b 00 19 40 00 01 17 00 00 19 08 1c 00 19 40 00 01 25 05 00 00 00 06 00 00 00 01 00 02 00 19 40 00 01 19 08 03 00 22 03 01 00 19 08 03 00 22 03 01 00 42 01 06 00 00 00 00 00 <colorScale> d5 03 (00) <cfvo type="min" val="0" /> d7 03 (18) 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <cfvo type="percent" val="50" /> d7 03 (18) 04 00 00 00 00 00 00 00 00 00 49 40 00 00 00 00 00 00 00 00 00 00 00 00 <cfvo type="max" val="0" /> d7 03 (18) 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <color rgb="FFF8696B" /> (hint : color encoding = BGR not RGB) b4 04 (08) 05 ff 00 00 f8 69 6b ff <color rgb="FFFFEB84" /> (hint : color encoding = BGR not RGB) b4 04 (08) 05 ff 00 00 ff eb 84 ff <color rgb="FF63BE7B" /> (hint : color encoding = BGR not RGB) b4 04 (08) 05 ff 00 00 63 be 7b ff </colorScale> d6 03 (00) </cfRule> d0 03 (00) </conditionalFormatting> ce 03 (00) <conditionalFormatting sqref="B9:C10"> cd 03 (1c) 01 00 00 00 00 00 00 00 01 00 00 00 08 00 00 00 09 00 00 00 01 00 00 00 02 00 00 00 <cfRule type="iconSet" priority="1"> cf 03 (2e) 06 00 00 00 04 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff <iconSet iconSet="3TrafficLights2"> d1 03 (06) 04 00 00 00 78 00 <cfvo type="percentile" val="0" /> d7 03 (18) 05 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 <cfvo type="percentile" val="0" /> d7 03 (18) 05 00 00 00 00 00 00 00 00 80 40 40 01 00 00 00 01 00 00 00 00 00 00 00 <cfvo type="percentile" val="0" /> d7 03 (18) 05 00 00 00 00 00 00 00 00 c0 50 40 01 00 00 00 01 00 00 00 00 00 00 00 </iconSet> d2 03 (00) </cfRule> d0 03 (00) </conditionalFormatting> ce 03 (00) <hyperlink ref="C3" r:id="rId1"/> ee 03 (28) 02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00 04 00 00 00 72 00 49 00 64 00 32 00 00 00 00 00 00 00 00 00 00 00 00 00 <printOptions/> dd 03 (02) 10 00 <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/> dc 03 (30) 66 66 66 66 66 66 e6 3f 66 66 66 66 66 66 e6 3f 00 00 00 00 00 00 e8 3f 00 00 00 00 00 00 e8 3f 33 33 33 33 33 33 d3 3f 33 33 33 33 33 33 d3 3f <headerFooter/> df 03 (1a) 0c 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff e0 03 (00) </worksheet> 82 01 (00)
Above, there are examples of underlying structure, one of involving altered columns, another involving the rows and actual cells. Each cell holds a particular value type which happens to be reflected by the record identifier. And it differs whether it’s a raw value, or a value with an associated formula. Hence,
public const int BIFF12_ROW = 0x00; public const int BIFF12_BLANK = 0x01; public const int BIFF12_NUM = 0x02; public const int BIFF12_BOOLERR = 0x03; public const int BIFF12_BOOL = 0x04; public const int BIFF12_FLOAT = 0x05; public const int BIFF12_STRING = 0x07; public const int BIFF12_FORMULA_STRING = 0x08; public const int BIFF12_FORMULA_FLOAT = 0x09; public const int BIFF12_FORMULA_BOOL = 0x0A; public const int BIFF12_FORMULA_BOOLERR = 0x0B;
Let’s take a moment to discuss how formulas are stored :
The formula bytecode
The formula bytecode produced by Excel when parsing formulas is documented for the most part (cf. official documentation over all the ptgTokens
). If you are interested in parsing or evaluating formulas, you need to know that they are stored in the reverse order (Reverse Polish Notation), meaning that 5+3 is stored as {5,3,+}. Other than that, you can get bootstrapped thanks to a simple RPN parser/evaluator written in C++ that I have made available here. That being said, Excel 2007 introduces the following changes :
- Because there are a lot more rows and columns per worksheet, the formula bytecode needs to find new slots to store rows in 4 bytes instead of 2 (formerly max rows = 65536), and store columns in two bytes instead of 1 (formerly max columns = 256). That’s why, compared to the whole encoding, any row is now a 4-byte slot. There is no slot change for columns because in fact any cell column reference was already stored in two bytes instead of one. With that said, the two most significant bits of the two bytes have always been used to distinguish between relative rows versus absolute rows and relative columns versus absolute columns, which makes us able to address 14 bits worth of columns. It’s probably not a big surprise that the new maximum columns per worksheet allowed in Excel 2007 is 2^14 = 16384. The Excel team could have fixed the row/col limitation problem forever by using a «variable length» encoding, just like they do for the record structure, but they chose not to. So, to summarize :
Excel version row encoding column encoding Excel 97-2003 2 bytes 2 bytes (bit 14 and 15 used to store abs/rel flags) Excel 2007 4 bytes 2 bytes (bit 14 and 15 used to store abs/rel flags) - There are new concepts in Excel 2007 like being able to address Table columns (or other projections, like Table headers) in plain formulas. This materializes with a new syntax, for instance
=SUM(Table1[2004])
instead of=SUM(range)
or=SUM(A1:A3)
. Since the Table concept, complete with columns, is not stored as hidden ranges, it requires newptgTokens
in the formula bytecode to represent it. I have deciphered oneptgToken = 0x18
in an example which materializes Table1[2004] in the formula=SUM(Table1[2004])
. If you create a Table calledTable1
with a third column called2004
, here is how formula=SUM(Table1[2004])
gets stored usingptgTokens
:09 (2c) 05 00 00 00 column (0-based) 00 00 00 00 style (0-based) 00 00 00 00 00 00 20 40 IEEE float 10 00 grbits (0010 = autoCalc) 12 00 00 00 len of formula in bytes (=SUM(Table1[2004]) 18 19 00 00 01 00 01 00 00 00 02 00 02 00 ptgTokenTable = 18 + ... (Table1[2004]) 19 10 53 8b ptgTokenAttr optimized SUM 00 00 00 00
Also, strings can be either stored inline or referred to thanks to an index, in which case they are really stored in a separate part known as the shared strings part. Because, as a consumer, you don’t control how the string is stored, in practice if you interested in cell values you need to read the shared strings part, build the table of indexed strings, prior to reading worksheet parts.
Shared strings part
The shared strings part simply indexes strings as a way to factorize strings that may be used more than once in the worksheets or other parts (chart title for instance). It makes the reading of BIFF12
harder since the shared strings part must be read prior to reading other parts such as worksheets, otherwise there are chances you won’t be able to make sense out of string indexes you’ll encounter.
The shared strings part is also used to store rich strings, i.e. strings where more than one formatting style is applied, also known as «formatting runs».
While the XML-way to describe formatting runs is much like in Word, the BIN-way is much like in old BIFF
, i.e. the raw string is stored, followed by the formatting runs which are 4-byte pairs {position, style} defining the style to apply from the current position up to the given position in the string (0-based). A consequence is that if you don’t care about formatting runs, then reading the string is as easy as if there was not formatting runs at all. Here are examples of shared strings :
// shared strings BIN : a - <sst count="1" uniqueCount="1"> 9f 01 (08) 01 00 00 00 01 00 00 00 <si><t>a</t></si> 13 (07) 00 nb formatting runs 01 00 00 00 len of string (number of Unicode characters) 61 00 string "a" </sst> a0 01 (00) // shared strings BIN : a,b, cc - <sst count="3" uniqueCount="3"> 9f 01 (08) 03 00 00 00 03 00 00 00 <si><t>a</t></si> 13 (07) 00 nb formatting runs 01 00 00 00 len of string (number of Unicode characters) 61 00 string "a" <si><t>b</t></si> 13 (07) 00 nb formatting runs 01 00 00 00 len of string (number of Unicode characters) 62 00 string "b" <si><t>cc</t></si> 13 (09) 00 nb formatting runs 02 00 00 00 len of string (number of Unicode characters) 63 00 63 00 string "cc" </sst> a0 01 (00) // shared strings BIN : a,b, cc with string a used twice - <sst count="4" uniqueCount="3"> 9f 01 (08) 04 00 00 00 03 00 00 00 <si><t>a</t></si> 13 (07) 00 nb formatting runs 01 00 00 00 len of string (number of Unicode characters) 61 00 string "a" <si><t>b</t></si> 13 (07) 00 nb formatting runs 01 00 00 00 len of string (number of Unicode characters) 62 00 string "b" <si><t>cc</t></si> 13 (09) 00 nb formatting runs 02 00 00 00 len of string (number of Unicode characters) 63 00 63 00 string "cc" </sst> a0 01 (00) // shared strings BIN : abcd, where bc is in red - <sst count="1" uniqueCount="1"> 9f 01 (08) 01 00 00 00 01 00 00 00 <si><.../t></si> 13 (19) 01 nb formatting runs 04 00 00 00 len of string (number of Unicode characters) 61 00 62 00 63 00 64 00 string "abcd" 02 00 00 00 formatting run 1 {pos=2, style=0} 01 00 01 00 formatting run 2 {pos=1, style=1} 03 00 00 00 formatting run 3 {pos=3, style=0} </sst> a0 01 (00)
The following records are involved :
public const int BIFF12_SI = 0x13; public const int BIFF12_SST = 0x019F; public const int BIFF12_SST_END = 0x01A0;
Styles part
The styles part, much like the shared strings part, is a dictionary of factorized formatting styles represented by indexes. The most basic formatting style is known as a XF style, represented by a 0-based index, which groups a number of formatting such as fill pattern, borders, and so on. Cell styles are variations of XF styles on top of which lives new concepts in Excel 2007 such as table styles, and themes (themes are defined in a separate part, which is left in XML even in a .XLSB binary file).
Here is an example of styles part, in XML, and then in BIN :
// This is what a styles part looks like in XML ="1.0"="UTF-8"="yes" <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/5/main"> <fonts count="1"> <font> <sz val="11"/> <color theme="1"/> <name val="Calibri"/> <family val="2"/> <scheme val="minor"/> </font> </fonts> <fills count="2"> <fill><patternFill patternType="none"/></fill> <fill><patternFill patternType="gray125"/></fill> </fills> <borders count="1"> <border><left/><right/><top/><bottom/> <diagonal/></border> </borders> <cellStyleXfs count="1"> <xf numFmtId="0" fontId="0" fillId="0" borderId="0"/> </cellStyleXfs> <cellXfs count="1"> <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/> </cellXfs> <cellStyles count="1"> <cellStyle name="Normal" xfId="0" builtinId="0"/> </cellStyles> <dxfs count="0"/> <tableStyles count="0" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/> <colors/> </styleSheet> // This is how, after breaking the BIN part in records, you can match // records to the XML markup. // Note that record identifiers are the two bytes on the left, // the associated record length is surrounded by parentheses, // and it's followed by the record content itself. <styleSheet> 96 02 (00) <fonts> e3 04 (04) 01 00 00 00 <font> <sz val="11"/> <color theme="1"/> <name val="Calibri"/> <family val="2"/> <scheme val="minor"/> </font> 2b (27) dc 00 00 00 90 01 00 00 00 02 00 00 07 01 00 00 00 00 00 ff 02 07 00 00 00 43 00 61 00 6c 00 69 00 62 00 72 00 69 00 </fonts> e4 04 (00) <fills count="2"> db 04 (04) 02 00 00 00 <fill><patternFill patternType="none"/></fill> 2d (44) 00 00 00 00 03 40 00 00 00 00 00 ff 03 41 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <fill><patternFill patternType="gray125"/></fill> 2d (44) 11 00 00 00 03 40 00 00 00 00 00 ff 03 41 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 </fills> dc 04 (00) <borders count="1"> e5 04 (04) 01 00 00 00 <border><left/><right/><top/><bottom/> <diagonal/></border> 2e (33) 00 00 00 01 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 </borders> e6 04 (00) <cellStyleXfs count="1"> f2 04 (04) 01 00 00 00 <xf numFmtId="0" fontId="0" fillId="0" borderId="0"/> 2f (10) ff ff 00 00 00 00 00 00 00 00 00 00 10 10 00 00 </cellStyleXfs> f3 04 (00) <cellXfs count="1"> e9 04 (04) 01 00 00 00 <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/> 2f (10) 00 00 00 00 00 00 00 00 00 00 00 00 10 10 00 00 </cellXfs> ea 04 (00) <cellStyles count="1"> eb 04 (04) 01 00 00 00 <cellStyle name="Normal" xfId="0" builtinId="0"/> 30 (18) 00 00 00 00 01 00 00 ff 06 00 00 00 4e 00 6f 00 72 00 6d 00 61 00 6c 00 </cellStyles> ec 04 (00) <dxfs> f9 03 (04) 00 00 00 00 </dxfs> fa 03 (00) <tableStyles count="0" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/> fc 03 (50) 00 00 00 00 11 00 00 00 54 00 61 00 62 00 6c 00 65 00 53 00 74 00 79 00 6c 00 65 00 4d 00 65 00 64 00 69 00 75 00 6d 00 39 00 11 00 00 00 50 00 69 00 76 00 6f 00 74 00 53 00 74 00 79 00 6c 00 65 00 4c 00 69 00 67 00 68 00 74 00 31 00 36 00 </tableStyles fd 03 (00) <colors> d9 03 (00) </colors> da 03 (00) </styleSheet> 97 02 (00)
The following records are involved :
public const int BIFF12_FONT = 0x2B; public const int BIFF12_FILL = 0x2D; public const int BIFF12_BORDER = 0x2E; public const int BIFF12_XF = 0x2F; public const int BIFF12_CELLSTYLE = 0x30; public const int BIFF12_STYLESHEET = 0x0296; public const int BIFF12_STYLESHEET_END = 0x0297; public const int BIFF12_COLORS = 0x03D9; public const int BIFF12_COLORS_END = 0x03DA; public const int BIFF12_DXFS = 0x03F9; public const int BIFF12_DXFS_END = 0x03FA; public const int BIFF12_TABLESTYLES = 0x03FC; public const int BIFF12_TABLESTYLES_END = 0x03FD; public const int BIFF12_FILLS = 0x04DB; public const int BIFF12_FILLS_END = 0x04DC; public const int BIFF12_FONTS = 0x04E3; public const int BIFF12_FONTS_END = 0x04E4; public const int BIFF12_BORDERS = 0x04E5; public const int BIFF12_BORDERS_END = 0x04E6; public const int BIFF12_CELLXFS = 0x04E9; public const int BIFF12_CELLXFS_END = 0x04EA; public const int BIFF12_CELLSTYLES = 0x04EB; public const int BIFF12_CELLSTYLES_END = 0x04EC; public const int BIFF12_CELLSTYLEXFS = 0x04F2; public const int BIFF12_CELLSTYLEXFS_END = 0x04F3;
Comments part
Comments are apparently complicated to store in the workbook model. Although comments have their own part, they are not referenced anywhere directly. Rather the worksheet where one or more comments are supposed to be attached to reference a legacy drawing part, VML markup, which describes a complex graphical construct that, thanks to some weird magic manages to relate the shape to draw the comments. It’s unclear to me how not only BIN consumers/implementers but also XML consumers/implementers are expected to work with comments in a meaningful way.
Here is an example of comments part in XML and in BIN :
// This is what a comment part looks like in XML ="1.0"="UTF-8"="yes" <comments xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/5/main"> <authors> <author>Administrator</author> </authors> <commentList> <comment ref="C3" authorId="0"> <text> <r> <rPr><b/><sz val="8"/> <color indexed="81"/><rFont val="Tahoma"/> <charset val="1"/></rPr> <t>Administrator:</t> </r> <r> <rPr><sz val="8"/> <color indexed="81"/><rFont val="Tahoma"/> <charset val="1"/></rPr> <t xml:space="preserve">_x000A_new comment</t> </r> <r> <rPr><sz val="11"/><color theme="1"/> <rFont val="Calibri"/><family val="2"/> <scheme val="minor"/></rPr> <t/> </r> </text> </comment> <comment ref="C5" authorId="0"> <text> <r> <rPr><b/><sz val="8"/> <color indexed="81"/><rFont val="Tahoma"/> <charset val="1"/></rPr> <t>Administrator:</t> </r> <r> <rPr><sz val="8"/><color indexed="81"/> <rFont val="Tahoma"/> <charset val="1"/></rPr> <t xml:space="preserve">_x000A_a</t> </r> <r> <rPr><b/><sz val="8"/> <color indexed="10"/><rFont val="Tahoma"/> <family val="2"/></rPr> <t>noth</t> </r> <r> <rPr><sz val="8"/><color indexed="81"/> <rFont val="Tahoma"/><charset val="1"/></rPr> <t>er</t> </r> <r> <rPr><sz val="11"/><color theme="1"/><rFont val="Calibri"/> <family val="2"/> <scheme val="minor"/></rPr> <t/> </r> </text> </comment> </commentList> </comments> // This is how, after breaking the BIN part in records, you can match // records to the XML markup. // Note that record identifiers are the two bytes on the left, // the associated record length is surrounded by parentheses, // and it's followed by the record content itself. <comments> f4 04 (00) <authors> f6 04 (00) <author>Administrator</author> f8 04 (1e) 0d 00 00 00 41 00 64 00 6d 00 69 00 6e 00 69 00 73 00 74 00 72 00 61 00 74 00 6f 00 72 00 </authors> f7 04 (00) <commentList> f9 04 (00) <comment ref="C3" authorId="0"> fb 04 (14) 00 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00 <text ...> fd 04 (49) 01 1a 00 00 00 41 00 64 00 6d 00 69 00 6e 00 69 00 73 00 74 00 72 00 61 00 74 00 6f 00 72 00 3a 00 0a 00 6e 00 65 00 77 00 20 00 63 00 6f 00 6d 00 6d 00 65 00 6e 00 74 00 03 00 00 00 00 00 02 00 0e 00 01 00 1a 00 00 00 </comment> fc 04 (00) <comment ref="C5" authorId="0"> fb 04 (14) 00 00 00 00 04 00 00 00 04 00 00 00 02 00 00 00 02 00 00 00 <text ...> fd 04 (49) 01 16 00 00 00 41 00 64 00 6d 00 69 00 6e 00 69 00 73 00 74 00 72 00 61 00 74 00 6f 00 72 00 3a 00 0a 00 61 00 6e 00 6f 00 74 00 68 00 65 00 72 00 05 00 00 00 00 00 02 00 0e 00 01 00 10 00 03 00 14 00 01 00 16 00 00 00 </comment> fc 04 (00) </commentList> fa 04 (00) </comments> f5 04 (00)
The following records are involved :
public const int BIFF12_COMMENTS = 0x04F4; public const int BIFF12_COMMENTS_END = 0x04F5; public const int BIFF12_AUTHORS = 0x04F6; public const int BIFF12_AUTHORS_END = 0x04F7; public const int BIFF12_AUTHOR = 0x04F8; public const int BIFF12_COMMENTLIST = 0x04F9; public const int BIFF12_COMMENTLIST_END = 0x04FA; public const int BIFF12_COMMENT = 0x04FB; public const int BIFF12_COMMENT_END = 0x04FC; public const int BIFF12_TEXT = 0x04FD;
Drawings part
Drawing parts are stored in XML, but there are interesting rules:
- A chart or a shape references a drawing part, this way in a worksheet:
<drawing r:id="rId1"/>
- A comment references a legacy drawing part, this way in a worksheet:
<legacyDrawing r:id="rId1"/>
- An OLE object references a legacy drawing part and an embedding part, this way in a worksheet:
<drawing r:id="rId1"/> <oleObjects> <oleObject progId="Paint.Picture" shapeId="1025" r:id="rId2"/> </oleObjects>
- The worksheet only references a drawing as a separate part
- The separate part (drawings/drawingxxx.xml) is not encoded in BIN
- When a drawing part references a chart, it is as a separate part
- When a drawing part references an ole object, it also references a media object (last pic rendering)
- The chart part (charts/chartxxx.xml) is not encoded in BIN
Table part
The Table part is a new concept in Excel 2007, adds major improvements to the List
object concept introduced in Excel 2003. Whenever a Table object is created, it gets referenced in a worksheet as follows :
// Excerpt from a worksheet part ... <tableParts count="1"> 94 05 (04) 01 00 00 00 <tablePart r:id="rId1"/> 95 05 (0c) 04 00 00 00 72 00 49 00 64 00 32 00 </tableParts> 96 05 (00) ...
This directly related to a separate part stored in a parent tables folder known thanks to the relationship identifier (rId1
) and the content of the relationship file (_rels/sheetxxx.bin.rels). This brings us to the Table
part itself :
// This is what a table part looks like in XML ="1.0"="UTF-8"="yes" <table xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/5/main" id="1" name="Table1" displayName="Table1" ref="B2:D4" totalsRowShown="0"> <autoFilter ref="B2:D4"/> <tableColumns count="3"> <tableColumn id="1" name="Column1"/> <tableColumn id="2" name="2003"/> <tableColumn id="3" name="2004"/> </tableColumns> <tableStyleInfo name="TableStyleMedium9" showFirstColumn="0" showLastColumn="0" showRowStripes="1" showColumnStripes="0"/> </table> // This is how, after breaking the BIN part in records, you can match // records to the XML markup. // Note that record identifiers are the two bytes on the left, // the associated record length is surrounded by parentheses, // and it's followed by the record content itself. <table xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/5/main" id="1" name="Table1" displayName="Table1" ref="B2:D4" totalsRowShown="0"> d7 02 (64) 01 00 00 00 03 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff 06 00 00 00 54 00 61 00 62 00 6c 00 65 00 31 00 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff <autoFilter ref="B2:D4"> a1 01 (10) 01 00 00 00 03 00 00 00 01 00 00 00 03 00 00 00 </autoFilter> a2 01 (00) <tableColumns count="3"> d9 02 (04) 03 00 00 00 <tableColumn id="1" name="Column1"/> db 02 (3e) 01 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff 07 00 00 00 43 00 6f 00 6c 00 75 00 6d 00 6e 00 31 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff </tableColumn> dc 02 (00) <tableColumn id="2" name="2003"> db 02 (38) 02 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff 04 00 00 00 32 00 30 00 30 00 33 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff </tableColumn> dc 02 (00) <tableColumn id="3" name="2004"/> db 02 (38) 03 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff 04 00 00 00 32 00 30 00 30 00 34 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff </tableColumn> dc 02 (00) </tableColumns> da 02 (00) <tableStyleInfo name="TableStyleMedium9" showFirstColumn="0" showLastColumn="0" showRowStripes="1" showColumnStripes="0"/> 81 04 (28) 04 00 11 00 00 00 54 00 61 00 62 00 6c 00 65 00 53 00 74 00 79 00 6c 00 65 00 4d 00 65 00 64 00 69 00 75 00 6d 00 39 00 </table> d8 02 (00)
The following records are involved :
public const int BIFF12_TABLE = 0x02D7; public const int BIFF12_TABLE_END = 0x02D8; public const int BIFF12_TABLECOLUMNS = 0x02D9; public const int BIFF12_TABLECOLUMNS_END= 0x02DA; public const int BIFF12_TABLECOLUMN = 0x02DB; public const int BIFF12_TABLECOLUMN_END = 0x02DC; public const int BIFF12_AUTOFILTER = 0x01A1; public const int BIFF12_AUTOFILTER_END = 0x01A2; public const int BIFF12_FILTERCOLUMN = 0x01A3; public const int BIFF12_FILTERCOLUMN_END= 0x01A4; public const int BIFF12_FILTERS = 0x01A5; public const int BIFF12_FILTERS_END = 0x01A6; public const int BIFF12_FILTER = 0x01A7; public const int BIFF12_TABLESTYLEINFO = 0x0481; public const int BIFF12_SORTSTATE = 0x0492; public const int BIFF12_SORTCONDITION = 0x0494; public const int BIFF12_SORTSTATE_END = 0x0495;
Query Table part
The Query Table part defines a data source. A query table is never referenced in a worksheet part. Rather, it’s an implicit field in a Table
part, and it’s only made explicit in the relationships file associated to the table (i.e. tablexxx.bin.rels). Here is how a Query Table part looks like :
<queryTable name="Database1" connectionId="1" ...> bf 03 (20) 49 1a 10 00 10 00 01 00 00 00 connectionId 09 00 00 00 length of string to follow (in characters, not bytes) 44 00 61 00 74 00 61 00 62 00 61 00 73 00 65 00 31 00 query table name <queryTableRefresh nextId="5"> c1 03 (0a) 17 00 05 00 00 00 00 00 00 00 <queryTableFields count="4"> c7 03 (04) 04 00 00 00 <queryTableField id="1" name="ID" tableColumnId="1"> c9 03 (14) 10 00 00 00 01 00 00 00 query table field id 01 00 00 00 table column id 02 00 00 00 length of string to follow (in characters, not bytes) 49 00 44 00 query table field name </queryTableField> ca 03 (00) <queryTableField id="2" name="TB_Name" tableColumnId="2"> c9 03 (1e) 10 00 00 00 02 00 00 00 query table field id 02 00 00 00 table column id 07 00 00 00 length of string to follow (in characters, not bytes) 54 00 42 00 5f 00 4e 00 61 00 6d 00 65 00 query table field name </queryTableField> ca 03 (00) <queryTableField id="3" name="TB_AGE" tableColumnId="3"> c9 03 (1c) 10 00 00 00 03 00 00 00 query table field id 03 00 00 00 table column id 06 00 00 00 length of string to follow (in characters, not bytes) 54 00 42 00 5f 00 41 00 47 00 45 00 query table field name </queryTableField> ca 03 (00) <queryTableField id="4" name="TB_COUNTRY" tableColumnId="4"> c9 03 (24) 10 00 00 00 04 00 00 00 query table field id 04 00 00 00 table column id 0a 00 00 00 length of string to follow (in characters, not bytes) 54 00 42 00 5f 00 43 00 4f 00 55 00 4e 00 54 00 52 00 59 00 query table field name </queryTableField> ca 03 (00) </queryTableFields> c8 03 (00) </queryTableRefresh> c2 03 (00) </queryTable> c0 03 (00)
The following records are involved :
public const int BIFF12_QUERYTABLE = 0x03BF; public const int BIFF12_QUERYTABLE_END = 0x03C0; public const int BIFF12_QUERYTABLEREFRESH = 0x03C1; public const int BIFF12_QUERYTABLEREFRESH_END = 0x03C2; public const int BIFF12_QUERYTABLEFIELDS = 0x03C7; public const int BIFF12_QUERYTABLEFIELDS_END = 0x03C8; public const int BIFF12_QUERYTABLEFIELD = 0x03C9; public const int BIFF12_QUERYTABLEFIELD_END = 0x03CA;
Connections part
The Connections part is a connection string to a data source. Unlike how Query Table parts relate to Table parts, Connections parts don’t relate to Query Table parts by way of relationships (there is no querytablexxx.bin.rels file). Instead, a Query Table part has a connection id attribute in a <queryTable>
element (BIFF12_QUERYTABLE
record). Connections part files are made available at the workbook level, i.e. shared across all worksheet and related objects, which is the reason why Connections parts relate to the Workbook relationships (workbook.bin.rels). Here is how a Connections part looks like :
<connections> ad 03 (00) <connection id="1" sourceFile="C:Database1.mdb" keepAlive="1" name="Database1" type="5" refreshedVersion="3" background="1" saveData="1"> c9 01 (51) 03 00 02 00 00 00 51 00 09 00 05 00 00 00 01 00 00 00 01 00 00 00 00 10 00 00 00 43 00 3a 00 5c 00 44 00 61 00 74 00 61 00 62 00 61 00 73 00 65 00 31 00 2e 00 6d 00 64 00 62 00 09 00 00 00 44 00 61 00 74 00 61 00 62 00 61 00 73 00 65 00 31 00 <dbPr connection=Provider=Microsoft.ACE.OLEDB.12.0; User ID=Admin;Data Source=C:Database1.mdb; Mode=Share Deny Write;Extended Properties=""; Jet OLEDB:System database="";Jet OLEDB:Registry Path=""; Jet OLEDB:Engine Type=5;Jet OLEDB:Database Locking Mode=0; Jet OLEDB:Global Partial Bulk Ops=2; Jet OLEDB:Global Bulk Transactions=1; Jet OLEDB:New Database Password="";Jet OLEDB:Create System Database=False;Jet OLEDB:Encrypt Database=False; Jet OLEDB:Don't Copy Locale on Compact=False; Jet OLEDB:Compact Without Replica Repair=False; Jet OLEDB:SFP=False;Jet OLEDB: Support Complex Data=False" command="Table1" commandType="3"> cb 01 (81 09) 03 00 00 00 02 34 02 00 00 50 00 72 00 6f 00 76 00 69 00 64 00 65 00 72 00 3d 00 4d 00 69 00 63 00 72 00 6f 00 73 00 6f 00 66 00 74 00 2e 00 41 00 43 00 45 00 2e 00 4f 00 4c 00 45 00 44 00 42 00 2e 00 31 00 32 00 2e 00 30 00 3b 00 55 00 73 00 65 00 72 00 20 00 49 00 44 00 3d 00 41 00 64 00 6d 00 69 00 6e 00 3b 00 44 00 61 00 74 00 61 00 20 00 53 00 6f 00 75 00 72 00 63 00 65 00 3d 00 43 00 3a 00 5c 00 44 00 61 00 74 00 61 00 62 00 61 00 73 00 65 00 31 00 2e 00 6d 00 64 00 62 00 3b 00 4d 00 6f 00 64 00 65 00 3d 00 53 00 68 00 61 00 72 00 65 00 20 00 44 00 65 00 6e 00 79 00 20 00 57 00 72 00 69 00 74 00 65 00 3b 00 45 00 78 00 74 00 65 00 6e 00 64 00 65 00 64 00 20 00 50 00 72 00 6f 00 70 00 65 00 72 00 74 00 69 00 65 00 73 00 3d 00 22 00 22 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 53 00 79 00 73 00 74 00 65 00 6d 00 20 00 64 00 61 00 74 00 61 00 62 00 61 00 73 00 65 00 3d 00 22 00 22 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 52 00 65 00 67 00 69 00 73 00 74 00 72 00 79 00 20 00 50 00 61 00 74 00 68 00 3d 00 22 00 22 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 45 00 6e 00 67 00 69 00 6e 00 65 00 20 00 54 00 79 00 70 00 65 00 3d 00 35 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 44 00 61 00 74 00 61 00 62 00 61 00 73 00 65 00 20 00 4c 00 6f 00 63 00 6b 00 69 00 6e 00 67 00 20 00 4d 00 6f 00 64 00 65 00 3d 00 30 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 47 00 6c 00 6f 00 62 00 61 00 6c 00 20 00 50 00 61 00 72 00 74 00 69 00 61 00 6c 00 20 00 42 00 75 00 6c 00 6b 00 20 00 4f 00 70 00 73 00 3d 00 32 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 47 00 6c 00 6f 00 62 00 61 00 6c 00 20 00 42 00 75 00 6c 00 6b 00 20 00 54 00 72 00 61 00 6e 00 73 00 61 00 63 00 74 00 69 00 6f 00 6e 00 73 00 3d 00 31 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 4e 00 65 00 77 00 20 00 44 00 61 00 74 00 61 00 62 00 61 00 73 00 65 00 20 00 50 00 61 00 73 00 73 00 77 00 6f 00 72 00 64 00 3d 00 22 00 22 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 43 00 72 00 65 00 61 00 74 00 65 00 20 00 53 00 79 00 73 00 74 00 65 00 6d 00 20 00 44 00 61 00 74 00 61 00 62 00 61 00 73 00 65 00 3d 00 46 00 61 00 6c 00 73 00 65 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 45 00 6e 00 63 00 72 00 79 00 70 00 74 00 20 00 44 00 61 00 74 00 61 00 62 00 61 00 73 00 65 00 3d 00 46 00 61 00 6c 00 73 00 65 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 44 00 6f 00 6e 00 27 00 74 00 20 00 43 00 6f 00 70 00 79 00 20 00 4c 00 6f 00 63 00 61 00 6c 00 65 00 20 00 6f 00 6e 00 20 00 43 00 6f 00 6d 00 70 00 61 00 63 00 74 00 3d 00 46 00 61 00 6c 00 73 00 65 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 43 00 6f 00 6d 00 70 00 61 00 63 00 74 00 20 00 57 00 69 00 74 00 68 00 6f 00 75 00 74 00 20 00 52 00 65 00 70 00 6c 00 69 00 63 00 61 00 20 00 52 00 65 00 70 00 61 00 69 00 72 00 3d 00 46 00 61 00 6c 00 73 00 65 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 53 00 46 00 50 00 3d 00 46 00 61 00 6c 00 73 00 65 00 3b 00 4a 00 65 00 74 00 20 00 4f 00 4c 00 45 00 44 00 42 00 3a 00 53 00 75 00 70 00 70 00 6f 00 72 00 74 00 20 00 43 00 6f 00 6d 00 70 00 6c 00 65 00 78 00 20 00 44 00 61 00 74 00 61 00 3d 00 46 00 61 00 6c 00 73 00 65 00 06 00 00 00 54 00 61 00 62 00 6c 00 65 00 31 00 </dbPr> cc 01 (00) </connection> ca 01 (00) </connections> ae 03 (00)
The following records are involved :
public const int BIFF12_CONNECTIONS = 0x03AD; public const int BIFF12_CONNECTIONS_END = 0x03AE; public const int BIFF12_CONNECTION = 0x01C9; public const int BIFF12_CONNECTION_END = 0x01CA; public const int BIFF12_DBPR = 0x01CB; public const int BIFF12_DBPR_END = 0x01CC;
Pivot table part
I haven’t been very far in the pivot table part area. What I can say however is that :
- Unlike other objects applied to cells such as Tables, pivot tables are not declared or referenced in the worksheet part where they are visible! And that is the same whether or not Pivot Tables are stored in XML or BIN
- Despite that, the worksheet part relationships file (_rels/sheetxxx.bin.rels) actually defines a relation to one or more pivot tables whenever applicable (the Pivot Tables parts are stored in a separate, parent, folder).
- A Pivot Table part (pivotTables/pivotTablexxx.bin) defines its layout and pivot fields, but not the actual data source.
- Like the worksheet, a Pivot Table part does not explicitely relate to a Pivot Cache Definition part (where the data source is defined), although it stores a reference in the associated relationships file (_rels/pivotTablexxx.bin.rels)
- A Pivot Cache Definition part (pivotCache/pivotCacheDefinitionxxx.bin) defines the actual pivot table data source among other things, as in :
// we have created a Pivot Table whose data source is the // content of object Table1 <cacheSource type="worksheet"> <worksheetSource name="Table1" r:id="rId2"/> </cacheSource>
Printer settings part
The printer settings part is really just a binary dump of the WIN32 DEVMODE
structure (see MSDN for more information). This information was stored in earlier Excel versions (97 and above) as the BIFF8
[PLS] record.
Index and Calculation Chain parts
Those are caches that only serve the Excel run-time. They contain direct offsets to the content in worksheet BIN parts and are thus unsuitable for programming purposes. Fortunately, you can alter BIN parts without worrying about corrupting the workbook in this particular case, because it does not harm to leave the index and calculation chain parts unsynched.
A word on password-protected documents
While not strictly related to BIN file formats, if you happen to password-protect Excel 2007 workbooks, then the resulting file, no matter whether it’s .XLSX or not, will be encrypted using RC4 in an OLE container. The entire file is encrypted in the EncryptedPackage
stream. Here is a screen capture of a password-protected workbook as viewed in an OLE document viewer:
A password-protected Excel 2007 workbook is encrypted in an OLE container.
Needless to say, password-protected workbooks are not expected to be used programmatically…
In which order should parts be read?
- Workbook
- SharedStrings
- Styles
- Worksheet(s)
- Optional Tables, Drawings, …
How does one get the value of a cell?
When reading a worksheet part, individual cells are part of a block of records inside BIFF12_SHEETDATA
and BIFF12_SHEETDATA_END
. Excel stores cells row by row, meaning that there is a record which identifies a given row (including information such as the row style, height, whether it’s hidden or not, …), then follows an arbitrary amount of actual cell records identified by the value they store and whether or not their value is governed by a formula. For each cell record, the column is provided (including other informations). If a cell stores a shared string, then the value is obtained from the index by looking up the shared strings table.
Objects such as hyperlinks, tables, chart, named ranges, pivot tables are defined on top of these cells and defined elsewhere, either at after the BIFF12_SHEETDATA
block of the corresponding worksheet part, or in other parts (named ranges are defined in the workbook part, so that they can be shared across all worksheets, internal and external).
How does one get the style of a cell?
As mentioned previously, a cell stores formatting style information. Individual cells store a formatting style, a 0-based index in the styles table. In general, those styles are individual cell styles, and refer to the <cellXfs>
collection of individual <xf>
styles. In turn, each <xf>
has an index to the following collections : number formats, borders, fonts, alignment, and fill pattern.
Whenever the cell stores an inline rich string, or has an index to a shared string which in turn is a rich string, then the formatting style of the cell is defined by the formatting runs stored as part of the rich string. Each formatting run defines a style for a fraction of the text.
Final words, and links to the source code, again
The BIFF12
reader presented in this article, provides in the sample code below is a work in progress into the Office 2007 .bin file format which, as we have seen, encompasses a number of underlying file formats. The code provided in C++ and in C# is the basis of a read/write/manipulation library thanks to the fact that record handlers are really entirely responsible for reading/writing/manipulating the corresponding records (and that’s why there are so many classes).
To turn the existing source code into a real manipulation library, you’ll have to create instances of individual records. For instance, in C# instead of doing this:
BaseRecord recHandler = (BaseRecord) h[recid];
You’ll have to do this:
BaseRecord rec = (BaseRecord) Activator.CreateInstance(h[recid].GetType());
And of course implement a Write()
method for each record handler.
My goal was to decipher most undocumented .bin file formats inside ZIP files, and come up with a way to read the values in the cells of an arbitrary Excel 2007 workbook regardless the file format.
If you feel like augmenting the reverse engineering done so far such as the actual deserialization of the less important individual records, then feel free to do so and drop a line (I can merge your work into this source code).
History
- August 10, 2006 — first publication
- August 23, 2006 — updated
BIFF12
variable length structure (email discussion with Microsoft), added information on Tables, pivot tables, formulaptgTokens
, as well as a number of records that I did not see in my first pass (workbook defined names, conditional formattings, data bars, …) - January 25, 2007 — added reverse engineering of more
BIFF12
bin parts
vidual records. For instance, in C# instead of doing this:
BaseRecord recHandler = (BaseRecord) h[recid];
You’ll have to do this:
BaseRecord rec = (BaseRecord) Activator.CreateInstance(h[recid].GetType());
And of course implement a Write()
method for each record handler.
My goal was to decipher most undocumented .bin file formats inside ZIP files, and come up with a way to read the values in the cells of an arbitrary Excel 2007 workbook regardless the file format.
If you feel like augmenting the reverse engineering done so far such as the actual deserialization of the less important individual records, then feel free to do so and drop a line (I can merge your work into this source code).
History
- August 10, 2006 — first publication
- August 23, 2006 — updated
BIFF12
variable length structure (email discussion with Microsoft), added information on Tables, pivot tables, formulaptgTokens
, as well as a number of records that I did not see in my first pass (workbook defined names, conditional formattings, data bars, …) - January 25, 2007 — added reverse engineering of more
BIFF12
bin parts
ble for reading/writing/manipulating the corresponding records (and that’s why there are so many classes).
To turn the existing source code into a real manipulation library, you’ll have to create instances of individual records. For instance, in C# instead of doing this:
BaseRecord recHandler = (BaseRecord) h[recid];
You’ll have to do this:
BaseRecord rec = (BaseRecord) Activator.CreateInstance(h[recid].GetType());
And of course implement a Write()
method for each record handler.
My goal was to decipher most undocumented .bin file formats inside ZIP files, and come up with a way to read the values in the cells of an arbitrary Excel 2007 workbook regardless the file format.
If you feel like augmenting the reverse engineering done so far such as the actual deserialization of the less important individual records, then feel free to do so and drop a line (I can merge your work into this source code).
History
- August 10, 2006 — first publication
- August 23, 2006 — updated
BIFF12
variable length structure (email discussion with Microsoft), added information on Tables, pivot tables, formulaptgTokens
, as well as a number of records that I did not see in my first pass (workbook defined names, conditional formattings, data bars, …) - January 25, 2007 — added reverse engineering of more
BIFF12
bin parts
vidual records. For instance, in C# instead of doing this:
BaseRecord recHandler = (BaseRecord) h[recid];
You’ll have to do this:
BaseRecord rec = (BaseRecord) Activator.CreateInstance(h[recid].GetType());
And of course implement a Write()
method for each record handler.
My goal was to decipher most undocumented .bin file formats inside ZIP files, and come up with a way to read the values in the cells of an arbitrary Excel 2007 workbook regardless the file format.
If you feel like augmenting the reverse engineering done so far such as the actual deserialization of the less important individual records, then feel free to do so and drop a line (I can merge your work into this source code).
History
- August 10, 2006 — first publication
- August 23, 2006 — updated
BIFF12
variable length structure (email discussion with Microsoft), added information on Tables, pivot tables, formulaptgTokens
, as well as a number of records that I did not see in my first pass (workbook defined names, conditional formattings, data bars, …) - January 25, 2007 — added reverse engineering of more
BIFF12
bin parts