Worksheet and workbook specifications and limits
Feature |
Maximum limit |
---|---|
Open workbooks |
Limited by available memory and system resources |
Total number of rows and columns on a worksheet |
1,048,576 rows by 16,384 columns |
Column width |
255 characters |
Row height |
409 points |
Page breaks |
1,026 horizontal and vertical |
Total number of characters that a cell can contain |
32,767 characters |
Characters in a header or footer |
255 |
Maximum number of line feeds per cell |
253 |
Sheets in a workbook |
Limited by available memory (default is 1 sheet) |
Colors in a workbook |
16 million colors (32 bit with full access to 24 bit color spectrum) |
Named views in a workbook |
Limited by available memory |
Unique cell formats/cell styles |
65,490 |
Fill styles |
256 |
Line weight and styles |
256 |
Unique font types |
1,024 global fonts available for use; 512 per workbook |
Number formats in a workbook |
Between 200 and 250, depending on the language version of Excel that you have installed |
Names in a workbook |
Limited by available memory |
Windows in a workbook |
Limited by available memory |
Hyperlinks in a worksheet |
65,530 |
Panes in a window |
4 |
Linked sheets |
Limited by available memory |
Scenarios |
Limited by available memory; a summary report shows only the first 251 scenarios |
Changing cells in a scenario |
32 |
Adjustable cells in Solver |
200 |
Custom functions |
Limited by available memory |
Zoom range |
10 percent to 400 percent |
Reports |
Limited by available memory |
Sort references |
64 in a single sort; unlimited when using sequential sorts |
Undo levels |
100 |
Fields in a data form |
32 |
Workbook parameters |
255 parameters per workbook |
Items displayed in filter drop-down lists |
10,000 |
Noncontiguous cells that can be selected |
2,147,483,648 cells |
Maximum limits of memory storage and file size for Data Model workbooks |
32-bit environment is subject to 2 gigabytes (GB) of virtual address space, shared by Excel, the workbook, and add-ins that run in the same process. A data model’s share of the address space might run up to 500 – 700 megabytes (MB), but could be less if other data models and add-ins are loaded. 64-bit environment imposes no hard limits on file size. Workbook size is limited only by available memory and system resources. Beginning with Excel 2016, Large Address Aware functionality lets 32-bit Excel consume twice the memory when users work on a 64-bit Windows operating system. For more information, see Large Address Aware capability change for Excel. Note: Adding tables to the Data Model increases the file size. If you don’t plan to create complex Data Model relationships using many data sources and data types in your workbook, uncheck the Add this data to the Data Model box when you import or create tables, pivot tables, or data connections. For more information, see Data Model specification and limits. |
Processor Cores |
64 |
File name length |
218 characters — This includes the file path. For example, C:UsernameDocumentsFileName.xlsx. |
Calculation specifications and limits
Feature |
Maximum limit |
---|---|
Number precision |
15 digits |
Smallest allowed negative number |
-2.2251E-308 |
Smallest allowed positive number |
2.2251E-308 |
Largest allowed positive number |
9.99999999999999E+307 |
Largest allowed negative number |
-9.99999999999999E+307 |
Largest allowed positive number via formula |
1.7976931348623158e+308 |
Largest allowed negative number via formula |
-1.7976931348623158e+308 |
Length of formula contents |
8,192 characters |
Internal length of formula |
16,384 bytes |
Iterations |
32,767 |
Worksheet arrays |
Limited by available memory |
Selected ranges |
2,048 |
Arguments in a function |
255 |
Nested levels of functions |
64 |
User defined function categories |
255 |
Number of available worksheet functions |
341 |
Size of the operand stack |
1,024 |
Cross-worksheet dependency |
64,000 worksheets that can refer to other sheets |
Cross-worksheet array formula dependency |
Limited by available memory |
Area dependency |
Limited by available memory |
Area dependency per worksheet |
Limited by available memory |
Dependency on a single cell |
4 billion formulas that can depend on a single cell |
Linked cell content length from closed workbooks |
32,767 |
Earliest date allowed for calculation |
January 1, 1900 (January 1, 1904, if 1904 date system is used) |
Latest date allowed for calculation |
December 31, 9999 |
Largest amount of time that can be entered |
9999:59:59 |
Charting specifications and limits
Feature |
Maximum limit |
---|---|
Charts linked to a worksheet |
Limited by available memory |
Worksheets referred to by a chart |
255 |
Data series in one chart |
255 |
Data points in a data series for 2-D charts |
Limited by available memory |
Data points in a data series for 3-D charts |
Limited by available memory |
Data points for all data series in one chart |
Limited by available memory |
PivotTable and PivotChart report specifications and limits
Feature |
Maximum limit |
---|---|
PivotTable reports on a sheet |
Limited by available memory |
Unique items per field |
1,048,576 |
Row or column fields in a PivotTable report |
Limited by available memory |
Report filters in a PivotTable report |
256 (may be limited by available memory) |
Value fields in a PivotTable report |
256 |
Calculated item formulas in a PivotTable report |
Limited by available memory |
Report filters in a PivotChart report |
256 (may be limited by available memory) |
Value fields in a PivotChart report |
256 |
Calculated item formulas in a PivotChart report |
Limited by available memory |
Length of the MDX name for a PivotTable item |
32,767 |
Length for a relational PivotTable string |
32,767 |
Items displayed in filter drop-down lists |
10,000 |
Workbooks with the «Allow changes by more than one user…» setting enabled
If the Allow changes by more than one user… setting is on for a workbook, then the following information applies. This setting is accessible by clicking the Review tab > Share Workbook. Note that in newer versions of Excel, the Share Workbook button has been hidden. To unhide it, click File > Options > Quick Access Toolbar. Open the list under Choose commands from and select All Commands. Scroll down that list until you see Share Workbook (Legacy). Select that item and click Add. Click OK. The Share Workbook button is now at the top of the Excel window.
Feature |
Maximum limit |
---|---|
Users who can open the file at the same time |
256 |
Personal views in the workbook |
Limited by available memory |
Days that change history is maintained |
32,767 (default is 30 days) |
Workbooks that can be merged at one time |
Limited by available memory |
Cells that can be highlighted |
32,767 |
Colors used to identify changes made by different users when change highlighting is turned on |
32 (each user is identified by a separate color; changes made by the current user are highlighted with navy blue) |
Excel tables in the workbook |
0 (zero) Note: A workbook that contains one or more Excel tables cannot have the Allow changes by more than one user… setting enabled. |
Worksheet and workbook specifications and limits
Feature |
Maximum limit |
---|---|
Open workbooks |
Limited by available memory and system resources |
Total number of rows and columns on a worksheet |
1,048,576 rows by 16,384 columns |
Column width |
255 characters |
Row height |
409 points |
Page breaks |
1,026 horizontal and vertical |
Total number of characters that a cell can contain |
32,767 characters |
Characters in a header or footer |
255 |
Maximum number of line feeds per cell |
253 |
Sheets in a workbook |
Limited by available memory (default is 3 sheets) |
Colors in a workbook |
16 million colors (32 bit with full access to 24 bit color spectrum) |
Named views in a workbook |
Limited by available memory |
Unique cell formats/cell styles |
65,490 |
Fill styles |
256 |
Line weight and styles |
256 |
Unique font types |
1,024 global fonts available for use; 512 per workbook |
Number formats in a workbook |
Between 200 and 250, depending on the language version of Excel that you have installed |
Names in a workbook |
Limited by available memory |
Windows in a workbook |
Limited by available memory |
Hyperlinks in a worksheet |
65,530 hyperlinks |
Panes in a window |
4 |
Linked sheets |
Limited by available memory |
Scenarios |
Limited by available memory; a summary report shows only the first 251 scenarios |
Changing cells in a scenario |
32 |
Adjustable cells in Solver |
200 |
Custom functions |
Limited by available memory |
Zoom range |
10 percent to 400 percent |
Reports |
Limited by available memory |
Sort references |
64 in a single sort; unlimited when using sequential sorts |
Undo levels |
100 |
Fields in a data form |
32 |
Workbook parameters |
255 parameters per workbook |
Items displayed in filter drop-down lists |
10,000 |
Noncontiguous cells that can be selected |
2,147,483,648 cells |
Processor Cores |
64 |
Calculation specifications and limits
Feature |
Maximum limit |
---|---|
Number precision |
15 digits |
Smallest allowed negative number |
-2.2251E-308 |
Smallest allowed positive number |
2.2251E-308 |
Largest allowed positive number |
9.99999999999999E+307 |
Largest allowed negative number |
-9.99999999999999E+307 |
Largest allowed positive number via formula |
1.7976931348623158e+308 |
Largest allowed negative number via formula |
-1.7976931348623158e+308 |
Length of formula contents |
8,192 characters |
Internal length of formula |
16,384 bytes |
Iterations |
32,767 |
Worksheet arrays |
Limited by available memory |
Selected ranges |
2,048 |
Arguments in a function |
255 |
Nested levels of functions |
64 |
User defined function categories |
255 |
Number of available worksheet functions |
341 |
Size of the operand stack |
1,024 |
Cross-worksheet dependency |
64,000 worksheets that can refer to other sheets |
Cross-worksheet array formula dependency |
Limited by available memory |
Area dependency |
Limited by available memory |
Area dependency per worksheet |
Limited by available memory |
Dependency on a single cell |
4 billion formulas that can depend on a single cell |
Linked cell content length from closed workbooks |
32,767 |
Earliest date allowed for calculation |
January 1, 1900 (January 1, 1904, if 1904 date system is used) |
Latest date allowed for calculation |
December 31, 9999 |
Largest amount of time that can be entered |
9999:59:59 |
Charting specifications and limits
Feature |
Maximum limit |
---|---|
Charts linked to a worksheet |
Limited by available memory |
Worksheets referred to by a chart |
255 |
Data series in one chart |
255 |
Data points in a data series for 2-D charts |
Limited by available memory |
Data points in a data series for 3-D charts |
Limited by available memory |
Data points for all data series in one chart |
Limited by available memory |
PivotTable and PivotChart report specifications and limits
Feature |
Maximum limit |
---|---|
PivotTable reports on a sheet |
Limited by available memory |
Unique items per field |
1,048,576 |
Row or column fields in a PivotTable report |
Limited by available memory |
Report filters in a PivotTable report |
256 (may be limited by available memory) |
Value fields in a PivotTable report |
256 |
Calculated item formulas in a PivotTable report |
Limited by available memory |
Report filters in a PivotChart report |
256 (may be limited by available memory) |
Value fields in a PivotChart report |
256 |
Calculated item formulas in a PivotChart report |
Limited by available memory |
Length of the MDX name for a PivotTable item |
32,767 |
Length for a relational PivotTable string |
32,767 |
Items displayed in filter drop-down lists |
10,000 |
Workbooks with the «Allow changes by more than one user…» setting enabled
If the Allow changes by more than one user… setting is on for a workbook, then the following information applies. This setting is accessible by clicking the Review tab > Share Workbook.
Feature |
Maximum limit |
---|---|
Users who can open and share the file at the same time |
256 |
Personal views in the workbook |
Limited by available memory |
Days that change history is maintained |
32,767 (default is 30 days) |
Workbooks that can be merged at one time |
Limited by available memory |
Cells that can be highlighted in the workbook |
32,767 |
Colors used to identify changes made by different users when change highlighting is turned on |
32 (each user is identified by a separate color; changes made by the current user are highlighted with navy blue) |
Excel tables in the workbook |
0 (zero) Note: A workbook that contains one or more Excel tables cannot have the Allow changes by more than one user… setting enabled. |
Worksheet and workbook specifications and limits
Feature |
Maximum limit |
---|---|
Open workbooks |
Limited by available memory and system resources |
Total number of rows and columns on a worksheet |
1,048,576 rows by 16,384 columns |
Column width |
255 characters |
Row height |
409 points |
Page breaks |
1,026 horizontal and vertical |
Total number of characters that a cell can contain |
32,767 characters |
Characters in a header or footer |
255 |
Maximum number of line feeds per cell |
253 |
Sheets in a workbook |
Limited by available memory (default is 3 sheets) |
Colors in a workbook |
16 million colors (32 bit with full access to 24 bit color spectrum) |
Named views in a workbook |
Limited by available memory |
Unique cell formats/cell styles |
65,490 |
Fill styles |
256 |
Line weight and styles |
256 |
Unique font types |
1,024 global fonts available for use; 512 per workbook |
Number formats in a workbook |
Between 200 and 250, depending on the language version of Excel that you have installed |
Names in a workbook |
Limited by available memory |
Windows in a workbook |
Limited by available memory |
Hyperlinks in a worksheet |
65,530 hyperlinks |
Panes in a window |
4 |
Linked sheets |
Limited by available memory |
Scenarios |
Limited by available memory; a summary report shows only the first 251 scenarios |
Changing cells in a scenario |
32 |
Adjustable cells in Solver |
200 |
Custom functions |
Limited by available memory |
Zoom range |
10 percent to 400 percent |
Reports |
Limited by available memory |
Sort references |
64 in a single sort; unlimited when using sequential sorts |
Undo levels |
100 |
Fields in a data form |
32 |
Workbook parameters |
255 parameters per workbook |
Filter drop-down lists |
10,000 |
Calculation specifications and limits
Feature |
Maximum limit |
---|---|
Number precision |
15 digits |
Smallest allowed negative number |
-2.2251E-308 |
Smallest allowed positive number |
2.2251E-308 |
Largest allowed positive number |
9.99999999999999E+307 |
Largest allowed negative number |
-9.99999999999999E+307 |
Largest allowed positive number via formula |
1.7976931348623158e+308 |
Largest allowed negative number via formula |
-1.7976931348623158e+308 |
Length of formula contents |
8,192 characters |
Internal length of formula |
16,384 bytes |
Iterations |
32,767 |
Worksheet arrays |
Limited by available memory |
Selected ranges |
2,048 |
Arguments in a function |
255 |
Nested levels of functions |
64 |
User defined function categories |
255 |
Number of available worksheet functions |
341 |
Size of the operand stack |
1,024 |
Cross-worksheet dependency |
64,000 worksheets that can refer to other sheets |
Cross-worksheet array formula dependency |
Limited by available memory |
Area dependency |
Limited by available memory |
Area dependency per worksheet |
Limited by available memory |
Dependency on a single cell |
4 billion formulas that can depend on a single cell |
Linked cell content length from closed workbooks |
32,767 |
Earliest date allowed for calculation |
January 1, 1900 (January 1, 1904, if 1904 date system is used) |
Latest date allowed for calculation |
December 31, 9999 |
Largest amount of time that can be entered |
9999:59:59 |
Charting specifications and limits
Feature |
Maximum limit |
---|---|
Charts linked to a worksheet |
Limited by available memory |
Worksheets referred to by a chart |
255 |
Data series in one chart |
255 |
Data points in a data series for 2-D charts |
32,000 |
Data points in a data series for 3-D charts |
4,000 |
Data points for all data series in one chart |
256,000 |
PivotTable and PivotChart report specifications and limits
Feature |
Maximum limit |
---|---|
PivotTable reports on a sheet |
Limited by available memory |
Unique items per field |
1,048,576 |
Row or column fields in a PivotTable report |
Limited by available memory |
Report filters in a PivotTable report |
256 (may be limited by available memory) |
Value fields in a PivotTable report |
256 |
Calculated item formulas in a PivotTable report |
Limited by available memory |
Report filters in a PivotChart report |
256 (may be limited by available memory) |
Value fields in a PivotChart report |
256 |
Calculated item formulas in a PivotChart report |
Limited by available memory |
Length of the MDX name for a PivotTable item |
32,767 |
Length for a relational PivotTable string |
32,767 |
Workbooks with the «Allow changes by more than one user…» setting enabled
If the Allow changes by more than one user… setting is on for a workbook, then the following information applies. This setting is enabled when using Shared Workbooks.
Feature |
Maximum limit |
---|---|
Users who can open and share the workbook at the same time |
256 |
Personal views in the workbook |
Limited by available memory |
Days that change history is maintained |
32,767 (default is 30 days) |
Workbooks that can be merged at one time |
Limited by available memory |
Cells that can be highlighted |
32,767 |
Colors used to identify changes made by different users when change highlighting is turned on |
32 (each user is identified by a separate color; changes made by the current user are highlighted with navy blue) |
Excel tables in the workbook |
0 (zero) Note: A workbook that contains one or more Excel tables cannot have the Allow changes by more than one user… setting enabled. |
Top of Page
Have a look at this link** and this one (and this for 2013). A quick summary:
+-----------------+-----------+--------------+---------------------+
| | Max. Rows | Max. Columns | Max. Cols by letter |
+-----------------+-----------+--------------+---------------------+
| Excel 365* | 1,048,576 | 16,384 | XFD |
| Excel 2013 | 1,048,576 | 16,384 | XFD |
| Excel 2010 | 1,048,576 | 16,384 | XFD |
| Excel 2007 | 1,048,576 | 16,384 | XFD |
| Excel 2003 | 65,536 | 256 | IV |
| Excel 2002 (XP) | 65,536 | 256 | IV |
| Excel 2000 | 65,536 | 256 | IV |
| Excel 97 | 65,536 | 256 | IV |
| Excel 95 | 16,384 | 256 | IV |
| Excel 5 | 16,384 | 256 | IV |
+-----------------+-----------+--------------+---------------------+
*Excel 365 unverified.
**This (web archive) link probably will not work with your browser, but the information is in the page source.
-
Excel Howtos, Interview Questions, Power Pivot, Power Query
-
Last updated on September 17, 2019
Chandoo
As part of our Excel Interview Questions series, today let’s look at another interesting challenge. How-to handle more than million rows in Excel?
You may know that Excel has a physical limit of 1 million rows (well, its 1,048,576 rows). But that doesn’t mean you can’t analyze more than a million rows in Excel.
The trick is to use Data Model.
Excel data model can hold any amount of data
Introduced in Excel 2013, Excel Data Model allows you to store and analyze data without having to look at it all the time. Think of Data Model as a black box where you can store data and Excel can quickly provide answers to you.
Because Data Model is held in your computer memory rather than spreadsheet cells, it doesn’t have one million row limitation. You can store any volume of data in the model. The speed and performance of this just depends on your computer processor and memory.
How-to load large data sets in to Model?
Let’s say you have a large data-set that you want to load in to Excel.
If you don’t have something handy, here is a list of 18 million random numbers, split into 6 columns, 3 million rows.
Step 1 – Connect to your data thru Power Query
Go to Data ribbon and click on “Get Data”. Point to the source where your data is (CSV file / SQL Query / SSAS Cube etc.)
Step 2 – Load data to Data Model
In Power Query Editor, do any transformations if needed. Once you are ready to load, click on “Close & Load To..” button.
Tell Power Query that you want to make a connection, but load data to model.
Now, your data model is buzzing with more than million cells.
Step 3 – Analyze the data with Pivot Tables
Go and insert a pivot table (Insert > Pivot Table)
Excel automatically picks Workbook Data Model. You can now see all the fields in your data and analyze by calculating totals / averages etc.
You can also build measures (thru Power Pivot, another powerful feature of Excel) too.
How to view & manage the data model
Once you have a data model setup, you can use,
- Data > Queries & Connections: to view and adjust connection settings
- Relationships: to set up and manage relationships between multiple tables in your data model
- Manage Data Model: to manage the data model using Power Pivot
Alternative answer – Can I not use Excel…
Of course, Excel is not built for analyzing such large volumes of data. So, if possible, you should try to analyze such data with tools like Power BI [What is Power BI?] This gives you more flexibility, processing power and options.
Watch the answer & demo of Excel Data Model
I made a video explaining the interview question, answer and a quick demo of Excel data model with 2 million rows. Check it out below or on my YouTube Channel.
Resources to learn about Excel Data Model
- How to set up data model and connect multiple tables
- A quick but detailed introduction to Power Query
- Create measures in Excel Pivot Tables – Quick example
- Create data model in Excel – Support.Office.com
- How to use Data Model in Excel – Excelgorilla.com
How do you analyze large volumes of data in Excel?
What about you? Do you use the data model option to analyze large volumes of data? What other methods do you rely on? Please post your tips & ideas in the comments section.
Share this tip with your colleagues
Get FREE Excel + Power BI Tips
Simple, fun and useful emails, once per week.
Learn & be awesome.
-
13 Comments -
Ask a question or say something… -
Tagged under
data model, excel interview questions, power pivot, power query, videos
-
Category:
Excel Howtos, Interview Questions, Power Pivot, Power Query
Welcome to Chandoo.org
Thank you so much for visiting. My aim is to make you awesome in Excel & Power BI. I do this by sharing videos, tips, examples and downloads on this website. There are more than 1,000 pages with all things Excel, Power BI, Dashboards & VBA here. Go ahead and spend few minutes to be AWESOME.
Read my story • FREE Excel tips book
Excel School made me great at work.
5/5
From simple to complex, there is a formula for every occasion. Check out the list now.
Calendars, invoices, trackers and much more. All free, fun and fantastic.
Power Query, Data model, DAX, Filters, Slicers, Conditional formats and beautiful charts. It’s all here.
Still on fence about Power BI? In this getting started guide, learn what is Power BI, how to get it and how to create your first report from scratch.
Related Tips
13 Responses to “How can you analyze 1mn+ rows data – Excel Interview Question – 02”
-
Thanks for your time to give us!
-
Leon Nguyen says:
How can i make Vlookup work for 1 million rows and not freeze up my computer? I have 2 sets of data full of customer account numbers and i want to do Vlookup to match them up. Thank you.
-
blahblahblacksheep says:
At that point, I would just import both spreadsheets into an Access database, and run a query that connects them based on a common field/ID (eg: account_ID). Excel cells are all variants, so they take up a lot of memory. But, in Access, you can specify the data type of each field (text, int, decimal, etc). Then identify primary keys on each table (eg: account_ID). Just create a quick Access database, import the spreadsheets, make a query, connect them, and drag-n-drop the fields you want for the query output. Excel is good for pivot tables and such on pre-aggregated data. When you’re trying to do massive data merges (like VLOOKUP on million rows), it’s time to shift the data into a data-oriented solution which can spit out an aggregate data query which you can then export to excel and do the rest of your analysis on. You might be able to leverage the data model he suggested. Put both spreadsheets into the Data Model as data sources, then see if you can open up a drag-n-drop data model screen to hook them together via Primary & Foreign key values (again, Account_ID or such). Another way is to run the VLOOKUP, wait for it to finish, then copy the column that had values filled in and paste-as values. If you have to do multiple columns of VLOOKUP, just do them one-at-a-time, and then copy/paste-as values. This means your spreadsheet won’t be agile (IE: won’t update if you change the source data of the VLOOKUP). But, if you’re just trying to merge data sets quick-n-dirty style, it can be done. Trying to run multiple VLOOKUPS on a million record data set at the same time will kill your computer. So, just do them one-at-a-time and paste-as values.
-
Matthew says:
This is of course what we all did when the limit was 50K. Seriously, Excel is for Granny’s grocery list. There are better ways to do this. I will allow it is better than text file processing, but really.
-
-
-
Myles says:
«well, its 1,048,576 rows»
it’s -
Steve says:
The international standard abbreviation for Million is capital M. NOT mn. You shouldn’t make up your own terms. It takes away from any sense we have that you know what you are talking about. BTW, lower case m is milli or 1/1000. Lower case n is nano or 1/10^-9, or one billionth.
-
Thanks for the detail Steve.
-
Andrew says:
1mn is million in some financial cases and some countries
1MM is million in some industries
1kk is also million in some countries.That you dont know about this could take away any sense anyone has that you know what you are talking about, or leave the impression that you are looking to be rude.
I am also pretty sure you understood it to be 1 million and you were not opening the article expecting it to show you 1 nanometer in rows in excel.
-
-
Veronica says:
You are the best
-
RUPAM says:
CAN I USE THIS IN COMPARING INPUT FILE DATA AND TABLE DATA TO CONFIRM THE DATA CORRECTION?
-
Allan says:
Rupam
If you send me details I will show you how I would tackle COMPARING INPUT FILE DATA AND TABLE DATA
Allan
-
-
allan brayshaw says:
‘Introduction
‘An alternative approach to Excel handling large volumes of data is to use Collections to consolidate the data before making it
‘available for Excel to process. This technique is simple to code and much faster to run than asking Excel to process large volumes of data.
‘Consider a large dataset containing 5 years files of Date/customer account number/customer name/many address lines/phone number/transaction value
‘and where customers will generate many transactions per day.
‘Let’s assume the business requirement is to provide a report of total transaction value per customer by year/month.
‘Rather than placing these records in an Excel worksheet, use VB to read the 5 yearly files and consolidate the information in a memory
‘collection by year/month rather than year/month/day.
‘With, say, 300,000 rows and 10 columns of data, Excel has 3 million cells to process.
‘These 300,000 records could be consolidated into a maximum of 60,000 records:- 5 years x 12 months x 1,000 customers each with a total transaction value.
‘at the same time any rows or columns not contributing to the report can be excluded. (10 columns reduced to 4 in this example)
‘In this example, consolidation reduces the data to 60,000 rows and 4 columns (i.e. 240,000 cells).
‘Therefore Excel data volumes reduce to 240,000 cells irrespective of data volume processed.
‘Using a memory Collection to consolidate the data results in Excel having far less data to process and less data means less time to process.
‘and the Excel 1 million row limit would not apply to the input data unless the consolidated data exceeds this limit — very unlikely.
‘The code below assumes the data is already held in an Excel worksheet but this could easily be converted to reading from one or more files.
‘The earliest record I can find of me using the Collection Class was 2008 (I was working as a Performance engineer at BT when the Excel row limit was 64k
‘and I had 1.2 million 999 calls to analyse) but there is also a Dictionary class which I hear has benefits over the Collections class.
‘I have never used this class but see details https://www.experts-exchange.com/articles/3391/Using-the-Dictionary-Class-in-VBA.html
‘Setup a demo as follows:
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
‘Copy/Paste the following text into an Excel code Module in an empty .xlsm Workbook and rename 2 sheets Data and Consolidation
‘Insert a new Class Module — Class1
‘Cut/Paste the following 34 rows of text to create a Class1 definition record of only those data fields required for analysis
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
‘Option Explicit
‘Private pYYMM As String
‘Private pAccountNumber As String
‘Private pCustomerName As String
‘Private pTotalValue As Single
‘
‘Property Let YYMM(xx As String) ‘YYMM
‘ pYYMM = xx
‘End Property
‘Property Get YYMM() As String
‘ YYMM = pYYMM
‘End Property
‘Property Let AccountNumber(xx As String) ‘AccountNumber
‘ pAccountNumber = xx
‘End Property
‘Property Get AccountNumber() As String
‘ AccountNumber = pAccountNumber
‘End Property
‘Property Let CustomerName(xx As String) ‘CustomerName
‘ pCustomerName = xx
‘End Property
‘Property Get CustomerName() As String
‘ CustomerName = pCustomerName
‘End Property
‘Property Let TotalValue(xx As Single) ‘TotalValue
‘ pTotalValue = xx
‘End Property
‘Property Get TotalValue() As Single
‘ TotalValue = pTotalValue
‘End Property
»»»»»»»»»»»»»»»»»»»»»»»»»
»’Cut/Paste up to here and remove the first ‘ comment on each row after pasting into Class1»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
‘Code module content
Option Explicit
Dim NewRecord As New Class1 ‘record area (definition of new Class1 record)
Dim Class1Collection As New Collection ‘A collection of keyed records held in memory
Dim Class1Key As String ‘Collection Key must be a String
Dim WalkRecord As Class1
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
‘This code creates the text key used to access each Class1 record in memory by year, month and account number:-
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
Dim Rw As Long
Dim StartTime As Date
Sub Consolidate()
StartTime = Now()
Sheets(«Data»).Select
For Rw = 2 To Range(«A1»).CurrentRegion.Rows.Count
Class1Key = Format(Range(«A» & Rw).Value, «yymm») & Range(«B» & Rw).Value ‘Key = yymm & AccountNumber
»»»»»»»»»»»»»»»»
‘This code populates the Class1 record:-
»»»»»»»»»»»»»»»»
NewRecord.YYMM = Format(Range(«A» & Rw).Value, «yymm») ‘YYMM
NewRecord.AccountNumber = Range(«B» & Rw).Value ‘AccountNumber
NewRecord.CustomerName = Range(«C» & Rw).Value ‘CustomerName
NewRecord.TotalValue = Range(«J» & Rw).Value ‘TotalValue
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
‘This code writes the Class record into memory and, if an identical key already exists, accumulate TotalValue:-
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
On Error Resume Next
Call Class1Collection.Add(NewRecord, Class1Key)
Select Case Err.Number
Case Is = 0
On Error GoTo 0
‘******** NEW RECORD INSERTED OK HERE *********
Case Is = 457
On Error GoTo 0
‘******** CONSOLIDATE DUPLICATE KEY CASES HERE ************
Class1Collection(Class1Key).TotalValue = Class1Collection(Class1Key).TotalValue + NewRecord.TotalValue
Case Else
‘******** UNEXPECTED ERROR *****************
Err.Raise Err.Number
Stop
End Select
Set NewRecord = Nothing
Next Rw
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
‘ When all the records have been processed, This code writes Collection entries to a WorkSheet:-
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
Sheets(«Consolidation»).Select
Range(«A1:D1»).Value = Array(«Date», «account number», «customer name», «total value»)
Rw = 2
For Each WalkRecord In Class1Collection ‘walk collection and write rows
Cells(Rw, 1) = «20» & Left(WalkRecord.YYMM, 2) & «/» & Right(WalkRecord.YYMM, 2)
Cells(Rw, 2) = WalkRecord.AccountNumber
Cells(Rw, 3) = WalkRecord.CustomerName
Cells(Rw, 4) = WalkRecord.TotalValue
Rw = Rw + 1
Next WalkRecord
Columns(«A:A»).NumberFormat = «dd/mm/yyyy;@»
Columns(«D:D»).NumberFormat = «#,##0.00»
Columns(«A:D»).HorizontalAlignment = xlCenter
Columns(«A:D»).EntireColumn.AutoFit
Set Class1Collection = Nothing
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
‘This code sorts the Consolidated records into AccountNumber within YY/MM
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
Rw = Range(«A1»).CurrentRegion.Rows.Count
ActiveWorkbook.Worksheets(«Consolidation»).Sort.SortFields.Clear
ActiveWorkbook.Worksheets(«Consolidation»).Sort.SortFields.Add Key _
:=Range(«A2:A» & Rw), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
ActiveWorkbook.Worksheets(«Consolidation»).Sort.SortFields.Add Key _
:=Range(«B2:B» & Rw), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
With ActiveWorkbook.Worksheets(«Consolidation»).Sort
.SetRange Range(«A1:D» & Rw)
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
Range(«A2»).Select
MsgBox («Runtime » & Format(Now() — StartTime, «hh:mm:ss»)) ‘300k records takes 1 min 46 seconds on my ancient HP laptop
End Sub
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
‘Now run macro Generate to create 300k test rows, followed by macro Consolidate to demonstrate consolidation by data collection.
‘In my experience the improvement in Excel run time is so spectacular that no further action is required
‘but further run time improvements could result from avoiding calling the Format function twice every row,
‘or holding CustomerName in another collection and accessing during the Walk process, or using the Dictionary class?
‘Note that is advisable to run macro ClearSheets before saving your WorkBook.
‘Also the Record constant can be manually adjust up to a maximum of 1048575 if required
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
Sub Generate() ‘Generates test data in Sheet Data
Const StartDate As Date = 42370 ’01/01/2016
Const FinishDate As Date = 44196 ’31/12/2020
Const Records As Long = 300000 ‘number of records to generate — max 1048575
Const Customers As Integer = 1000 ‘number of Customers
Const StartValue As Integer = 5 ‘£5
Const FinishValue As Integer = 100 ‘£100
StartTime = Now()
Sheets(«Data»).Select
Range(«A1:J1»).Value = Array(«Date», «account number», «customer name», «address 1», «address 2», «address 3», «address 4», «address 5», «phone number», «tx value»)
For Rw = 2 To Records + 1
Cells(Rw, 1) = WorksheetFunction.RandBetween(StartDate, FinishDate) ‘Date
Cells(Rw, 2) = WorksheetFunction.RandBetween(1, Customers) ‘Customer number
Range(Cells(Rw, 3), Cells(Rw, 9)) = Array(«Customer » & Cells(Rw, 2), «Not required», «Not required», «Not required», «Not required», «Not required», «Not required»)
Cells(Rw, 10) = WorksheetFunction.RandBetween(StartValue, FinishValue) ‘Value
Next Rw
Columns(«A:A»).NumberFormat = «dd/mm/yyyy;@»
Columns(«J:J»).NumberFormat = «#,##0.00»
Columns(«A:J»).HorizontalAlignment = xlCenter
Columns(«A:J»).EntireColumn.AutoFit
Range(«A2»).Select
MsgBox («Runtime » & Format(Now() — StartTime, «hh:mm:ss»)) ‘300k records takes 1 min 50 seconds on my ancient HP laptop
End Sub
Sub ClearSheets() ‘delete all data before saving
Sheets(«Data»).Select
Range(«A1»).CurrentRegion.Delete Shift:=xlUp
Range(«A2»).Select
Sheets(«Consolidation»).Select
Range(«A1»).CurrentRegion.Delete Shift:=xlUp
Range(«A2»).Select
End Sub
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
‘Finally, a data collection can comprise of values and keys without requiring a Class record.
‘Such a collection can be used to identify data duplicates (e.g. value/key=row number/cellcontent) or hold a lookup table (e.g. value/key=name/account)
‘Most applications process far more data than they report therefore using collections benefits most data sets irrespective of volume.
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
»Example of generating subtotals by writing additional records into the collection
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
‘Sub ConsolidateWithSubtotals()
‘Dim SubTotalRecord As New Class1 ‘subtotals
‘ StartTime = Now()
‘ Sheets(«Data»).Select
‘ For Rw = 2 To Range(«A1»).CurrentRegion.Rows.Count
‘ Class1Key = Format(Range(«A» & Rw).Value, «yymm») & Range(«B» & Rw).Value ‘Key = yymm & AccountNumber
»»»»»»»»»»»»»»»»’
»Next populate the Class1 record:-
»»»»»»»»»»»»»»»»’
‘ NewRecord.YYMM = Format(Range(«A» & Rw).Value, «yymm») ‘YYMM
‘ NewRecord.AccountNumber = Range(«B» & Rw).Value ‘AccountNumber
‘ NewRecord.CustomerName = Range(«C» & Rw).Value ‘CustomerName
‘ NewRecord.TotalValue = Range(«J» & Rw).Value ‘TotalValue
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»’
»Next write the Class record into memory and, if an identical key already exists, accumulate TotalValue:-
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
‘ On Error Resume Next
‘ Call Class1Collection.Add(NewRecord, Class1Key)
‘ Select Case Err.Number
‘ Case Is = 0
‘ On Error GoTo 0
‘ ‘******** NEW RECORD INSERTED OK HERE *********
‘ Case Is = 457
‘ On Error GoTo 0
‘ ‘******** CONSOLIDATE DUPLICATE KEY CASES HERE ************
‘ Class1Collection(Class1Key).TotalValue = Class1Collection(Class1Key).TotalValue + NewRecord.TotalValue
‘ Case Else
‘ ‘******** UNEXPECTED ERROR *****************
‘ Err.Raise Err.Number
‘ Stop
‘ End Select
‘ Set NewRecord = Nothing
»»»»»»»»»»»’
»accumulate subtotals
»»»»»»»»»»»’
‘ Class1Key = Format(Range(«A» & Rw).Value, «yymm») ‘Key = yymm
‘ SubTotalRecord.YYMM = Format(Range(«A» & Rw).Value, «yymm») ‘YYMM
‘ SubTotalRecord.AccountNumber = «Subtotal»
‘ SubTotalRecord.CustomerName = «»
‘ SubTotalRecord.TotalValue = Range(«J» & Rw).Value ‘TotalValue
‘ On Error Resume Next
‘ Call Class1Collection.Add(SubTotalRecord, Class1Key)
‘ Select Case Err.Number
‘ Case Is = 0
‘ On Error GoTo 0
‘ ‘******** NEW RECORD INSERTED OK HERE *********
‘ Case Is = 457
‘ On Error GoTo 0
‘ ‘******** CONSOLIDATE DUPLICATE KEY CASES HERE ************
‘ Class1Collection(Class1Key).TotalValue = Class1Collection(Class1Key).TotalValue + SubTotalRecord.TotalValue
‘ Case Else
‘ ‘******** UNEXPECTED ERROR *****************
‘ Err.Raise Err.Number
‘ Stop
‘ End Select
‘ Set SubTotalRecord = Nothing
‘ Next Rw
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
» When all the records have been processed, write Collection entries to a WorkSheet:-
»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
‘ Sheets(«Consolidation»).Select
‘ Range(«A1:D1»).Value = Array(«Date», «account number», «customer name», «total value»)
‘ Rw = 2
‘ For Each WalkRecord In Class1Collection ‘walk collection and write rows
‘ If WalkRecord.AccountNumber = «Subtotal» Then
‘ Cells(Rw, 1) = «20» & Left(WalkRecord.YYMM, 2) & «/» & Right(WalkRecord.YYMM, 2)
‘ Cells(Rw, 2) = «Subtotal 20» & Left(WalkRecord.YYMM, 2) & «/» & Right(WalkRecord.YYMM, 2)
‘ Cells(Rw, 3) = WalkRecord.CustomerName
‘ Cells(Rw, 4) = WalkRecord.TotalValue
‘ Else
‘ Cells(Rw, 1) = «20» & Left(WalkRecord.YYMM, 2) & «/» & Right(WalkRecord.YYMM, 2)
‘ Cells(Rw, 2) = WalkRecord.AccountNumber
‘ Cells(Rw, 3) = WalkRecord.CustomerName
‘ Cells(Rw, 4) = WalkRecord.TotalValue
‘ End If
‘ Rw = Rw + 1
‘ Next WalkRecord
‘ Set Class1Collection = Nothing
»
‘ Rw = Range(«A1»).CurrentRegion.Rows.Count
‘ ActiveWorkbook.Worksheets(«Consolidation»).Sort.SortFields.Clear
‘ ActiveWorkbook.Worksheets(«Consolidation»).Sort.SortFields.Add Key _
‘ :=Range(«A2:A» & Rw), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
‘ ActiveWorkbook.Worksheets(«Consolidation»).Sort.SortFields.Add Key _
‘ :=Range(«B2:B» & Rw), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
‘ With ActiveWorkbook.Worksheets(«Consolidation»).Sort
‘ .SetRange Range(«A1:D» & Rw)
‘ .Header = xlYes
‘ .MatchCase = False
‘ .Orientation = xlTopToBottom
‘ .SortMethod = xlPinYin
‘ .Apply
‘ End With
‘ Range(«A2»).Select
‘ MsgBox («Runtime » & Format(Now() — StartTime, «hh:mm:ss»)) ‘300k records takes 1 min 46 seconds on my ancient HP laptop
‘End Sub