Excel is not a database… Here’s why.
So it happened again. 16,000 COVID-19 tests in England got lost, and MS Excel gets the blame. The reason? It was being relied on for huge amounts of data, leading to big problems.
After more than 20 years as a consultant and system developer, I certainly have seen many creative and sometimes, quite frankly, life-threatening solutions where a spreadsheet has developed uncontrollably into a critical business or documentation system (and by “creative solutions”, I really mean horror stories).
Excel is a popular tool, and for a good reason. It’s flexible and often really quick to get started with to perform calculations and create diagrams. Also, with integration into Sharepoint, Excel can be set up quickly as a multi-user environment.
But, it certainly also has some shortcomings.
Some of these can be related to technical issues, as in the initial example of lost COVID-19 test data where apparently a text file was imported into Excel, but with more lines/records than could be handled in the spreadsheet resulting in the lost records.
Now, experienced and Excel-savvy users may very well be aware of these limitations, but when you are in a hurry (and in crisis-mode), this can easily be forgotten. And, it’s not easy to know the exact limitations you have at the time as this also can be version-dependent. Even if the limitation would have been discovered in time, some sort of workaround had to be figured out. Maybe splitting the spreadsheet into several sheets?
What is a “Real Database”?
In this case, it actually seems that Excel was used as a replacement for a “real database” and in many cases that may be a pretty bad idea. Certainly when you are handling more than a million records and most definitely when it is data related to serious health issues. It’s not surprising though that Excel can be assumed to be a real database: A spreadsheet looks very much like a database for a non-database user.
In terms of a “real database”, I think of systems like PostgreSQL, SQL Server, Oracle, or even more lightweight systems like SQLite. These systems focus on storing large amounts of data — with integrity. Meaning you can set up rules regarding your precious data. For the case of COVID-19, this could mean ensuring that the same test is not registered twice.
However, loading a “real database” with data can be daunting as there are so many different ways of importing data. And certainly automating the continuous update of the database can be a real challenge.
Excel and Databases Can Still Work Together
There are efficient ways of setting up automated workflows for these database scenarios. The best solution is often to use the right tools for the right purpose, and Excel can still be a part of that solution. A good example can be found at the European Environment Authority where these scenarios have been carefully thought through after many years of experience. They regularly collect environmental data from a great number of member states. Mr. Jan Bliki, Head of the group on data management, explains:
“Excel is still the easiest input tool among non-database users. It’s hard to replace if users are that used to that format. In our case, we ask to deliver multiple files using the same template: Excel files that only store a fraction of the data (for example, by hospital and by day). Then, we use a central service that brings all these daily results together into one single database using a single ETL tool. To make easy use of such a database, a BI tool could be linked so those same users would still have the ability to produce an overview of all results.”
Key Benefits of a Real Database
There are some really good points here:
- A single database helps preserve the integrity of your data.
- The automation capabilities of an ETL tool, such as FME, can include QA rules that send alerts when something suspicious happens. But, it also makes it easier to keep the database up to date.
- As Jan also mentions, using BI alongside a database is a great way to provide feedback and insights to users.
So how can an ETL tool like FME handle validation and QA in a process like this? Think of the database as a secure bank-vault. To get data in (and out), it has to have the right credentials. FME is the safety-system checking that your data brings the right credentials and, if not, warns you that something fishy is going on.
A few examples of checks you could include:
- Are there more than a certain number of rows in the Excel sheet? This could have helped in preventing the COVID-incident mentioned in the beginning.
- Are there duplicate records?
- Are data-values within reasonable ranges?
- Do all fields have values?
The list goes on. One of the greatest benefits of having an automated process is that you can constantly add more checks as the situation evolves! Making the safety-system stronger with time.
Don’t Blame Individual Tools
To sum this up, I honestly think that blaming the spreadsheet’s shortcomings for accidents, like the one with the COVID-19 tests, is far too easy and avoids the core problem. The tool has most likely been used in a way that it was not intended for. And most importantly: Any information owner should take measures to treat their data as the valuable resource it really is!
Want to learn more about Excel? Check out “5 Ways FME Can Improve Your Excel Data” webinar hosted by Safe Software co-founder Dale Lutz.
About Data
Databases
Excel
Skip to content
Working in the GIS field, it’s very common to gather data for a project from various adhoc sources, and often that data is delivered in the form of an Excel spreadsheet. I’ll preface by saying that there’s nothing wrong with delivering data in this way. Excel is a powerful and widely used tool in the business world, and just about everybody who’s ever worked in an office has opened an Excel spreadsheet. But too often in my GIS career I’ve come across projects that are using Excel spreadsheets as a master repository, storing large amounts of tabular data within. It’s not hard to imagine how data integrity can quickly become compromised when people are copying/pasting/emailing/editing/saving spreadsheets, both on their local disk and shared network folders.
So why isn’t Excel a database? It has rows and columns and you can sort data, create formulas, query data with features like VLOOKUP right? While those things may be true, there are several reasons Excel is not a database:
- Excel does not allow columns to contain only one specific primitive datatype – integers, floating point decimals, boolean, string, etc. In a database table, each column or field can be constrained to only allow one datatype. Often there will be a column you want to only contain a specific value, like an integer for an ID number. Excel does not provide any data validation, and will not prevent users from entering an invalid value for for the ID. A database table would throw an error, not allowing text to be used for the integer ID column. A database table can also disallow null or empty values for a column, requiring every row to have a value, useful for ensuring each row has an ID number for example. Excel does not provide this functionality.
- Excel does not allow multiple users to open and edit the same Excel file. This is a very limiting factor experienced by many users who share files on network drives. I have personally run into this issue quite a few times, having to ask a somebody on the team to close a file that they had open so I could make some edits. Inevitably, myself or somebody else would make a copy of the file to work on locally, and then we’d have issues reconciling the edits each person made with the master copy. A database is designed to allow hundreds or thousands of users to concurrently query, view, and edit the data.
- Excel is slooooow. Like really slow. The file size can balloon when there are many rows and columns, many formulas, and special formatting, filters, etc in the worksheets. Sorting worksheets with many rows can be painfully slow, and out of memory errors can happen when the content of the Excel file becomes too large. Databases are designed to maximize performance and offer nearly limitless space, depending on hardware configuration of course. (SQL Server can handle databases of up to 524,272 terabytes!) Excel files should be out of the question when data tables have hundreds of thousands of rows or more or data.
- Excel does not allow you to query or join data from multiple tables. Well, it does kinda…features like PivotTables, VLOOKUP,and HLOOKUP can provide functionality similar to some database queries by summarizing data, searching for matching criteria, etc. I won’t get into how to use these features but they do not quite give the same functionality and flexibility as a database provides. A significant limiting factor of Excel is not being able to link or join tables between files – all of these operations must be done with worksheets in the same Excel file. Where Excel really falls short however, is one to many relationships. This is an important concept in data relationships, and these kinds of table joins cannot easily be performed in Excel:
- Last, but certainly not least, Excel doesn’t provide any data backup or recovery tools. You may have some automatic file backups on your computer’s hard drive or on a shared network drive, but these may be insufficient if an Excel file is destroyed or altered. A database will have built in features to backup data, and sometimes allow the data to be rolled back to a given point in time. Bringing data back online after a hardware failure is generally an easy process with most database software. A database can also keep logs of all changes to the tables, making it possible to undo a specific table update if an error was made. On a related note, databases will offer options to securely access your data as well. Excel files offer little in the way of protecting the data from unauthorized viewers.
To summarize, Excel is a great tool for data analysis, presentation, and mathematical and statistical calculations. It is a not so great tool for storing large amounts of data. It’s often difficult to maintain data integrity using Excel, especially among multiple users. When the content of the file becomes too large, Excel performance drops significantly making viewing and editing the data difficult.
For any readers out there who currently find themselves in a situation where they feel Excel is not quite the right tool for data storage, I would urge you to explore using a database for your project. There are quite a few options out there, both free and paid, for setting up a database. It’s probably a little beyond the scope of this post to discuss all the various database storage and retrieval types and the specific database software, but since I’m a .NET developer I would suggest trying SQL Server Express. It’s easy to set up a local database server and begin experimenting with migrating data from Excel into database tables. You may also already have Microsoft Access installed on your machine as part of Microsoft Office.
There were lots of spreadsheet package software were available in the market but among all these spreadsheet packages Microsoft excel is the most popular one and it is the most useable spreadsheet package amongst users.
But some peoples are getting confused between excel and database and they think that excel is an example of a database. But that is not the actual truth, excel is not a database and here in this article, I am going to share with you some of the essential causes about why excel is not a database? So, read this entire article for detailed information.
So, why excel is not a database? there were several reasons and based on those reasons it can be said that Excel is not a database, the reasons are within excel the data we’re getting stores within cells, and within the cell, you can store any type of information while that is not the case within the database here you can only store the raw data. Besides that within excel you can perform lots of data and besides that, you can perform other calculations within excel and excel is unable to handle much data like databases, and therefore based on these cases it can be said that Excel is not a database.
To know more about this topic read this entire article for detailed information.
Why Excel is not a Database? [Major Reasons]
There is a lot of confusion amongst the people between data tables, databases, and excel spreadsheets. Most people think that these things are similar but that is not the actual case. These things are almost different from each other.
Here in this section, I have discussed why excel is not a database-
Spreadsheets:
The first major point is that Excel is not a database it is a spreadsheet package. Generally, the spreadsheet is a kind of electronic ledger, an electronic version of paper accounting worksheets. It was mainly created to facilitate people who need to stores their accounting information in tabular forms digitally.
It can contain a large amount of data and stores that data in tabular forms and it can the existing data to create calculations.
Well, these things can also be possible within the database, and therefore most people getting confused and thinks excel and database are closely co-related but they are different.
Stores any type of Information:
Within a spreadsheet or Excel, every cell is treated as a unique entity. And it can store any type of information.
It can stores a date an integer value, string name and besides this thing, you can also apply several formats within your text. This is not inherent to the databases, databases only can store the raw data.
Besides that preset the type of data contained in a certain field. For example, in a field containing date values if the user tries to insert the strings the software or database will show an error. This thing doesn’t happen within excel, if you insert a column with date values you should not get the error message.
This happens mainly because excel stores the integer value but the database won’t.
Data Stores:
Within excel or spreadsheets, data were getting stores within the cell. But in the database data were stores in the record of a table meaning you must count the record in a table to express how long the data table is, not the numbers of the cells.
And within the database, you cannot pick any font color or size all you care about the information were getting stores. While on the other side in excel you can change the font color, size, and other things.
Here I want to mention that within the database formatting is not allowed but within excel formatting is allowed.
Calculations:
This is another substantial difference. In spreadsheets or excel, different cells can contain different calculations, such as functions and formulas. This means if you want to combine two integers then the results will be stores in other cells.
While in the database all calculations and operations are based on the existing data and it’s done after data retrieval. Within the database, there are similar options like view in which you can perform calculations. These objects also contain columns that can be normal columns or contain a certain type of calculations.
Multiple worksheets:
You have to think that excel or spreadsheet can contain multiple datasheets, so one can create tables on the worksheets and then use the excels sheets to create relations between the tables. Well, the fact is within excel such relations can be logically limited.
And instead of setting the spreadsheets, excel, or worksheets, one can set up relations between the tables and this will boost the performance of operations increasing the speed at which you can manipulate your dataset.
Handling Data:
Excel is incapable to handle 1 million rows of data and there are no such solutions to fix this problem.
And in this case, the database is better to use a database that can handle more than 10 million records and it can solve your problem.
Multi-User property:
Referring to the multi-user property database were lagging, essentially every person must update their spreadsheet with new data. For instance, if there is a new purchase to register or to register the last name in customer columns every user must make these changes manually.
On the other hand, databases provide a stable structure, controlling access permissions and user restrictions. One person can make it change that is usually visible to everybody instantly. And this feature increases its efficiency.
And therefore in terms of data consistency databases were more powerful instead of excel.
Duplicate Information:
Using data consistency and data integrity using databases eliminates duplicate information which is another way to save space and increase efficiency.
And within excel there is a slight possibility of duplicate information.
In ConClusion:
In this article I have mentioned some of the essential causes about why excel is not a database, I hope you have liked this article and if you have any kind of information then you can ask in the comment section.
And if this article is worthied for you then please do share the article and follow our website regularly for this kind of helpful and informational article.
This post may contain paid links to my personal recommendations that help to support the site!
So, you’ve started using Microsoft Excel for a while and you’re now thinking if excel can be considered as a database. Not to worry, I’ve done my research on whether Excel is a database. Here’s the short answer:
Excel is not a database. Excel is only a spreadsheet software that cannot be considered as a database because it lacks data integrity, proper structure, table relationships, and database keys that exist in databases. However, Excel can be used as a temporary substitute for data storage in small amounts.
Excel is likely one of your most-used applications in your work or even school projects. If you’re like me, I use Excel to input and store data like a database. Do read on as I’ll be sharing more about why Excel is not a database and some related questions!
Excel VS Database: What Are Their Differences?
The difference between Excel and a database is that Excel is a spreadsheet data storage tool for small-scale data but databases are high-integrity storages that only show data when queried. Excel lacks the data integrity, proper indexing structure, database keys, table relationships found in all databases.
Now that you’re slightly clearer about how different Excel and databases are, you must be wondering what are the actual aspects that keep them apart. Let’s have a quick look at this summary table below!
Comparison Between Excel and a Database
Differences | Excel | Database | |
1 | Storage Location | Stored in flat files | Stored in either database files or on cloud storage |
2 | Storage Size | Small data | Small and big data |
3 | Integrity of Data | Low integrity, can be overwritten easily | High integrity with reading and writing controls |
4 | How Data is Shown | Shown as a whole | Shows only data from specific queries |
5 | Flexibility | Highly flexible formulas and functions | Flexible only through complicated queries |
These are the first 5 differences I could think of when doing the comparison between Excel and a database. Now let’s zoom in to each of these categories to pick out what sets them apart!
1. Storage Location
When we think of Excel, we almost always imagine a bunch of cells put together in a spreadsheet. This all comes in the form of a workbook flat file.
Data in Excel is typically stored in cells and stored in a workbook file. These files are called flat files.
If you’re curious to know what flat files are, check out this short video below that I found useful:
In contrast, actual databases are quite different in terms of storage location. In fact, databases offer much more options for storage than you think! Let’s have a quick look at some of the common examples of where databases store data:
- Traditional databases
- Data warehouses
- Data lakes
Traditional databases are the most similar to Excel and their flat files. They are simple and can hold only a small amount of data.
Data warehouses are silos of data that have been taken from multiple sources, for the purpose of analytics. These can be stored either physically or in a cloud.
Data lakes are the most versatile of the three, being the ones that can store a variety of data types, such as images, audio recordings, and other unstructured data types! The data is almost always stored in the cloud because of how large the data is.
I’ve added a short but helpful video below from Datacamp, with a more comprehensive explanation of where data is stored in databases. Do check it out!
2. Storage Size
I’m pretty sure you might have come across some situations where Excel would hit its data storage limit due to its limitations in size. This is one common thing known to almost all Excel users.
Excel can only hold a low amount of storage, which may not be enough for large-scale storage.
For example, when working with Excel files, I’ve personally encountered scenarios where Excel would crash when opening up CSV or Excel files that are holding too much data.
Having a small storage size in Excel might be fine for personal use but will be a big problem for businesses.
According to Microsoft, Excel can only handle a maximum number of 1,048,576 rows. You’ll most likely not want to be storing any of your data in such an unstable state with data in such high volumes.
Databases are made purely for data storage and technologies in databases have enabled larger storage solutions.
For example, data can be stored in the cloud for MongoDB databases on the Mongo DB Cloud. With a much higher limit of storage, you can be sure that all data entered into a database can be sufficiently stored. A good example of such large storage capacities is the Azure SQL server, which can store up to 120GB of data through their Standard Series option.
3. Integrity of Data
If you’re keeping data in storage, you’d likely expect it to have high data integrity and not be modified easily. Excel handles this poorly, because of its lack of version control and its lack of strict user controls.
When storing data on Excel, you’re working with many different empty cells that can be filled independently of each other. In most cases, you’d like them to be linked to each other, to give some integrity to the data.
Moreover, Excel doesn’t have much version control over any modifications made to the tables.
In contrast to Excel, databases are very well-structured, with all modifications to data going through SQL queries or through programming. This makes databases a much more robust option to store data!
4. How Data is Shown
I’m sure you’re aware that Excel is great when visualizing data as you create calculations and use formulas. Excel gives us some kind of visual feedback as we work on our data, which can be quite awesome, but it comes with its drawbacks.
Excel visualizes all data in a spreadsheet at once and not only the data you need. This means that large amounts of data cannot be shown at once without slowing down the Excel program.
Databases, on the other hand, only present data that are needed according to the specific queries you write! This means that through the use of Structure Query Language (SQL) you can pick out and visualize only the data you need, and not cause long processing times when loading the visuals.
If you’re new to the idea of SQL, check out this quick summary video:
5. Flexibility
In terms of flexibility, Excel is perfect for small data quantities. Excel allows you to work on calculations using formulas within cells. This makes data manipulation very agile and flexible.
However, in databases, flexibility is only limited to how complicated the SQL queries are.
With SQL, you can also perform complex queries to transform the data, but in a more controlled and structured way. This is good for large quantities of data since queries are still very flexible even at such a large scale.
What is Considered as a Database?
The term database is commonly used loosely among many stakeholders in the workplace and this might have caused many of us to have confusion as to what a database is actually defined as.
Let’s have a short answer first below:
A database is commonly used interchangeably with database management systems (DBMS). Most databases store aggregations of data or files that contain information, stored on a computer electronically in a DBMS.
Here are some common examples of databases that most data analysts use:
- SQLite
- MongoDB
- MySQL
- Microsoft SQL Server
- Azure SQL Server
- Google BigQuery
In my career as a data analyst so far, I’ve come across most of the above but have seen MySQL as the most common out there. However, the closest database to Excel is SQLite, which is the most lightweight.
For those who are fresh to the whole concept of databases and want to know more, here’s a video I found giving a great overview of them:
And if you’re still curious to learn more about them, I’ve another video from CBT Nuggets, a YouTube channel that I trust, for you:
Why is Excel Not a Database?
You must be familiar by now that Excel shouldn’t be considered as a database. Let’s have a deeper look at some reasons why Excel is not a database:
5 Reasons Why Excel is Not a Database:
- Excel lacks data integrity
- Excel does not have proper indexing structure
- Excel does not use database keys
- Excel cannot use table relationships
- Excel does not use an RDBMS system
- Excel is limited in storage
Now let’s understand what each of these reasons mean:
1. Excel lacks data integrity
Excel handles data integrity very differently from databases. It does not have any version control over data and all data can be overwritten easily, which is not the case for databases.
2. Excel does not have proper indexing structure
Databases typically have a database index that allows data to be processed faster, making queries very efficient.
For example, when looking for a book in a library, you’d look at the proper codes and author names to find the book you want. This is very similar to the database index and is missing within Excel.
Here’s a fun video you must watch if you’re curious about database indexing. There’s so much Excel is missing out on!
3. Excel does not use database keys
Databases typically use a key system to create relationships between tables and the base Excel software doesn’t support that.
Database keys are the way to identify a record within a table in a database. Some common forms of keys include the primary key, the foreign key, and the composite key.
Here’s a quick explainer video to get you started on database keys:
And here’s another video on database keys with more details:
4. Excel cannot use table relationships
Excel doesn’t have any support for table relationships that databases do. Well, except for those of you who use add-ons in Excel or Power Query. Because of the lack of support of database keys, Excel cannot create table relationships like joins and unions among data tables.
5. Excel does not use an RDBMS system
Excel typically stores data in spreadsheets and they are found in workbooks. These workbooks are worked on using Excel, spreadsheet software, unlike an RDBMS system like in relational databases.
Here’s a short video introduction to RDBMS systems:
6. Excel is limited in storage
Excel is rather limited in its capabilities in data storage compared to databases out there. Excel doesn’t have much storage space because it’s limited to data found within spreadsheet workbooks.
Compared to databases, Excel does not have large data storage capabilities.
In this case, Excel data would only be put together in larger storage through the combination of workbooks and data into Microsoft Access databases.
Read more about Microsoft Access on their website over here!
Why is Excel Wrongly Used as a Database?
Excel is wrongly used as a database because of its similarities in data storage to databases. Excel stores data in tables similar to those in databases and is commonly incorrectly used as a database. However, Excel and databases are vastly different in terms of data integrity, storage location, and storage size.
What are Some Excel Alternatives that are Databases?
Not that we’ve confirmed that Excel should not be used as a database, you must be curious to know the alternatives. Here are some Excel alternatives that I can think of:
Common Excel Alternatives that Are Actual Databases
- Microsoft Access
- SQLite
- MySQL
Related Questions
Can Excel be Used as a Database?
Excel cannot be used as a database. Excel is a spreadsheet program that lacks data integrity, proper structure, table relationships, and database keys that exist in databases. However, Excel can be used as temporary data collection on a small scale.
What are the Disadvantages of Using Excel as a Database?
- Limited Data Types
- No Versioning System
- No Table Relationships
- Lack of Data Security
- Only Useful for Small Datasets
Is Excel a Spreadsheet or a Database?
Excel is a spreadsheet. Excel is a spreadsheet that allows data transformation, data analysis and stores data in Excel workbooks. However, Excel is not a database. Databases require data integrity, proper structure, table relationships, and database keys that Excel does not have.
Is Excel a Flat File Database?
Excel is a flat-file database. Excel stores data within Excel workbook flat files. Flat files are plain text files storing data with no indexing structure and relationships. However, Excel is not a database because it lacks data integrity, proper structure, table relationships, and database keys that are in databases.
Is Microsoft Access a Database?
Microsoft Access is a database. Microsoft Access uses a relational database management system for data storage. It has proper indexing structures, supports table relationships, has database keys, and has high data integrity compared to flat files in Excel. Access also supports database querying using SQL.
Final Thoughts
We all use Excel so often in our daily work that we tend to forget that Excel should never have been seen as a database. Hope this article helps to clear all the confusion! If you’re still confused, the answer is Excel is NOT a database. Thanks for reading!
Austin Chia
I’m a tech nerd, data analyst, and data scientist hungry to learn new skills, tools, and software. I love sharing content with my years of experience in data science, marketing, and tech startups.
Since its initial release in 1985, Microsoft Excel has grown to become a necessity for companies everywhere. It’s the most widely used spreadsheet software among the business community, and has been a robust tool for simple analysis and budgeting.
The problem, however, is that a high percentage of BI and marketing analysts have been incorrectly using it as an ad-hoc database and reporting system rather than for it’s intended purpose. Whether fully realized or not, it’s starting to become a massive problem for many analysts, as data bloat and manual data blending have bogged down the software. This, in turn, ends up making it difficult to gain actionable insights, as well as wastes a considerable amount of time and opportunity cost. It’s a condition we at Alight have diagnosed as Excel Hell.
Sadly, this reliance on Excel results largely from a broad familiarity with the software. Everyone knows Excel and just about everyone is comfortable with it. Thus, it’s commonly used. Yet, despite everyone’s familiarity with it, that doesn’t make Excel a database.
For a number of reasons, a true database is a far better option:
A database connects data tables automatically.
Unlike Excel, database systems allow users to only enter in data once, as it offers the functionality to flow down and tie other records together. As DB Pros explain:
“For example, a Company record can be entered once, and then you can add multiple Contacts that are tied to that Company. This eliminates the need to enter the address, phone number, website URL, etc. more than once. In Excel, if you want a complete list to report on, you must enter that information for each row that includes that Company.”
As Judith Allen notes, data recovery is appreciably easier.
When storing a significant quantity of data in Excel, the program has a habit of slowing down substantially, often to a near crawl (as mentioned above). Databases are designed specifically to handle large datasets, making it relatively easy to recover or find a defined and specific data point in short order.
Databases can be edited by multiple people in real-time.
Having multiple users interacting and changing a database at the same time is a huge benefit. It increases efficiency and saves a ton of time. This reason alone makes databases a substantially better option. Excel, in theory, does offer this feature through its cloud-based OneDrive system. However, not only is it a highly watered down version of Excel desktop, it still hasn’t been widely adopted throughout the business community.
This is by no means an exhaustive list, but it should provide convincing evidence as to why Excel should not be used as a database.
BLOG: What’s a Data Warehouse? A Guide for Marketers
If you’re feeling really ambitious and want to truly maximize your analytics capabilities, it might be worth considering a data warehouse. According to Oracle, in essence, it’s a relational database, which is used for query and analysis as opposed to transactional processing—the customary purpose of traditional database systems.
At Alight, we’ve built our ChannelMix platform on top of a data warehouse. Through an automated process we aggregate and integrate all data sources into one packaged set, cleaning it up and delivering it in a report-ready format.
We developed ChannelMix because, like so many out there, we desperately needed to get out of Excel Hell. So, if you want to see what it’s all about, shoot us a note. We’ll give you the lowdown.
Maximize Campaign ROI with a Complete Analytics Solution
Alight’s Media Performance Analytics solution enables marketers to measure and optimize cross-channel campaigns to reduce wasted spend and improve ROI. Schedule a free solution consultation with our team!
Table of Contents
There are hordes of business people who use Excel every day and swear it`s a great database. Excel is a great spreadsheet application but, it`s not really a database at all. Excel however is an excellent complement to any database since it can turn row after row of data into attractive and comprehensive reports and charts.
What You Should Already Know
You should already be a believer in Why Use a Database.
You should already know the following key concepts:
-
separation of presentation, behavior and data is a good thing
-
what is a database table
-
what is Microsoft Excel
What You Will Learn
You will have a conceptual understanding of:
-
what are common features in modern databases
-
what is a relational database (vs other types of databases)
-
why Excel really isnt a database
What are Common Features in Modern Databases
After our tirade about why Excel really isn’t a database, lets look again at some features that look suspiciously familiar between Excel and modern databases.
Modern Databases include Tables
A table is a unit organization of data within a database.
Much like Excel has columns and rows, a table has fields and records. In Excel, you may put a word in the top cell of each column to describe the data that appear below it. For example, in a spreadsheet containing monthly expenses, column B may represents expenses that occurred in the month of February.
In fact, a table, all by itself, looks an awfully lot like a spreadsheet.
THE WORLD USED TO BE FLAT. There is a type of database called a flat-file database which resembles one or more tables that don’t have much to do with each other – many modern databases used to be flat-file databases up until very recently — FileMaker Pro being one of them. Explains the name, doesn’t it?
Databases Can Work with Database Files Much Larger Than Available RAM.
Databases are designed to in a way to refer to information without actually requiring to load all of the information into memory. Just load a 100 MB Excel file or text file and you will see a huge performance hit. On the other hand, a well designed database doesn’t need to load its entire bulk into memory.
NOT ALL DATABASES ARE FAST THOUGH. Valentina takes speed farther than almost any other database on the market when it comes to handling how data is managed in memory. But almost any modern database is going to be faster than pokey Excel in handling thousands of records.
TRAPPED IN A CELL. Excel uses a workbook format, each one of which can contain up to 256 work sheets. Each sheet can have up to 65,000 rows and 256 columns. Calculate that out and you have a measily 4,292,608,000 cells. That sounds like a lot, doesn’t it? Just try opening a workbook with a sheet that contains only 10,000 rows of information and try scrolling it. Prepare to take coffee breaks between each scroll!
Modern Databases Are Naturally Multi-User
Databases manage information so that it’s easy to have a lot of users accessing it at the same time. Excel documents are meant to be used on a single computer at a time.
HEY, THERE IS A SINGLE-USER VERSION OF VALENTINA, YOU LIAR! Okay, you caught us! But we have a defense — there is a single connection version of Valentina. The innate engine of Valentina can handle an almost unlimited number of connections.
I`VE SEEN DATABASES THAT SAY THEY ARE SINGLE USER Yes, there are database systems that are single user. They were not built with a network in mind. Our biased point-of-view is that a single user only database isn`t keeping up with the times.
What is a Relational Database (vs other types of databases)
A relational database works with the tables inside it in a more intelligent way and in a way that would make the boys at ANSI be really proud. A relational database has more than one table, and a column in one table directly relates to a column in another – however not all information needs to be stored in the referring column.
A Relational DB Example for a Bookstore
Consider an order system at a bookstore. This order system may include a database that has three tables:
A CUSTOMER TABLE. Each customer has a unique ID that ensures that the five people named John Smith within do not get mixed up.
AN ORDER TABLE. Each order has a unique ID as well, so it is possible to differentiate between the order for Harry Potter vs Darth Vader yesterday at 11:00 AM and the one that came in at 2:00 PM.
A BOOK TABLE. Each book edition has a unique ID, so it is possible to tell apart the First, Second, and Fifth Editions of Harry Potter vs Darth Vader.
These three tables can be joined in a way that relates them with each other without really physically joining them together.
For example, each actual order is associated with a unique buyer (from the customer table) and a unique book edition (from the book table). These IDs can be extremely small — maybe an integer. Then, instead of having the string “Harry Potter vs Darth Vader” duplicated thousands of times in your Order Table — making that table bloat terribly — it could simply refer to the book ID 1295.
The common element between each of the tables is an ID – that relates records without any nasty and unnatural physical joining that the lads at ANSI would find terribly distasteful.
WHAT’S OLD IS NEW. Prior to pioneering work of Dr E.F. Codd in relational databases (see Dr Codd’s 12 rules), there was a type of database called a Network Database. Network Databases stored units of information that naturally pointed to the next unit of information, kind of like how web pages can be linked together on the internet with hyperlinks. Navigating a network database was referred to as “walking the set”.
In this series we are exploring Microsoft Excel from a data governance perspective. Many organizations use Excel heavily as the primary place where data is entered. This is understandable for younger organizations that are still trying to figure out best practices. However, Excel has many limitations that other database platforms do not, and maturing organizations should re-examine their use of Excel as a means of data entry and storage.
Why Is Excel not a great database?
1. Excel allows too much data redundancy
One of the key principles of good database design is that data should only be maintained in one place, though it can be referenced from many places. In relational databases this is enforced using tables and keys. A table will be the key store of one piece of information, and a unique identifier/key will allow other tables to reference that information without copying all the data. Excel, with its multiple worksheet structure, does not enforce well the concept of keys and storing data in one place only.
What often happens in an excel spreadsheet is that the same data is copied from one worksheet to another, or even from one spreadsheet to another. As the data is moved into different locations it is inevitably changed and then it becomes difficult to know what the «authoritative» source of data is.
The process of separating data into tables and keys is called database normalization and it is important for a number of reasons. Moving your Excel spreadsheet into a database allows you to break apart the data into logical chunks that can be maintained in one place so that you don’t have to remember all the different worksheets where the data might appear.
2. Excel does not do robust data type enforcement by default
Another reason why Excel is not a great database is that it does not enforce data types by default. In a relational database if you set a field constraint the underlying system enforces that constraint rigidly to the extent that it does not permit data to be saved that does not fit the data type of the column. In Excel you can set a data type into a column, but by default all that means is that Excel tries to massage whatever you type into that column format. In a spreadsheet with thousands of rows it’s hard to find the data points that Excel might have interpreted wrongly.
There is a way to specify and enforce data type constraints on Excel fields but as this behavior is not the default behavior few people know or take the time to set these constraints.
Here is a list of some common data types and the benefits of enforcing data integrity.
3. Excel makes true data collaboration tricky
Relational databases have a robust transactional system which allows people to operate only on parts of the database that are not currently being changed and to group operations together into units called transactions. These changes, once committed, can then can be propagated to everyone else working on the database in real time. This ensures that two people don’t make contradictory changes to the database.
Collaboration using Microsoft Excel often means copying the spreadsheet to multiple people who each make their own changes (or even submit their own spreadsheets), and these spreadsheets are merged together into a master spreadsheet by another person. Because there is no one «authoritative» version of the spreadsheet it is easy to end up with confusion as people make changes in various places.
4. Excel is hard to secure
This might be the biggest reason of all why widespread use of Excel can be a data governance nightmare. If an Excel spreadsheet is not password protected (and most of them are not), all it takes to expose the data is for someone to obtain access to the file in question. This happens in several ways — someone can accidentally send the file to the wrong party, someone can physically or through the network get access to the computer where the spreadsheet is stored. Excel spreadsheets often find their way onto USB keys as well and these are misplaced with alarming frequency.
Relational databases such as SQL Server, Postgres, Oracle, and MySQL are somewhat easier to secure. In these cases there is only one place where the data is stored — presumably on a network or cloud server. This provides two layers of protection around the data — network security and database security.
Backups of relational databases are stored in a format where one must restore the database to a compatible version of the server before one can access the data. This means that it requires a bit of extra sophistication to get access to the data — even if someone gets access to a backup of the data they have to find a way to restore it on a comparable server before they can look at the data. Encrypted backups mitigate this issue even further. This is in contrast to the ease of getting access to data in an Excel spreadsheet — one must simply have a copy of Microsoft Excel on their machine to open a spreadsheet, and almost everyone has Excel.
Looking for an organization that can help you move your critical Excel spreadsheets to more stable and secure platform? Learn more about our data analytics services or contact us and let’s start the conversation.
Microsoft Excel is a versatile spreadsheet software program with broad applications that countless organizations utilize every day. But is Excel a database management system that businesses should be using for product development or tracking? Simply put: no.
Here are six reasons why using Excel as a product database could end up resulting in wasted time and money…and aggravation. Learn more about how robust product information management (PIM) software is a better option.
1. A database connects data tables automatically, Excel does not
Excel updates cells when you enter new information, but it doesn’t automatically carry that data across all related fields. Additionally, transferring information from one spreadsheet to another data table is a time-consuming, manual task that’s vulnerable to human error.
Product information management, or PIM, software eliminates these issues by instantly and automatically updating information across all related product entries and data tables.
2. No version control
Using Excel as a database puts you at risk of working with inaccurate information, and wasting time. Because updates are only available after users have actively saved changes, and files can be saved to any location, there can be multiple versions with conflicting or outdated data to manage.
With real-time updates across the platform, PIM software eliminates the possibility of inadvertently ending up with numerous versions containing different information floating around your organization.
3. Handling large datasets can be a difficult task
Excel is helpful for organizing datasets up to a certain size, but a common, known issue with this software is that responsiveness typically slows to a crawl once you’ve reached the program’s upper data limit. Program bugs and software crashes can also become common headaches as file sizes grow.
Thanks to its extensive storage capacity, PIM software easily processes large, complex datasets with no loss of system performance.
4. Cannot be edited by multiple people at once
Since updating information in an Excel file requires that all changes are saved at an individual workstation, simultaneous edits can’t be made by multiple users. This limitation can cause users to mistakenly work with outdated data and hinder real-time collaboration.
PIM’s centralized database allows multiple users at any number of workstations to access and edit information at any time.
5. Filtering and finding data is limited
Finding the data you need in Excel is limited to searching using Control+F and column filters. This method of searching can be manageable, especially for those who are intimately familiar with the spreadsheet. But with a variety of teams needing easy access to product data, isn’t providing a modern search experience worth it?
With a robust search capability that offers multiple pathways to find information, PIM software keeps your data organized and readily accessible with just a few clicks. In fact, you can replicate a search experience similar to what your customers would find on Amazon or Home Depot.
6. Excel can’t manage visual content
Businesses that try to use Excel as a database often struggle with the disconnect between text-based information and visual media. Excel isn’t designed to support high-resolution images and videos, and the only point of connection you have between product information and related media is a URL. This leads to obvious issues when dealing with product information.
Accessing your media from Excel takes at least a few extra steps, and off-site colleagues and collaborators can’t access visual media at all if you use a local hard drive for storage.
PIM systems store text-based and visual media in one interface. Typically, the visual media will come via integration with a digital asset management (DAM) system, which is the central source for an organization’s entire library of photos, videos, illustrations, documents and other marketing content.
What’s the Solution?
Given its limitations, do you want to trust Excel with your critical information or product data? PIM software is a comprehensive database solution developed specifically with the needs of product management and e-commerce in mind.
Experience the benefits of freeing yourself from Excel firsthand. Widen builds DAM and PIM software to help businesses get content to market faster. today!
A very common way for businesses to store data is within Excel. People are convinced of Excel’s superiority as a database. I mean, after all, Excel is familiar right? It’s easy and it’s user-friendly and you’ve been using it for years. There is just one problem with this – Excel is not a database.
Excel was made as a spreadsheet application. Sure, it is a great complement to any database due to its ability to turn your rows of data into nice looking charts and reports. But when it comes down to it, storing your data is not a job Excel was made to do.
On a recent project, a client delivered their thousands of lines of customer data to us across multiple Excel files. If we were to combine all of this data into one workbook, Excel’s performance would dramatically drop. Similarly, by keeping all the data sets separate, the files would be much smaller and easier to work with performance wise — though analysing the data across multiple Excel files would prove far too difficult.
By merging all the datasets together into SQL Server and through robust analysis methods and our Data Profiling and Deduplication tools we were able to eliminate thousands of lines of bad data and duplicates that would be near enough impossible to pick up using Excel analysis tools. We shrunk our client’s database to less than half of its original size by removing records that were not fit for purpose and fixing those that remained, thereby leaving our client with good data that they could actually use.
A big incentive to not use Excel for storing your data is if you think about how easily you could corrupt or even lose that data. There will probably be several copies of one database sitting with various different people across your organisation, thus making version control difficult. Even storing the file on a shared server will not eliminate the chances of people overwriting each other’s work. If multiple users will need read and write access to your data then it’s time to look for other, more viable options than Excel.
Of course you can use Excel as a simple, flat database. But in our experience most businesses require a far more complex solution to store their data, as datasets get larger and larger, require better collaboration and security methods and need to be protected from bad data entry. All this is just not supported by Excel’s core functionality.
If you are still storing your data within Excel and would like to talk about finding a better fit for your needs, contact Acuate and we will find you your ideal solution.