What is sql used for in excel

This is the first of several articles on SQL for Excel users.  The aim is to show you why it’s so useful for Excel users to be familiar with SQL and to introduce some basic concepts and queries.

Table of contents

  • 1. Why is it important to learn SQL?
  • 2. What is SQL?
  • 3. Getting SQL Server
  • 4. Performing a simple Select query
  • 5. How to filter and sort
  • 6. Querying Multiple Tables – the JOIN statement
  • 7. Aggregating Data – the GROUP BY statement
  • 8. Summary

Why is it important to learn SQL?

In a previous article, SQL and Excel: Why you need both, we showed that for many businesses the use of both SQL Server and Excel is an optimum approach to data management.  In today’s world, there is more data than ever, and everybody in the workplace needs to be comfortable using data in some way.  Excel is the data tool of choice for the vast majority of people, but as we highlighted in the previous article, it has some shortcomings.  This is where SQL Server steps in, and why you need to use both.

Organizations that use SQL have dedicated people or teams to manage their SQL estates.  These Database Administrators (DBAs) or other professionals are experts in their field, so why should you need to learn how to use SQL?  Furthermore, a tool like the SQL Spreads Excel Add-In makes it very easy for Excel users to connect to a SQL table and perform updates to it from within Excel.  The answer, is of course, that you don’t have to learn SQL in order to use data from a SQL database, but there are some real advantages of learning some SQL for Excel users:

  • it makes it easier to talk ‘data’ to your DBA team;
  • it will help you understand the importance of data structure, and may improve the way you use Excel;
  • you can start using SQL query tools (like SQL Server Management Studio) to view and query data yourself, instead of asking a DBA to do it for you;
  • learning a new skill can be a fun and rewarding exercise.

Finally, it’s worth pointing out that as an Excel user, you are already familiar with many of the key concepts in SQL.  It’s therefore just a case of learning some new terminology.  For example, when you use the following in Excel, there is a corresponding feature or query in SQL:

Excel SQL
Tables / Sheets Tables
Filtering Query with a “where” statement
Sorting Query with a, “order by” statement
Removing duplicates Query with the “distinct” keyword
If statements Query with the “case” statement
Pivot tables Query with a “group by” statement (and “sum” or “count”) or the “pivot” statement
VLookup (or Index/Match) Query with a “join” statement

There are more, but this is a great starting point, and we’ll go through most of these in this article.

What is SQL?

SQL (Structured Query Language) is used to access and manage data in a relational database.  Relational database management systems (RDBMS) are the pre-eminent database technology.  Almost all of the different RDBMS flavors use SQL.  Some commonly used RDBMS are Microsoft SQL Server, MySQL, Oracle.  We’re going to focus our articles on Microsoft SQL Server.

Data is stored in tables in a relational database. Those tables usually look like one sheet in Excel, with rows and columns.  We can use a set of instructions (written in SQL) to retrieve data and perform analysis.  Strictly speaking, because we’ll be using Microsoft SQL Server, we’ll be using the Transact SQL (T-SQL) flavor of SQL.  T-SQL is a proprietary Microsoft extension of the SQL language which provides all the functionality of SQL but with some added extras.

Generally, when most people say, “use SQL”, they are referring to the underlying RDBMS and the language as one.

The diagram below shows how queries are written in SSMS and executed against data stored in tables in a database on a database server.  There are a couple of other things to note:

  • multiple users can run queries against a database from their own local SSMS installations;
  • a database server can have multiple databases installed on it;
  • a database can have multiple tables in it.

SSMS_SQLServer_diagram

Getting SQL Server

To start using SQL Server you can either download and install it (on your own machine or a server), or use the cloud-based Azure SQL suite.  We’ll look at the download and install option first.

Installing SQL Server

The latest version of SQL Server is SQL Server 2019 (15.x).  There are several editions available: Enterprise, Standard, Developer, and Express. The latter is free and is a really good way for you to learn SQL.  You can get the download for SQL Server Express here.

The installation process is very easy to follow, but if you need more information, there is a good article here.

It’s important to note that the installation wizard will also install SQL Server Management Studio (SSMS) for you as well, so once you have completed the installation, you can open SSMS and start querying data!  Before you can do that, however, you’ll need some data to query.  A good way to start is with the AdventureWorks sample database from Microsoft.  You can get the file and installation instructions here

When you open SSMS, you’ll see the Object Explorer on the left.  Here you can see the name of your database server and below it, the list of databases.

SSMS_screen_layout

Click on the AdventureWorks database to expand it, and you’ll see all the tables in the database. 

SQL_object_explorer

Azure SQL

First off, you’ll need an Azure subscription.  If you don’t already have one, you can sign up for a free Azure account here.

Once your account is set up you can go to the Azure portal home page.  On the home page, click on Azure SQL under the available services (if you don’t see it, you can search for it in the top search box).

Azure Portal Home

On the next screen, click on the ‘Create Azure SQL resource’ button.

Azure SQL home

On the next screen, click on the SQL Database option.

Azure SQL Deployment options

Full instructions for setting up your database in Azure SQL are here.

Ok, now it’s time to start querying data!

Note: the examples in the following sections refer to the AdventureWorks database; if you have installed the AdventureWorksLT database sample as part of your AzureSQL configuration, then some of the table names will differ slightly.

Performing a simple Select query

We’re going to run a simple query to select some data from one of the tables on the AdventureWorks database.  Click on AdventureWorks, and then right-click and select ‘New Query’ (you can also click ‘new Query’ in the menu bar).  In the new Query window that opens up, type the following:

SELECT Name, ProductNumber, Color
FROM Production.Product

then press the ‘Execute’ button (or press F5) to see the results displayed in the lower window.  

SQL_new_query

Tip: we could copy and paste the results directly from the results pane and paste them into Excel!  Just click in the grey header in the top left corner of the results table, right-click and select ‘Copy with Headers’.  You can then paste it into Excel.  

SQL_copy_results

Let’s just recap what we just did, and look closely at the query we ran:

SQL_simple_select_statement

This SELECT statement simply gets the specified columns from the specified table and returns the results. Now, you may ask yourself, how do I know which columns to specify if I don’t know what columns there are in the table?  You can, of course, click on the table in the Object Explorer to see all the columns in the table (including their data types).

SQL_object_explorer_columns

Whilst it’s useful to see the column names, you often need to see the actual data to check whether you need to return it in a query.  There is a quick shortcut to do this – select the Product.Product table, right-click and click ‘Select Top 1000 Rows’.

SQL_select_top_1000

The query is displayed in a new Query editor and then executed to show the results in the bottom pane.  You can see in the query that the TOP statement has been added to the query.  Also note that the auto-generated script refers to the table with the fully qualified name format of [database].[table].  This is best practice because it avoids problems if you have the same table names across databases.  

SQL_select_top_1000_results

Tip: you can also get a quick overview of data in a table by typing the following:

SELECT Top (10) * 
FROM AdventureWorks.Production.Product

The * in the query basically says “show me everything” in the table.  A DBA would frown at you if you did this because it could return a huge amount of data and consume unnecessary resources – that’s why we limit the query with the TOP statement.

How to filter and sort

Let’s say that we only want to look at items in the Product table that are finished goods, and not components of products.  To do this I need to add a WHERE statement.  The WHERE statement is used to specify the condition(s) that need to be met for the data to be returned.  In the example below, we’ve added the condition ‘FinishedGoodsFlag = 1’ after the WHERE statement.

SELECT Name, ProductNumber, Color, ListPrice 
FROM AdventureWorks.Production.Product 
WHERE FinishedGoodsFlag = 1

If we wanted to further say that we only want to see finished goods with a list price greater than 1000, we simply extend the conditions after the WHERE statement:

SELECT Name, ProductNumber, Color, ListPrice 
FROM AdventureWorks.Production.Product 
WHERE FinishedGoodsFlag = 1 AND ListPrice > 1000

There are lots of conditions that you can add such as specifying that values must be between a range of dates, or in a list and you can add complexity with multiple nested conditions.

Finally, I want to order the results from most expensive to least expensive.  To do this, we add the ORDER BY statement at the end:

SELECT Name, ProductNumber, Color, ListPrice 
FROM AdventureWorks.Production.Product 
WHERE FinishedGoodsFlag = 1 AND ListPrice > 1000 
ORDER BY ListPrice DESC

SQL_simple_select_with_filtering

Querying Multiple Tables – the JOIN statement

Once you start playing around in SQL and writing some simple select statements you quickly realize that the data you need to see in the results is usually in more than one table.  The classic example is when a table contains an ID number instead of an actual name in a column – to get the name, I need to look up the ID in another table.  In Excel, we would use either the VLOOKUP function or the INDEX / MATCH functions.  So, how do we do it in SQL?          

Let’s go back to look at our Products table in the AdventureWorks database.  In the table, there is a column called ‘ProductSubcategoryID’.  

SQL_product_sub_category

I want to look up the actual Sub-category name using the ID.  To do this, I need to JOIN the Products table and the ProductSubcategory table.  Let’s look at the query and the results and then work out how it works:  

SELECT P.Name AS 'ProductName', SC.Name AS 'Subcategory' 
FROM AdventureWorks.Production.Product AS P 
INNER JOIN AdventureWorks.Production.ProductSubcategory AS SC ON P.ProductSubcategoryID = SC.ProductSubcategoryID 
WHERE FinishedGoodsFlag = 1 
ORDER BY P.Name

SQL_join_statement_subcategory

The INNER JOIN works using the fact that there is a common column between the 2 tables we want to join – in our example it is the ProductSubcategoryID column.  The query therefore basically says “return the columns specified (ProductName and Subcategory) from the two related tables where values in the ID columns are equal”.  Here is more of a breakdown:  

  1. The INNER JOIN statement is added after the first related table is specified – the Product table. Note that we have added a ‘table alias’, P, for the Product table; this just makes it easier to read the query when we have long table names.  Next, we need to specify the table on the other side of the join – the ProductSubcategoryID table.  Again, we give it an alias, SC.
  2. Now we need to specify the columns we want to match values on – the ProductSubcategoryID column in each table. Note that we need to qualify the column names with the respective table alias names.
  3. The columns that we want returned by the query have the same name, so we need to assign new names (eg ‘ProductName’ and ‘Subcategory’).

How_the_join_statement_works

As you can see, querying data from multiple tables is quite straightforward. Let’s extend our example a little more by also adding the Product Category to our query.  To do this we’ll need to join to the ProductCategory table   Here is the query:  

SELECT P.Name AS 'ProductName', SC.Name AS 'Subcategory', C.Name AS 'Category' 
FROM AdventureWorks.Production.Product AS P 
INNER JOIN AdventureWorks.Production.ProductSubcategory AS SC ON P.ProductSubcategoryID = SC.ProductSubcategoryID 
INNER JOIN AdventureWorks.Production.ProductCategory AS C ON SC.ProductCategoryID = C.ProductCategoryID 
WHERE FinishedGoodsFlag = 1 
ORDER BY P.Name 

And here are the results:      

SQL_product_sub_category_and_category

Aggregating Data – the GROUP BY statement

In the example we just looked at we ended up with a list of products that shows the sub-category and category they belong to.  How can we get a count of how many products are in each category and sub-category?  In Excel, it would be quick and easy to create a pivot table to do this.  In SQL, we can use the GROUP BY statement:  

SELECT C.Name AS 'Category', SC.Name AS 'Subcategory', COUNT(*) AS 'CountofProducts' 
FROM AdventureWorks.Production.Product AS P 
INNER JOIN AdventureWorks.Production.ProductSubcategory AS SC ON P.ProductSubcategoryID = SC.ProductSubcategoryID 
INNER JOIN AdventureWorks.Production.ProductCategory AS C ON SC.ProductCategoryID = C.ProductCategoryID
WHERE FinishedGoodsFlag = 1 
GROUP BY C.Name, SC.Name 
ORDER BY C.Name  

As you can see, the GROUP BY statement is added at the end of the query.  We just need to provide the columns that we want to do the aggregation on.  The aggregation itself is included in the list of columns we want to return in the SELECT statement – in our example, we want to COUNT how many products are in each category and sub-category.    

SQL_group_by

But where are the Category sub-totals?  To get these, we need to use the ROLLUP extension to the GROUP BY statement:  

SELECT  C.Name AS 'Category', SC.Name AS 'Subcategory', COUNT(P.Name) AS ProductCount 
FROM AdventureWorks.Production.Product AS P 
INNER JOIN AdventureWorks.Production.ProductSubcategory AS SC ON P.ProductSubcategoryID = SC.ProductSubcategoryID 
INNER JOIN AdventureWorks.Production.ProductCategory AS C ON SC.ProductCategoryID = C.ProductCategoryID 
WHERE FinishedGoodsFlag = 1 
GROUP BY ROLLUP (C.Name, SC.Name)    

The Category sub-totals are added to the results as new rows.  

SQL_group_by_rollup

There is also the PIVOT statement in SQL; this is covered in more detail in our How to Create Pivot Tables in SQL Server article.  

Summary

There is a lot to be gained for Excel users to learn how to write some basic SQL queries.  In this article on SQL for Excel users, we’ve run through some simple examples which will help you get started.  There are a lot of good online resources to help you learn more, and we’ll also cover some more advanced scenarios and examples in future articles.    

In the meantime, if you are an Excel user and need to view and make updates to data stored in SQL Server, then try out the SQL Spreads Add-in for Excel.  Download a free trial version here.

*This article was originally published on August 27, 2021 and was updated on July 4, 2022 to include updated information on the latest version of SQL Server.

Andy

Andy McDonald

Andy has worked 20+ years in the Engineering, Financial, and IT sectors with data analysis and presentation using tools such as SQL Server, Excel, Power Query and Power BI.

Writes for SQL Spreads about Excel and SQL Server and how to tie those two together.

You can do a lot of powerful things with Excel, for example connecting to other data sources. In this article, we are going to look at how we can use SQL within Excel.

What is SQL?

SQL stands for “Structured Query Language.” Microsoft SQL Server is just one of the many databases that use it. In a database:

  • you store data in tables
  • you can run SQL queries to retrieve data.

The advantages of using a database such as Microsoft SQL Server to store your data include:

  • The data is strongly typed, meaning that you cannot store a number in a date field. This makes your data instantly validated.
  • It can be a central data repository for your data on multiple projects.
  • Multiple people can access the same data at the same time. This reduces duplication and inconsistencies.
  • It is also well-protected with built-in security within the Relational Database Management Systems. Microsoft SQL Server offers several layers of security.

For more information, please see my Udemy article “Excel vs SQL Server.”

How do you access an SQL Server? First, you need a data connection. If you are using a work SQL Server, then you will be given details of your server by your IT department. This will include:

  • The Server Name: It can also take this from the Connection String if you have it.
  • Authentication Method: You will use either:
    • Windows Authentication, using your Windows username and password
    • SQL Server Authentication, using a separate username and password

If you have Microsoft SQL Server on your own computer, then the server name could be “localhost” or “.”, and you will probably use Windows Authentication.

You can use this connection to retrieve the Microsoft SQL Server data.

There are three different places in Excel where you can load SQL data:

  • In the main Excel window
  • In the Get and Transform window (also known as the Power Query editor)
  • In the Power Pivot window (also known as the Data Model)

We will have a look at each of these places.

Connecting SQL to the main Excel window

The main Excel window is the one you use every time you open Excel. To load data from SQL Server, go to Data – Get Data – From Database – From SQL Server Database. This has superseded previously used methods such as Microsoft Query.

You will then have to provide the Server Name.

There are four SQL Server data sources that you could query to return the results.

  1. You may want the data from a table. This is the raw data.
  2. You may want the query results from a previously created view. This results from an SQL Server data analysis.
  3. You may want the results from a stored procedure. This could be a more complex analysis, or one that involves parameters. For example, you may just want all sales from the state of Florida. Here, ‘Florida’ would be a parameter.
  4. You may want to run an ad hoc SQL query using the SELECT statement.

If you want to run a Stored Procedure or an ad hoc query, then at this stage, you will need to click on “Advanced options” and write the query in the box provided. You will also need to enter the name of the database as well.

Next, you need to provide the Authentication mode and any credentials required:

If you want to retrieve the results of a table or query, you can select the table or query. If you then click “Load,” it will be loaded into your Excel Workbook. We will look at what happens if you click “Transform Data” in the next section of this article.

Once you have made the link, it will load the data into an Excel Table. You can then use it just like other data stored in a table.

You can refresh the data whenever you want by right-hand clicking inside the table and choosing Refresh, or by going to Table Design – Refresh.

Connecting SQL to Get and Transform

The second way to connect to SQL data is by using the Get and Transform window.

This follows the same process for connecting to SQL Server as mentioned above, except that you press “Transform Data” instead of Load.

Once you have done this, then the data is in the Get and Transform window, also known as the Power Query Editor. 

You can also load data directly from the Power Query Editor. To do this, go to Home – New Source.

You can then perform additional manipulations before the data transfer into Excel. For example, you might want to:

  • Hide some columns or rows (by going to Home – Choose/Remove Columns)
  • Add additional columns using formulas. (However, Power Query uses a language called M, which differs significantly from Excel.)
  • Summarise the data using the Group By function

If you do this in Power Query, it will reduce the amount of data that goes into Excel. Power Query reduces the amount of data that it receives from SQL Server through a process called Query Folding. For example, you could retrieve all the contents of a table into Power Query, limit the number of rows to just 50, and reduce the number of columns used to just two. 

This reduction will be incorporated into the SQL statement so that Excel only retrieves the needed rows and columns from SQL Server. This reduces network traffic and increases the speed of retrieving that data.

When you leave the Power Query window by going to Home – Close & Load, it would then load the data into an Excel Table as before. 

However, if you go to “Home – Close & Load To…” instead, you could then:

  • Use it in a Pivot Table or Chart without loading the data in Excel as a Table.
  • Save it as a Connection (without loading the data into an Excel Table).

If you save it as a connection, you can use it later as the data source in any new Pivot Tables.

In “Save & Load To…”, there is a checkbox for “Add this data to the Data Model.” If you click on this, Excel will then export the data into Power Pivot, also known as the Data Model. We’ll have a look at the Data Model in the next part of this Article.

Connecting SQL to Power Pivot

The third way of connecting SQL to Excel directly is by using the Data Model, also known as Power Pivot. To open the Data Model, you need to go to Data – Manage Data Model. 

Then you can import the data into Power Pivot by going to Home – Get External Data – From Database – From SQL Server. You then connect to SQL Server in a similar process as before.

Once you have imported the data, you can then create calculation columns or measures. Power Pivot uses a formula language called DAX to build formulas. DAX is an extended version of the Excel formulas.

Once you have finished, you can then create a Power Pivot Table by going to Home – PivotTable – PivotTable.

This allows you to create Pivot Tables or charts from this data.

Where to go for more information

I hope that you have enjoyed this article.

Are you interested in Power Query or Power Pivot? Then why not join me in my “Analyzing and Visualizing Data with Microsoft Excel” course, where we have a look at these topics, together with Pivot Tables.

Do you want to learn SQL statements quickly? Please have a look at my “SQL Server Essentials in an hour” course. We will look at the six principal clauses of the SQL SELECT Statement: SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY.

Most Excel spreadsheets need you to manually insert data into cells before analyzing it or performing calculations using formulae or other functions. You may use Excel to get data from a big data source, such as an Access database, a SQL Server database, or even a huge text file. SQL statements in Excel allow you to connect to an external data source, parse fields or table contents, and import data without having to manually enter the data. After importing external data using SQL commands, you may sort, analyze, and conduct any necessary computations. Here, we will be discussing how to execute SQL statements in MS Excel. For this, an open-source package called ‘xlwings’ is required. So, before we begin with the process of running SQL queries in MS Excel, we will have to install xlwings. For running SQL queries in MS Excel using xlwings, having Windows OS and Python is a must.

Installation of xlwings

Make sure you have installed pip for Python beforehand. If not, refer to this GeeksforGeeks link. Once you have installed pip, open your Command Prompt and type pip install xlwings, and hit Enter. Once this command is executed completely, type xlwings add-in install and hit Enter. Now, open Excel, and you’ll find xlwings section added.

Steps to run SQL queries in Excel

Step 1: Creation of Tables in Excel.

For the execution of SQL queries in Excel, in this article, two tables have been created in Excel (same workbook) and will be used for demonstration of the same. The two tables are – Employee Table and Department Table, as depicted below:

Table 1: Employee Table.

Employee-Table.

Table 2: Department Table.

Department-Table

Step 2: Write the SQL query in Excel.

Type in the SQL query to be executed in Excel. (You may first Merge & Center the cells and then type in the SQL query). 
Note: When only one table is being referred to, use ‘a’/’A’ for referring to it. If there are two tables, for example, when Joins are used, use ‘a’/’A’ for the first table and use ‘b’/’B’ for referring to the second table

SQL-Query-sample.

Step 3: Running the SQL query in Excel.

For executing the SQL query, type in =sql( in a new cell, where you need the retrieved data to be displayed. Then, click on the Insert Function option, displayed to the left of the Formula Bar.

insert-function

On clicking the Insert Function option, a dialog box appears, which requires 2 inputs – Query and Tables. For the Query input, select the SQL query cell (above step) or simply manually type in the query to be executed.

For the Tables input, hold and drag the entire table to be used for the SQL query. If there is more than one table, add the table(s) in a similar fashion in the Tables input. After this, click on the Ok button, and presto, the data is retrieved!

function-argument-shown

Output: Now you can see the output of the SQL Query. 

sql-query-output

More Sample SQL Queries in Excel

  • SELECT STATEMENT: The SELECT statement is used to get information from a database. The information returned is saved in a result table known as the result set.

Select statement syntax: SELECT Age FROM a

select-query

SELECT Name, Gender FROM a 

selecting-name

  • WHERE CLAUSE: To filter records, use the WHERE clause. It is used to extract only records that meet a predefined requirement.

Where clause syntax: SELECT * FROM a WHERE Gender = ‘Female’    

where-condition-applied

  • OR OPERATOR: The OR operators are used to filter records based on several criteria. If any of the conditions separated by OR is TRUE, the OR operator shows a record.

Or operator syntax: SELECT * FROM a WHERE Gender = ‘MALE’ OR Age < 40     

or-operator

  • NOT OPERATOR: If the condition(s) is NOT TRUE, the NOT operator shows a record.

Not operator syntax: SELECT * FROM a WHERE NOT Gender = ‘Female’     

not-operator

  • MIN() FUNCTION: The MIN() method returns the column with the lowest value.

          Min function syntax: SELECT MIN(Age) FROM a 

It shows the data of minimum age only

  • AVG() FUNCTION: AVG() returns the average value of a numerical column.

          Avg function syntax: SELECT AVG(Age) FROM a

It shows the data of average age only.

  • GROUP BY STATEMENT: The SQL GROUP BY clause is used in conjunction with the SELECT statement to group identical data. In a SELECT statement, the GROUP BY clause comes after the WHERE clause and before the ORDER BY clause.

Group By statement syntax: SELECT AVG(Salary) AS Avg_Sal, Gender FROM a GROUP BY Gender 

group-by-query

  • SQL INNER JOIN: The INNER JOIN keyword selects records from both tables that have matching values.  

Inner join syntax: SELECT a.Name,a.Dept,b.D_Name,b.D_City FROM an INNER JOIN b ON a.Dept=b.D_Name

applying-inner-join

26 Sep ’22 by Antonio Nakić-Alfirević

SQL Query function in Excel

If you’re reading this article you probably know that Google Sheets has a QUERY function that allows you to run SQL-like queries against data in the sheet. This function lets you do all sorts of gymnastics with the data in your sheet, be it filtering, aggregating, or pivoting data.

Being a fully-fledged desktop app, Excel tends to be more feature-rich than Google Sheets. This is especially true in the data analytics department where Excel shines with advanced Excel functions as well as Power Query functionality.

However, Excel doesn’t natively have a QUERY function that you can use in cells on the sheet.

In this blog post, I’m going to show you how to add a QUERY function to Excel and give a few examples of how to use it.

First look

Let’s start by taking a look at the function in action.

Simple SELECT query examples

Simple SELECT query examples

The function is pretty straightforward. It accepts the SQL query as the first parameter and returns a table with the results of the query.

The results automatically spill to the necessary amount of space. This spilling behavior relies on the dynamic array functionality that’s available in Excel 365 (but isn’t in earlier versions of Microsoft Excel).

Works with Excel tables

In Google Sheets, the QUERY function references data by address (e.g. “A1:B10”) while columns are referenced by letters (e.g. A, B, C…).

This works but has some drawbacks:

  • It makes the query sensitive to the location of the data. If the data is moved or if columns are reordered, the query will break.
  • It makes the query difficult to read since it uses range addresses and column letters instead of table and column names (e.g. Employees, DateOfBirth…)
  • Adding or removing rows can break the query. For example, if the provided range is “A1:H10” the query will only take into account the first 10 rows. If additional rows are added, the query will not take them into account. You can get around this by omitting the end row number (e.g. “A1:H”), but this means that there must be no other content below the data range.

Excel, on the other hand, allows explicitly defining tables (aka ListObjects) that delineate the areas that hold data. Each Excel table has a name, as do its columns. This makes Excel tables very similar to database tables and makes them easier to work with from SQL.

Explicitly defined tables in Excel

Explicitly defined tables in Excel

Full SQL syntax support (SQLite)

Under the hood, the Windy.Query function is powered by SQLite – a small but powerful embedded database engine.

When called, the function passes the query to the built-in SQLite engine which has an adapter that lets it use Excel tables as its data source.

This means that the entire SQLite syntax is available for use in queries. In comparison, in Google Sheets, the query syntax is rather limited. It only supports a single table (no joins) and a very small set of built-in functions.

Examples of use

Since the engine under the hood is SQLite, queries can use all operations available in SQLite, including table joins, temp tables, column table expressions, window functions etc… Let’s go over some examples of how to use these in Excel.

Joining tables

Here’s an example of a simple one-to-many join:

Simple one-to-many SQL join

Simple one-to-many SQL join

The usual way of doing a simple operation such as this one in Excel would be to use xlookup or PowerQuery, but SQL is now another option. And if we needed anything more complex than a simple join, SQL would quickly shine as the most powerful and convenient option of the three.

Merging table rows (union)

Another way we might want to combine two (or more) tables is to combine their rows. We can do this with a SQL UNION operator.

The tables might have some rows in common. If we want to keep only one instance of such rows we would use the regular UNION operator. If we want to keep both versions of rows that are in common, we would use the UNION ALL operator.

Merging rows from two tables

Merging rows from two tables

Finding differences between two tables

In the previous example, we had two tables that had some rows in common and some rows not. Let’s assume, for example, that the first table contains last year’s list of employees and the second table is the new list of employees.

If we wanted to find out the differences between the two tables, we could easily do that with a bit of SQL.

All of the rows that are in the first table but not in the second one we will mark as “deleted”. All of the rows that are in the second table but not in the first one we will mark as “added”. Here’s what that SQL query looks like:

select
	id, name, 'deleted'
from
	employees e where not exists (select * from Employees_New en where e.id == en.Id)
union
select
	id, name, 'added'
from
	employees_new en where not exists (select * from Employees e where e.id == en.Id)

And here’s what the result looks like:

Calculating a diff between two tables

Calculating a diff between two tables

Ranking rows

Another useful thing we might want to do is rank rows based on some criteria. For example, suppose we have a table with a list of cities. For each city we have its population and the country it belongs to.

Our task is to find the top 3 cities in each country based on population. Here’s how we might do that in SQL.

-- we use this CTE so we can reference the calculated 'rank_pop' column in the where clause
with cte as (
	select
		city,
		country,
		population,
		-- using the RANK() window function
		RANK() OVER (PARTITION BY country ORDER BY population) as rank_pop
	from
		cities c)
select
	*
from
	cte
where
	-- filtering by the 'rank_pop' column from the CTE
	rank_pop <= 3
order by country, rank_pop

Rank cities by population in each country, return top 3 per country

Rank cities by population in each country, return top 3 for each country

This query is a bit more complex than the previous ones. It uses a common table expression and a window function (the rank function), and showcases the ability to write complex SQL in queries.

Queries can also make use of dozens of built-in SQLite functions. Various specialized extended functions such as RegexReplace, GPSDist (GPS distance between two points) and LevDist (fuzzy text matching) are also available.

Updating tables

OK, this next example is a bit of a hack, but a useful one… The query you supply doesn’t need to be a SELECT query. You can do UPDATE/INSERT/DELETE statements as well, and these will modify the data in the target Excel tables.

Updating Excel tables with SQL

Updating workbook tables with SQL

This can be a handy way to clean and transform data in your tables in place, without having to export/import the data to an external database (e.g. SQL Server, MySql, Postgres…).

This works because the SQLite engine isn’t copying the data. Rather, it’s using an adapter that lets it access live data in the Excel table.

How does the function see Excel tables?

At first glance, it might seem strange that the query can access your workbook tables. After all, we did not pass them in as parameters, and functions normally only work with parameters that are passed to them.

However, the Windy.Query function is aware of the workbook it’s being called from and it can read data from the workbook’s tables without the need for passing them in as parameters. This makes the function much easier to call especially when working with multiple tables.

Column Headers

Results returned by the Windy.Query function can optionally include headers. This is controlled by the second parameter of the function.

Toggling column headers on/off

Toggling column headers on/off

The texts in the column headers are determined by the SQL query itself. You can easily rename result columns by aliasing them in the select list.

Aliasing column headers

Aliasing column headers

Automatically refresh results

By default, the SQL query runs as a one-off operation when you enter the formula but does not refresh if the source tables change. However, if you want the query to refresh whenever one of the source tables changes, you can easily do so by setting the autoRefresh argument to true.

Automatic refresh of results when source data changes

Automatic refresh of results when source data changes

Note that the auto-refresh functionality relies on Excel’s RTD (Real-Time Data) server. The RTD server usually throttles updates so functions don’t overwhelm Excel with frequent updates. The default throttle interval is 2s meaning that the function will not update more than once every 2s. To improve responsiveness, you can lower this value to something like 20ms. The simplest way to do this is through the “Configure” dialog in the QueryStorm runtime’s ribbon.

Adjusting the RTD interval in Excel

Adjusting the RTD interval in Excel

Passing parameters

When needed, SQL queries can use values from cells as parameters. To use a cell as a parameter in a query, start by giving the cell a name (named range).

Give the parameter cell a name

Give the parameter cell a name

Once the cell has a name, you can reference it in the query using the @paramName or $paramName syntax.

Use the parameter in the SQL query

Use the parameter in the SQL query

If automatic refresh is turned on, results will automatically refresh whenever one of the parameter cells changes its value.

Automatic refresh of results when parameter cell changes

Automatic refresh of results when parameter cell changes

Performance

This is all well and good for small tables, you might think, but how does it handle large data sets? Well, it handles them quite well. The function can read source tables of 100k rows and 10 columns within a few milliseconds and can return this amount of data in a second or two. In addition to this, all columns are automatically indexed so searches and joins are extremely performant as well.

This makes the function perform very well, both from the data throughput standpoint as well as from the computational one.

OK, so is this better than the Google Sheets version of the QUERY function?

Yes, dah. Did you read the previous chapters? 😛

Installing the Windy.Query function

So how do you install this function into your Excel? It’s a simple 2-step process.

Step 1 is to install the QueryStorm Runtime add-in (if you don’t already have it). This is a free, 4MB add-in for Excel that lets you install and use various extensions for Excel. It’s basically an app store for Excel.

Step 2 is to click the “Extensions” button in the “QueryStorm” tab in the Excel ribbon, find the Windy.Query package in the “Online” tab, and install it.

What happens if I share the workbook with a user who doesn’t have the function?

Nothing bad. If the other user doesn’t have the function installed, they will see the last results of the query that were returned on your machine. They just won’t be able to refresh the results.

Advanced SQL query editor

Writing SQL queries in the formula bar can get a bit unwieldy. To make queries easier to write, it’s better to use a proper editor, preferably one that offers syntax highlighting and code completion for SQL and knows about the tables in your workbook.

For this purpose, I recommend using the QueryStorm IDE. This is an advanced IDE that lets you use SQL in Excel. You can write the query in the QueryStorm code editor and then paste the query into the Windy.Query function when you’re happy with it (if needed).

Using the QueryStorm IDE to write SQL queries in Excel

Using the QueryStorm IDE to write SQL queries in Excel

The IDE does more than just allow using SQL in Excel. You can use it to create and share functions and addins for Excel. In fact the QueryStorm IDE was used to create the Windy.Query function itself.

The IDE has a free community version for individuals and small companies, while users in larger companies can make use of the free trial license. For paid licenses, check out the pricing page.

You can read more in this blog post that’s dedicated to the QueryStorm SQL IDE.

Video demonstration

For a video demonstration of the Windy.Query function, take a look the following video:

07 Aug 3 Ways to Perform an Excel SQL Query

Posted at 12:08h
in Excel VBA
0 Comments

Howdee! Excel is a great tool for performing data analysis. However, sometimes getting the data we need into Excel can be cumbersome and take a lot of time when going through other systems. You’re also at the mercy of how a disparate system exports data, and may need an additional step between exporting and getting the data into the format you need. If you have access to the database where the data is housed, you can circumvent these steps and create your own custom Excel SQL query.

To follow along with my below demos, you’ll need to have an instance of SQL server installed on your desktop. If you don’t, you can download the trial version, developer version, or free express version here. I’ll be working with the free developer version in this article. I’m also using a sample database that you can download here. The easiest way to install this is using SQL Server Management Studio (SSMS). That download is available here. Once you open SSMS, it should automatically detect your local server instance. You must ensure your SQL Server User is running as the “Local Client” and then you can create a blank database, and restore that database from the backup file. If you have issues accomplishing this, let me know in the comments and I’ll elaborate on how this is done.

If you are familiar enough with SQL and have access to your own data, you can skip these steps and use your data. Otherwise, I recommend downloading these tools before getting started. If you’re new to SQL, I highly recommend the SQL Essential Training courses on Lynda.com. Now, on to why you’re all here…

Excel SQL Query Using Get Data

This option is the most straight forward approach to creating an Excel SQL query. However, it is important to note that this approach is only available in Excel 2013 and later and will not currently work on Mac OSX. To get started, select “Get Data” à “From Database” à “From SQL Server Database” as shown in the screen grab. At this point it will pop-up a prompt to enter your server name and the target database you’re wanting to query (you can get this information from SSMS). You can enter this information and then select “OK”. This will allow you to browse available tables from that database to import. You can remove columns and filter tables before importing. If you do not know how to write SQL queries yet, this is one approach you can take.

However, if you select the “Advanced” dropdown arrow, you can create your own custom Excel SQL query. I usually create my query in SSMS or Visual Studio and then just paste the final query in this window. That is because there is no intellisense in this window and it can be difficult to spot errors in your query. Once you select OK, it will ask you to confirm credentials and you may get an error about encryption. This is common when connecting to databases in this manner and nothing to worry about. The next screen will provide an example of your data and you can select “Load” to import it.

Excel SQL Query

This will create a table on a new tab and you’ll also notice a new pane on the right titled “Connections & Queries”. It will display the name of your query (defaults to “Query1”, “Query2”, etc.) and you can rename the query by right-clicking and selecting “Rename”. You can also edit the query from this location as well. It will open up an interface with a sample of your data and you can add/remove columns, filter your data, or edit your source query from here.

Excel SQL Query

Now that you’ve set up this Excel SQL query, you can simply refresh the data set with fresh data anytime by clicking “Refresh All” on the “Data” ribbon. A quick side note here. If you pivot this data, “Refresh All” will refresh pivot tables first and then the query. To update your pivot table, you’ll need to refresh all twice or update your pivot table manually. To me, one of the downsides of this approach is the results are always returned in a table. I personally do not like working with tables in Excel. That’s where using VBA for your SQL query can come in handy.

Excel SQL Query Using VBA

Using VBA to create your Excel SQL query is not as straight forward as the previous approach, but can still be an extremely useful method depending on your situation. I particularly like that the data is not returned to a table unless you designate it to be so. This technique will work on older versions of Microsoft Excel but will not work on Mac OSX versions of Excel since it uses and ADO connection.

To get started, open up the VBA editor by pressing alt+F11. Before beginning to write your code, you’ll need to ensure that the “Microsoft ActiveX Data Objects 2.0 Library” is referenced from the VBA Project. To do this, click on “Tools” in the ribbon menu at the top of the VBA editor. In the popup, ensure the library is checked as shown below. This allows the project to use the ADO connectors to create the connection to your database. Next, let’s dimension a few variables.

Excel SQL Query


Dim Conn As New ADODB.Connection
Dim recset As New ADODB.Recordset
Dim sqlQry As String, sConnect As String

The Conn variable is will be used to represent the connection between our VBA project and the SQL database. The receset variable will represent a new record set through which we will give the command to perform our Excel SQL query using the connection we’ve established. Finally, the sqlQry variable will represent a string variable that is our SQL query command, and the sConnect variable will be a string representing the connection string the database requires. Let’s look at how to use these variables to perform a SQL query.


sqlQry = "select top 1000 si.InvoiceID, si.InvoiceDate, sc.CustomerName from Sales.Invoices si" & _
             " left join sales.Customers sc on sc.CustomerID = si.CustomerID"

sConnect = "Driver={SQL Server};Server=[Your Server Name Here]; Database=[Your Database Here];Trusted_Connection=yes;"

Conn.Open sConnect

Set recset = New ADODB.Recordset

    recset.Open sqlQry, Conn
    Sheet2.Cells(2, 1).CopyFromRecordset recset
    recset.Close

Conn.Close

Set recset = Nothing

While this may look complex, each step is relatively simple. Firstly, we set our sqlQry variable equal to a string that represents the syntax of our SQL query. We then create a connection string we can use in our next command to connect to the database. So, “Conn.Open” is the command to open the connection and “sConnect” is the string it uses to do so. “Trusted_Connection=yes” means that the connection will attempt to be established using your Microsoft credentials for the account you’re logged in as.

Now that the connection is open, we can open a new record set and pass it the sql command using the sqlQry variable, and tell it which connection to use by passing it the Conn variable. We can then use the VBA command “CopyFromRecordset” to paste the recordset anywhere in our workbook. It’s important to close both the record set and connection at this point. You also want to set your recset variable equal to nothing so it does not eat up valuable resources.

One of the downsides to using this method is that you must explicitly tell Excel some things that the previous approach did automatically. For example, this SQL query will not return any column headers. Therefore, you must explicitly tell Excel what to label your columns. Secondly, the data is not automatically cleared and the new query imported. You must also explicitly tell Excel to do this as well. Here is the final code with those commands added.


Sub SQL_Example()
Dim Conn As New ADODB.Connection
Dim recset As New ADODB.Recordset
Dim sqlQry As String, sConnect As String

Sheet2.Cells.ClearContents

sqlQry = "select top 1000 si.InvoiceID, si.InvoiceDate, sc.CustomerName from Sales.Invoices si" & _
            " left join sales.Customers sc on sc.CustomerID = si.CustomerID"

sConnect = "Driver={SQL Server};Server=[Your Server Name Here]; Database=[Your Database Name Here];Trusted_Connection=yes;"

Conn.Open sConnect
Set recset = New ADODB.Recordset

    recset.Open sqlQry, Conn
    Sheet2.Cells(2, 1).CopyFromRecordset recset
    recset.Close

Conn.Close
Set recset = Nothing

Sheet2.Cells(1, 1) = "Invoice ID"
Sheet2.Cells(1, 2) = "Invoice Date"
Sheet2.Cells(1, 3) = "Customer Name"

End Sub

My preference for using this approach is when I want the user to be able to pass parameters to my Excel SQL query. For example, I might have a dropdown of customer names the user could select. By using this tactic, I can easily add a dropdown of customer names the user can select, and pass that value to my SQL query in a where clause.

As you can see, both the built in Excel SQL query and the VBA method have pros and cons. I employ both in my everyday work depending on what situation I find myself in.

Excel SQL Query Using Microsoft Query

This option is likely the most complex option, but it has the added advantage of being compatible with some versions of Mac OSX. I won’t pretend to be an expert at creating Mac OSX compatible tools for Excel, but I have successfully used this implementation to create an embedded Excel SQL query for Macs in the past.

I also like this method because you can create popup style parameters. For example, you can prompt the user to input date range parameters at the time the SQL query is ran. Like the first example, running this query is as easy as clicking “Refresh All” on the Data ribbon. Let’s dive in to the details.

To get started here, click “Get Data” on the Data ribbon. In the menu that dropdowns select “From Other Sources” and, finally “From Microsoft Query”.

Excel SQL Query

This will open a wizard for you to choose your data source. Double click “<New Data Source>” and you’ll be prompted to enter some information about your data source. Option 1 can be anything you wish that describes your data source. Option 2 should be “SQL Server”. Click “Connect” and it will pop up a third window where you can enter information about the server and login information. Be sure you select the “Options>>” dropdown so you can select the database you’re wanting to connect to.

Excel SQL Query

You’ll now have a new data source in the original window. Double click the data source to bring up a table import wizard. If you want to import an entire table, you can do so here and even filter and sort the data using the import wizard. However, if you want to use your own custom query as we have been, just select any field and go through the wizard and import the data. When you come to screen that asks you if you want to return the data to Excel or edit in a query, return the data to Excel. It will then prompt you to select where you want the data returned in your workbook.

The query will return the data in a table format. To change it to your own custom SQL query, let’s follow these steps:

  • Click anywhere in the data table.
  • On the Excel Data Ribbon, in the “Queries & Connections” group, properties will no longer be grayed out like it normally is. Click this.
  • In the popup – you’ll see another properties icon. Click this.
  • In this popup, select the “Definition” tab and paste your SQL query in the “Command Text” input box.

Excel SQL Query

Now you’ve built an Excel SQL Query that can be refreshed anytime the workbook is refreshed. In my screengrab, the “Parameters” button is greyed out. If you want to add parameters to your query, you do so by adding “?” in your command text. That looks like this.

This creates a parameter the end user can interact with. You can have the user be prompted to enter an input when the workbook is refreshed, select a default value, or have it linked to a cell in the workbook. Even though this option is cumbersome to set up, I really enjoy using it. It allows me a lot of flexibility to have the user interact with the data. As I touched on in the beginning, I’ve had success using this option on Microsoft Office for Mac OSX. I don’t want to say this will work 100% of the time on a Mac because I’ve also had it fail. If anyone has any input on this, I’d love to hear from you.

Let me know your thoughts on these approaches in the comments! What other ways do you creatively get data into Excel from SQL data sources?

Cheers!

R

Понравилась статья? Поделить с друзьями:
  • What is sentence case in word
  • What is send word now
  • What is scroll lock in excel
  • What is scrabble word game
  • What is science word search