- Importance of an XML Parser
- Build XML Parser Using VBA
- Conclusion
This article will teach us how to parse XML files in VBA.
Importance of an XML Parser
As a Microsoft Excel user, it is common that you might receive some data in the form of an XML file. You will have to retrieve the information from the XML file and use it in your sheets or VBA macros according to your requirement.
A way to do this is to treat it as a text file and parse the information. But this is not an elegant way to parse XML files since the information is stored well-structured using tags, and treating it as a text file negates this concept.
Therefore, we will have to make use of an XML Parser. An XML Parser reads the XML file and retrieves the relevant data so it can be used readily.
Build XML Parser Using VBA
We can parse an XML file using VBA and convert the data into our Excel sheet. The method we will be using uses the XML DOM
implementation, short for the XML Document Object Model, and this model allows us to represent the XML file as an object we can then manipulate as required.
To start parsing your XML file through VBA, you must perform a simple sequence of steps. These are explained below.
To parse XML through VBA, you need to have MSXML.4.0
or greater on your system.
-
Add Reference to Microsoft XML
First, you need to add a reference to
Microsoft XML, V6.0
in the VBA Editor. This is how it is done:Open the VBA Editor from the Developer tab in Excel.
-
Scroll down and check
Microsoft XML, V6.0
, then click onOK
.Note that the version of Microsoft XML depends on the operating system and Microsoft Office installed on your computer.
-
Write VBA Code to Load the XML File Into
XML DOM
Suppose we have the following XML file:
<?xml version="1.0" encoding="ISO8859-1" ?> <menu> <food> <name> Halwa Puri </name> <price> $7.50 </price> <description> Halwa Puri is from Indian and Pakistani cuisines, having the sweet Halwa and the savory Puri which is a fried flatbread. </description> <calories> 900 </calories> </food> </menu>
We can use the following code to parse this XML file through VBA by making an
XML DOM
object in the following way:Sub XMLParser() Dim xDoc As New MSXML2.DOMDocument60 Dim node As IXMLDOMElement Set xDoc = New MSXML2.DOMDocument60 With xDoc .async = False .validateOnParse = True If xDoc.Load("D:VBAexample.xml") = False Then Debug.Print .parseError.reason, .parseError.ErrorCode Exit Sub End If Set node = xDoc.SelectSingleNode("//price") MsgBox node.Text End With End Sub
In the code above, we have first created a variable xDoc
of the MSXML2.DOMDocument60
type. Here, we have appended 60
at the end because we are using version 6.0
of Microsoft XML
, and without the 60
, this code will generate a compile-time error of User-defined type not found
.
Next, we have specified that we are working with the xDoc
variable using the With
statement. The .async
property defines permission for asynchronous downloads, and the .validateOnParse
property indicates if the parser should validate the XML
document.
After that, we use the .Load
function to load the specified XML
file into the DOM
variable. Here, you can change the path and file name to the one on your computer.
The next two lines are for error handling in case the XML
file is not loaded properly. To test if the loading has worked, we take one node from the file and specify its name as price
.
You should note that the node name is case-sensitive and must be specified according to your XML
file. Finally, we display the price using the node.Text
property in a message box.
Output:
This shows that the loading has worked perfectly fine.
One way to use the XML file data is to store it in an Excel sheet. Let us make a few changes to the code above to store the data in the Excel sheet:
Sub XMLParser()
Dim xDoc As New MSXML2.DOMDocument60
Set xDoc = New MSXML2.DOMDocument60
Dim list As MSXML2.IXMLDOMNodeList
Dim osh As Worksheet
Set osh = ThisWorkbook.Sheets("Sheet1")
oRow = 1
With xDoc
.async = False
.validateOnParse = True
If xDoc.Load("D:VBAexample.xml") = False Then
Debug.Print .parseError.reason, .parseError.ErrorCode
Exit Sub
End If
Set list = xDoc.SelectNodes("//price")
loopCount = 0
For Each node In list
oRow = oRow + 1
osh.Range("A" & oRow) = node.Text
Next
End With
End Sub
Here, we are retrieving all the price
nodes and storing them in the sheet. In this example, we have only one price
node that will be saved into the sheet as follows:
You can tweak the code according to your XML file and requirements.
Conclusion
This sums up our discussion on the method to parse XML
files through VBA. In this article, we have learned how to build an XML parser using XML DOM
in VBA.
Read XML using Excel VBA
XML is the file format that is widely used to transfer data over internet or between 2 systems with different platforms. The most widely used & familiar xml file in the internet world is the Sitemap.xml. This file has the major links to a website.
Other widely used file formats for data transfer are JSON, CSV. In this article, we are going to learn how to read the xml file using XML DOM (Data Object Model).
Excel VBA XML Parser
Using this tutorial you can build a XML parser using Excel VBA. Lets start with this step by step procedure. Open an Excel Workbook & Open VB Editor by pressing Alt + F11. Then follow these important steps.
- Add reference to “Microsoft XML, V6.0” from Excel VB editor.
- VB Editor -> Menu->Tools -> Reference
- Scroll down till Microsoft XML, V2.0 or 3.0 or 6.0 appears. The version of XML depends on the OS & Office version installed in your machine.
- Click Ok.
- Now, Copy paste the code to your VBE.
- Download a file from Internet or if you have a file already, Modify the xml file path in the code.
- Run the code by pressing F5.
'-------------------------------------------------------------------------------- 'Code by author@officetricks.com 'Visit https://officetricks.com to get more Free & Fully Functional VBA Codes '-------------------------------------------------------------------------------- Public Sub Xml_To_Excel() Dim myURL As String, sFileNamePath As String, dsh As Worksheet, osh As Worksheet Dim WinHttpReq As Object, Node As IXMLDOMNode Dim xDoc As MSXML2.DOMDocument Dim list As MSXML2.IXMLDOMNodeList 'Create XML DOM Object Set xDoc = New MSXML2.DOMDocument Set osh = ThisWorkbook.Sheets("Sheet2") oRow = 1 'This is only a sample xml file - Change the File path to your Xml file path fname = "http://www.xmlfiles.com/examples/simple.xml" 'Load Xml file to Object & Process Each node. If xDoc.Load(fname) Then Set list = xDoc.SelectNodes("//breakfast-menu/food") loopCount = 0 Application.Wait DateAdd("s", 5, Now) DoEvents For Each Node In list oRow = oRow + 1 '***Note: node names are Casesensitive*** osh.Range("A" & oRow) = Node.SelectSingleNode("name").Text osh.Range("B" & oRow) = Node.Text Next Else MsgBox "Error Occured" End If MsgBox "Process Completed" End Sub
This code uses XML DOM model to parse each node from input xml file. Then write it to the Excel file one by one.
In my previous articles, we discussed how VBA in Excel can be used for reporting, creating ribbons for your macros, and how to connect an Excel file as a database with SQL support. Working with a database within Excel is a very convenient feature you may use for gathering and storing of data. Nowadays, however, modern object databases and big data platforms prefer formats like JSON (Avro) or XML in general.
Definitions and Declarations
Most modern languages like Python or Ruby have standard XML parsers in-built. As VBA has been here for decades, neither much maintained, nor developed, the support is not that straight-forward. There is, however, a good tool set you may use for processing XMLs in your macros. Firstly, let’s define the objects we will work with:
Public Function ParseXML(p_path As String) As Object
Dim objDom As Object '// DOMDocument
Dim strData As String
Dim objStream As ADODB.Stream
Set objDom = CreateObject("Msxml2.DOMDocument.3.0") '// Using MSXML 3.0;
'you may use DOMDocument.4.0 for MSXML 4.0
Set objStream = New ADODB.Stream 'CreateObject("ADODB.Stream")
objStream.Charset = "UTF-8"
objStream.Open
objStream.LoadFromFile (p_path)
strData = objStream.ReadText()
objStream.Close
Set objStream = Nothing
objDom.LoadXML strData
Set ParseXML = objDom
End Function
The code is quite self-explanatory. I pasted the whole function, so feel free to use it directly as-is, just pass the source XML file path as an argument and don’t forget about error handling.
Accessing XML Data
Now let’s have a look at the functions we may use for data extraction. The basic looping through the XML nodes may be implemented as follows:
For Each listNode In rootNode.ChildNodes
If listNode.HasChildNodes Then
' do something
End If
Next listNode
Accessing the data of the actual node:
<SingleNode Id="N1" Text="NodeValue" Required="true" Look="Standard">
TheText
</SingleNode>
Would follow the below syntax:
str = listNode.BaseName ' Extracts "SingleNode" value
str = listNode.Attributes(0).Text ' Extracts "N1" value
str = listNode.Text ' Extracts "TheText" value
Other standard traversing methods you might need are also supported – for a more comprehensive overview, you may want to check this article. Node referencing:
XML representation:
<?xml version="1.0" encoding="utf-8"?>
<RootElement>
<Node ID="1">
<Node ID="1.1" />
</Node>
<Node ID="2">
<Node ID="2.1" />
<Node ID="2.2" />
<Node ID="2.3" />
<Node ID="2.4" />
</Node>
<Node ID="3" />
</RootElement>
In case you need to access a specific node directly and you don’t need control of the actual traversing, you may also use the XPath methods.
That is it for now, have fun working with your XMLs. Is there anything else you would like to know about VBA and macros in MS Office? Just drop me a message and I might discuss it in the next article. Thanks for reading!
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
1
branch
0
tags
Code
-
Use Git or checkout with SVN using the web URL.
-
Open with GitHub Desktop
-
Download ZIP
Latest commit
Files
Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
VBA-XMLConverter
Status: Incomplete, Under Development
XML conversion and parsing for VBA (Excel, Access, and other Office applications).
Tested in Windows Excel 2013 and Excel for Mac 2011, but should apply to 2007+.
- For Windows-only support, include a reference to «Microsoft Scripting Runtime»
- For Mac support or to skip adding a reference, include VBA-Dictionary.
Example
Dim XML As Object Set XML = XMLConverter.ParseXML( _ "<?xml version="1.0"?>" & _ "<messages>" & _ "<message id="1" date="2014-1-1">" & _ "<from><name>Tim Hall</name></from>" & _ "<body>Howdy!</body>" & _ "</message>" & _ "</messages>" _ ) Debug.Print XML("documentElement")("nodeName") ' -> "messages" Debug.Print XML("documentElement")("childNodes")(1)("attributes")("id") ' -> "1" Debug.Print XML("documentElement")("childNodes")(1)("childNodes")(2)("text") ' -> "Howdy!" Debug.Print XMLConverter.ConvertToXML(XML) ' -> "<?xml version="1.0"?><messages>...</messages>"
XML files are one of the most common type of data files apart from text and CSV (comma-separated values) files. Reading data files which are not hierarchical (as XML files or JSON) is relatively easy. You can read in the data row by row and process columns separately. With XML (and JSON) the task is not as easy as the data is hierarchical (parent-child relationships exist between records in the schema) and the number of underlying nodes may vary as opposed to tabular data which usually has a constant number of columns separated with a specific delimiter.
Fortunately, we can use the MSXML2.DOMDocument object in VBA. Let’s however, as always, start with a short introduction as to how XML files a structure before we dive into the examples.
Loading XML document in VBA
The MSXML2.DOMDocument object allows you to easily traverse through an XML structure an extract any XML node and/or attribute needed. Let’s look at the example below.
Below we start by loading the XML document. Notice that I am selecting the load to be performed synchronously and not validation be carried out on parsing the document. Feel free to change these options if needed.
Sub TestXML() Dim XDoc As Object, root as Object Set XDoc = CreateObject("MSXML2.DOMDocument") XDoc.async = False: XDoc.validateOnParse = False XDoc.Load (ThisWorkbook.Path & "test.xml") Set root = XDoc.DocumentElement '... End Sub
Alternatively load an XML from a string:
Sub TestXML() Dim XDoc As Object, root as Object Set XDoc = CreateObject("MSXML2.DOMDocument") XDoc.async = False: XDoc.validateOnParse = False XDoc.LoadXML ("<root><child></child></root>") Set root = XDoc.DocumentElement '... End Sub
That’s it. You have loaded the XML document into memory into the DOMDocument object. The document has been parsed and you can easily traverse the enclosed elements. See next section.
XML DOM nodes in VBA
For the below I will use the following examples XML:
<?xml version="1.0" encoding="utf-8"?> <DistributionLists> <List> <Name>Recon</Name> <TO>John;Bob;Rob;Chris</TO> <CC>Jane;Ashley</CC> <BCC>Brent</BCC> </List> <List> <Name>Safety Metrics</Name> <TO>Tom;Casper</TO> <CC>Ashley</CC> <BCC>John</BCC> </List> <List> <Name>Performance Report</Name> <TO>Huck;Ashley</TO> <CC>Tom;Andrew</CC> <BCC>John;Seema</BCC> </List> </DistributionLists>
The XML document will provide you with the root of the entire DOM (of type XDoc.DocumentElement). Each DocumentElement (XML DOM node) facilitates the following node references:
Node Reference | Type | Description |
---|---|---|
parentNode | [XDoc.DocumentElement] | The parent node, one node higher in the DOM hierarchy |
firstChild | [XDoc.DocumentElement] | The first child node, first node lower in the DOM hierarchy |
lastChild | [XDoc.DocumentElement] | The last child node, last node lower in the DOM hierarchy |
childNodes | [Array of type XDoc.DocumentElement] | All child nodes of the current node, all nodes lower in the DOM hierarchy |
nextSibling | [XDoc.DocumentElement] | Next sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node |
previousSibling | [XDoc.DocumentElement] | Previous sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node |
All the above references allow you to free move within the XML DOM.
ChildNodes
Let’s start by extracting the first list and printing it’s XML and text contents. The basics to moving around the XML DOM is using ChildNodes.
Sub TestXML() Dim XDoc As Object Set XDoc = CreateObject("MSXML2.DOMDocument") XDoc.async = False: XDoc.validateOnParse = False XDoc.Load (ThisWorkbook.Path & "test.xml") 'Get Document Elements Set lists = XDoc.DocumentElement 'Get first child ( same as ChildNodes(0) ) Set getFirstChild = lists.FirstChild 'Print first child XML Debug.Print getFirstChild.XML 'Print first child Text Debug.Print getFirstChild.Text Set XDoc = Nothing End Sub
This is the result
'Print first child XML <List> <Name>Recon</Name> <TO>John;Bob;Rob;Chris</TO> <CC>Jane;Ashley</CC> <BCC>Brent</BCC> </List> 'Print first child Text Recon John;Bob;Rob;Chris Jane;Ashley Brent
Traversing through the whole XML in VBA
Now that we got the basics let’s print out the whole contents of the XML DOM including the basenames (node names).
Sub TestXML() Dim XDoc As Object Set XDoc = CreateObject("MSXML2.DOMDocument") XDoc.async = False: XDoc.validateOnParse = False XDoc.Load (ThisWorkbook.Path & "test.xml") 'Get Document Elements Set lists = XDoc.DocumentElement 'Traverse all elements 2 branches deep For Each listNode In lists.ChildNodes Debug.Print "---Email---" For Each fieldNode In listNode.ChildNodes Debug.Print "[" & fieldNode.BaseName & "] = [" & fieldNode.Text & "]" Next fieldNode Next listNode Set XDoc = Nothing End Sub
This is the result:
---Email--- [Name] = [Recon] [TO] = [John;Bob;Rob;Chris] [CC] = [Jane;Ashley] [BCC] = [Brent] ---Email--- [Name] = [Safety Metrics] [TO] = [Tom;Casper] [CC] = [Ashley] [BCC] = [John] ---Email--- [Name] = [Performance Report] [TO] = [Huck;Ashley] [CC] = [Tom;Andrew] [BCC] = [John;Seema]
Easy right? Using the basics above we can easily move around the document. But this still seems like a lot of coding right? Well there is an easier way of moving / extracting items using the DOMDocument object – called XPath.
XML Document example node references
Now that we have a hang of our XML document, based on the example XML I provided above I mapped a reference to how to obtain various elements of our XML file by using node references:
- DistributionLists [FirstChild]
- List [ChildNodes(0)]
- Name: Recon [ChildNodes(0).ChildNodes(0).innerText]
- TO: John;Bob;Rob;Chris [ChildNodes(0).ChildNodes(1).innerText]
- CC: Jane;Ashley
- BCC: Brent
- List [ChildNodes(1)]
- Name: Performance Report [ChildNodes(1).ChildNodes(0).innerText]
- TO: Huck;Ashley
- CC: Tom;Andrew
- BCC: John;Seema
(…)
- List [ChildNodes(0)]
XPath in VBA
Instead of traversing the elements/nodes in your XML using the .ChildNodes/.FirstChild/NextChild properties we can also use XPath. XPath is a query language used for selecting XML nodes in an XML document. It is represented by a single string. It allows you to extract any number of nodes (0 or more) which match the specified XPath query.
If you want to learn XPath I can recommend this overview:
https://www.w3schools.com/xml/xpath_syntax.asp
Now let’s jump into an example:
Example 1: Extract all Lists
Sub TestXML() Dim XDoc As Object Set XDoc = CreateObject("MSXML2.DOMDocument") XDoc.async = False: XDoc.validateOnParse = False XDoc.Load (ThisWorkbook.Path & "test.xml") Set lists = XDoc.SelectNodes("//DistributionLists/List") Set XDoc = Nothing End Sub
Example 2: Extracting all TO fields
Set toFields = XDoc.SelectNodes("//DistributionLists/List/TO") End Sub
Example 3: Extracting the first and last Name field
Set firstNameField = XDoc.SelectNodes("//DistributionLists/List[0]/Name") Set lastNameField = XDoc.SelectNodes("//DistributionLists/List[2]/Name")
Example 3: Extracting all child List nodes (Name, TO, CC, BCC)
Set listChildrenField = XDoc.SelectNodes("//DistributionLists/List/*")
XML Attributes in VBA
Let’s tackle one last example – attributes. Let’s slightly modify the XML above and include an example attribute named attribute.
<?xml version="1.0" encoding="utf-8"?> <DistributionLists> <List> <Name attribute="some">Recon</Name>
Using XPath (or traversing the DOM) we can easily extract the attribute as shown below.
Set firstNameField = XDoc.SelectNodes("//DistributionLists/List[0]/Name") Debug.Print firstNameField(0).Attributes(0).Text 'Result: "some"
Creating XML documents
Creating documents is also quite straight forward in VBA.
Dim XDoc As Object, root As Object, elem As Object Set XDoc = CreateObject("MSXML2.DOMDocument") Set root = XDoc.createElement("Root") XDoc.appendChild root 'Add child to root Set elem = XDoc.createElement("Child") root.appendChild elem 'Add Attribute to the child Dim rel As Object Set rel = XDoc.createAttribute("Attrib") rel.NodeValue = "Attrib value" elem.setAttributeNode rel 'Save the XML file XDoc.Save "C:my_file.xml"
Reading XML into Excel
One interesting use for Excel is using it as a platform for calling web services.
While the Web Browser is the obvious platform for sending and receiving web based data,
sometimes Excel is a better application to analyse formatted lists or numerical data.
Excel gives you the ability to import and query data via web services.
Excel VBA — Loading XML string defined in program into VBA XML DOM Document
In this example we define an XML string, load the string and use it to populate a DOM document.
Sub XMLTest01() 'In Tools > References, add reference to "Microsoft XML, vX.X" before running. 'create instance of the DOMDocument object: Dim xmlDoc As MSXML2.DOMDocument Set xmlDoc = New MSXML2.DOMDocument Dim strXML As String 'create XML string strXML = "<fullName>" & _ "<firstName>Bob</firstName>" & _ "<lastName>Smith</lastName>" & _ "</XXXfullName>" ' use XML string to create a DOM, on error show error message If Not xmlDoc.LoadXML(strXML) Then Err.Raise xmlDoc.parseError.ErrorCode, , xmlDoc.parseError.reason End If End Sub
XML error message: Note that there is an error in the terminating the XML fullname tag
Some VBA vocabulary
MSXML2.DOMDocument
The DOMDocument object represents the top node in the tree. It implements all of the base Document Object Model (DOM) document methods and provides additional members that support Extensible Stylesheet Language (XSL) and XML transformations. Only one object can be created: the document. All other objects are accessed or created from the document.
LoadXML
Loads an XML document using the supplied string.
Loading XML in Excel VBA — Enhanced error reporting
Sub XMLTest02() 'In Tools > References, add reference to "Microsoft XML, vX.X" before running. 'create instance of the DOMDocument class: Dim xmlDoc As MSXML2.DOMDocument Set xmlDoc = New MSXML2.DOMDocument Dim strErrText As String Dim xmlError As MSXML2.IXMLDOMParseError Dim strXML As String 'create XML string strXML = "<fullName>" & _ "<firstName>Bob</firstName>" & _ "<lastName>Smith</lastName>" & _ "</XXXfullName>" ' use XML string to create a DOM, on error show error message If Not xmlDoc.LoadXML(strXML) Then ' get the ParseError object Set xmlError = xmlDoc.parseError With xmlError strErrText = "Your XML Document failed to load" & _ "due the following error." & vbCrLf & _ "Error #: " & .ErrorCode & ": " & xmlError.reason & _ "Line #: " & .Line & vbCrLf & _ "Line Position: " & .linepos & vbCrLf & _ "Position In File: " & .filepos & vbCrLf & _ "Source Text: " & .srcText & vbCrLf End With ' Display error & exit program MsgBox strErrText, vbExclamation Set xmlDoc = Nothing End End If End Sub
XML error message: Note that there is an error in the terminating the XML fullname tag
Reading an XML file into Excel
'In Tools > References, add reference to "Microsoft XML, vX.X" before running. Sub subReadXMLStream() Dim xmlDoc As MSXML2.DOMDocument Dim xEmpDetails As MSXML2.IXMLDOMNode Dim xParent As MSXML2.IXMLDOMNode Dim xChild As MSXML2.IXMLDOMNode Dim Col, Row As Integer Set xmlDoc = New MSXML2.DOMDocument xmlDoc.async = False xmlDoc.validateOnParse = False ' use XML string to create a DOM, on error show error message If Not xmlDoc.Load("http://itpscan.info/blog/excel/xml/schedule.xml") Then Err.Raise xmlDoc.parseError.ErrorCode, , xmlDoc.parseError.reason End If Set xEmpDetails = xmlDoc.DocumentElement Set xParent = xEmpDetails.FirstChild Row = 1 Col = 1 Dim xmlNodeList As IXMLDOMNodeList Set xmlNodeList = xmlDoc.SelectNodes("//record") For Each xParent In xmlNodeList For Each xChild In xParent.ChildNodes Worksheets("Sheet1").Cells(Row, Col).Value = xChild.Text Debug.Print Row & " - "; Col & " - " & xChild.Text Col = Col + 1 Next xChild Row = Row + 1 Col = 1 Next xParent End Sub
Some VBA vocabulary
MSXML2.IXMLDOMNode
The IXMLDOMNode object provides methods that represent the core functionality of any node.
async
XML DOM property that specifies whether asynchronous download is permitted.
validateOnParse
XML DOM property that indicates whether the parser should validate this document.
FirstChild
Gets the first child of the node.
output in MS Excel:
Clear cache
This code is useful if you are using Apache Basic Athorization and user changes their userid/password.
Even if the code creates a brand new XMLHttpReq object and sets this header to the new information,
it logs in to the server as the first user, presumably from cached credentials. This code eventively clears the cache
in most browsers and lets user log in with a new username/password combination.
Sub subClearCache() ' force browser to clear cache myURL = "http://172.16.50.250/blackberry/BBTESTB01.pgm" Dim oHttp As New MSXML2.XMLHTTP oHttp.Open "POST", myURL, False oHttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded" oHttp.setRequestHeader "Cache-Control", "no-cache" oHttp.setRequestHeader "PragmaoHttp", "no-cache" oHttp.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" oHttp.setRequestHeader "Authorization", "Basic " & Base64EncodedUsernamePassword oHttp.send "PostArg1=PostArg1Value" Result = oHttp.responseText End Sub