Парсинг xml файла в excel

  • Парсеры файлов XML, DOC, TXT, PDF
  • Парсеры для обучения настройке под сайты

  • https://ExcelVBA.ru/sites/default/files/parsers/XML_parser.png

Задача: 

Обработать все файлы XML в выбранной папке, и сформировать отчёт в формате Excel (одна строка таблицы Excel соответствует одному XML файлу)

Описание: 

При запуске, парсер создаёт новый файл Excel (из 46 столбцов), и выводит диалоговое окно выбора папки с файлами XML
(файлы XML могут находиться в подпапках, — парсер проверяет подпапки до 10 уровней вложенности)

Для извлечения нужных данных, парсер использует действие «HTML: поиск тегов»
Ищется заданный тег со свойством, равным названию очередного столбца парсера, и из найденного тега берется значение нужного атрибута.

Для тестирования работы парсера, извлеките обрабатываемые файлы XML из архива в любую папку,
и после запуска парсера выберите эту папку в диалоговом окне.

По вопросам настройки программы для парсинга файлов XML обращайтесь ко мне на почту order@excelvba.ru или в скайп ExcelVBA.ru

  • 21614 просмотров
  1. Importance of an XML Parser
  2. Build XML Parser Using VBA
  3. Conclusion

Parsing XML in Microsoft Excel VBA

This article will teach us how to parse XML files in VBA.

Importance of an XML Parser

As a Microsoft Excel user, it is common that you might receive some data in the form of an XML file. You will have to retrieve the information from the XML file and use it in your sheets or VBA macros according to your requirement.

A way to do this is to treat it as a text file and parse the information. But this is not an elegant way to parse XML files since the information is stored well-structured using tags, and treating it as a text file negates this concept.

Therefore, we will have to make use of an XML Parser. An XML Parser reads the XML file and retrieves the relevant data so it can be used readily.

Build XML Parser Using VBA

We can parse an XML file using VBA and convert the data into our Excel sheet. The method we will be using uses the XML DOM implementation, short for the XML Document Object Model, and this model allows us to represent the XML file as an object we can then manipulate as required.

To start parsing your XML file through VBA, you must perform a simple sequence of steps. These are explained below.

To parse XML through VBA, you need to have MSXML.4.0 or greater on your system.

  • Add Reference to Microsoft XML

    First, you need to add a reference to Microsoft XML, V6.0 in the VBA Editor. This is how it is done:

    Open the VBA Editor from the Developer tab in Excel.

  • Go to References

  • Scroll down and check Microsoft XML, V6.0, then click on OK.

    Add Microsoft XML V6.0 Reference

    Note that the version of Microsoft XML depends on the operating system and Microsoft Office installed on your computer.

  • Write VBA Code to Load the XML File Into XML DOM

    Suppose we have the following XML file:

    <?xml version="1.0" encoding="ISO8859-1" ?>
    <menu>
    <food>
    <name> Halwa Puri </name>
    <price> $7.50 </price>
    <description> Halwa Puri is from Indian and Pakistani cuisines, having the sweet Halwa and the savory Puri which is a fried flatbread. </description>
    <calories> 900 </calories>
    </food>
    </menu>
    

    We can use the following code to parse this XML file through VBA by making an XML DOM object in the following way:

    Sub XMLParser()
    
    Dim xDoc As New MSXML2.DOMDocument60
    Dim node As IXMLDOMElement
    Set xDoc = New MSXML2.DOMDocument60
    
    With xDoc
    .async = False
    .validateOnParse = True
    
    If xDoc.Load("D:VBAexample.xml") = False Then
    Debug.Print .parseError.reason, .parseError.ErrorCode
    Exit Sub
    End If
    
    Set node = xDoc.SelectSingleNode("//price")
    MsgBox node.Text
    End With
    
    End Sub
    

In the code above, we have first created a variable xDoc of the MSXML2.DOMDocument60 type. Here, we have appended 60 at the end because we are using version 6.0 of Microsoft XML, and without the 60, this code will generate a compile-time error of User-defined type not found.

Next, we have specified that we are working with the xDoc variable using the With statement. The .async property defines permission for asynchronous downloads, and the .validateOnParse property indicates if the parser should validate the XML document.

After that, we use the .Load function to load the specified XML file into the DOM variable. Here, you can change the path and file name to the one on your computer.

The next two lines are for error handling in case the XML file is not loaded properly. To test if the loading has worked, we take one node from the file and specify its name as price.

You should note that the node name is case-sensitive and must be specified according to your XML file. Finally, we display the price using the node.Text property in a message box.

Output:

Load the XML File Into XML DOM

This shows that the loading has worked perfectly fine.

One way to use the XML file data is to store it in an Excel sheet. Let us make a few changes to the code above to store the data in the Excel sheet:

Sub XMLParser()

Dim xDoc As New MSXML2.DOMDocument60
Set xDoc = New MSXML2.DOMDocument60
Dim list As MSXML2.IXMLDOMNodeList
Dim osh As Worksheet
Set osh = ThisWorkbook.Sheets("Sheet1")
oRow = 1

With xDoc
.async = False
.validateOnParse = True

If xDoc.Load("D:VBAexample.xml") = False Then
Debug.Print .parseError.reason, .parseError.ErrorCode
Exit Sub
End If

Set list = xDoc.SelectNodes("//price")
loopCount = 0
For Each node In list
oRow = oRow + 1
osh.Range("A" & oRow) = node.Text
Next
End With

End Sub

Here, we are retrieving all the price nodes and storing them in the sheet. In this example, we have only one price node that will be saved into the sheet as follows:

use the XML file data to store it in an Excel sheet

You can tweak the code according to your XML file and requirements.

Conclusion

This sums up our discussion on the method to parse XML files through VBA. In this article, we have learned how to build an XML parser using XML DOM in VBA.

Read XML using Excel VBA

XML is the file format that is widely used to transfer data over internet or between 2 systems with different platforms. The most widely used & familiar xml file in the internet world is the Sitemap.xml. This file has the major links to a website.

Other widely used file formats for data transfer are JSON, CSV. In this article, we are going to learn how to read the xml file using XML DOM (Data Object Model).

Excel VBA XML Parser

Using this tutorial you can build a XML parser using Excel VBA. Lets start with this step by step procedure. Open an Excel Workbook & Open VB Editor by pressing Alt + F11. Then follow these important steps.

  1. Add reference to “Microsoft XML, V6.0” from Excel VB editor.
    • VB Editor -> Menu->Tools -> Reference
    • Scroll down till Microsoft XML, V2.0 or 3.0 or 6.0 appears. The version of XML depends on the OS & Office version installed in your machine.
    • Click Ok.
  2. Now, Copy paste the code to your VBE.
  3. Download a file from Internet or if you have a file already, Modify the xml file path in the code.
  4. Run the code by pressing F5.
'--------------------------------------------------------------------------------
'Code by author@officetricks.com
'Visit https://officetricks.com to get more Free & Fully Functional VBA Codes
'--------------------------------------------------------------------------------
Public Sub Xml_To_Excel()
    Dim myURL As String, sFileNamePath As String, dsh As Worksheet, osh As Worksheet
    Dim WinHttpReq As Object, Node As IXMLDOMNode
    Dim xDoc As MSXML2.DOMDocument
    Dim list As MSXML2.IXMLDOMNodeList
    
    'Create XML DOM Object
    Set xDoc = New MSXML2.DOMDocument
    Set osh = ThisWorkbook.Sheets("Sheet2")
    oRow = 1
    
    'This is only a sample xml file - Change the File path to your Xml file path
    fname = "http://www.xmlfiles.com/examples/simple.xml"
    
    'Load Xml file to Object & Process Each node.
    If xDoc.Load(fname) Then
        Set list = xDoc.SelectNodes("//breakfast-menu/food")
        loopCount = 0
        Application.Wait DateAdd("s", 5, Now)
        DoEvents
        For Each Node In list
            oRow = oRow + 1
            '***Note: node names are Casesensitive***
            osh.Range("A" & oRow) = Node.SelectSingleNode("name").Text
            osh.Range("B" & oRow) = Node.Text
        Next
    Else
        MsgBox "Error Occured"
    End If
    
    MsgBox "Process Completed"
End Sub

This code uses XML DOM model to parse each node from input xml file. Then write it to the Excel file one by one.

In my previous articles, we discussed how VBA in Excel can be used for reporting, creating ribbons for your macros, and how to connect an Excel file as a database with SQL support. Working with a database within Excel is a very convenient feature you may use for gathering and storing of data. Nowadays, however, modern object databases and big data platforms prefer formats like JSON (Avro) or XML in general.

Definitions and Declarations

Most modern languages like Python or Ruby have standard XML parsers in-built. As VBA has been here for decades, neither much maintained, nor developed, the support is not that straight-forward. There is, however, a good tool set you may use for processing XMLs in your macros. Firstly, let’s define the objects we will work with:

Public Function ParseXML(p_path As String) As Object
    Dim objDom As Object                                    '// DOMDocument
    Dim strData As String
    Dim objStream As ADODB.Stream

    Set objDom = CreateObject("Msxml2.DOMDocument.3.0")     '// Using MSXML 3.0;                         
        'you may use DOMDocument.4.0 for MSXML 4.0
    Set objStream = New ADODB.Stream 'CreateObject("ADODB.Stream")
    
    objStream.Charset = "UTF-8"
    objStream.Open
    objStream.LoadFromFile (p_path)
    
    strData = objStream.ReadText()
    
    objStream.Close
    Set objStream = Nothing
    
    objDom.LoadXML strData
    
    Set ParseXML = objDom
End Function

The code is quite self-explanatory. I pasted the whole function, so feel free to use it directly as-is, just pass the source XML file path as an argument and don’t forget about error handling.

Accessing XML Data

Now let’s have a look at the functions we may use for data extraction. The basic looping through the XML nodes may be implemented as follows:

For Each listNode In rootNode.ChildNodes
    If listNode.HasChildNodes Then
        ' do something
    End If
Next listNode

Accessing the data of the actual node:

<SingleNode Id="N1" Text="NodeValue" Required="true" Look="Standard">
    TheText
</SingleNode>

Would follow the below syntax:

str = listNode.BaseName            ' Extracts "SingleNode" value
str = listNode.Attributes(0).Text  ' Extracts "N1" value
str = listNode.Text                ' Extracts "TheText" value

Other standard traversing methods you might need are also supported – for a more comprehensive overview, you may want to check this article. Node referencing:

XML representation:

<?xml version="1.0" encoding="utf-8"?>
<RootElement>
    <Node ID="1">
        <Node ID="1.1" />
    </Node>
    <Node ID="2">
        <Node ID="2.1" />
        <Node ID="2.2" />
        <Node ID="2.3" />
        <Node ID="2.4" />
    </Node>
    <Node ID="3" />
</RootElement>

In case you need to access a specific node directly and you don’t need control of the actual traversing, you may also use the XPath methods.

That is it for now, have fun working with your XMLs. Is there anything else you would like to know about VBA and macros in MS Office? Just drop me a message and I might discuss it in the next article. Thanks for reading!

Below are two methods to output the fields you need. Note, that the XML you have posted does not contain the header definitions for namespace «xa:» so is not fully formed XML. I’ve removed them in the example so MSXML2.DOMDocument doesn’t throw a parse error.

Option Explicit
Sub XMLMethod()
Dim XMLString As String
Dim XMLDoc As Object
Dim boolValue As Boolean
Dim xmlDocEl As Object
Dim xMeContext As Object
Dim xChild As Object
Dim xorder As Object


    XMLString = Sheet1.Range("A1").Value

    'Remove xa: in this example
    'reason : "Reference to undeclared namespace prefix: 'xa'."
    'Shouldn't need to do this if full XML is well formed containing correct namespace
    XMLString = Replace(XMLString, "xa:", vbNullString)

    Set XMLDoc = CreateObject("MSXML2.DOMDocument")
    'XMLDoc.setProperty "SelectionNamespaces", "xa:"

        'XMLDoc.Load = "C:UsersoooDesktoptest.xml" 'load from file
    boolValue = XMLDoc.LoadXML(XMLString)  'load from string

    Set xmlDocEl = XMLDoc.DocumentElement
    Set xMeContext = xmlDocEl.SelectSingleNode("//MeContext")
        Debug.Print Split(xMeContext.XML, """")(1)
    For Each xChild In xmlDocEl.ChildNodes

        If xChild.NodeName = "Orders" Then
            For Each xorder In xChild.ChildNodes
                Debug.Print Split(xorder.XML, """")(1)
                Debug.Print xorder.Text
            Next xorder

        ElseIf xChild.Text = "" Then
            Debug.Print Split(xChild.XML, """")(1)
        Else
            Debug.Print xChild.Text
        End If


    Next xChild

    'Output:
    'ABCe0552553
    'ABCe05525531
    '1
    'Cust1234
    'Smith
    'New York
    '101
    'MP3 Player
    '102
    'Radio


End Sub

And the following uses regex, which is really only useful if the XML is fixed to exactly your example each time. It’s not really recommended for parsing XML in general unless you want speed over reliability.

Option Explicit

Sub RegexMethod()
Dim XMLString As String
Dim oRegex As Object
Dim regexArr As Object
Dim rItem As Object

    'Assumes Sheet1.Range("A1").Value holds example XMLString
    XMLString = Sheet1.Range("A1").Value

    Set oRegex = CreateObject("vbscript.regexp")
    With oRegex
        .Global = True
        .Pattern = "(id=""|>)(.+?)(""|</)"
        Set regexArr = .Execute(XMLString)

        'No lookbehind so replace unwanted chars
        .Pattern = "(id=""|>|""|</)"
        For Each rItem In regexArr
            'Change Debug.Print to fill an array to write to Excel
            Debug.Print .Replace(rItem, vbNullString)
        Next rItem
    End With

    'Output:
    'ABCe0552553
    'ABCe05525531
    '1
    'Cust1234
    'Smith
    'New York
    '101
    'MP3 Player
    '102
    'Radio


End Sub

EDIT: Slight update to output to array for writing to range

Option Explicit

Sub RegexMethod()
Dim XMLString As String
Dim oRegex As Object
Dim regexArr As Object
Dim rItem As Object
Dim writeArray(1 To 1, 1 To 10) As Variant
Dim col As Long

    'Assumes Sheet1.Range("A1").Value holds example XMLString
    XMLString = Sheet1.Range("A1").Value

    Set oRegex = CreateObject("vbscript.regexp")
    With oRegex
        .Global = True
        .Pattern = "(id=""|>)(.+?)(""|</)"
        Set regexArr = .Execute(XMLString)

        'No lookbehind so replace unwanted chars
        .Pattern = "(id=""|>|""|</)"
        For Each rItem In regexArr
            'Change Debug.Print to fill an array to write to Excel
            Debug.Print .Replace(rItem, vbNullString)

            col = col + 1
            writeArray(1, col) = .Replace(rItem, vbNullString)
        Next rItem
    End With

    Sheet1.Range("A5:J5").Value = writeArray


End Sub


Sub XMLMethod()
Dim XMLString As String
Dim XMLDoc As Object
Dim boolValue As Boolean
Dim xmlDocEl As Object
Dim xMeContext As Object
Dim xChild As Object
Dim xorder As Object
Dim writeArray(1 To 1, 1 To 10) As Variant
Dim col As Long


    XMLString = Sheet1.Range("A1").Value

    'Remove xa: in this example
    'reason : "Reference to undeclared namespace prefix: 'xa'."
    'Shouldn't need to do this if full XML is well formed
    XMLString = Replace(XMLString, "xa:", vbNullString)

    Set XMLDoc = CreateObject("MSXML2.DOMDocument")
    'XMLDoc.setProperty "SelectionNamespaces", "xa:"

        'XMLDoc.Load = "C:UsersoooDesktoptest.xml" 'load from file
    boolValue = XMLDoc.LoadXML(XMLString)  'load from string

    Set xmlDocEl = XMLDoc.DocumentElement
    Set xMeContext = xmlDocEl.SelectSingleNode("//MeContext")
        'Debug.Print Split(xMeContext.XML, """")(1)
        col = col + 1
        writeArray(1, col) = Split(xMeContext.XML, """")(1)
    For Each xChild In xmlDocEl.ChildNodes

        If xChild.NodeName = "Orders" Then
            For Each xorder In xChild.ChildNodes
                col = col + 1
                'Debug.Print Split(xorder.XML, """")(1)
                writeArray(1, col) = Split(xorder.XML, """")(1)
                col = col + 1
                'Debug.Print xorder.Text
                writeArray(1, col) = xorder.Text
            Next xorder
        ElseIf xChild.Text = "" Then
            col = col + 1
            'Debug.Print Split(xChild.XML, """")(1)
            writeArray(1, col) = Split(xChild.XML, """")(1)
        Else
            col = col + 1
            'debug.Print xChild.Text
            writeArray(1, col) = xChild.Text
        End If


    Next xChild

    Sheet1.Range("A5:J5").Value = writeArray


End Sub

Понравилась статья? Поделить с друзьями:
  • Парсинг excel файла vba
  • Парсинг excel таблиц php
  • Парсинг excel на java
  • Парсинг excel в html
  • Парсинг excel python pandas