This article will demonstrate how to parse a string in VBA.
Often we will use Excel to import data from other files. Occasionally this data might be in a format that is not very user friendly, or that we need to amend to bring the information into Excel in a logical way. There are a number of string functions we can use in Excel VBA to extract the data correctly from the string that is brought in.
VBA Split Function
If the string comes into Excel as a delimited string (ie separated by commas, semi-colons, etc.), we can easily split the string into the individual values by using the Split function.
For example, say we have this string of names:
“John, Mary, Jack, Fred, Melanie, Steven, Paul, Robert”
Using the split function, we can return these names to Excel individually:
Sub SplitText()
Dim strT As String
Dim strArray() As String
Dim name As Variant
'populate the string with names
strT = "John,Mary,Jack,Fred,Melanie,Steven,Paul,Robert"
'populate the array and indicate the delmiter
strArray = Split(strT, ",")
'loop through each name and display in immediate window
For Each name In strArray
Debug.Print name
Next
End Sub
VBA Left, Right and Mid Functions
We can also extract data from strings by using the Left, Right and Mid functions. They are not as efficient as using the Split function to get multiple values from a string, but if you need to separate a line into specific areas, they can be useful.
For example, say our file name is “C:DataTestFile.xls” . Now this includes the drive, the folder on the drive, the name of the file and the file extension.
To get the drive that the file is stored on we can use:
LEFT(“C:DataTestFile.xls”, 1) – which will return C.
To get the Path including the drive we can use:
LEFT(“C:DataTestFile.xls”, 7) – which will return C:Data.
To get the name of the file only, we can use MID:
MID(“C:DataTestFile.xls”, 9,8) – which will return TestFile
To get the extension of the file we can use:
RIGHT(“C:DataTestFile.xls”, 3)
Sub ExtractData()
Dim strData As String
Dim strLeft As String
Dim strRight As String
Dim strMid As String
'populate the string
strData = "C:DataTestFile.xls"
'break down the name
strLeft = Left(strData, 7)
strMid = Mid(strData, 9, 8)
strRight = Right(strData, 3)
'return the results
MsgBox "The path is " & strLeft & ", the File name is " & strMid & " and the extension is " & strRight
End Sub
The result of which would be:
VBA Replace Function
Another useful string function to manipulate strings in Excel, is the Replace function. This can be used to remove anything from a string and replace it with something else. This is particularly useful if the string that you have brought into Excel has characters that your coding will not recognize, or will mess up your data.
For example:
Consider the following string:
“John””Mary””Jack””Fred””Melanie””Steven””Paul””Robert”””
We can replace the double-quotes with commas using the Replace function.
Sub ExtractData()
Dim StrData As String
StrData = "John""Mary""Jack""Fred""Melanie""Steven""Paul""Robert"""
StrData = Replace(StrData, """", ",")
MsgBox StrData
End Sub
VBA Coding Made Easy
Stop searching for VBA code online. Learn more about AutoMacro — A VBA Code Builder that allows beginners to code procedures from scratch with minimal coding knowledge and with many time-saving features for all users!
Learn More!
I have a macro that send an XMLHTTP request to a server and it gets as response a plain text string, not a JSON format string or other standard formats (at least for what I know).
I would like to parse the output string in order to access the data in an structured approach in the same fashion as the parseJson subroutine in this link
My problem is I am not good with regular expressions and I am not able to modify the routine for my needs.
The string that I need to parse has the following structure:
- The string is a single line
- Each single parameter is defined by its parameter name the equal simbol, its value and ending with; «NID=3;» or «SID=Test;»
- Parameter can be collected in «structures» starts and end with the symbol | and they are identified with their name followed by ; such as |STEST;NID=3;SID=Test;|
- A structure can contain also other structures
An example of a output string is the following
|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|
In this case there is a macro structure KC which contains a structure AD. The structure AD is composed by the parameters PE, PF and 2 structures CD. And finaly the structures CD have the parameters PE and HP
So I would like to parse the string to obtain an Object/Dictionary that reflects this structure, can you help me?
Adds after the first answers
Hi all, thank you for your help, but I think I should make more clear the output that I would like to get.
For the example string that I have, I would like to have an object with the following structure:
<KC>
<AD>
<PE>5</PE>
<PF>3</PF>
<CD>
<PE>5</PE>
<HP>test</HP>
</CD>
<CD>
<PE>3</PE>
<HP>abc</HP>
</CD>
</AD>
</KC>
So I started to wrote a possible working code base on some hint from @Nvj answer and the answer in this link
Option Explicit
Option Base 1
Sub Test()
Dim strContent As String
Dim strState As String
Dim varOutput As Variant
strContent = "|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|"
Call ParseString(strContent, varOutput, strState)
End Sub
Sub ParseString(ByVal strContent As String, varOutput As Variant, strState As String)
' strContent - source string
' varOutput - created object or array to be returned as result
' strState - Object|Array|Error depending on processing to be returned as state
Dim objTokens As Object
Dim lngTokenId As Long
Dim objRegEx As Object
Dim bMatched As Boolean
Set objTokens = CreateObject("Scripting.Dictionary")
lngTokenId = 0
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "|[A-Z]{2};" 'Pattern for the name of structures
Tokenize objTokens, objRegEx, strContent, lngTokenId, bMatched, "str"
.Pattern = "[A-Z]{2}=[^|=;]+;" 'Pattern for parameters name and values
Tokenize objTokens, objRegEx, strContent, lngTokenId, bMatched, "par"
End With
End Sub
Sub Tokenize(objTokens, objRegEx, strContent, lngTokenId, bMatched, strType)
Dim strKey As String
Dim strKeyPar As String
Dim strKeyVal As String
Dim strWork As String
Dim strPar As String
Dim strVal As String
Dim strLevel As String
Dim strRes As String
Dim lngCopyIndex As Long
Dim objMatch As Object
strRes = ""
lngCopyIndex = 1
With objRegEx
For Each objMatch In .Execute(strContent)
If strType = "str" Then
bMatched = True
With objMatch
strWork = Replace(.Value, "|", "")
strWork = Replace(strWork, ";", "")
strLevel = get_Level(strWork)
strKey = "<" & lngTokenId & strLevel & strType & ">"
objTokens(strKey) = strWork
strRes = strRes & Mid(strContent, lngCopyIndex, .FirstIndex - lngCopyIndex + 1) & strKey
lngCopyIndex = .FirstIndex + .Length + 1
End With
lngTokenId = lngTokenId + 1
ElseIf strType = "par" Then
strKeyPar = "<" & lngTokenId & "par>"
strKeyVal = "<" & lngTokenId & "val>"
strKey = strKeyPar & strKeyVal
bMatched = True
With objMatch
strWork = Replace(.Value, ";", "")
strPar = Split(strWork, "=")(0)
strVal = Split(strWork, "=")(1)
objTokens(strKeyPar) = strPar
objTokens(strKeyVal) = strVal
strRes = strRes & Mid(strContent, lngCopyIndex, .FirstIndex - lngCopyIndex + 1) & strKey
lngCopyIndex = .FirstIndex + .Length + 1
End With
lngTokenId = lngTokenId + 2
End If
Next
strContent = strRes & Mid(strContent, lngCopyIndex, Len(strContent) - lngCopyIndex + 1)
End With
End Sub
Function get_Level(strInput As String) As String
Select Case strInput
Case "KC"
get_Level = "L1"
Case "AD"
get_Level = "L2"
Case "CD"
get_Level = "L3"
Case Else
MsgBox ("Error")
End
End Select
End Function
This function creates a dictionary with an item for each structure name, parameter name and parameter value as shown in the figure
Thanks to the function get_Level
the items associated to structures have a level that should help to preserve the original hierarchy of the data.
So what I am missing is a function to create an object that has the original structure of the input string. This is what the Retrieve
function do in this answer link, but I do not know how to adapt it to my case
Парсинг текстовой строки (обрезка строки) |
||||||||
Ответить |
||||||||
Ответить |
||||||||
Ответить |
||||||||
Ответить |
||||||||
Ответить |
||||||||
Ответить |
||||||||
Ответить |
||||||||
Ответить |
||||||||
Ответить |
||||||||
Ответить |
Всем доброго времени суток. Хочу предложить метод парсинга JSON-строки c помощью RegEx для Excel VBA. В отличие от достаточно известного способа преобразования JSON-строки в объект с помощью ScriptControl:
Sub Vulnerability()
' вредоносная JSON-строка, полученная в ответе web-сервера, имеет доступ к файловой системе и многому другому
jsonString = "{a:(function(){(new ActiveXObject('Scripting.FileSystemObject')).CreateTextFile('C:\Test.txt')})()}"
' в данном случае создается файл на диске C:
Set jsonObj = jsonDecode(jsonString)
End Sub
Function jsonDecode(jsonString As Variant)
Set sc = CreateObject("ScriptControl"): sc.Language = "JScript"
Set jsonDecode = sc.Eval("(" + jsonString + ")")
End Function
данный метод не создает уязвимостей системы. Объекты {} представлены Scripting.Dictionary, что позволяет обращаться к их свойствам и методам: .Count, .Items, .Keys, .Exists(), .Item(). Массивы [] являются обычными VB-массивами с индексацией с нуля, поэтому количество элементов можно определить с помощью UBound(). Ниже привожу код с некоторыми примерами использования:
Option Explicit
Sub JsonTest()
Dim strJsonString As String
Dim varJson As Variant
Dim strState As String
Dim varItem As Variant
' преобразование JSON-строки в объект
' корневой элемент может быть объектом {} или массивом []
strJsonString = "{""a"":[{}, 0, ""value"", [{""stuff"":""content""}]], b:null}"
ParseJson strJsonString, varJson, strState
' проверка структуры шаг за шагом
Select Case False ' если хоть одна из проверок неудачна, цепочка прервется
Case IsObject(varJson) ' если корневой JSON-элемент является объектом,
Case varJson.Exists("a") ' имеющим свойство a,
Case IsArray(varJson("a")) ' являющимся массивом
Case UBound(varJson("a")) >= 3 ' не менее чем с 4 элементами,
Case IsArray(varJson("a")(3)) ' и 4-ый элемент - это массив,
Case UBound(varJson("a")(3)) = 0 ' в котором единственный элемент
Case IsObject(varJson("a")(3)(0)) ' является объектом,
Case varJson("a")(3)(0).Exists("stuff") ' имеющим свойство stuff,
Case Else
' тогда вывести значение этого свойства.
MsgBox "Проверка структуры шаг за шагом" & vbCrLf & varJson("a")(3)(0)("stuff")
End Select
' прямой доступ к свойству при известной структуре
MsgBox "Прямой доступ к свойству" & vbCrLf & varJson.Item("a")(3)(0).Item("stuff") ' content
' Обход каждого элемента массива
For Each varItem In varJson("a")
' показать структуру элемента
MsgBox "Структура элемента:" & vbCrLf & BeautifyJson(varItem)
Next
' показать структуру целиком, начиная с корневого элемента
MsgBox "Структура целиком, начиная с корневого элемента:" & vbCrLf & BeautifyJson(varJson)
End Sub
Sub BeautifyTest()
' поместите JSON-строку в файл "desktopsource.json"
' переработанная JSON-строка будет сохранена в файл "desktopresult.json"
Dim strDesktop As String
Dim strJsonString As String
Dim varJson As Variant
Dim strState As String
Dim strResult As String
Dim lngIndent As Long
strDesktop = CreateObject("WScript.Shell").SpecialFolders.Item("Desktop")
strJsonString = ReadTextFile(strDesktop & "source.json", -2)
ParseJson strJsonString, varJson, strState
If strState <> "Error" Then
strResult = BeautifyJson(varJson)
WriteTextFile strResult, strDesktop & "result.json", -1
End If
CreateObject("WScript.Shell").PopUp strState, 1, , 64
End Sub
Sub ParseJson(ByVal strContent As String, varJson As Variant, strState As String)
' strContent - исходная JSON-строка
' varJson - созданный объект или массив, возвращаемый в качестве результата
' strState - строка Object|Array|Error, в зависимости от результата преобразования
Dim objTokens As Object
Dim objRegEx As Object
Dim bMatched As Boolean
Set objTokens = CreateObject("Scripting.Dictionary")
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
' спецификация http://www.json.org/
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = """(?:\""|[^""])*""(?=s*(?:,|:|]|}))"
Tokenize objTokens, objRegEx, strContent, bMatched, "str"
.Pattern = "(?:[+-])?(?:d+.d*|.d+|d+)e(?:[+-])?d+(?=s*(?:,|]|}))"
Tokenize objTokens, objRegEx, strContent, bMatched, "num"
.Pattern = "(?:[+-])?(?:d+.d*|.d+|d+)(?=s*(?:,|]|}))"
Tokenize objTokens, objRegEx, strContent, bMatched, "num"
.Pattern = "b(?:true|false|null)(?=s*(?:,|]|}))"
Tokenize objTokens, objRegEx, strContent, bMatched, "cst"
.Pattern = "b[A-Za-z_]w*(?=s*:)" ' неспецифицированные имена свойств без кавычек
Tokenize objTokens, objRegEx, strContent, bMatched, "nam"
.Pattern = "s"
strContent = .Replace(strContent, "")
.MultiLine = False
Do
bMatched = False
.Pattern = "<d+(?:str|nam)>:<d+(?:str|num|obj|arr|cst)>"
Tokenize objTokens, objRegEx, strContent, bMatched, "prp"
.Pattern = "{(?:<d+prp>(?:,<d+prp>)*)?}"
Tokenize objTokens, objRegEx, strContent, bMatched, "obj"
.Pattern = "[(?:<d+(?:str|num|obj|arr|cst)>(?:,<d+(?:str|num|obj|arr|cst)>)*)?]"
Tokenize objTokens, objRegEx, strContent, bMatched, "arr"
Loop While bMatched
.Pattern = "^<d+(?:obj|arr)>$" ' неспецифицированный массив в качестве корневого элемента
If Not (.Test(strContent) And objTokens.Exists(strContent)) Then
varJson = Null
strState = "Error"
Else
Retrieve objTokens, objRegEx, strContent, varJson
strState = IIf(IsObject(varJson), "Object", "Array")
End If
End With
End Sub
Sub Tokenize(objTokens, objRegEx, strContent, bMatched, strType)
Dim strKey As String
Dim strRes As String
Dim lngCopyIndex As Long
Dim objMatch As Object
strRes = ""
lngCopyIndex = 1
With objRegEx
For Each objMatch In .Execute(strContent)
strKey = "<" & objTokens.Count & strType & ">"
bMatched = True
With objMatch
objTokens(strKey) = .Value
strRes = strRes & Mid(strContent, lngCopyIndex, .FirstIndex - lngCopyIndex + 1) & strKey
lngCopyIndex = .FirstIndex + .Length + 1
End With
Next
strContent = strRes & Mid(strContent, lngCopyIndex, Len(strContent) - lngCopyIndex + 1)
End With
End Sub
Sub Retrieve(objTokens, objRegEx, strTokenKey, varTransfer)
Dim strContent As String
Dim strType As String
Dim objMatches As Object
Dim objMatch As Object
Dim strName As String
Dim varValue As Variant
Dim objArrayElts As Object
strType = Left(Right(strTokenKey, 4), 3)
strContent = objTokens(strTokenKey)
With objRegEx
.Global = True
Select Case strType
Case "obj"
.Pattern = "<d+w{3}>"
Set objMatches = .Execute(strContent)
Set varTransfer = CreateObject("Scripting.Dictionary")
For Each objMatch In objMatches
Retrieve objTokens, objRegEx, objMatch.Value, varTransfer
Next
Case "prp"
.Pattern = "<d+w{3}>"
Set objMatches = .Execute(strContent)
Retrieve objTokens, objRegEx, objMatches(0).Value, strName
Retrieve objTokens, objRegEx, objMatches(1).Value, varValue
If IsObject(varValue) Then
Set varTransfer(strName) = varValue
Else
varTransfer(strName) = varValue
End If
Case "arr"
.Pattern = "<d+w{3}>"
Set objMatches = .Execute(strContent)
Set objArrayElts = CreateObject("Scripting.Dictionary")
For Each objMatch In objMatches
Retrieve objTokens, objRegEx, objMatch.Value, varValue
If IsObject(varValue) Then
Set objArrayElts(objArrayElts.Count) = varValue
Else
objArrayElts(objArrayElts.Count) = varValue
End If
varTransfer = objArrayElts.Items
Next
Case "nam"
varTransfer = strContent
Case "str"
varTransfer = Mid(strContent, 2, Len(strContent) - 2)
varTransfer = Replace(varTransfer, """", """")
varTransfer = Replace(varTransfer, "\", "")
varTransfer = Replace(varTransfer, "/", "/")
varTransfer = Replace(varTransfer, "b", Chr(8))
varTransfer = Replace(varTransfer, "f", Chr(12))
varTransfer = Replace(varTransfer, "n", vbLf)
varTransfer = Replace(varTransfer, "r", vbCr)
varTransfer = Replace(varTransfer, "t", vbTab)
.Global = False
.Pattern = "\u[0-9a-fA-F]{4}"
Do While .Test(varTransfer)
varTransfer = .Replace(varTransfer, ChrW(("&H" & Right(.Execute(varTransfer)(0).Value, 4)) * 1))
Loop
Case "num"
varTransfer = Evaluate(strContent)
Case "cst"
Select Case LCase(strContent)
Case "true"
varTransfer = True
Case "false"
varTransfer = False
Case "null"
varTransfer = Null
End Select
End Select
End With
End Sub
Function BeautifyJson(varJson As Variant) As String
Dim strResult As String
Dim lngIndent As Long
BeautifyJson = ""
lngIndent = 0
BeautyTraverse BeautifyJson, lngIndent, varJson, vbTab, 1
End Function
Sub BeautyTraverse(strResult As String, lngIndent As Long, varElement As Variant, strIndent As String, lngStep As Long)
Dim arrKeys() As Variant
Dim lngIndex As Long
Dim strTemp As String
Select Case VarType(varElement)
Case vbObject
If varElement.Count = 0 Then
strResult = strResult & "{}"
Else
strResult = strResult & "{" & vbCrLf
lngIndent = lngIndent + lngStep
arrKeys = varElement.Keys
For lngIndex = 0 To UBound(arrKeys)
strResult = strResult & String(lngIndent, strIndent) & """" & arrKeys(lngIndex) & """" & ": "
BeautyTraverse strResult, lngIndent, varElement(arrKeys(lngIndex)), strIndent, lngStep
If Not (lngIndex = UBound(arrKeys)) Then strResult = strResult & ","
strResult = strResult & vbCrLf
Next
lngIndent = lngIndent - lngStep
strResult = strResult & String(lngIndent, strIndent) & "}"
End If
Case Is >= vbArray
If UBound(varElement) = -1 Then
strResult = strResult & "[]"
Else
strResult = strResult & "[" & vbCrLf
lngIndent = lngIndent + lngStep
For lngIndex = 0 To UBound(varElement)
strResult = strResult & String(lngIndent, strIndent)
BeautyTraverse strResult, lngIndent, varElement(lngIndex), strIndent, lngStep
If Not (lngIndex = UBound(varElement)) Then strResult = strResult & ","
strResult = strResult & vbCrLf
Next
lngIndent = lngIndent - lngStep
strResult = strResult & String(lngIndent, strIndent) & "]"
End If
Case vbInteger, vbLong, vbSingle, vbDouble
strResult = strResult & varElement
Case vbNull
strResult = strResult & "Null"
Case vbBoolean
strResult = strResult & IIf(varElement, "True", "False")
Case Else
strTemp = Replace(varElement, """", """")
strTemp = Replace(strTemp, "", "\")
strTemp = Replace(strTemp, "/", "/")
strTemp = Replace(strTemp, Chr(8), "b")
strTemp = Replace(strTemp, Chr(12), "f")
strTemp = Replace(strTemp, vbLf, "n")
strTemp = Replace(strTemp, vbCr, "r")
strTemp = Replace(strTemp, vbTab, "t")
strResult = strResult & """" & strTemp & """"
End Select
End Sub
Function ReadTextFile(strPath As String, lngFormat As Long) As String
' lngFormat -2 - System default, -1 - Unicode, 0 - ASCII
With CreateObject("Scripting.FileSystemObject").OpenTextFile(strPath, 1, False, lngFormat)
ReadTextFile = ""
If Not .AtEndOfStream Then ReadTextFile = .ReadAll
.Close
End With
End Function
Sub WriteTextFile(strContent As String, strPath As String, lngFormat As Long)
With CreateObject("Scripting.FileSystemObject").OpenTextFile(strPath, 2, True, lngFormat)
.Write (strContent)
.Close
End With
End Sub