Microsoft word 2003 xml

From Wikipedia, the free encyclopedia

WordProcessingML

Filename extension .XML (XML document)
Developed by Microsoft
Type of format Document file format
Extended from XML, DOC
DataDiagramingML

Filename extension .VDX (XML Drawing),
.VSX (XML Stencil),
.VTX (XML Template)
Developed by Microsoft
Type of format Diagramming vector graphics
Extended from XML, VSD, VSS, VST
SpreadsheetML

Filename extension .XML (XML Spreadsheet)
Developed by Microsoft
Type of format Spreadsheet
Extended from XML, XLS

The Microsoft Office XML formats are XML-based document formats (or XML schemas) introduced in versions of Microsoft Office prior to Office 2007. Microsoft Office XP introduced a new XML format for storing Excel spreadsheets and Office 2003 added an XML-based format for Word documents.

These formats were succeeded by Office Open XML (ECMA-376) in Microsoft Office 2007.

File formats[edit]

  • Microsoft Office Word 2003 XML Format — WordProcessingML or WordML (.XML)
  • Microsoft Office Excel 2002 and Excel 2003 XML Format — SpreadsheetML (.XML)
  • Microsoft Office Visio 2003 XML Format — DataDiagramingML (.VDX, .VSX, .VTX)
  • Microsoft Office InfoPath 2003 XML Format — XML FormTemplate (.XSN) (Compressed XML templates in a Cabinet file)
  • Microsoft Office InfoPath 2003 XML Format — XMLS FormTemplate (.XSN) (Compressed XML templates in a Cabinet file)

Limitations and differences with Office Open XML[edit]

Besides differences in the schema, there are several other differences between the earlier Office XML schema formats and Office Open XML.

  • Whereas the data in Office Open XML documents is stored in multiple parts and compressed in a ZIP file conforming to the Open Packaging Conventions, Microsoft Office XML formats are stored as plain single monolithic XML files (making them quite large, compared to OOXML and the Microsoft Office legacy binary formats). Also, embedded items like pictures are stored as binary encoded blocks within the XML. In case of Office Open XML, the header, footer, comments of a document etc. are all stored separately.
  • XML Spreadsheet documents cannot store Visual Basic for Applications macros, auditing tracer arrows, chart and other graphic objects, custom views, drawing object layers, outlining, scenarios, shared workbook information and user-defined function categories.[1] In contrast, the newer Office Open XML formats support full document fidelity.
  • Poor backward compatibility with the version of Word/Excel prior to the one in which they were introduced. For example, Word 2002 cannot open Word 2003 XML files unless a third-party converter add-in is installed.[2] Microsoft has released a Word 2003 XML Viewer which allows WordProcessingML files saved by Word 2003 to be viewed as HTML from within Internet Explorer.[3] For Office Open XML, Microsoft provides converters for Office 2003, Office XP and Office 2000.
  • Office Open XML formats are also defined for PowerPoint 2007, equation editing (Office MathML), vector drawing, charts and text art (DrawingML).

Word XML format example[edit]

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument
   xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
   xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint"
   xmlns:o="urn:schemas-microsoft-com:office:office"
   w:macrosPresent="no"
   w:embeddedObjPresent="no"
   w:ocxPresent="no"
   xml:space="preserve">
  <o:DocumentProperties>
    <o:Title>This is the title</o:Title>
    <o:Author>Darl McBride</o:Author>
    <o:LastAuthor>Bill Gates</o:LastAuthor>
    <o:Revision>1</o:Revision>
    <o:TotalTime>0</o:TotalTime>
    <o:Created>2007-03-15T23:05:00Z</o:Created>
    <o:LastSaved>2007-03-15T23:05:00Z</o:LastSaved>
    <o:Pages>1</o:Pages>
    <o:Words>6</o:Words>
    <o:Characters>40</o:Characters>
    <o:Company>SCO Group, Inc.</o:Company>
    <o:Lines>1</o:Lines>
    <o:Paragraphs>1</o:Paragraphs>
    <o:CharactersWithSpaces>45</o:CharactersWithSpaces>
    <o:Version>11.6359</o:Version>
  </o:DocumentProperties>
  <w:fonts>
    <w:defaultFonts
       w:ascii="Times New Roman"
       w:fareast="Times New Roman"
       w:h-ansi="Times New Roman"
       w:cs="Times New Roman" />
  </w:fonts>

  <w:styles>
    <w:versionOfBuiltInStylenames w:val="4" />
    <w:latentStyles w:defLockedState="off" w:latentStyleCount="156" />
    <w:style w:type="paragraph" w:default="on" w:styleId="Normal">
      <w:name w:val="Normal" />
      <w:rPr>
        <wx:font wx:val="Times New Roman" />
        <w:sz w:val="24" />
        <w:sz-cs w:val="24" />
        <w:lang w:val="EN-US" w:fareast="EN-US" w:bidi="AR-SA" />
      </w:rPr>
    </w:style>
    <w:style w:type="paragraph" w:styleId="Heading1">
      <w:name w:val="heading 1" />
      <wx:uiName wx:val="Heading 1" />
      <w:basedOn w:val="Normal" />
      <w:next w:val="Normal" />
      <w:rsid w:val="00D93B94" />
      <w:pPr>
        <w:pStyle w:val="Heading1" />
        <w:keepNext />
        <w:spacing w:before="240" w:after="60" />
        <w:outlineLvl w:val="0" />
      </w:pPr>
      <w:rPr>
        <w:rFonts w:ascii="Arial" w:h-ansi="Arial" w:cs="Arial" />
        <wx:font wx:val="Arial" />
        <w:b />
        <w:b-cs />
        <w:kern w:val="32" />
        <w:sz w:val="32" />
        <w:sz-cs w:val="32" />
      </w:rPr>
    </w:style>
    <w:style w:type="character" w:default="on" w:styleId="DefaultParagraphFont">
      <w:name w:val="Default Paragraph Font" />
      <w:semiHidden />
    </w:style>
    <w:style w:type="table" w:default="on" w:styleId="TableNormal">
      <w:name w:val="Normal Table" />
      <wx:uiName wx:val="Table Normal" />
      <w:semiHidden />
      <w:rPr>
        <wx:font wx:val="Times New Roman" />
      </w:rPr>
      <w:tblPr>
        <w:tblInd w:w="0" w:type="dxa" />
        <w:tblCellMar>
          <w:top w:w="0" w:type="dxa" />
          <w:left w:w="108" w:type="dxa" />
          <w:bottom w:w="0" w:type="dxa" />
          <w:right w:w="108" w:type="dxa" />
        </w:tblCellMar>
      </w:tblPr>
    </w:style>
    <w:style w:type="list" w:default="on" w:styleId="NoList">
      <w:name w:val="No List" />
      <w:semiHidden />
    </w:style>
  </w:styles>
  <w:docPr>
    <w:view w:val="print" />
    <w:zoom w:percent="100" />
    <w:doNotEmbedSystemFonts />
    <w:proofState w:spelling="clean" w:grammar="clean" />
    <w:attachedTemplate w:val="" />
    <w:defaultTabStop w:val="720" />
    <w:punctuationKerning />
    <w:characterSpacingControl w:val="DontCompress" />
    <w:optimizeForBrowser />
    <w:validateAgainstSchema />
    <w:saveInvalidXML w:val="off" />
    <w:ignoreMixedContent w:val="off" />
    <w:alwaysShowPlaceholderText w:val="off" />
    <w:compat>
      <w:breakWrappedTables />
      <w:snapToGridInCell />
      <w:wrapTextWithPunct />
      <w:useAsianBreakRules />
      <w:dontGrowAutofit />
    </w:compat>
  </w:docPr>
  <w:body>
    <wx:sect>
      <w:p>
        <w:r>
          <w:t>This is the first paragraph</w:t>
        </w:r>
      </w:p>
      <wx:sub-section>
        <w:p>
          <w:pPr>
            <w:pStyle w:val="Heading1" />
          </w:pPr>
          <w:r>
            <w:t>This is a heading</w:t>
          </w:r>
        </w:p>
        <w:sectPr>
          <w:pgSz w:w="12240" w:h="15840" />
          <w:pgMar w:top="1440"
		   w:right="1800"
		   w:bottom="1440"
		   w:left="1800"
		   w:header="720"
		   w:footer="720"
		   w:gutter="0" />
          <w:cols w:space="720" />
          <w:docGrid w:line-pitch="360" />
        </w:sectPr>
      </wx:sub-section>
    </wx:sect>
  </w:body>
</w:wordDocument>

Excel XML spreadsheet example[edit]

<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="https://www.w3.org/TR/html401/">
<Worksheet ss:Name="CognaLearn+Intedashboard">
<Table>
<Column ss:Index="1" ss:AutoFitWidth="0" ss:Width="110"/>
<Row>
<Cell><Data ss:Type="String">ID</Data></Cell>
<Cell><Data ss:Type="String">Project</Data></Cell>
<Cell><Data ss:Type="String">Reporter</Data></Cell>
<Cell><Data ss:Type="String">Assigned To</Data></Cell>
<Cell><Data ss:Type="String">Priority</Data></Cell>
<Cell><Data ss:Type="String">Severity</Data></Cell>
<Cell><Data ss:Type="String">Reproducibility</Data></Cell>
<Cell><Data ss:Type="String">Product Version</Data></Cell>
<Cell><Data ss:Type="String">Category</Data></Cell>
<Cell><Data ss:Type="String">Date Submitted</Data></Cell>
<Cell><Data ss:Type="String">OS</Data></Cell>
<Cell><Data ss:Type="String">OS Version</Data></Cell>
<Cell><Data ss:Type="String">Platform</Data></Cell>
<Cell><Data ss:Type="String">View Status</Data></Cell>
<Cell><Data ss:Type="String">Updated</Data></Cell>
<Cell><Data ss:Type="String">Summary</Data></Cell>
<Cell><Data ss:Type="String">Status</Data></Cell>
<Cell><Data ss:Type="String">Resolution</Data></Cell>
<Cell><Data ss:Type="String">Fixed in Version</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">0000033</Data></Cell>
<Cell><Data ss:Type="String">CognaLearn Intedashboard</Data></Cell>
<Cell><Data ss:Type="String">janardhana.l</Data></Cell>
<Cell><Data ss:Type="String"></Data></Cell>
<Cell><Data ss:Type="String">normal</Data></Cell>
<Cell><Data ss:Type="String">text</Data></Cell>
<Cell><Data ss:Type="String">always</Data></Cell>
<Cell><Data ss:Type="String"></Data></Cell>
<Cell><Data ss:Type="String">GUI</Data></Cell>
<Cell><Data ss:Type="String">2016-10-14</Data></Cell>
<Cell><Data ss:Type="String"></Data></Cell>
<Cell><Data ss:Type="String"></Data></Cell>
<Cell><Data ss:Type="String"></Data></Cell>
<Cell><Data ss:Type="String">public</Data></Cell>
<Cell><Data ss:Type="String">2016-10-14</Data></Cell>
<Cell><Data ss:Type="String">IE8 browser_Modules screen tool tip text is shown twice</Data></Cell>
<Cell><Data ss:Type="String">new</Data></Cell>
<Cell><Data ss:Type="String">open</Data></Cell>
<Cell><Data ss:Type="String"></Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>

See also[edit]

  • List of document markup languages
  • Comparison of document markup languages

References[edit]

  1. ^ «Features and limitations of XML Spreadsheet format (broken)». Archived from the original on 2007-10-09. Retrieved 2007-11-01.
  2. ^ «Polar WordML add-in (broken)». Archived from the original on 2009-04-11. Retrieved 2007-11-01.
  3. ^ Word 2003 XML Viewer
  • Overview of Office 2003 Developer Technologies
  • Office 2003 XML. ISBN 0-596-00538-5

External links[edit]

  • MSDN: XML Spreadsheet Reference
  • MSDN: Word 2003 XML Reference
  • Lawsuit about XML patent

One of Microsoft Office 2003’s most significant new features is the integration of XML technology.

The XML features of Word 2003 are a great way to ensure that you can always get to the information stored within documents. This article focuses on taking advantage of Word 2003’s XML features from within your applications.

The .doc file format that is still present in Word 2003 is essentially a proprietary binary format; sadly, .doc files are difficult to extract information from. By saving documents in the new XML format, you can easily retrieve information trapped inside of Word 2003 documents by using little more than XPath queries.

New features included in Word 2003 also allow you to force users into entering data into an XML document without their knowledge! Essentially, you can annotate a document with an XML schema and then protect the document, only allowing the user to add or edit information in specific locations throughout the document. This way, when the user saves the document, the data is written directly to an XML document, allowing it to be easily consumed by another application or a database.

With the ability to save as and read from XML, you can create sophisticated documents by processing and manipulating XML.

Another cool idea for using XML with Word 2003 documents is the ability to transform XML into other formats. As of this writing, there is an XSLT provided by Microsoft that takes a Word 2003 XML document and transforms it into an HTML document for viewing in a Web browser. Of course, my first reaction to this was “What good is that? I can save a document as HTML, right?” Then I realized that I have complete control over this transformation by designing my own XSLT, unlike the Save as HTML functionality from previous versions.

But these ideas are outside the topic of this article, which is focused on the ability to manipulate a Word 2003 document (saved as XML) from within code. Before Word 2003, all you could effectively do was to either use automation or to be really handy with the RTF format (and open the RTF using Word). With the ability of Word 2003 to both save as and read from XML, you can create sophisticated Word 2003 documents by processing and manipulating XML.

If you’re not sure why you might try something like this, here are a few ideas:

  • You can create documents from data within an application, such as form letters.
  • You can send Word 2003 documents to a client workstation over the Internet as XML and have it correctly interpreted at the client workstation as a Word 2003 document.
  • You can return Word 2003 documents from Web services.

So, to get a better feel for how this may benefit your own applications, let’s walk through the creation of a Word 2003 template, save it as XML, and then manipulate the document (using data provided by a user) to produce a final document for use in the application.

Creating a Schema

The first step in this process is to create a schema for the data that you can insert into the Word 2003 document template. Although you don’t actually need to have a schema, it’s a bit easier to work with the document if you apply a schema to it. Without the schema, you’d have to use a feature like bookmarks, which are rendered like the following XML snippet:

&lt;aml:annotation aml:id="0" 
    w:type="Word.Bookmark.Start" 
    w:name="ContactName"/&gt;
&lt;w:p&gt;
  &lt;w:r&gt;
    &lt;w:t&gt;[ContactName]&lt;/w:t&gt;
  &lt;/w:r&gt;
&lt;aml:annotation aml:id="0" 
    w:type="Word.Bookmark.End"/&gt;
&lt;/w:p&gt;

Notice how the bookmark, named ContactName in this example, is delimited by two empty annotation elements. The only thing that distinguishes these elements is the type attribute values of Word.Bookmark.Start and Word.Bookmark.End. This is slightly more complex than applying a schema to the document, which produces the XML in the following snippet:

&lt;ns0:ContactName&gt;
  &lt;w:p&gt;
    &lt;w:r&gt;
      &lt;w:t&gt;[ContactName]&lt;/w:t&gt;
    &lt;/w:r&gt;
  &lt;/w:p&gt;
&lt;/ns0:ContactName&gt;

Because I’m starting from scratch, the schema approach seems to be a slightly easier way to go. But I can imagine a situation where you are migrating your approach from an earlier version of Word and where your documents are marked up with bookmarks. As you can see, it’s still possible to use the bookmarks, just a tiny bit more work than using an attached schema.

For example, using the Northwind Customers table from SQL Server, I’ve created a very simple schema that is listed in its entirety in Listing 1.

This simple schema points out another advantage to using a schema-based approach: Word 2003 enforces the restrictions defined in the schema for the document. Any violations appear as errors in Word 2003’s task pane feature, but you can also validate the document against the schema with any XML validation tool.

The schema that you create can be as simple or as complex as you like. What is important is how to mark up the Word 2003 document with this schema so that you get the desired XML output from your application.

Making a Word 2003 Template

With the schema out of the way, let’s see how to apply it to a Word 2003 document. Start by creating or opening a document in Word 2003 with the desired boilerplate text. You may wish to highlight or somehow mark the locations for XML placeholders in your document so you can find them easily when it comes time to edit the document. My convention is to write the node names into the text of the document, and surround them with square brackets (e.g., [ContactName]). These become the placeholders for the schema elements in the document.

Because these are XML documents, you can pass them over the Internet from a Web service or Web site to a client.

To apply a schema, open the Tools menu and select the Templates and Add-ins… option. This opens the dialog box where you can manage the XML schemas that can be applied to Word 2003 documents. Select the XML Schema page to view the current list of attached schemas. If the list is blank, or the desired schema is not listed, click the Add Schema… button. After adding a schema, you are prompted to provide an alias for the schema, simply to make it easier to reference because the namespace is usually long and difficult to read. Once you’ve added your schema and provided the alias, it appears in the list on the XML Schema page. Enable the checkbox next to the desired schema, and then close the dialog box.

Once you press the OK button on the Templates dialog box, Word 2003 automatically displays the XML Structure task pane. If it doesn’t, you can press Ctrl+F1 to make the task pane appear, and then select the XML Structure page from the drop-down list at the top of the pane.

Now that a schema has been attached, you can apply the elements from the schema to the document. Depending upon how your schema is constructed, you may or may not see any elements in the lower part of the XML Structure task pane. In the example schema, because there is no parent element, all of the nodes initially appear in the list.

To apply the elements, select an area of your document (it doesn’t have to contain any text) and then choose one of the available elements to apply. When selecting the first element to apply, Word 2003 prompts you to define how you wish to apply this first element, either to the entire document or only to what you have selected. I’ve gotten into the habit of always applying the elements to the selection, as that seems to be what I’d want in most situations anyway. Continue to highlight text and apply the elements as desired.

After making your selections and applying the schema, you may or may not see much of a difference in your document. This depends on whether or not you have selected the Show XML tags in the document checkbox in the XML Structure task pane. With this option selected, you’ll see the start and end tags graphically represented in your Word 2003 document, as shown in Figure 1.

Figure 1: A Word 2003 document that has been marked up with the schema from Listing 1 looks like this.
Figure 1: A Word 2003 document that has been marked up with the schema from Listing 1 looks like this.

Now that you’ve applied the schema to your document, save it as an XML file so that you can parse it with your application code. To do this, start by choosing the Save As… option from the File menu. In the Save As Type drop-down list, choose XML Document (.xml*).** You will then see some additional controls to the right of the drop-down list that are specific to the XML format, as shown in Figure 2.

Figure 2: Choose the XML format from the lower portion of the Save As… dialog box showing the XML options.
Figure 2: Choose the XML format from the lower portion of the Save As… dialog box showing the XML options.

None of the checkboxes should be selected for this example, as you do not want to apply a transform or save only the data without the tags. This ensures that all of the information you have entered into your document is written out to XML.

Tips for Saving as XML

To make things a little cleaner in the XML output, you will want to ensure that you either spell everything correctly (not very likely if you use my naming convention for the placeholder text) or that you ignore any spelling errors flagged by Word 2003. If you leave in something that the Word 2003 spelling checker doesn’t like, the resultant XML looks similar to the following snippet:

&lt;ns0:ContactName&gt;
  &lt;w:p&gt;
    &lt;w:r&gt;
      &lt;w:t&gt;[&lt;/w:t&gt;
    &lt;/w:r&gt;
    &lt;w:proofErr w:type="spellStart" /&gt;
    &lt;w:r&gt;
      &lt;w:t&gt;ConyactName&lt;/w:t&gt;
    &lt;/w:r&gt;
    &lt;w:proofErr w:type="spellEnd" /&gt;
    &lt;w:r&gt;
      &lt;w:t&gt;]&lt;/w:t&gt;
    &lt;/w:r&gt;
  &lt;/w:p&gt;
&lt;/ns0:ContactName&gt;

As you can see, with the proofing errors, this changes the expected XML, because Word 2003 has embedded some proofErr elements. Once you handle the spelling errors (e.g., right-click the error in the document and choose “Ignore All”), the XML appears as shown in this snippet:

&lt;ns0:ContactName&gt;
  &lt;w:p&gt;
    &lt;w:r&gt;
      &lt;w:t&gt;[ContactName]&lt;/w:t&gt;
    &lt;/w:r&gt;
  &lt;/w:p&gt;
&lt;/ns0:ContactName&gt;

Also, be aware of where your paragraph marks appear in relation to your applied schema elements. In the snippet shown above, the [ContactName] text appears on a line all by itself. This places a paragraph element (the w:p element) completely within the ContactName element.

If, on the other hand, you placed ContactName on the same line as some other text or another element, the paragraph element won’t appear within the ContactName element but outside of it. Because my document contains both of these examples, the code will have to handle both situations appropriately.

Opening the XML File

Now that you’ve saved the document as XML, you can see the document on your hard drive with its XML extension. When you double-click it, it opens up within Word 2003, not in your associated program for XML files (which is, by default, Microsoft Internet Explorer). This is because there is a processing instruction at the top of the XML document that declares the ProgID to use when opening this XML file, as shown in this snippet:

&lt;?xml version="1.0" encoding="UTF-8" 
  standalone="yes" ?&gt;
&lt;?mso-application 
  progid="Word.Document"?&gt;
&lt;w:wordDocument . . .

If you comment out the second line of this document and then save it, you no longer launch Word 2003 when double-clicking the XML file. I found this useful during testing so that I could quickly view the XML produced by saving the Word 2003 document as XML.

Creating the Output

Now that the template has been defined and annotated as desired, you can write a small program to read data from an XML file and merge this data with the template. For this example, I’ve used a console application (as I don’t need a GUI) and chose Visual Basic .NET as the language.

First, look at the XML data in Listing 2 that I’ll merge with the document. It contains a single record from the Northwind database on SQL Server.

To make the example easier, I’ve saved this as a file called NWData.xml. In the real world, I’d probably capture the desired data in a Web page or Windows application and then retrieve the data from a database instead of a disk file.

There are more elements in this XML file compared to what I’ve applied in the Word 2003 document. That means I’ll have to be certain to skip these elements when processing the file; perhaps they’ll be added to other document templates in the future.

The code (the complete listing is shown in Listing 3) uses the XMLDocument class from the .NET Framework to do the bulk of the work. The code starts by loading both the data file and the Word 2003 template file into separate XML DOM objects. The Word 2003 document (saved as XML) is loaded through a method of a class instantiated as the oProcess object.

Dim oProcess As New WordXMLTest
Dim sDocPath As String
Dim sDataPath As String
Dim sSaveFile As String

sDocPath = "sample2.xml"
sDataPath = "NWdata.xml"
sSaveFile = "OutFile.xml"

Try
  'load the WordXML into a DOM
  oProcess.LoadFile(sDocPath)

  'load data into DOM 
  xmlDataDoc.Load(sDataPath)

Next, select the nodes from the data document with a simple XPath query, and iterate through them with a For-Next loop. Note that this code only assumes that a single customer record exists in the XML file. If there are multiple customers, add another outer loop to iterate through each customer record.

'iterate through data nodes
xmlNodes = xmlDataDoc.SelectNodes( _ 
          "/results/customers/*")
'replace Word doc area with data 
If Not xmlNodes Is Nothing Then
  For i = 0 To xmlNodes.Count - 1
    xmlNode = xmlNodes(i)
    sNodeName = xmlNode.Name
    sNewText = xmlNode.InnerText

    oProcess.ProcessNodes( _ 
        sNodeName, sNewText)
  Next
End If

For the ProcessNodes method, the desired node name and new text are passed as parameters. A separate method is used because in my template, I have the ContactName element in two locations within the document. I want to ensure that both of these locations are replaced with the same name.

So, in the ProcessNodes method, the specified node name is used to create XPath queries to retrieve lists of matching nodes. Then each query is executed with the SelectNodes method on the Word 2003 XML DOM object, oXMLWordDoc.

Public Sub ProcessNodes( _ 
       ByVal sNodeName As String, _
       ByVal sNodeValue As String)
'replace the node(s) in the document 'with the specified value 
Dim oNodeList As XmlNodeList

'get nodes that have 
'embedded paragraph marks
oNodeList = _ 
  oXMLWordDoc.SelectNodes( _ 
    "//ns0:" + sNodeName + "//w:p", _ 
    oNSMgr)

The interesting part of the code is the XPath queries; there are two of them, to ensure that you catch all of the nodes with the specified node name. Because some of the nodes are within a single paragraph and others are embedded within a paragraph, there are queries to account for both situations.

If Not oNodeList Is Nothing Then
  FillNodes(oNodeList, sNodeValue)
End If
'get nodes that do NOT have 
'embedded paragraph marks 
  oNodeList = _ 
    oXMLWordDoc.SelectNodes( _ 
      "//ns0:" + sNodeName, oNSMgr)

If Not oNodeList Is Nothing Then
  FillNodes(oNodeList, sNodeValue)
End If

The namespace prefix requires that the SelectNodes method specifies a NamespaceManager object, which is part of .NET’s System.XML namespace. Otherwise, your SelectNodes query will fail with errors. The NamespaceManager object, stored in a property of the WordXMLTest class, is populated within the New method, so it runs when the WordXMLTest class is instantiated.

Word 2003 enforces the restrictions defined in the schema for each document.

The namespace URIs come directly from the Word 2003 XML file and may vary depending upon the target namespace declared in your schema and what Word 2003 assigns as a prefix to your schema.

The FillNodes method referenced in the ProcessNodes method receives a node list object and a new node value as parameters. It changes the contents of the specified nodes on the oXMLWordDoc object.

Private Sub FillNodes( _ 
    ByVal oNodeList As XmlNodeList,
    ByVal sNodeValue As String)
Dim i As Integer
Dim oXMLNode, oInnerNode As XmlNode

For i = 0 To oNodeList.Count - 1
  oXMLNode = oNodeList(i)
  oInnerNode =   
    oXMLNode.SelectSingleNode( _ 
      "w:r/w:t", oNSMgr)
  If Not oInnerNode Is Nothing Then
    oInnerNode.InnerText = sNodeValue
  End If
Next

The replacement actually occurs on the text between the <w:t> and </w:t> tags that appear within the specified node object. This ensures that no formatting is lost, as font and paragraph properties are specified in the elements that surround the <w:t> element.

The last bit is to take the modified XML and save it to disk with a different file name so that it can be viewed. This is done by calling the Save method on the Word 2003 XML DOM object:

  'write out the new Doc file. 
  oProcess.save(sSaveFile)
. . .
Class WordXMLTest
  Public oXMLWordDoc As _ 
    New XmlDocument
  Public oNSMgr As _ 
    New XmlNamespaceManager( _ 
        oXMLWordDoc.NameTable)

  Public Sub save( _ 
    ByVal sFileName As String)
    
    oXMLWordDoc.Save(sFileName)
  End Sub

The Final Output

After running the program, you should now be able to double-click the output file and see the output in Word 2003, as shown in Figure 3.

Figure 3: After XML processing has the customized data in place, the final document looks official.
Figure 3: After XML processing has the customized data in place, the final document looks official.

If you double-click the output XML file and it doesn’t load in Word 2003, most likely you followed my earlier advice about commenting out the processing instruction in your template file so that you could view the XML in your registered XML application. Simply remove the comment so that the processing instruction becomes active again, allowing the document to open directly in Word 2003.

Another trick for ensuring that the document opens in Word 2003 is to force a DOC extension on the final output of the program. For example, to force the OutFile.XML file to open in Word 2003, rename the file as Outfile.XML.DOC.

Summary

You have seen how to take a Word document and process it using XML, a new feature of Word 2003. By marking up the desired document with an associated XML schema and saving it as XML, you’ve exposed the contents of the document through XML. With a little processing, the Word 2003 XML file is easily merged with XML data and can act as a template for a multitude of documents.

Listing 1: A simple XML schema based upon Northwind’s Customers table

&lt;?xml version="1.0" encoding="utf-8" ?&gt;
&lt;xs:schema xmlns:xs="<a href="http://www.w3.org/2001/XMLSchema";>http://www.w3.org/2001/XMLSchema<;/a>"
  elementFormDefault="qualified"
  targetNamespace="<a href="http://schemas.eps-software.com/NWindTest";>http://schemas.eps-software.com/NWindTest<;/a>"
  xmlns:eps="<a href="http://schemas.eps-software.com/NWindTest";>http://schemas.eps-software.com/NWindTest<;/a>"&gt;
  &lt;xs:element name="Address"&gt;
    &lt;xs:simpleType&gt;
      &lt;xs:restriction base="xs:string"&gt;
        &lt;xs:maxLength value="60" /&gt;
      &lt;/xs:restriction&gt;
    &lt;/xs:simpleType&gt;
  &lt;/xs:element&gt;
  &lt;xs:element name="City"&gt;
    &lt;xs:simpleType&gt;
      &lt;xs:restriction base="xs:string"&gt;
        &lt;xs:maxLength value="15" /&gt;
      &lt;/xs:restriction&gt;
    &lt;/xs:simpleType&gt;
  &lt;/xs:element&gt;
  &lt;xs:element name="ContactName"&gt;
    &lt;xs:simpleType&gt;
      &lt;xs:restriction base="xs:string"&gt;
        &lt;xs:maxLength value="30" /&gt;
      &lt;/xs:restriction&gt;
    &lt;/xs:simpleType&gt;
  &lt;/xs:element&gt;
  &lt;xs:element name="ContactTitle"&gt;
    &lt;xs:simpleType&gt;
      &lt;xs:restriction base="xs:string"&gt;
        &lt;xs:maxLength value="30" /&gt;
      &lt;/xs:restriction&gt;
    &lt;/xs:simpleType&gt;
  &lt;/xs:element&gt;
  &lt;xs:element name="Country"&gt;
    &lt;xs:simpleType&gt;
      &lt;xs:restriction base="xs:string"&gt;
        &lt;xs:maxLength value="15" /&gt;
      &lt;/xs:restriction&gt;
    &lt;/xs:simpleType&gt;
  &lt;/xs:element&gt;
  &lt;xs:element name="Fax"&gt;
    &lt;xs:simpleType&gt;
      &lt;xs:restriction base="xs:string"&gt;
        &lt;xs:maxLength value="24" /&gt;
      &lt;/xs:restriction&gt;
    &lt;/xs:simpleType&gt;
  &lt;/xs:element&gt;
  &lt;xs:element name="PostalCode"&gt;
    &lt;xs:simpleType&gt;
      &lt;xs:restriction base="xs:string"&gt;
        &lt;xs:maxLength value="10" /&gt;
      &lt;/xs:restriction&gt;
    &lt;/xs:simpleType&gt;
  &lt;/xs:element&gt;
  &lt;xs:element name="Region"&gt;
    &lt;xs:simpleType&gt;
      &lt;xs:restriction base="xs:string"&gt;
        &lt;xs:maxLength value="15" /&gt;
      &lt;/xs:restriction&gt;
    &lt;/xs:simpleType&gt;
  &lt;/xs:element&gt;
&lt;/xs:schema&gt;

Listing 2: The XML data to be merged with the marked-up document

&lt;?xml version="1.0" encoding="utf-8" ?&gt;
&lt;results&gt;
  &lt;customers&gt;
    &lt;CustomerID&gt;OLDWO&lt;/CustomerID&gt;
    &lt;CompanyName&gt;Old World Delicatessen&lt;/CompanyName&gt;
    &lt;ContactName&gt;Rene Phillips&lt;/ContactName&gt;
    &lt;ContactTitle&gt;Sales Representative&lt;/ContactTitle&gt;
    &lt;Address&gt;2743 Bering St.&lt;/Address&gt;
    &lt;City&gt;Anchorage&lt;/City&gt;
    &lt;Region&gt;AK&lt;/Region&gt;
    &lt;PostalCode&gt;99508&lt;/PostalCode&gt;
    &lt;Country&gt;USA&lt;/Country&gt;
    &lt;Phone&gt;(907) 555-7584&lt;/Phone&gt;
    &lt;Fax&gt;(907) 555-2880&lt;/Fax&gt;
  &lt;/customers&gt;
&lt;/results&gt;

Listing 3: The complete VB.NET code that merges the XML with the marked-up document

  Sub Main()
    Dim oProcess As New WordXMLTest
    Dim sDocPath, sDataPath, sSaveFile As String
    Dim sNodeName, sNewText As String
    Dim xmlDataDoc As New XmlDocument
    Dim xmlNodes As XmlNodeList
    Dim xmlNode As XmlNode
    Dim oExc As Exception
    Dim i As Integer

    sDocPath = " sample2.xml"
    sDataPath = "NWdata.xml"
    sSaveFile = " OutFile.xml"

    Try
      'load the WordXML into a DOM
      oProcess.LoadFile(sDocPath)

      'load data into DOM 
      xmlDataDoc.Load(sDataPath)

      'iterate through data nodes
      xmlNodes = xmlDataDoc.SelectNodes( _ 
        "/results/customers/*")
      'replace Word doc area with data 
      If Not xmlNodes Is Nothing Then
        For i = 0 To xmlNodes.Count - 1
          xmlNode = xmlNodes(i)
          sNodeName = xmlNode.Name
          sNewText = xmlNode.InnerText

          oProcess.ProcessNodes(sNodeName, sNewText)
        Next
      End If
      'write out the new Doc file. 
      oProcess.save(sSaveFile)

    Catch oExc
      MsgBox(oExc.Message, MsgBoxStyle.Critical, _
             "Error")
    End Try
  End Sub
End Module

Class WordXMLTest
  Public oXMLWordDoc As New XmlDocument
  Public oNSMgr As New _ 
    XmlNamespaceManager(oXMLWordDoc.NameTable)

  Public Sub New()
    'add the schema's namespace to a name space manager
    LoadNS("ns0", _ 
      "<a href="http://schemas.eps-software.com/NWindTest";>http://schemas.eps-software.com/NWindTest<;/a>")
    LoadNS("w", _ 
"<a href="http://schemas.microsoft.com/office/word/2003/wordml";>http://schemas.microsoft.com/office/word/2003/wordml<;/a>")
  End Sub

  Public Sub LoadFile(ByVal sFilePath As String)
    oXMLWordDoc.Load(sFilePath)
  End Sub

  Private Sub LoadNS(ByVal sPrefix, ByVal sURI)
    oNSMgr.AddNamespace(sPrefix, sURI)
  End Sub

  Public Sub save(ByVal sFileName As String)
    oXMLWordDoc.Save(sFileName)
  End Sub

  Public Sub ProcessNodes(ByVal sNodeName As String, _ 
                          ByVal sNodeValue As String)
    'replace node(s) in document with value 
    Dim oNodeList As XmlNodeList

    'gets nodes that have embedded paragraph marks
    oNodeList = oXMLWordDoc.SelectNodes( _ 
      "//ns0:" + sNodeName + "//w:p", oNSMgr)

    If Not oNodeList Is Nothing Then
      FillNodes(oNodeList, sNodeValue)
    End If
    'gets nodes that do NOT have 
    'embedded paragraph marks 
    oNodeList = oXMLWordDoc.SelectNodes( _ 
      "//ns0:" + sNodeName, oNSMgr)

    If Not oNodeList Is Nothing Then
      FillNodes(oNodeList, sNodeValue)
    End If
  End Sub

  Private Sub FillNodes(ByVal oNodeList As XmlNodeList, ByVal sNodeValue As String)
    Dim i As Integer
    Dim oXMLNode, oInnerNode As XmlNode

    For i = 0 To oNodeList.Count - 1
      oXMLNode = oNodeList(i)
      oInnerNode = oXMLNode.SelectSingleNode("w:r/w:t", oNSMgr)
      If Not oInnerNode Is Nothing Then
        oInnerNode.InnerText = sNodeValue
      End If
    Next
  End Sub
End Class

Show All

About XML documents in Word

Note  XML features, except for saving documents as XML with the Word XML schema, are available only in Microsoft Office Professional Edition 2003 and stand-alone Microsoft Office Word 2003.

Why XML?

Extensible Markup Language (XML) enables you to organize and work with documents and data in ways that were previously impossible or very difficult. By using custom XML schemas, you can now identify and extract specific pieces of business data from ordinary business documents.

For example, an invoice that contains the name and address of a customer or a report that contains last quarter’s financial results are no longer static documents. The information they contain can be passed to a database or reused elsewhere, outside of the documents.

The ability to save a Microsoft Word document in standard XML format helps separate its content from the confines of the document. The content becomes available for automated data-mining and repurposing processes. The content can easily be searched and even modified by processes other than Word, such as server-based data processing.

Because Word is capable of representing its documents as XML, automated server-based processes can now generate Word documents on the fly by pulling together data from various sources. Such a document could then easily be updated on a regular basis, eliminating the manual search for relevant data and unnecessary retyping.

Word and XML

Microsoft Word enables you to work with XML documents in two ways:

  • Use the Word XML schema    You can create a document in Word as you normally would and then save it as an XML document. Word uses its own XML schema, WordML, to apply XML tags that store information, such as file properties, and define the structure of the document, such as its paragraphs, headings, and tables. Word also uses XML tags to store formatting and layout information, according to the Word XML schema.
  • Use any XML schema    You can create or open a document in Word, attach any custom XML schema to it, and apply XML tags to the content of the document. When you save this document as an XML document, the XML tags define the structure of the document in terms of the XML schema that is attached to it.

    When you save the document, by default both the Word schema and the custom schema are attached to the document, preserving the data as defined by the custom schema and the rich formatting as defined by the Word XML schema. You also have the option of saving the document as data only, according to the custom schema.

    Whether you use the built-in Word XML schema for a Word document structure or attach your own schema for a structure that is more suitable for your business, any software that can parse XML can read and process the data in a document that you save as an XML document (.xml file).

    For example, if the custom schema is for résumé data, the XML tags in the document will define the structure of the document in terms of name, address, work experience, education, and so on. When you save the document, you have both a richly formatted document that looks professional when printed and a data file that can be processed by any program that can read XML.

You can also store XML data in a document that you save as a Word document (.doc) or template (.dot). However, only Word will be able to read or process the XML.

XML tagging

When a custom XML schema is attached to a document, the XML Structure task pane provides a list of elements that are defined in the schema. You apply XML tags to the document by selecting document content and then choosing an element from the list. If the schema defines attributes for an element, you can specify these as well in the XML Structure task pane.

Note  You can attach more than one schema to a document. Elements from all attached schemas are available in the list of elements in the XML Structure task pane.

A check box on the pane enables you to see the XML tags inline, in the context of the document.

If the structure of the document violates the rules of the schema, a purple wavy line marks the spot in the document, and the XML Structure task pane reports the violation.

XSL Transformations

Upon opening and saving XML documents, you can apply Extensible Stylesheet Language Transformation (XSLT) files that render the XML data in a particular format. For example, you could have one XSLT that presents data as a specification and another XSLT that presents the same data as a parts list, where quantities and prices are calculated.

ShowXSLTs applied when opening a document

An XML document may have more than one XSLT associated with it. When this is the case, you must select the XSLT that you want to use to display the document. You do this in the XML Document pane, where the available XSLTs (data views) are listed.

If no XSLT is associated with an XML document, then Word opens it using its default XSLT, or «Data only view.»

If the Word XML schema is attached to the document, Word opens the document without applying an XSLT, even if one is associated with the document.

Note  Rather than applying an XSLT manually, you can define solutions that associate XSLTs with certain types of XML documents. You make this association in the Schema Library, which you can access on the XML Schema tab of the Templates and Add-ins dialog box (Tools menu).

ShowXSLTs applied when saving a document

You can apply an XSLT when you save an XML document by selecting the Apply transform check box and browsing to the XSLT file.

Caution  If you apply an XSLT when you save the file, Word discards any data that the XSLT does not use.

Microsoft Office 2003: поддержка XML

Каждая новая версия Microsoft Office привлекает внимание огромного числа пользователей. Оно и неудивительно: пакет остается самым популярным продуктом в своей области. Обычный пользователь, прежде всего, хочет знать, стоит ли ему отказываться от старой версии Office и переходить на новую, что может ему предложить последний релиз. По информации Microsoft, в новом Офисе изменений довольно много, но большинство из них адресовано сотрудникам крупных компаний — тем, кто использует Office на работе и работает в команде.

Улучшенная поддержка XML является главным и наиболее важным нововведением Microsoft Office 2003. Язык XML (Extensible Markup Language) позволяет более гибко работать с нужными данными в рамках Microsoft Office и снимает ограничения между форматами, системами и странами. При помощи XML можно собрать воедино разрозненные данные и работать с ними в одном документе. Чтобы дать пользователю представление о том, для чего нужен XML, приведем такой пример: скажем, две конкурирующие компании (например, два рекламных агентства) предлагают свои условия размещения рекламы для крупного заказчика. С потенциальным клиентом общается менеджер, однако, чтобы ответить на все вопросы заказчика, менеджеру необходимо постоянно держать связь с множеством людей. Он должен, скажем, знать, сколько в данный момент стоят рекламные площади в изданиях, которыми интересуется заказчик, цену на материалы для изготовления рекламной атрибутики (футболок, визиток и пр.) и т.п.

Теперь представим, что в первой компании используют стандартные средства связи. Чтобы ответить на вопрос заказчика, менеджер находит в базе данных сотрудника, который заведует нужным сектором, и пишет ему письмо с вопросом, например, о текущей стоимости рекламы на первой полосе еженедельника. Сотрудник получает письмо, но не всегда отвечает на него сразу. Он может быть занят другими неотложными делами. Нашему менеджеру приходится сначала ждать ответа по электронной почте, а после звонить по телефону и в случае неудачи обращаться к другим сотрудникам, которые могут владеть этой информацией. На все это уходит очень много времени, а потенциальный клиент тем временем ждет…

Посмотрим теперь на другую компанию, которая использует технологию XML для обмена информацией. Все данные, которые касаются текущих изменений в той области, в которой работает компания, заносятся в базу данных. Как только цены меняются, ответственный сотрудник изменяет данные в базе. Данные сохраняются в формате XML, и люди, имеющие доступ к ним, могут в любой момент найти необходимые сведения и использовать в своих отчетах, электронных письмах, таблицах и т.д. Информация в базе данных изменяется каждый день, поскольку это является обязанностью каждого ответственного лица. В такой компании менеджер сможет найти вопрос на ответ клиента в считанные секунды. Ему не нужно для этого искать ответственного сотрудника. Более того, он вообще может не знать, в чьи обязанности входит тот или иной сегмент деятельности компании.

Встроенная поддержка XML в Office 2003 позволяет пользователям работать в знакомой офисной среде, но при этом создавать и сохранять документы XML, даже не зная о том, что они работают с XML. Это значит, что пользователи могут не проходить никаких дополнительных курсов обучения XML, а работать с ним, используя привычные инструменты. Даже несмотря на то, что для работы с XML в Microsoft Office вовсе необязательно владеть основами программирования, в некоторых случаях полезно знать язык. Поэтому мы считаем, что тут будет не лишним привести некоторые основные термины XML:
• DTD (document type definition). Правила, в которых хранятся имена элементов и атрибуты. Эти правила определяют, как элементы могут быть использованы вместе, и в каком порядке.
• Элемент. Элементом называют любые данные в документе XML, помещенные между открывающими и закрывающими тегами, например: <title> Добро пожаловать в Microsoft Office 2003</title> .
• Таблица стилей. Коллекция правил форматирования, которые контролируют отображение документа. Таблицы стилей могут быть помещены непосредственно в документ или же сохраняться отдельно и управлять документом при помощи ссылок. Обычно таблицы стилей хранятся отдельно.
• Схема XML. Документ, который определяет, какие элементы, объекты и содержание могут быть размещены в документе.
• XSL (Extensible Stylesheet Language). Язык, который используется для создания таблиц стилей.
• XSLT (XSL Transformations). Трансформация структуры документа XML для просмотра его в разных режимах.
• Использование XML анализа данных. Работая с Excel, вы можете использовать возможности XML для работы со структурированными табличными данными для расчетов и анализа.
• Создание, редактирование и управление содержанием.

Напомним, что уже в Microsoft Office XP присутствовала поддержка XML. В частности, пользователи могли сохранять файлы в этом формате. Однако в новой версии было добавлено множество новых функций. Работать с XML можно даже ничего не зная об этом языке. Выполняя простые процедуры, можно прикреплять схему XML, добавлять теги XML в документы, создавать таблицы Excel и без труда заполнять их данными.
В Microsoft Word 2003 поддержка XML заключается в возможностях просматривать файлы XML, работать с ними и сохранять их. Пользователи могут создавать и прикреплять свои собственные схемы, которые могут использоваться в работе с Microsoft Word или отдельно. Чтобы пользователь был уверен в том, что он вводит правильные теги, в Word встроена их автоматическая проверка. У вас также есть возможность искать в базе данных и импортировать данные XML в документ.

В Microsoft Word 2003 появилась новая область задач XML. Для начала работы с ней необходимо нажать на кнопку Templates And Add-Ins и в появившемся диалоговом окне выбрать схему XML-документа. Вы также можете вызвать это окно, выполнив команду Tools > Templates And Add-Ins и перейдя на закладку XML Schema.
В один документ можно добавить не одну схему, а несколько. При добавлении схем Word обязательно проверит их и сообщит, если они не могут корректно работать вместе. После выбора схемы Word проверит, правильно ли она сформирована. Если в коде есть ошибки, программа предупредит вас об этом.

С помощью области задач XML можно просматривать структуру XML-документа и осуществлять навигацию по нему. Кроме того, основываясь на выбранной схеме, Word проверит правильность используемых тегов.
Используя структуру XML, можно разметить документ (отдельные фрагменты), что принципиально упростит его последующую автоматическую обработку. Если раньше для этого требовалось самое тщательное оформление (специальные стили для каждого атрибута, определенный порядок их следования и т.д.), то теперь пользователям предоставляется полная свобода: новая объектная модель XML обеспечивает чрезвычайно простой программный доступ к любым элементам.

Одно из дополнений Microsoft Word 2003 — возможность просматривать теги XML. Когда вы открываете документ XML и начинаете работу, любые теги, которые вы вставляете (или те, которые уже введены в документ), отображаются в виде скобок вокруг слова, фразы или абзаца. Режим просмотра тегов включается установкой галочки Show XML tags на области задач.

Сергей Бондаренко, Марина Двораковская, blackmore_s_night@yahoo.com

Компьютерная газета. Статья была опубликована в номере 43 за 2003 год в рубрике soft :: ос

Понравилась статья? Поделить с друзьями:
  • Microsoft word 2003 to word 2010
  • Microsoft word 2007 не открывает документы
  • Microsoft word 2003 for free download
  • Microsoft word 2007 не открывает документ
  • Microsoft word 2003 2019