Office Open XML (also informally known as OOXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. The format was initially standardized by the Ecma (as ECMA-376), and by the ISO and IEC (as ISO/IEC 29500) in later versions.
Contents
- 1 Is XML same as docx?
- 2 How do I open an XML document in word?
- 3 Can you create a XML document in word?
- 4 What are XML documents?
- 5 How do I convert XML to word?
- 6 What is XML used for?
- 7 What is an XML file and how do I open it?
- 8 What is an Office Open XML document?
- 9 Does Microsoft Office use XML?
- 10 How do I add an XML file to Word?
- 11 How do XML documents work?
- 12 What is XML with example?
- 13 How do you represent an XML document?
- 14 How do I convert XML to text?
- 15 How do I open an XML file in Windows 10?
- 16 How do I edit an XML document?
- 17 What are the benefits of using XML?
- 18 What are the main features of XML?
- 19 Is XML still relevant 2021?
- 20 How do I open a XML file in Chrome?
Is XML same as docx?
DOCX was originally developed by Microsoft as an XML-based format to replace the proprietary binary format that uses the . doc file extension. Since Word 2007, DOCX has been the default format for the Save operation.
How do I open an XML document in word?
#1) Open Windows Explorer and browse to the location where the XML file is located. We have browsed to the location of our XML file MySampleXML as seen below. #2) Now right-click over the file and select Open With to choose Notepad or Microsoft Office Word from the list of options available to open the XML file.
Can you create a XML document in word?
Creating an XML Document.You can open and edit an XML file in Word, in the same way you can an HTML file. You can also open it in an XML editor such as XMetal, or as a plain text file in a text editor such as Notepad.
What are XML documents?
XML documents are strictly text files. In the context of data transport, the phrase “XML document” refers to a file or data stream containing any form of structured data.XML documents contain only markup and content. All of the rules and semantics of the document are defined by the applications that process them.
How do I convert XML to word?
About This Article
- Open Word.
- Click File.
- Click Save As.
- Click Browse.
- Select Word Document from the “Save as type” drop-down.
- Click Save.
What is XML used for?
The Extensible Markup Language (XML) is a simple text-based format for representing structured information: documents, data, configuration, books, transactions, invoices, and much more. It was derived from an older standard format called SGML (ISO 8879), in order to be more suitable for Web use.
What is an XML file and how do I open it?
XML files are encoded in plaintext, so you can open them in any text editor and be able to clearly read it. Right-click the XML file and select “Open With.” This will display a list of programs to open the file in. Select “Notepad” (Windows) or “TextEdit” (Mac).
What is an Office Open XML document?
Office Open XML, also known as OpenXML or OOXML, is an XML-based format for office documents, including word processing documents, spreadsheets, presentations, as well as charts, diagrams, shapes, and other graphical material.
Does Microsoft Office use XML?
Starting with the 2007 Microsoft Office system, Microsoft Office uses the XML-based file formats, such as . docx, . xlsx, and .These formats and file name extensions apply to Microsoft Word, Microsoft Excel, and Microsoft PowerPoint.
How do I add an XML file to Word?
To add an XMLNode control to a document
- In the document in the Visual Studio designer, on the ribbon, click the Developer tab.
- In the XML group, click Schema.
- Click the XML Schema tab.
- Click Add Schema.
- Select an XML schema that contains non-repeating schema elements from the Add Schema dialog box and click Open.
How do XML documents work?
Right-click the XML file you want to open, point to “Open With” on the context menu, and then click the “Notepad” option. Note: We’re using Windows examples here, but the same holds true for other operating systems. Look for a good third-party text editor that is designed to support XML files.
What is XML with example?
Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.
XML.
Filename extension | .xml |
---|---|
Developed by | World Wide Web Consortium |
Type of format | Markup language |
Extended from | SGML |
How do you represent an XML document?
An XML document consists of three parts, in the order given:
- An XML declaration (which is technically optional, but recommended in most normal cases)
- A document type declaration that refers to a DTD (which is optional, but required if you want validation)
- A body or document instance (which is required)
How do I convert XML to text?
How to Convert XML to TXT with Doxillion Document Converter Software
- Download Doxillion Document Converter Software. Download Doxillion Document Converter Software.
- Import XML Files into the Program.
- Choose an Output Folder.
- Set the Output Format.
- Convert XML to TXT.
How do I open an XML file in Windows 10?
Replies (48)
- Type Default programs in the search bar on Windows 10.
- Associate a file type or protocol with a program under Choose the program that Windows use by default in the Default Program Window.
- Select the . xml file type in the Associate a file type or protocol with a program Window and click on Ok.
How do I edit an XML document?
Open the file you wish to edit by double clicking the file name. The file will open and display the existing code. Edit your XML file. Review your editing.
What are the benefits of using XML?
Advantages of XML
- XML uses human, not computer, language. XML is readable and understandable, even by novices, and no more difficult to code than HTML.
- XML is completely compatible with Java™ and 100% portable. Any application that can process XML can use your information, regardless of platform.
- XML is extendable.
What are the main features of XML?
A basic summary of the main features of XML follows:
- Excellent for handling data with a complex structure or atypical data.
- Data described using markup language.
- Text data description.
- Human- and computer-friendly format.
- Handles data in a tree structure having one-and only one-root element.
Is XML still relevant 2021?
XML is used extensively in today’s ‘e’ world – banking services, online retail stores, integrating industrial systems, etc. One can put as many different types of information in the XML and it still remains simple.
How do I open a XML file in Chrome?
Just about every browser can open an XML file. In Chrome, just open a new tab and drag the XML file over. Alternatively, right click on the XML file and hover over “Open with” then click “Chrome”. When you do, the file will open in a new tab.
From Wikipedia, the free encyclopedia
Filename extension | .XML (XML document) |
---|---|
Developed by | Microsoft |
Type of format | Document file format |
Extended from | XML, DOC |
Filename extension | .VDX (XML Drawing),.VSX (XML Stencil),.VTX (XML Template) |
---|---|
Developed by | Microsoft |
Type of format | Diagramming vector graphics |
Extended from | XML, VSD, VSS, VST |
Filename extension | .XML (XML Spreadsheet) |
---|---|
Developed by | Microsoft |
Type of format | Spreadsheet |
Extended from | XML, XLS |
The Microsoft Office XML formats are XML-based document formats (or XML schemas) introduced in versions of Microsoft Office prior to Office 2007. Microsoft Office XP introduced a new XML format for storing Excel spreadsheets and Office 2003 added an XML-based format for Word documents.
These formats were succeeded by Office Open XML (ECMA-376) in Microsoft Office 2007.
File formats[edit]
- Microsoft Office Word 2003 XML Format — WordProcessingML or WordML (.XML)
- Microsoft Office Excel 2002 and Excel 2003 XML Format — SpreadsheetML (.XML)
- Microsoft Office Visio 2003 XML Format — DataDiagramingML (.VDX, .VSX, .VTX)
- Microsoft Office InfoPath 2003 XML Format — XML FormTemplate (.XSN) (Compressed XML templates in a Cabinet file)
- Microsoft Office InfoPath 2003 XML Format — XMLS FormTemplate (.XSN) (Compressed XML templates in a Cabinet file)
Limitations and differences with Office Open XML[edit]
Besides differences in the schema, there are several other differences between the earlier Office XML schema formats and Office Open XML.
- Whereas the data in Office Open XML documents is stored in multiple parts and compressed in a ZIP file conforming to the Open Packaging Conventions, Microsoft Office XML formats are stored as plain single monolithic XML files (making them quite large, compared to OOXML and the Microsoft Office legacy binary formats). Also, embedded items like pictures are stored as binary encoded blocks within the XML. In case of Office Open XML, the header, footer, comments of a document etc. are all stored separately.
- XML Spreadsheet documents cannot store Visual Basic for Applications macros, auditing tracer arrows, chart and other graphic objects, custom views, drawing object layers, outlining, scenarios, shared workbook information and user-defined function categories.[1] In contrast, the newer Office Open XML formats support full document fidelity.
- Poor backward compatibility with the version of Word/Excel prior to the one in which they were introduced. For example, Word 2002 cannot open Word 2003 XML files unless a third-party converter add-in is installed.[2] Microsoft has released a Word 2003 XML Viewer which allows WordProcessingML files saved by Word 2003 to be viewed as HTML from within Internet Explorer.[3] For Office Open XML, Microsoft provides converters for Office 2003, Office XP and Office 2000.
- Office Open XML formats are also defined for PowerPoint 2007, equation editing (Office MathML), vector drawing, charts and text art (DrawingML).
Word XML format example[edit]
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <?mso-application progid="Word.Document"?> <w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve"> <o:DocumentProperties> <o:Title>This is the title</o:Title> <o:Author>Darl McBride</o:Author> <o:LastAuthor>Bill Gates</o:LastAuthor> <o:Revision>1</o:Revision> <o:TotalTime>0</o:TotalTime> <o:Created>2007-03-15T23:05:00Z</o:Created> <o:LastSaved>2007-03-15T23:05:00Z</o:LastSaved> <o:Pages>1</o:Pages> <o:Words>6</o:Words> <o:Characters>40</o:Characters> <o:Company>SCO Group, Inc.</o:Company> <o:Lines>1</o:Lines> <o:Paragraphs>1</o:Paragraphs> <o:CharactersWithSpaces>45</o:CharactersWithSpaces> <o:Version>11.6359</o:Version> </o:DocumentProperties> <w:fonts> <w:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman" /> </w:fonts> <w:styles> <w:versionOfBuiltInStylenames w:val="4" /> <w:latentStyles w:defLockedState="off" w:latentStyleCount="156" /> <w:style w:type="paragraph" w:default="on" w:styleId="Normal"> <w:name w:val="Normal" /> <w:rPr> <wx:font wx:val="Times New Roman" /> <w:sz w:val="24" /> <w:sz-cs w:val="24" /> <w:lang w:val="EN-US" w:fareast="EN-US" w:bidi="AR-SA" /> </w:rPr> </w:style> <w:style w:type="paragraph" w:styleId="Heading1"> <w:name w:val="heading 1" /> <wx:uiName wx:val="Heading 1" /> <w:basedOn w:val="Normal" /> <w:next w:val="Normal" /> <w:rsid w:val="00D93B94" /> <w:pPr> <w:pStyle w:val="Heading1" /> <w:keepNext /> <w:spacing w:before="240" w:after="60" /> <w:outlineLvl w:val="0" /> </w:pPr> <w:rPr> <w:rFonts w:ascii="Arial" w:h-ansi="Arial" w:cs="Arial" /> <wx:font wx:val="Arial" /> <w:b /> <w:b-cs /> <w:kern w:val="32" /> <w:sz w:val="32" /> <w:sz-cs w:val="32" /> </w:rPr> </w:style> <w:style w:type="character" w:default="on" w:styleId="DefaultParagraphFont"> <w:name w:val="Default Paragraph Font" /> <w:semiHidden /> </w:style> <w:style w:type="table" w:default="on" w:styleId="TableNormal"> <w:name w:val="Normal Table" /> <wx:uiName wx:val="Table Normal" /> <w:semiHidden /> <w:rPr> <wx:font wx:val="Times New Roman" /> </w:rPr> <w:tblPr> <w:tblInd w:w="0" w:type="dxa" /> <w:tblCellMar> <w:top w:w="0" w:type="dxa" /> <w:left w:w="108" w:type="dxa" /> <w:bottom w:w="0" w:type="dxa" /> <w:right w:w="108" w:type="dxa" /> </w:tblCellMar> </w:tblPr> </w:style> <w:style w:type="list" w:default="on" w:styleId="NoList"> <w:name w:val="No List" /> <w:semiHidden /> </w:style> </w:styles> <w:docPr> <w:view w:val="print" /> <w:zoom w:percent="100" /> <w:doNotEmbedSystemFonts /> <w:proofState w:spelling="clean" w:grammar="clean" /> <w:attachedTemplate w:val="" /> <w:defaultTabStop w:val="720" /> <w:punctuationKerning /> <w:characterSpacingControl w:val="DontCompress" /> <w:optimizeForBrowser /> <w:validateAgainstSchema /> <w:saveInvalidXML w:val="off" /> <w:ignoreMixedContent w:val="off" /> <w:alwaysShowPlaceholderText w:val="off" /> <w:compat> <w:breakWrappedTables /> <w:snapToGridInCell /> <w:wrapTextWithPunct /> <w:useAsianBreakRules /> <w:dontGrowAutofit /> </w:compat> </w:docPr> <w:body> <wx:sect> <w:p> <w:r> <w:t>This is the first paragraph</w:t> </w:r> </w:p> <wx:sub-section> <w:p> <w:pPr> <w:pStyle w:val="Heading1" /> </w:pPr> <w:r> <w:t>This is a heading</w:t> </w:r> </w:p> <w:sectPr> <w:pgSz w:w="12240" w:h="15840" /> <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0" /> <w:cols w:space="720" /> <w:docGrid w:line-pitch="360" /> </w:sectPr> </wx:sub-section> </wx:sect> </w:body> </w:wordDocument>
Excel XML spreadsheet example[edit]
<?xml version="1.0" encoding="UTF-8"?> <?mso-application progid="Excel.Sheet"?> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="https://www.w3.org/TR/html401/"> <Worksheet ss:Name="CognaLearn+Intedashboard"> <Table> <Column ss:Index="1" ss:AutoFitWidth="0" ss:Width="110"/> <Row> <Cell><Data ss:Type="String">ID</Data></Cell> <Cell><Data ss:Type="String">Project</Data></Cell> <Cell><Data ss:Type="String">Reporter</Data></Cell> <Cell><Data ss:Type="String">Assigned To</Data></Cell> <Cell><Data ss:Type="String">Priority</Data></Cell> <Cell><Data ss:Type="String">Severity</Data></Cell> <Cell><Data ss:Type="String">Reproducibility</Data></Cell> <Cell><Data ss:Type="String">Product Version</Data></Cell> <Cell><Data ss:Type="String">Category</Data></Cell> <Cell><Data ss:Type="String">Date Submitted</Data></Cell> <Cell><Data ss:Type="String">OS</Data></Cell> <Cell><Data ss:Type="String">OS Version</Data></Cell> <Cell><Data ss:Type="String">Platform</Data></Cell> <Cell><Data ss:Type="String">View Status</Data></Cell> <Cell><Data ss:Type="String">Updated</Data></Cell> <Cell><Data ss:Type="String">Summary</Data></Cell> <Cell><Data ss:Type="String">Status</Data></Cell> <Cell><Data ss:Type="String">Resolution</Data></Cell> <Cell><Data ss:Type="String">Fixed in Version</Data></Cell> </Row> <Row> <Cell><Data ss:Type="Number">0000033</Data></Cell> <Cell><Data ss:Type="String">CognaLearn Intedashboard</Data></Cell> <Cell><Data ss:Type="String">janardhana.l</Data></Cell> <Cell><Data ss:Type="String"></Data></Cell> <Cell><Data ss:Type="String">normal</Data></Cell> <Cell><Data ss:Type="String">text</Data></Cell> <Cell><Data ss:Type="String">always</Data></Cell> <Cell><Data ss:Type="String"></Data></Cell> <Cell><Data ss:Type="String">GUI</Data></Cell> <Cell><Data ss:Type="String">2016-10-14</Data></Cell> <Cell><Data ss:Type="String"></Data></Cell> <Cell><Data ss:Type="String"></Data></Cell> <Cell><Data ss:Type="String"></Data></Cell> <Cell><Data ss:Type="String">public</Data></Cell> <Cell><Data ss:Type="String">2016-10-14</Data></Cell> <Cell><Data ss:Type="String">IE8 browser_Modules screen tool tip text is shown twice</Data></Cell> <Cell><Data ss:Type="String">new</Data></Cell> <Cell><Data ss:Type="String">open</Data></Cell> <Cell><Data ss:Type="String"></Data></Cell> </Row> </Table> </Worksheet> </Workbook>
See also[edit]
- List of document markup languages
- Comparison of document markup languages
References[edit]
- ^ «Features and limitations of XML Spreadsheet format (broken)». Archived from the original on 2007-10-09. Retrieved 2007-11-01.
- ^ «Polar WordML add-in (broken)». Archived from the original on 2009-04-11. Retrieved 2007-11-01.
- ^ Word 2003 XML Viewer
- Overview of Office 2003 Developer Technologies
- Office 2003 XML. ISBN 0-596-00538-5
External links[edit]
- MSDN: XML Spreadsheet Reference
- MSDN: Word 2003 XML Reference
- Lawsuit about XML patent
According to Statista, worldwide data creation reached a new high of 79 zettabytes in 2021. It’s expected to continue to increase rapidly, reaching 181 zettabytes by 2025 — or ten times the amount of data produced in 2016. Since the amount of data being produced and shared online is increasing exponentially, we need a way to accommodate this growth.
Cue XML, one of the most popular and efficient ways of storing and moving data online. Understanding this technology is a crucial addition to your website development tool belt.
That’s why we’ll cover the following:
- what an XML file is
- what an XML file is used for
- how to open an XML file
- how to create a simple XML file
What is an XML file?
An XML file contains XML code and ends with the file extension «.xml». It contains tags that define not only how the document should be structured but also how it should be stored and transported over the internet.
Let’s look at a basic example of an XML file below. You can also click here to view the file directly in your browser.
Image Source
As you can see, this file consists of plain text and tags. The plain text is shown in black and the tags are shown in green.
Plain text is the actual data being stored. In this example, the XML is storing student names as well as test scores associated with each student.
While plain text represents the data, tags indicate what the data is. Each tag represents a type of data, like «first name,» «last name,» or «score,» and tells the computer what to do with the plain text data inside of it. Tags aren’t supposed to be seen by users, only the software itself.
XML Hierarchy
Each instance of an XML tag is called an element. In an XML file, elements are arranged in a hierarchy, which means that elements can contain other elements.
The topmost element is called the «root» element, and contains all other elements, which are called «child» elements.
In the example above, «studentsList» is the root element. It contains two «student» elements. Each «student» element contains the elements «firstName,» «lastName,» «scores,» etc. The beginning and end of each element are represented by a starting tag (e.g., «<firstName>») and a closing tag (e.g., «</firstName>») respectively.
Also, you’ll often see XML code formatted such that each level of element is indented, as is true in our example. This makes the file easier for humans to read, and does not affect how computers process the code.
Let’s take a closer look at the purpose and history of this language below.
XML Language
XML, short for «eXtensible Markup Language,» was published by the World Wide Web Consortium (W3C) in 1998 to meet the challenges of large-scale electronic publishing. Since then, it has become one of the most widely used formats for sharing structured information among people, computers, and networks.
Since XML can be read and interpreted by people as well as computer software, it is known as human- and machine-readable.
The primary purpose of XML, however, is to store data in a way that can be easily read by and shared between software applications. Since its format is standardized, XML can be shared across systems or platforms, both locally and over the internet, and the recipient will still be able to parse the data.
It’s important to understand that XML doesn’t do anything with the data other than store it, like a database. Another piece of software must be created or used to send, receive, store, or display the data.
At this point, you might be thinking XML sounds a lot like another markup language, the Hypertext Markup Language (HTML). Let’s take a closer look at the differences between these languages below.
XML vs HTML
Both XML and HTML contain text and tags that instruct the software on how to use it. However, while XML tags specify the type of data, HTML tags specify how data is displayed. In short, XML is used to represent and share structured information, whereas HTML is used to display content on web pages.
Besides their purpose, there’s one other key difference between XML and HTML tags.
When programming in HTML, a developer must use tags from the HTML tag library, or a standardized set of tags. While you can do a lot with these tags, there is a limited number available. That means there are only so many ways you can structure content on a web page.
XML does not have this limitation, as there is no preset library of XML tags. Instead, developers can create an unlimited number of custom tags to fit their data needs. This extensive customization is the «X» in XML.
To create custom tags, a developer writes a Document Type Definition (DTD), which is XML’s version of a tag library. An XML file’s DTD is indicated at the top of the file, and tells the software what each tag means and what to do with it.
For instance, an XML file containing info for a reservation system might have a custom «<res_start>» tag to define a time when a reservation begins. By reading the DTD, a program processing this file will know what the code «<res_start>7:00 PM PST</res_start>» means, and can use the information within the tag accordingly. This could mean sending this data in a confirmation email or storing it in another database.
To summarize: An XML file is a file used to store data in the form of hierarchical elements. Data stored in XML files can be read by computer programs with the help of custom tags, which indicate the type of element.
Let’s take a look at some use cases for this extensible language below.
What is an XML file used for?
Since XML files are plain text documents, they are easy to create, store, transport, and interpret by computers and humans alike. This is why XML is one of the most commonly used languages on the internet. Many web-based software applications store information and send information to other apps in XML format.
Here are the most common uses of XML today:
Transporting Digital Information
The text-based format of XML files makes them highly portable, and therefore widely used for transferring information between web servers. Certain APIs, namely SOAP APIs and REST APIs, send information to other applications packaged in XML files.
Web Searching
Since XML defines the type of information contained in a document, it’s easier and more effective to search the web with than HTML, for example.
Let’s say you want to search for songs by Taylor Swift. Using HTML, you’d likely get back search results including interviews and articles that mention her songs. Using XML, search results would be restricted to songs only.
Computer Applications
XML files allow computer apps to easily structure and fetch the data that they need. After retrieving data from the file, programs can decide what to do with the data. This could mean storing in another database, using it in the program backend, or displaying it on the screen.
Additionally, some popular file formats are built with XML. Consider the Microsoft Office file extensions .docx (for Word documents), .xlsx (for Excel spreadsheets), and .pptx (for PowerPoint presentations). The «x» at the end of these file extensions stands for XML.
Websites and Web Apps
Websites and web apps can pull content for their pages from XML files. This is a common example of how the markup languages XML and HTML work together.
XML code modules might even appear within an HTML file in order to help display content on the page. This makes XML especially applicable to interactive websites and pages whose content changes dynamically. Depending on the user or screen size, an HTML file can choose to display only certain elements in the XML code, providing visitors with a personalized browsing experience.
How to Open an XML File
Since XML files are text files, you can open them in a few different ways. If you’re occasionally viewing XML files, you can open them directly in your favorite browser. If you’re frequently viewing, editing, and reformatting XML files, use an online XML editor or a text editor on your computer.
In this section, I’ll cover how to open XML files with each of these programs.
How to Open XML Files With a Web Browser
All modern web browsers allow you to read XML files right in the browser window. Like with the menu example from earlier, you can select an XML file from your device and choose to open it with your web browser. Here’s how a file looks in Google Chrome:
While the appearance of the text will differ by browser, you should be able to easily parse the contents of the file, and you might also be able to hide and reveal specific elements.
If there’s an error in the file, your browser will tell you with an error window. Google Chrome will display an error message like the following:
Note that your browser won’t let you edit the file this way. To change the file, you’ll need to use a specialized tool.
How to Open XML Files With an Online XML Editor
You can use a free online text file editor to view your XML files, change their contents, or convert them to other file formats. We recommend Code Beautify’s XML Viewer for this purpose.
In the tool, click Browse to upload a file from your computer. Once uploaded, you can edit the file on the left and view the hierarchy of the XML contents on the right.
Image Source
Once finished editing, click Save & Share to create a fresh XML file.
Code Beautify also offers many free conversion tools to convert your XML files to other popular data storage formats like JSON and CSV.
How to Open XML Files With a Text Editor
As with any text file, you can open XML files in any text editor. However, common editors like Notepad and Word probably won’t display your XML files with colors or indentation. This makes the files less readable, as seen in the example below.
You’ll want to opt for a specialized text editor that will detect the .xml format and display your files accordingly. For PCs, Notepad++ is a popular option. For Macs, try Xmplify or Eclipse.
Alternatively, you can use a simple text editor and apply indentation to your files with a free online XML formatter.
If any of your systems implement XML files, they will almost certainly write all of these files for you. If you want to practice writing your own basic XML files, you can do so in a text editor. Let’s walk through how to create an XML file below.
How to Create an XML File
- Open your text editor of choice.
- On the first line, write an XML declaration.
- Set your root element below the declaration.
- Add your child elements within the root element.
- Review your file for errors.
- Save your file with the .xml file extension.
- Test your file by opening it in the browser window.
1. Open your text editor of choice.
I’ll use Sublime Text for this demo since it’s free and works on macOs, Linux, and Microsoft operating systems.
2. On the first line, write an XML declaration.
This declaration tells the application running the file that the language is XML.
3. Set your root element below the declaration.
Every XML file has one root element, which contains all other child elements. The root element is written below the declaration.
In this example file, «<root_element>» is the starting tag for the root element, and «</root_element>» is the closing element. All other elements will go between these tags.
You can substitute «root_element» in both tags with a name relevant to the information you’re storing.
4. Add your child elements within the root element.
Next, add your child elements between the starting and closing tag of the root element. You can nest a child element within another child element.
Like the root element, each child element needs a starting tag and a closing tag. After adding child tags, your file will look something like this:
Instances of «root_element», «child_element», and «Content» can be swapped with names that make more sense for your file.
5. Review your file for errors.
Time to review. Are there any missing closing tags? Any rogue ampersands? Does the document type declaration appear after the first element in the document? These are just a few possible errors.
Notice that line 5 is highlighted below. That’s because the closing tag of the «child_element_2» is missing a bracket.
6. Save your file with the .xml file extension.
As said above, an XML file ends with the file extension «.xml». So make sure to save your file with that extension.
7. Test your file by opening it in the browser window.
Finally, test that your file is working by dragging and dropping it into a new browser tab or window.
Frequently Asked Questions about XML Files
Still have questions about XML files? No problem. Here are a few frequently asked questions about this type of file, along with the answers.
Can I open an XML file in Excel?
Yes. Open Excel and click File > Open. Locate the XML file on your computer and click Open. An XML file will look something like this in Excel:
Can I open an XML file in Word?
Yes. Open Word and click File > Open. Locate the XML file on your computer and click Open. An XML file will look something like this in Word:
How do I convert an XML file into a PDF?
To convert an XML file into a PDF, you can use a free online tool like Convert XML to PDF online. Simply click the Choose File button, select the XML file from your computer, and click Open. Then click the Convert Now button.
The PDF file will be ready to view or download within a few seconds.
How do I recover an XML file?
Whether you’ve accidentally deleted an XML file, your disk drive has been corrupted, or you simply can’t find the file you’re looking for, you can recover an XML file easily with a file recovery software tool like iBeesoft Data Recovery.
Just download and open the tool on your computer, select the file type «Other Files,» then select where you want to search on your computer, and click Scan.
Image Source
You’ll see a list of results. You can sort by .xml, select the file, and click Recover.
For a more in-depth look into this process, check out this step-by-step guide for recovering a lost XML file or repairing a corrupted file with iBeesoft.
How do I comment in XML?
To comment in an XML file, enclose the text within <!— —> tags. Here’s an example of a comment:
<!-- declarations for <head> & <body> -->
Note that this is the same syntax for commenting in HTML.
Understanding XML
It might not be as engaging as parallax scrolling or as groundbreaking as machine learning, but XML is one of the most crucial technologies on the web today. You can leave the coding up to developers, but having a solid understanding of XML will give you a better sense of how websites, including your own, deliver content.
Editor’s note: This post was originally published in July 2020 and has been updated for comprehensiveness.
Excel for Microsoft 365 Word for Microsoft 365 PowerPoint for Microsoft 365 Excel 2021 Word 2021 PowerPoint 2021 Office 2021 Excel 2019 Word 2019 PowerPoint 2019 Office 2019 Excel 2016 Word 2016 PowerPoint 2016 Office 2016 Excel 2013 Word 2013 PowerPoint 2013 Office 2013 Excel 2010 Word 2010 PowerPoint 2010 Office 2010 Office 2007 More…Less
Starting with the 2007 Microsoft Office system, Microsoft Office uses the XML-based file formats, such as .docx, .xlsx, and .pptx. These formats and file name extensions apply to Microsoft Word, Microsoft Excel, and Microsoft PowerPoint. This article discusses key benefits of the format, describes the file name extensions and discusses how you can share Office files with people who are using earlier versions of Office.
In this article
What are the benefits of Open XML Formats?
What are the XML file name extensions?
Can different versions of Office share the same files?
What are the benefits of Open XML Formats?
The Open XML Formats include many benefits — not only for developers and the solutions that they build, but also for individual people and organizations of all sizes:
-
Compact files Files are automatically compressed and can be up to 75 percent smaller in some cases. The Open XML Format uses zip compression technology to store documents, offering potential cost savings as it reduces the disk space required to store files and decreases the bandwidth needed to send files via e-mail, over networks, and across the Internet. When you open a file, it is automatically unzipped. When you save a file, it is automatically zipped again. You do not have to install any special zip utilities to open and close files in Office.
-
Improved damaged-file recovery Files are structured in a modular fashion that keeps different data components in the file separate from each other. This allows files to be opened even if a component within the file (for example, a chart or table) is damaged or corrupted.
-
Support for advanced features Many of the advanced features of Microsoft 365 require the document to be stored in the Open XML format. Things like AutoSaveand the Accessibility Checker, for two examples, can only work on files that are stored in the modern Open XML format.
-
Better privacy and more control over personal information Documents can be shared confidentially, because personally identifiable information and business-sensitive information, such as author names, comments, tracked changes, and file paths can be easily identified and removed by using Document Inspector.
-
Better integration and interoperability of business data Using Open XML Formats as the data interoperability framework for the Office set of products means that documents, worksheets, presentations, and forms can be saved in an XML file format that is freely available for anyone to use and to license, royalty free. Office also supports customer-defined XML Schemas that enhance the existing Office document types. This means that customers can easily unlock information in existing systems and act upon it in familiar Office programs. Information that is created within Office can be easily used by other business applications. All you need to open and edit an Office file is a ZIP utility and an XML editor.
-
Easier detection of documents that contain macros Files that are saved by using the default «x» suffix (such as .docx, .xlsx, and .pptx) cannot contain Visual Basic for Applications (VBA) macros and XLM macros. Only files whose file name extension ends with an «m» (such as .docm, .xlsm, and .pptm) can contain macros.
Before you decide to save the file in a binary format, read Can different versions of Office share the same files?
How do I convert my file from the old binary format to the modern Open XML format?
With the file open in your Office app, click File > Save as (or Save a copy, if the file is stored on OneDrive or SharePoint) and make sure the Save as type is set to the modern format.
This will create a new copy of your file, in the Open XML format.
What are the XML file name extensions?
By default, documents, worksheets, and presentations that you create in Office are saved in XML format with file name extensions that add an «x» or an «m» to the file name extensions that you are already familiar with. The «x» signifies an XML file that has no macros, and the «m» signifies an XML file that does contain macros. For example, when you save a document in Word, the file now uses the .docx file name extension by default, instead of the .doc file name extension.
When you save a file as a template, you see the same kind of change. The template extension used in earlier versions is there, but it now has an «x» or an «m» on the end. If the file contains code or macros, you must save it by using the new macro-enabled XML file format, which adds an «m» for macro to the file extension.
The following tables list all the default file name extensions in Word, Excel, and PowerPoint.
Word
XML file type |
Extension |
Document |
.docx |
Macro-enabled document |
.docm |
Template |
.dotx |
Macro-enabled template |
.dotm |
Excel
XML file type |
Extension |
Workbook |
.xlsx |
Macro-enabled workbook |
.xlsm |
Template |
.xltx |
Macro-enabled template |
.xltm |
Non-XML binary workbook |
.xlsb |
Macro-enabled add-in |
.xlam |
PowerPoint
XML file type |
Extension |
Presentation |
.pptx |
Macro-enabled presentation |
.pptm |
Template |
.potx |
Macro-enabled template |
.potm |
Macro-enabled add-in |
.ppam |
Show |
.ppsx |
Macro-enabled show |
.ppsm |
Slide |
.sldx |
Macro-enabled slide |
.sldm |
Office theme |
.thmx |
Can different versions of Office share the same files?
Office lets you save files in the Open XML Formats and in the binary file format of earlier versions of Office and includes compatibility checkers and file converters to allow file-sharing between different versions of Office.
Opening existing files in Office You can open and work on a file that was created in an earlier version of Office, and then save it in its existing format. Because you might be working on a document with someone who uses an earlier version of Office, Office uses a compatibility checker that verifies that you have not introduced a feature that an earlier version of Office does not support. When you save the file, the compatibility checker reports those features to you and then lets you remove them before continuing with the save.
Need more help?
Want more options?
Explore subscription benefits, browse training courses, learn how to secure your device, and more.
Communities help you ask and answer questions, give feedback, and hear from experts with rich knowledge.
A file with the .xml file extension is an Extensible Markup Language (XML) file. These are really just plain text files that use custom tags to describe the structure and other features of the document.
XML is a markup language created by the World Wide Web Consortium (W3C) to define a syntax for encoding documents that both humans and machines could read. It does this through the use of tags that define the structure of the document, as well as how the document should be stored and transported.
It’s probably easiest to compare it to another markup language with which you might be familiar—the Hypertext Markup Language (HTML) used to encode web pages. HTML uses a pre-defined set of markup symbols (short codes) that describe the format of content on a web page. For example, the following simple HTML code uses tags to make some words bold and some italic:
This is how you make <b>bold text</b> and this is how you make <i>italic text</i>
The thing that differentiates XML, though, is that it’s extensible. XML doesn’t have a predefined markup language, like HTML does. Instead, XML allows users to create their own markup symbols to describe content, making an unlimited and self-defining symbol set.
Essentially, HTML is a language that focuses on the presentation of content, while XML is a dedicated data-description language used to store data.
XML is often used as the basis for other document formats—hundreds, in fact. Here are a few you might recognize:
- RSS and ATOM both describe how reader apps handle web feeds.
- Microsoft .NET uses XML for its configuration files.
- Microsoft Office 2007 and later use XML as the basis for document structure. That’s what the “X” means in the .DOCX Word document format, for example, and it’s also used in Excel (XLSX files) and PowerPoint (PPTX files).
So, if you have an XML file, that doesn’t necessarily tell you what app it’s intended for use with. And typically, you won’t need to worry about it, unless you’re the one actually designing the XML files.
How Do I Open One?
There are a few ways you can open an XML file directly. You can open and edit them with any text editor, view them with any web browser, or use a website that lets you view, edit, and even convert them to other formats.
Use a Text Editor If You Work With XML Files Regularly
Since XML files are really just text files, you can open them in any text editor. The thing is, a lot of text editors—like Notepad—just aren’t designed to show XML files with their proper structure. It might be okay for popping an XML file open and taking a quick look to help figure out what it is. But, there are much better tools for working with them.
Right-click the XML file you want to open, point to “Open With” on the context menu, and then click the “Notepad” option.
Note: We’re using Windows examples here, but the same holds true for other operating systems. Look for a good third-party text editor that is designed to support XML files.
The file does open, but as you can see, it loses most of its formatting and crams the whole thing onto just two lines of the document.
So while Notepad might be useful for quickly checking out an XML file, you’re much better off with a more advanced tool like Notepad++, which highlights syntax and formats the file the way it’s intended.
Here is the same XML file opened in Notepad++:
RELATED: How To Replace Notepad with Another Text Editor in Windows
Use a Web Browser to View the Structured Data
If don’t really need to edit XML files, but just need to view them on occasion, the browser you’re using to read this article is well-suited to the job. And in fact, your default web browser is likely set up as the default viewer for XML files. So, double-clicking an XML file should open it in your browser.
If not, you can right-click the file to find options for opening it with whatever app you want. Just select your web browser from the list of programs. We’re using Chrome in this example.
When the file opens, you should see nicely-structured data. It’s not as pretty as the color-coded view you get with something like Notepad++, but it’s a far sight better than what you get with Notepad.
Use An Online Editor to View, Edit, or Convert XML Files
If want to edit the occasional XML file and don’t want to download a new text editor, or if you need to convert an XML file to another format, there are a few decent online XML editors available for free. TutorialsPoint.com, XMLGrid.net, and CodeBeautify.org all let you view and edit XML files. After you’ve done your editing, you can download the changed XML file, or even convert it to a different format.
For the example here, we’ll be using CodeBeautify.org. The page is divided into three sections. On the left is the XML file you’re working with. In the middle, you’ll find several options. On the right, you’ll see the results of some of the options you can select. For example, in the image below, our full XML file is on the left and the tree view is showing in the results pane because we clicked the “Tree View” button in the middle.
Here’s a better look at those options. Use the “Browse” button to upload an XML file from your computer or the “Load URL” button to pull XML from an online source.
The “Tree View” button displays your data in a nicely formatted tree structure in the results pane, with all your tags on the left in orange and the attributes to the right of the tags.
The “Beautify” displays your data in neat, easy-to-read lines in the results pane.
The “Minify” button displays your data using the least amount of white space possible. It will attempt to put every single piece of data on one line. This comes in handy when trying to make the file smaller. It will save some space, but at the cost of being able to read it effectively.
And finally, you can use the “XML to JSON” button to convert the XML to JSON format, the “Export to CSV” button to save your data as a comma-separated values file, or the “Download” button to download any changes you’ve made as a new XML file.
READ NEXT
- › What Is a Markup Language?
- › What Are AAE Files from an iPhone, and Can I Delete Them?
- › How to Add the Developer Tab to the Microsoft Office Ribbon
- › What Is an SVG File, and How Do I Open One?
- › How to Add the Developer Tab to Microsoft Excel
- › What Is a PLIST File?
- › What Is an ODT File, and How Do You Open One?
- › Spotify Is Shutting Down Its Free Online Game
Q
What is in a Word XML Document (*.xml) file?
A
After you saved a Word document into a Word XML Document (*.xml) file,
you can actually open it in an XML browser to see how the entire word documents
is represented in the XML file.
1. Open «Word_Tutorials.docx» and save it to «Word_Tutorials.xml»
in Word XML Document format.
2. Open «Word_Tutorials.xml» in Firefox. If you collapse all first level elements,
you will see that a Word XML Documents contains a list of «part» elements.
Each «part» element is actually representing a file in the original .docx file.
For example, the main content of the original Word document is represented
by this «part» element:
<pkg:part pkg:name="/word/document.xml" pkg:contentType="application/vnd.openxmlformats-officedocument .wordprocessingml.document.main+xml"> <pkg:xmlData> ... </pkg:xmlData> </pkg:part>
Another example, the first picture in the original Word document is represented by this «part» element:
<pkg:part pkg:name="/word/media/image1.png" pkg:contentType="image/png" pkg:compression="store"> <pkg:binaryData> ... </pkg:binaryData> </pkg:part>
Top level relations between the main file and other files are represented by this «part» element:
<pkg:part pkg:name="/_rels/.rels" pkg:contentType="application/vnd.openxmlformats-package .relationships+xml" pkg:padding="512"> <pkg:xmlData> ... </pkg:xmlData> </pkg:part>
⇒ Understanding Microsoft Word File Types
⇐ Convert Word Documents to XML Files
⇑ Converting Microsoft Word to/from Other Format
⇑⇑ MS Word — Frequently Asked Questions
With approximately one billion people using Microsoft Office, the DOCX format is the most popular de facto standard for exchanging document files between offices. Its closest competitor — the ODT format — is only supported by Open/LibreOffice and some open source products, making it far from standard. The PDF format is not a competitor because PDFs can’t be edited and they don’t contain a full document structure, so they can only take limited local changes like watermarks, signatures, and the like. This is why most business documents are created in the DOCX format; there’s no good alternative to replace it.
While DOCX is a complex format, you may want to parse it manually for simpler tasks such as indexing, converting to TXT and making other small modifications. I’d like to give you enough information on DOCX internals so you don’t have to reference the ECMA specifications, a massive 5,000 page manual.
The best way to understand the format is to create a simple one-word document with MSWord and observe how editing the document changes the underlying XML. You’ll face some cases where the DOCX doesn’t format properly in MS Word and you don’t know why, or come across instances when it’s not evident how to generate the desired formatting. Seeing and understanding exactly what’s going on in the XML will help that.
I worked for about a year on a collaborative DOCX editor, CollabOffice, and I want to share some of that knowledge with the developer community. In this article I will explain the DOCX file structure, summarising information that is scattered over the internet. This article is an intermediary between the huge, complex ECMA specification and the simple internet tutorials currently available. You can find the files that accompany this article in the toptal-docx
project on my github account.
A Simple DOCX file
A DOCX file is a ZIP archive of XML files. If you create a new, empty Microsoft Word document, write a single word ‘Test’ inside and unzip it contents, you will see the following file structure:
Even though we’ve created a simple document, the save process in Microsoft Word has generated default themes, document properties, font tables, and so on, in XML format.
All the files inside a DOCX are XML files, even those with the «.rels» extension.
To start, let us remove the unused stuff and focus on document.xml
, which contains the main text elements. When you delete a file, make sure you have deleted all the relationship references to it from other the xml files. Here is a code-diff example on how I’ve cleared dependencies to app.xml and core.xml. If you have any unresolved/missing references, MSWord will consider the file broken.
Here’s the structure of our simplified, minimal DOCX document (and here’s the project on github):
Let’s break it down by file from here, from the top:
_rels/.rels
This defines the reference that tells MS Word where to look for the document contents. In this case, it references word/document.xml
:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"
Target="word/document.xml"/>
</Relationships>
_rels/document.xml.rels
This file defines references to resources, such as images, embedded in the document content. Our simple document has no embedded resources, so the relationship tag is empty:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
</Relationships>
[Content_Types].xml
[Content_Types].xml
contains information about the types of media inside the document. Since we only have text content, it’s pretty simple:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/>
<Override PartName="/word/document.xml"
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
</Types>
document.xml
Finally, here is the main XML with the document’s text content. I have removed some of namespace declarations for clarity, but you can find the full version of the file in the github project. In that file you’ll find that some of the namespace references in the document are unused, but you shouldn’t delete them because MS Word needs them.
Here’s our simplified example:
<w:document>
<w:body>
<w:p w:rsidR="005F670F" w:rsidRDefault="005F79F5">
<w:r><w:t>Test</w:t></w:r>
</w:p>
<w:sectPr w:rsidR="005F670F">
<w:pgSz w:w="12240" w:h="15840"/>
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720"
w:gutter="0"/>
<w:cols w:space="720"/>
<w:docGrid w:linePitch="360"/>
</w:sectPr>
</w:body>
</w:document>
The main node <w:document>
represents the document itself, <w:body>
contains paragraphs, and nested within <w:body>
are page dimensions defined by <w:sectPr>
.
<w:rsidR>
is an attribute that you can ignore; it’s used by MS Word internals.
Let’s take a look at a more complex document with three paragraphs. I have highlighted the XML with the same colors on the screenshot from Microsoft Word, so you can see the correlation:
<w:p w:rsidR="0081206C" w:rsidRDefault="00E10CAE"> <w:r> <w:t xml:space="preserve">This is our example first paragraph. It's default is left aligned, and now I'd like to introduce</w:t> </w:r> <w:r> <w:rPr> <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/> <w:color w:val="000000"/> </w:rPr> <w:t>some bold</w:t> </w:r> <w:r> <w:rPr> <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/> <w:b/> <w:color w:val="000000"/> </w:rPr> <w:t xml:space="preserve"> text</w:t> </w:r> <w:r> <w:rPr> <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/> <w:color w:val="000000"/> </w:rPr> <w:t xml:space="preserve">, </w:t> </w:r> <w:proofErr w:type="gramStart"/> <w:r> <w:t xml:space="preserve">and also change the</w:t> </w:r> <w:r w:rsidRPr="00E10CAE"> <w:rPr><w:rFonts w:ascii="Impact" w:hAnsi="Impact"/> </w:rPr> <w:t>font style</w:t> </w:r> <w:r> <w:rPr> <w:rFonts w:ascii="Impact" w:hAnsi="Impact"/> </w:rPr> <w:t xml:space="preserve"> </w:t> </w:r> <w:r> <w:t>to 'Impact'.</w:t></w:r> </w:p> <w:p w:rsidR="00E10CAE" w:rsidRDefault="00E10CAE"> <w:r> <w:t>This is new paragraph.</w:t> </w:r></w:p> <w:p w:rsidR="00E10CAE" w:rsidRPr="00E10CAE" w:rsidRDefault="00E10CAE"> <w:r> <w:t>This is one more paragraph, a bit longer.</w:t> </w:r> </w:p>
Paragraph Structure
A simple document consists of paragraphs, a paragraph consists of runs (a series of text with the same font, color, etc), and runs consist of characters (such as <w:t>
).<w:t>
tags may have several characters inside, and there might be a few in the same run.
Again, we can ignore <w:rsidR>
.
Text properties
Basic text properties are font, size, color, style, and so on. There are about 40 tags that specify text appearance. As you can see in our three paragraph example, each run has its own properties inside <w:rPr>
, specifying <w:color>
, <w:rFonts>
and boldness <w:b>
.
An important thing to note is that properties make a distinction between the two groups of characters, normal and complex script (Arabic, for instance), and that the properties have a different tag depending on which type of character it’s affecting.
Most normal script property tags have a matching complex script tag with an added “C” specifying the property is for complex scripts. For example: <w:i>
(italic) becomes <w:iCs>
, and the bold tag for normal script, <w:b>
, becomes <w:bCs>
for complex script.
Styles
There’s an entire toolbar in Microsoft Word dedicated to styles: normal, no spacing, heading 1, heading 2, title, and so on. These styles are stored in /word/styles.xml
(note: in the first step in our simple example, we removed this XML from DOCX. Make a new DOCX to see this).
Once you have text defined as a style, you will find reference to this style inside the paragraph properties tag, <w:pPr>
. Here’s an example where I’ve defined my text with the style Heading 1:
<w:p>
<w:pPr>
<w:pStyle w:val="Heading1"/>
</w:pPr>
<w:r>
<w:t>My heading 1</w:t>
</w:r>
</w:p>
and here is the style itself from styles.xml
:
<w:style w:type="paragraph" w:styleId="Heading1">
<w:name w:val="heading 1"/>
<w:basedOn w:val="Normal"/>
<w:next w:val="Normal"/>
<w:link w:val="Heading1Char"/>
<w:uiPriority w:val="9"/>
<w:qFormat/>
<w:rsid w:val="002F7F18"/>
<w:pPr>
<w:keepNext/>
<w:keepLines/>
<w:spacing w:before="480" w:after="0"/>
<w:outlineLvl w:val="0"/>
</w:pPr>
<w:rPr>
<w:rFonts w:asciiTheme="majorHAnsi" w:eastAsiaTheme="majorEastAsia" w:hAnsiTheme="majorHAnsi"
w:cstheme="majorBidi"/>
<w:b/>
<w:bCs/>
<w:color w:val="365F91" w:themeColor="accent1" w:themeShade="BF"/>
<w:sz w:val="28"/>
<w:szCs w:val="28"/>
</w:rPr>
</w:style>
The <w:style/w:rPr/w:b>
xpath specifies that the font is bold, and <w:style/w:rPr/w:color>
indicates the font color. <w:basedOn>
instructs MSWord to use “Normal” style for any missing properties.
Property Inheritance
Text properties are inherited. A run has its own properties (w:p/w:r/w:rPr/*
), but it also inherits properties from paragraph (w:r/w:pPr/*
), and both can reference style properties from the /word/styles.xml
.
<w:r>
<w:rPr>
<w:rStyle w:val="DefaultParagraphFont"/>
<w:sz w:val="16"/>
</w:rPr>
<w:tab/>
</w:r>
Paragraphs and runs start with default properties: w:styles/w:docDefaults/w:rPrDefault/*
and w:styles/w:docDefaults/w:pPrDefault/*
. To get the end result of a character’s properties you should:
- Use default run/paragraph properties
- Append run/paragraph style properties
- Append local run/paragraph properties
- Append result run properties over paragraph properties
When I say “append” B to A, I mean to iterate through all B properties and override all A’s properties, leaving all non-intersecting properties as-is.
One more place where default properties may be located is in the <w:style>
tag with w:type="paragraph"
and w:default="1"
. Note, that characters themselves inside a run never have a default style, so <w:style w:type="character" w:default="1">
doesn’t actually affect any text.
1554402290400-dbb29eef3ba6035df7ad726dfc99b2af.png)
Characters in a run can inherit from its paragraph and both can inherit from styles.xml.
Toggle properties
Some of the properties are “toggle” properties, such as <w:b>
(bold) or <w:i>
(italic); these attributes behave like an XOR operator.
This means if the parent style is bold and a child run is bold, the result will be regular, non-bold text.
You have to do lots of testing and reverse-engineering to handle toggle attributes correctly. Take a look at paragraph 17.7.3 of ECMA-376 Open XML specification to get the formal, detailed rules for toggle properties/
Toggle properties are the most complex for a layouter to handle correctly.
Fonts
Fonts follow the same common rules as other text attributes, but font property default values are specified in a separate theme file, referenced under word/_rels/document.xml.rels
like this:
<Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
Based on the above reference, the default font name will be found in word/theme/themes1.xml
, inside a <a:theme>
tag, a:themeElements/a:fontScheme/a:majorFont
or a:minorFont
tag.
The default font size is 10 unless the w:docDefaults/w:rPrDefault
tag is missing, then it is size 11.
Text alignment
Text alignment is specified by a <w:jc>
tag with four w:val
modes available: "left"
, "center"
, "right"
and "both"
.
"left"
is the default mode; text is started at the left of paragraph rectangle (usually the page width). (This paragraph is aligned to the left, which is standard.)
"center"
mode, predictably, centers all characters inside the page width. (Again, this paragraph exemplifies centered alignment.)
In "right"
mode, paragraph text is aligned to the right margin. (Notice how this text is aligned to the right side.)
"both"
mode puts extra spacing between words so that lines get wider and occupy the full paragraph width, with the exception of the last line which is left aligned. (This paragraph is a demonstration of that.)
Images
DOCX supports two sorts of images: inline and floating.
Inline images appear inside a paragraph along with the other characters, <w:drawing>
is used instead of using <w:t>
(text). You can find image ID with the following xpath syntax:
w:drawing/wp:inline/a:graphic/a:graphicData/pic:pic/pic:blipFill/a:blip/@r:embed
The image ID is used to look up the filename in the word/_rels/document.xml.rels
file, and it should point to gif/jpeg file inside word/media subfolder. (See the github project’s word/_rels/document.xml.rels
file, where you can see the image ID.)
Floating images are placed relative to paragraphs with text flowing around them. (Here’s th github project sample document with a floating image.)
Floating images use <wp:anchor>
instead of <w:drawing>
, so if you delete any text inside <w:p>
, be careful with the anchors if you don’t want the images removed.
MS Word’s image options refer to image alignment as «text wrapping mode».
Tables
XML tags for tables are similar to HTML table markup– is the same as <table>, matches with <tr>, etc.
<w:tbl>
, the table itself, has table properties <w:tblPr>
, and each column property is presented by <w:gridCol>
inside <w:tblGrid>
. Rows follow one by one as <w:tr>
tags and each row should have same number of columns as specified in <w:tblGrid>
:
<w:tbl>
<w:tblPr>
<w:tblW w:w="5000" w:type="pct" />
</w:tblPr>
<w:tblGrid><w:gridCol/><w:gridCol/></w:tblGrid>
<w:tr>
<w:tc><w:p><w:r><w:t>left</w:t></w:r></w:p></w:tc>
<w:tc><w:p><w:r><w:t>right</w:t></w:r></w:p></w:tc>
</w:tr>
</w:tbl>
Width for table columns can be specified in the <w:tblW>
tag, but if you don’t define it MS Word will use its internal algorithms to find the optimal width of columns for the smallest effective table size.
Units
Many XML attributes inside DOCX specify sizes or distances. While they’re integers inside the XML, they all have different units so some conversion is necessary. The topic is a complicated one, so I’d recommend this article by Lars Corneliussen on units in DOCX files. The table he presents is useful, though with a small misprint: inches should be pt/72, not pt*72.
Here’s a cheat sheet:
COMMON DOCX XML UNIT CONVERSIONS | ||||||
20th of a point | Points dxa/20 |
Inches pt/72 |
Centimeters in*2,54 |
Font half size pt/144 |
EMU in*914400 |
|
Example | 11906 | 595.3 | 8,27… | 21.00086… | 4,135 | 7562088 |
Tags using this | pgSz/pgMar/w:spacing | w:sz | wp:extent, a:ext |
Tips for Implementing a Layouter
If you want to convert a DOCX file (to PDF, for instance), draw it on canvas, or count number of pages, you’ll have to implement a layouter. A layouter is an algorithm for calculating character positions from a DOCX file.
This is a complex task if you need 100 percent fidelity rendering. The amount of time needed to implement a good layouter is measured in man-years, but if you only need a simple, limited one, it can be done relatively quickly.
A layouter fills a parent rectangle, which is usually a rectangle of the page. It add words from a run one by one. When the current line overflows, it starts a new one. If the paragraph is too high for the parent rectangle, it’s wrapped to the next page.
Here are some important things to keep in mind if you decide to implement a layouter:
- The layouter should take care about text alignment and text floating over images
- It should be capable of handling nested objects, such as nested tables
- If you want to provide full support for such images, you’ll have to implement a layouter with at least two passes, the first step collects floating images’ positions and the second fills empty space with text characters.
- Be aware of indentations and spacings. Each paragraph has spacing before and after, and these numbers are specified by the
w:spacing
tag. Vertical spacing is specified byw:after
andw:before
tags. Note that line spacing is specified byw:line
, but this is not the size of the line as one may expect. To get the size of the line, take the current font height, multiply byw:line
and divide by 12. - DOCX files contain no information about pagination. You won’t find the number of pages in the document unless you calculate how much space you need for each line to ascertain the number of pages. If you need to find exact coordinates of each character on the page, be sure to take into account all spacings, indentations and sizes.
- If you implement a full-featured DOCX layouter that handles tables, note the special cases when tables span multiple pages. A cell which causes a page overflow also affects other cells.
- Creating an optimal algorithm for calculating a table columns’ width is a challenging math problem and word processors and layouters usually use some suboptimal implementations. I propose using the algorithm from W3C HTML table documentation as a first approximation. I haven’t found a description of the algorithm used by MS Word, and Microsoft has fine-tuned the algorithm over time so different versions of Word may lay out tables slightly differently.
If something is unclear: reverse-engineer the XML!
When it’s not obvious how this or that XML tag works inside MS Word, there are two main approaches to figuring it out:
-
Create the desired content step-by-step. Start with a simple docx file. Save each step to its own file, as in
1.docx
,2.docx
, for example. Unzip each of them and use a visual diff tool for folder comparison to see which tags appear after your changes. (For a commercial option, try Araxis Merge, or for a free option, WinMerge.) -
If you generate a DOCX file that MS Word doesn’t like, work backwards. Simplify your XML step by step. At some point you will learn which change MS Word found incorrect.
DOCX is quite complex, isn’t it?
It is complex, and Microsoft’s license forbids using MS Word on the server side for processing DOCX– this is pretty standard for commercial products. Microsoft has, however, provided the XSLT file to handle most DOCX tags, but it won’t give you 100 percent or even 99 percent fidelity. Processes such as text wrapping over images are not supported, but you will be able to support the majority of documents. (If you don’t need complexity, consider using Markdown as an alternative.)
If you have a sufficient budget (there is no free DOCX rendering engine), you may want to use commercial products such as Aspose or docx4j. The most popular free solution is LibreOffice for converting between DOCX and other formats, including PDF. Unfortunately, LibreOffice contains many small bugs during conversion, and since it’s a sophisticated, open-source C++ product, it’s slow and difficult to fix fidelity issues.
Alternatively, if you find DOCX layouting too complicated to implement yourself, you can also convert it to HTML and use a browser to render it. You can also consider one of Toptal’s freelance XML developers.
DOCX Resources for further reading
- ECMA DOCX specification
- OpenXML library for DOCX manipulation from C#. It doesn’t contain information on layouting or rendering code, but offers a class hierarchy matching each possible XML node in DOCX.
- You can always search or ask on stackoverflow with keywords like docx4j, OpenXML and docx; there are people in the community who are knowledgeable.
Файлик с расширением XML (формат XML) – это расширяемый язык разметки. Непонятно? – Вот и меня данное определение вводит в ступор, а обычного новичка (который пытается освоить компьютер) такой ответ не устраивает. В данной заметке я хочу вам рассказать – что же за формат такой XML и как его можно открыть (и для чего он нужен).
На самом деле XML – самый простой текстовый документ, структура которого может состоять из пользовательских тегов или других описаний документа.
Содержание
- Что такое формат XML
- Как открыть XML файл
- Способ 1. Лучший текстовый редактор – Notepad++
- Способ 2. Блокнот Windows
- Способ 3. Браузер
- Заключение
Что такое формат XML
XML – это язык разметки, который создан для удобства кодирования и чтения как машинами, так и простыми смертными людьми. Делается это с помощью тегов, которые формируют структура документа и его параметры.
Если сравнить его с HTML, то в целом, задачи решают одинаковые, только в HTML все эти теги жестко прописаны и жирный будет определяться как <b></b>, то в случае с XML мы можем сделать жирный как <Жирный></Жирный>, т.е. разработчики сами определяют эти теги.
Я думаю вы уже поняли что XML не принадлежит к какой-то конкретной и определенной программе – это универсальный формат и использовать его может каждый. Например он используется для передачи персональных данных в Пенсионный Фонд РФ или же просто выступает промежуточным файлом при переносе данных из конфигурации 1С
Фишка XML в том, что даже если вы не знаете в какой программе он сформирован, то беглым взглядом можно прочитать и осознать информацию, которая в нем содержится.
Как открыть XML файл
Если вы впервые столкнулись с XML… не расстраивайтесь, формат очень распространенный и, раз уже мы знаем что это текст, то и открывается он любым текстовым редактором (только не вздумайте открывать его в MS Office Word)
Способ 1. Лучший текстовый редактор – Notepad++
Если вы достаточно часто сталкиваетесь с XML в своей жизни, то у вас обязательно должен быть установлен Notepad++! Данный текстовый редактор имеет огромный функционал и подсветку синтаксиса, которая так необходима при просмотрел XML. Notepad++ показывает XML формат в максимально удобном и усвояемом обычным человеком виде.
Если вы еще никогда не сталкивались с Notepad++, то вы просто обязаны его попробовать – просто маст хэв приложение на любом компьютере
Способ 2. Блокнот Windows
Если XML файл нужно отредактировать максимально быстро, то и самый простой блокнот сгодится. Кликните правой кнопкой мыши на файлике и в контекстном меню “Открыть с помощью…” и выберите в списке “Блокнот”.
В обычном блокноте Windows, к сожалению, нет подсветки синтаксиса, что несколько затрудняет восприятие информации… но по быстрому подправить вполне сгодится.
Способ 3. Браузер
Если нет надобности в редактировании, то XML файл можно открыть только для чтения в любом браузере. Я использую браузер Яндекс, но можно использовать даже ненавистный многим Internet Explorer. Для этого кликните правой кнопкой по файлу и выберите пункт “Открыть с помощью”, в списке найдите свой веб обозреватель…
Бывают такие ситуации, когда браузер попытается отобразить информацию согласно тегам… что нам совершенно не нужно – просто нажмите CTRL+ U для просмотра исходного кода документа.
Как можете видеть, браузер отображает более вменяемо информацию из XML файла в отличии от стандартного Блокнота.
Заключение
Вот мы и рассмотрели что такое XML файл и различные способы его открыть. Если разбирать вопрос более подробно, то многие читатели могут заметить, что в сети присутствует огромное количество онлайн сервисов для работы с XML – рассматривать их в разрезе данной заметки не вижу смысла, моя цель была рассказать как его просмотреть и отредактировать (по возможностями средствами самой ОС Windows)