Преобразовать excel в xml python

Improve Article

Save Article

Like Article

  • Read
  • Discuss
  • Improve Article

    Save Article

    Like Article

    Python proves to be a powerful language when the requirement is to convert a file from one format to the other. It supports tools that can be employed to easily achieve the functionality. In this article, we’ll find out how we will convert from an Excel file to Extensible terminology (XML) files with Python. 

    Modules needed

    • OpenPyXL helps in interacting with Excel files. It can read and write to .xlsx and .xlsm files and can be installed as:

    pip install openpyxl

    • Yattag is a Python library for generating HTML or XML documents with Python in a very readable way. This Yattag Library is pretty simple and easy to use library. If you are searching for any library in order to more easily generate HTML or XML documents.

    pip install yattag

    Function needed

    • To load the contents of the Excel file load_workbook() method of OpenPyXl is used.
    • To iterate through loaded file and read data Iter_rows() with appropriate attributes is used

    Syntax: Iter_rows(min_col, min_row, max_col, max_row, values_only)

    Parameters:

    • min_col (int) – smallest column value (1-based index)
    • min_row (int) – smallest row value (1-based index)
    • max_col (int) – largest column value (1-based index)
    • Max_row (int) – largest row value (1-based index)
    • values_only (bool) – whether only cell values should be returned
    • The tagtext() method is a helper method that returns a triplet composed of:
      • The Doc instance itself
      • The tag method of the Doc instance
      • The text method of the Doc instance
    • The asis method appends a string to the document without any form of escaping.
    • The tag method will accept any string as a tag name.
    • The indent function takes a string representing an XML or HTML document and returns a well-indented version of this document.

    Database in use: Click here 

    To convert Excel data to XML first, it needs to be read, the given program explains the mechanism for reading data.

    Approach

    • Import module
    • Load Excel file
    • Create sheet object
    • Iterate through rows

    Example

    Python3

    from openpyxl import load_workbook

    wb = load_workbook("demo_database.xlsx")

    ws = wb.worksheets[0]

    for row in ws.iter_rows(min_row=1, max_row=2, min_col=1, max_col=6):

        print([cell.value for cell in row])

    Now, Once we are done with Reading data. Let’s Code how to convert Excel to XML format, 

    Approach:

    • Import module
    • Read data
    • Create XML format page
    • Append to file
    • Save file

    Example:

    Python3

    from openpyxl import load_workbook

    from yattag import Doc, indent

    wb = load_workbook("demo_database.xlsx")

    ws = wb.worksheets[0]

    doc, tag, text = Doc().tagtext()

    xml_header = '<?xml version="1.0" encoding="UTF-8"?>'

    doc.asis(xml_header)

    doc.asis(xml_schema)

    with tag('People'):

        for row in ws.iter_rows(min_row=2, max_row=10, min_col=1, max_col=6):

            row = [cell.value for cell in row]

            with tag("Person"):

                with tag("First_Name"):

                    text(row[0])

                with tag("Last_Name"):

                    text(row[1])

                with tag("Gender"):

                    text(row[2])

                with tag("Country"):

                    text(row[3])

                with tag("Age"):

                    text(row[4])

                with tag("Date"):

                    text(row[5])

    result = indent(

        doc.getvalue(),

        indentation='   ',

        indent_text=True

    )

    with open("output.xml", "w") as f:

        f.write(result)

    Output: output.xml

    Like Article

    Save Article

    Python xls2xml converter

    This project contains a Python code for converting all Excell files (xlsx, xlsm, xltx, xltm) to pretified XML format.

    Dependencies:

    This script uses openpyxl library for maniuplating files. For simplicity I included instalation files for Python 2.5 which you can download and simply run setup.py to install it. This has been tested on MacOS Sierra and High Sierra.

    There’s a high chance it will not work on Windows since I didn’t have time to test it. In that case, visit the openpyxl repo and follow install instructions for Windows or Linux machines before you continue.

    Usage:

    The script is amazingly simple to use simply run the index.py file.

    How it works?

    The script will scan for all the supported files within its root folder and convert them all. XML files will be saved into a folder named XML which the script also creates. Every row within excell will be saved as a separete XML (one row for header and one for content). XML will be saved as a pritified output (you can modify the script if you need compressed). Script works for any number of sheets, number of columns and rows. The format of the XML will have look like the following:

    <?xml version='1.0' encoding='utf-8'?>
    <Workbook>
      <Worksheet name="Sheet name">
        <Row>
          <Cell>
            <Data type="String">Column one</Data>
          </Cell>
          <Cell>
            <Data type="String">Column two</Data>
          </Cell>
        </Row>
        <Row>
          <Cell>
            <Data type="unicode"> Content as a string </Data>
          </Cell>
          <Cell>
            <Data type="long"> 12345 </Data>
          </Cell>
        </Row>
      </Worksheet>
    </Workbook>
    

    The name of each XML file will have the following format: 2017-12-12_17.40.23.545001
    and is guaranted to be unique for every export.

    Feature list breakdown:

    • Scans for all supported files
    • Converts regardles of the number of sheets, rows or columns
    • Exports uniqe XML file for every row
    • Provides data type for every cell

    Planed future features and improvements:

    • Create a config file for allowing users to customize script behaviour
    • Support CSV files
    • Make compresed version to be used as a standalone cross platform app

    Contributions:

    Any improvements to the script, bug fixes or feature additions are more than welcome. Fork the code, and submit a pull request, I will do my best to review it in shortest possible time. You can also open an issue if you have any questions or problems.

    Estimated reading time: 3 minutes

    Are you working on a data analytics project where you need to feed your data to a location that is able to process an XML file?

    The ability to get your data into a structured format like XML has many benefits:

    (A) You can transfer the data to a web service for processing.

    (B) Multiple different formats of your raw data can be standardised, enabling quick conversion and processing.

    (C) XML files can be read by multiple different programs, as long as you deliver them in the correct format.

    (D) The receiver of data can easily read the XML file and store it on their database.

    The ability to use this method to read and transfer data is a very powerful way to help a data analyst process large data sets.

    In fact, if you are using cloud-based applications to analyse the information you are storing, this will quickly enable you to deliver the data.

    What packages do I need in Python?

    The first step is to import the following:

    import pandas as pd
    from lxml import etree as et

    Next we want to read in the source data

    In this instance, we are reading an excel file

    raw_data = pd.read_excel(r'Link to where your data is stored including the full file name')

    Now we want to start building the XML structure

    The FIRST STEP is to define the root

    root = et.Element('root')

    The root is the parent of all the data items (tags) contained in the XML file and is needed as part of the structure

    The SECOND STEP is to define the tag names that will store each row of the source data

    for row in raw_data.iterrows(): ==> This is a loop that takes runs through each record and populates for each tag.
        root_tags = et.SubElement(root, 'ExportData') #=== > Root name
    # These are the tag names for each row (SECTION 1)
        Column_heading_1 = et.SubElement(root_tags, 'Name')
        Column_heading_2 = et.SubElement(root_tags, 'Area')
        Column_heading_3 = et.SubElement(root_tags, 'NoPurchases')
        Column_heading_4 = et.SubElement(root_tags, 'Active')
    
    ###These are the values that will be populated for each row above
    # The values inside the [] are the raw file column headings.(SECTION 2)
        Column_heading_1.text = str(row[1]['Name'])
        Column_heading_2.text = str(row[1]['Area'])
        Column_heading_3.text = str(row[1]['No Purchases'])
        Column_heading_4.text = str(row[1]['Active'])

    The raw file looks like this:

    Name Area No Purchases Active
    John Dublin 2 Y
    Mary Galway 3 N
    Joe Limerick 4 N
    Jimmy Kilkenny 55 Y
    Jennifer Belfast 6 N
    Susan Waterford 3 Y
    Jake Cork 1 Y
    Bobby Dundalk 11 N
    Sarah Sligo 9 N
    Cian Ennis 8 Y
    Raw file data that will be imported into the XML file

    The THIRD STEP is to create the XML file and populate it with the data from the source file

    # This Section outputs the data to an xml file
    # Unless you tell it otherwise it saves it to the same folder as the script.
    tree = et.ElementTree(root) ==> The variable tree is to hold all the values of "root"
    et.indent(tree, space="t", level=0) ===> This just formats in a way that the XML is readable
    tree.write('output.xml', encoding="utf-8") ==> The data is saved to an XML file

    The XML output should look like the below

    <root>
    	<ExportData>
    		<Name>John</Name>
    		<Area>Dublin</Area>
    		<NoPurchases>2</NoPurchases>
    		<Active>Y</Active>
    	</ExportData>
    	<ExportData>
    		<Name>Mary</Name>
    		<Area>Galway</Area>
    		<NoPurchases>3</NoPurchases>
    		<Active>N</Active>
    	</ExportData>
    	<ExportData>
    		<Name>Joe</Name>
    		<Area>Limerick</Area>
    		<NoPurchases>4</NoPurchases>
    		<Active>N</Active>
    	</ExportData>
    	<ExportData>
    		<Name>Jimmy</Name>
    		<Area>Kilkenny</Area>
    		<NoPurchases>55</NoPurchases>
    		<Active>Y</Active>
    	</ExportData>
    	<ExportData>
    		<Name>Jennifer</Name>
    		<Area>Belfast</Area>
    		<NoPurchases>6</NoPurchases>
    		<Active>N</Active>
    	</ExportData>
    	<ExportData>
    		<Name>Susan</Name>
    		<Area>Waterford</Area>
    		<NoPurchases>3</NoPurchases>
    		<Active>Y</Active>
    	</ExportData>
    	<ExportData>
    		<Name>Jake</Name>
    		<Area>Cork</Area>
    		<NoPurchases>1</NoPurchases>
    		<Active>Y</Active>
    	</ExportData>
    	<ExportData>
    		<Name>Bobby</Name>
    		<Area>Dundalk</Area>
    		<NoPurchases>11</NoPurchases>
    		<Active>N</Active>
    	</ExportData>
    	<ExportData>
    		<Name>Sarah</Name>
    		<Area>Sligo</Area>
    		<NoPurchases>9</NoPurchases>
    		<Active>N</Active>
    	</ExportData>
    	<ExportData>
    		<Name>Cian</Name>
    		<Area>Ennis</Area>
    		<NoPurchases>8</NoPurchases>
    		<Active>Y</Active>
    	</ExportData>
    </root>

    Additional XML data can be added

    1. Add more rows – All you need to do is add onto the source file and save. When you rerun the logic it will read in the extra information.
    2. Add more columns – All you need to do is go to the second step above add in a tag name to SECTION 1. Seperately you will need to add an additional column with data to the source file, and then add that column name to SECTION 2 as well

    Recipe Objective

    In big data scenarios, converting data from Excel format to XML format using Python can be used to to exchange data between different applications. We process complex XML files into a relational format or a Big Data format such as Parquet/ORC . XML is user-readable and more commonly used in network communication, especially with the rise of REST-based web services. The tools needed to manage big data analytics, often in the form of NoSQL databases, only map over to XML in a partial sense. XML is considered to be at odds with relational schemas, which puts it in a camp with most of NoSQL technology

    System requirements :

    • Install Yattag python module as follows: pip install yattag
    • Install OpenPyXL python module as follows:pip install openpyxl
    • The below codes can be run in Jupyter notebook , or any python console
    • In this example we are going to use the excel employee dataset

    Table of Contents

    • Recipe Objective
      • System requirements :
      • Step 1: Import the modules
      • Step 2: Load the Workbook
      • Step 3: Fill the headers with Yattags
      • Step 4: Then create main Tag
      • Step 5: Now start loop through the sheet
      • Step 6: Adding the Yattag indent

    Step 1: Import the modules


    import openpyxl
    import yattag
    import load_workbook
    import Doc, indent

    Step 2: Load the Workbook

    create a Yattag document using yattag module and load the workbook and the worksheet.
    #load the excel file along with path where exists
    wb = load_workbook("empdata.xlsx")
    ws = wb.worksheets[0]

    # Create Yattag doc, tag and text objects
    doc, tag, text = Doc().tagtext()

    Step 3: Fill the headers with Yattags

    Fill the headers with Yattags asis() method which enables us to input any string as the next line

    xml_header = ''
    xml_schema = ''

    doc.asis(xml_header)
    doc.asis(xml_schema)

    Step 4: Then create main Tag

    create main tag Employees as below:


    with tag('Employees'):

    Step 5: Now start loop through the sheet

    Now start loopthrough the sheet with the iter_rows method. The iter_rows method returns a generator with all the cells. We use a list comprehension to get all the values from the cells.


    for row in ws.iter_rows(min_row=2, max_row=100, min_col=1, max_col=12):
    row = [cell.value for cell in row]
    with tag("Employee"):
    with tag("FirstName"):
    text(row[0])
    with tag("LastName"):
    text(row[1])
    with tag("Email"):
    text(row[10])
    with tag("Phone"):
    text(row[8])
    with tag("Company"):
    text(row[2])

    Step 6: Adding the Yattag indent

    Next, we are adding the Employees. When finished we indent our result with Yattags indent method.


    result = indent(
    doc.getvalue(),
    indentation = ' ',
    indent_text = True
    )
    print(result)
    with open("employee1.xml", "w") as f:
    f.write(result)

    The complete code looks as follows :


    import openpyxl, yattag
    from openpyxl import load_workbook
    from yattag import Doc, indent

    wb = load_workbook("empdata.xlsx")
    ws = wb.worksheets[0]

    # Create Yattag doc, tag and text objects
    doc, tag, text = Doc().tagtext()

    xml_header = ''
    xml_schema = ''

    doc.asis(xml_header)
    doc.asis(xml_schema)

    with tag('Employees'):
    # Use ws.max_row for all rows
    for row in ws.iter_rows(min_row=2, max_row=100, min_col=1, max_col=12):
    row = [cell.value for cell in row]
    with tag("Employee"):
    with tag("FirstName"):
    text(row[0])
    with tag("LastName"):
    text(row[1])
    with tag("Email"):
    text(row[10])
    with tag("Phone"):
    text(row[8])
    with tag("Company"):
    text(row[2])

    result = indent(
    doc.getvalue(),
    indentation = ' ',
    indent_text = True
    )
    print(result)

    with open("employee.xml", "w") as f: #give path where you want to create
    f.write(result)

    Output of the above Code as follows :

    If you are a person who is working on a data analytics project, then you might need to feed your data to a specific location. This can be done using the XML file. The main advantages of having your data in XML format are that it can be read by multiple different programs if it is delivered in the correct format, and also that the data receiver can easily read the files and store them in their database. Apart from these, one of the most popular reasons for using the XML file is that it helps to read and transfer the data quickly and helps the data analyst process large datasets. As a result, it becomes necessary to convert the Excel files to XML format. And to convert the Excel file to an XML file, we can simply use the powerful programming language Python.

    Thus, here is a detailed tutorial that explains “How do you convert XLS to XML using Python?” with examples. Scroll down to find out more.

    How to Create an XML file from Excel using Python?

    Converting data from Excel format to XML format using Python can be used in huge data applications to exchange data between different applications. We convert complex XML files to relational or Big Data formats like Parquet/ORC. XML is more widely used in network communication, particularly with the rise of REST-based web services. The tools required to manage big data analytics, which is frequently in the form of NoSQL databases, only partially map to XML. XML is thought to be incompatible with relational schemas, putting it in the same camp as most NoSQL technology.

    Why Python?

    When converting a file from one format to another, Python proves to be a powerful language. It includes tools that can be used to quickly obtain the functionality. In this post, let us see how to use Python to convert an Excel file to Extensible terminology (XML) files.

    Modules Used

    OpenPyXL:

    OpenPyXL is used to interact with Excel files. It can read and write to.xlsx and.xlsm files. Use the below command for Installation:

    Yattag:

    Yattag is a Python library for generating readable HTML or XML documents with Python. This Yattag Library is straightforward and simple to use. If you are looking for a library to help you create HTML or XML documents more effortlessly then this library is helpful.

    Functions Used:

    1)load_workbook(): The load_workbook() method of OpenPyXl is used to load the contents of an Excel file.

    2)Iter_rows(): Iter_rows() with proper attributes is used to iterate through the loaded file and read data.

    • Syntax of Iter_rows():

    Iter_rows(min_col, min_row, max_col, max_row, values_only)

    Parameters

    • min_col: It is the minimum/smallest column value(1-based index)
    • min_row: It is the minimum/smallest row value(1-based index)
    • max_col: It is the maximum/largest column value(1-based index)
    • max_row: It is the maximum/largest row value(1-based index)
    • values_only: This returns a boolean value. It indicates whether only cell values must be returned or not.

    3)tagtext() Function:

    This tagtext() function is a helper method that returns a triplet made up of:
    Doc instance itself, tag method of the Doc instance, text method of the Doc instance.
    4) asis() Function:

    The asis method adds/appends a string to the document without escaping it.

    5)tag() Function:

    The tag function is used to accept any string as a tag name.

    6)indent() Function:

    The indent function accepts a string that represents an XML or HTML document and gives a properly indented version of that document.

    Here we now work on the Excel spreadsheet shown in the image below:

    SampleExcelFile:

    SampleExcelFile

    Method #1: Reading an Excel File and printing all the Rows

    Approach:

    • Import load_workbook function from openpyxl module using the import keyword
    • Pass the excel file path to the load_workbook() function of openpyxl module to load the excel file and store it in a variable
    • Access Sheet1 of the given Excel File and store it in another variable
    • Loop till all the cells of the worksheet by passing the arguments min,max rows and columns.
    • Print all the cell values row-by-row.
    • The Exit of the Program.

    Below is the implementation:

    # Import load_workbook function from openpyxl module using the import keyword
    from openpyxl import load_workbook
    
    # Pass the excel file path to the load_workbook() function of openpyxl 
    # module to load the excel file and store it in a variable
    workbookObj  = load_workbook("SampleExcelFile.xlsx")
    
    # Access Sheet1 of the given Excel File and  store it in another variable
    newWorksheet = workbookObj.worksheets[0]
    
    # Loop till all the cells of the worksheet by passing the arguments min,max rows and columns
    for eachrow in newWorksheet.iter_rows(min_row=1, max_row=7, min_col=1, max_col=3):
        print([cell.value for cell in eachrow])

    Output:

    ['EmployeeId', ' EmployeeName', ' Salary']
    [2121.0, 'Danny', 25000.0]
    [2122.0, 'Sindhu', 50000.0]
    [2123.0, 'Steve', 40000.0]
    [2124.0, 'Rosy', 80000.0]
    [2125.0, 'Alex', 100000.0]
    [2126.0, 'Diya', 60000.0]
    

    Method #2: Converting Excel File to XML Format

    Approach:

    • Import load_workbook from openpyxl module using the import keyword
    • Import Doc, indent functions from yattag module using the import keyword
    • Pass the excel file path to the load_workbook() function of openpyxl module which is the file we want to work with and store it in a variable
    • Get/access the Sheet1 of the above workbookObj and store it in another variable
    • Get the Returning returns a triplet
    • Give the XML file header and store it in a variable
    • Give the XML file schema and store it in a variable
    • Append the above XML file header(String) to the document using the asis() function
    • Append the above XML file schema(String) to document using the asis() function
    • Give some random tag name using the above tag of document.
    • Open some random file name in write mode.
    • Write the above-obtained result into this corresponding file using the write() function.
    • The Exit of the Program.

    Below is the implementation:

    # Import load_workbook from openpyxl module using the import keyword
    from openpyxl import load_workbook
    # Import Doc, indent functions from yattag module using the import keyword
    from yattag import Doc, indent
    
    # Pass the excel file path to the load_workbook() function of openpyxl module 
    # which is the file we want to work with and store it in a variable
    workbookObj  = openpyxl.load_workbook('SampleExcelFile.xlsx')
    # Get/access the Sheet1 of the above workbookObj and store it in another variable
    newWorksheet  = workbookObj.worksheets[0]
    
    # Get the Returning returns a triplet
    document, tag_of_doc, text_of_doc = Doc().tagtext()
    
    # Give the xml file header and store it in a variable
    xmlFileHeader = '<?xml version="1.0" encoding="UTF-8"?>'
    # Give the xml file schema and store it in a variable
    xmlFileSchema = '<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"></xs:schema>'
    
    # Append the above xml file header(String) to document using the asis() function
    document.asis(xmlFileHeader)
    # Append the above xml file schema(String) to document using the asis() function
    document.asis(xmlFileSchema)
    
    # Give some random tag name using the above tag of document
    with tag_of_doc('EmployDetails'):
        for row in newWorksheet.iter_rows(min_row=1, max_row=7, min_col=1, max_col=3):
            row = [cell.value for cell in row]
            with tag_of_doc("EmployDetails"):
                with tag_of_doc("EmployeeId"):
                    text_of_doc(row[0])
                with tag_of_doc(" EmployeeName"):
                    text_of_doc(row[1])
                with tag_of_doc("Salary"):
                    text_of_doc(row[2])
    
    # Open some random file name in write mode. 
    with open("ConvertedtXMLFile.xml", "w") as file:
      # Write the above obtained result into this corresponding file 
      # using the write() function
        file.write(document.getvalue())

    Output:

    <?xml version=”1.0″ encoding=”UTF-8″?> <xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”> </xs:schema> <EmployDetails> <EmployDetails> <EmployeeId>EmployeeId</EmployeeId>< EmployeeName> EmployeeName </ EmployeeName> <Salary> Salary</Salary> </EmployDetails> <EmployDetails> <EmployeeId>2121.0</EmployeeId>< EmployeeName>Danny </ EmployeeName> <Salary>25000.0</Salary>undefined</EmployDetails>undefined<EmployDetails> <EmployeeId>2122.0</EmployeeId>< EmployeeName>Sindhuundefined</ EmployeeName>undefined<Salary>50000.0</Salary>undefined</EmployDetails>undefined<EmployDetails>undefined<EmployeeId>2123.0</EmployeeId>< EmployeeName>Steveundefined</ EmployeeName>undefined<Salary>40000.0</Salary>undefined</EmployDetails>undefined<EmployDetails>undefined<EmployeeId>2124.0</EmployeeId>< EmployeeName>Rosyundefined</ EmployeeName>undefined<Salary>80000.0</Salary>undefined</EmployDetails>undefined<EmployDetails>undefined<EmployeeId>2125.0</EmployeeId>< EmployeeName>Alexundefined</ EmployeeName>undefined<Salary>100000.0</Salary>undefined</EmployDetails>undefined<EmployDetails>undefined<EmployeeId>2126.0</EmployeeId>< EmployeeName>Diyaundefined</ EmployeeName>undefined<Salary>60000.0</Salary>undefined</EmployDetails>undefined</EmployDetails>

    Now we have provided the best way to convert the Excel file to an XML file using Python. If you have better methods than the methods provided here, do let us know in the comment box below.

    22.12.2019

    Introduction

    In this post, we will take a look into how we can generate Extensible Markup Language (XML) files from an Excel file with Python. We will be using the Yattag package to generate our XML file and the OpenPyXL package for reading our Excel data.

    Packages

    Yattag

    Yattag is described in its documentation as following:

    Yattag is a Python library for generating HTML or XML in a pythonic way.

    That pretty much sums Yatttag up, I find it as a simple, easy to use library that just works. I had been searching for this kind of library in order to more easily generate different XML files.

    To install Yattag with pip: pip install yattag

    Using Yattag

    Adding a tag with Yattag is as easy as using the With keyword:

    with tag('h1'):
        text('Hello world!')

    Tags are automatically closed. To start using Yattag we need to import Doc from Yattag and create our Doc, tag and text with Doc().tagtext().

    from yattag import Doc
    doc, tag, text = Doc().tagtext()
    with tag('h1'):
        text('Hello world!')
    doc.getvalue()
    
    Output:
    '<h1>Hello world!</h1>'

    OpenPyXL

    OpenPyXL is a library for interacting with Excel 2010 files. OpenPyXL can read and write to .xlsx and .xlsm files.

    To install OpenPyXL with pip: pip install openpyxl

    Using OpenPyXL

    To load an existing workbook in OpenPyXl we need to use the load_workbook method. We also need to select the sheet we are reading the data from. In our example, we are using popular baby names in New York City. You can access the dataset from the link at the bottom of this post.

    I have created a workbook named NY_baby_names.xlsx with one sheet of data, Sheet1. The worksheet has the following headers: Year of Birth, Gender, Child’s First Name, Count, Rank. You can download the Excel file from my website here.

    To access the data with OpenPyXL, do the following:

    from openpyxl import load_workbook
    wb = load_workbook("NY_baby_names.xlsx")
    ws = wb.worksheets[0]
    for row in ws.iter_rows(min_row=1, max_row=2, min_col=1, max_col=4):
        print([cell.value for cell in row])
    
    Output: 
    ['Year of Birth', 'Gender', "Child's First Name", 'Count']
    [2011, 'FEMALE', 'GERALDINE', 13]

    First, we load the workbook with load_workbook, and then select the first worksheet. We then iterate through the first two rows with the iter_rows method.

    Generating the XML from Excel

    After the imports, we load the workbook and the worksheet. We then create our Yattag document. We fill the headers with Yattags asis() method. The asis method enables us to input any string as the next line.

    We then create our main tag, Babies. We start looping through our sheet with the iter_rows method. The iter_rows method returns a generator with all the cells. We use a list comprehension to get all the values from the cells.

    Next, we are adding the babies. Notice the use of the With tag and text. When we are finished we indent our result with Yattags indent method.

    Finally, we save our file. The output should look like below. Notice that I only included two babies in the output.

    <?xml version="1.0" encoding="UTF-8"?>
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"></xs:schema>
    <Babies>
        <Baby>
            <Name>
                GERALDINE
            </Name>
            <Gender>
                FEMALE
            </Gender>
            <year>
                2011
            </year>
            <count>
                13
            </count>
            <rank>
                75
            </rank>
        </Baby>
        <Baby>
            <Name>
                GIA
            </Name>
            <Gender>
                FEMALE
            </Gender>
            <year>
                2011
            </year>
            <count>
                21
            </count>
            <rank>
                67
            </rank>
        </Baby>
    </Babies>

    Complete code

    from openpyxl import load_workbook
    from yattag import Doc, indent
    
    wb = load_workbook("NY_baby_names.xlsx")
    ws = wb.worksheets[0]
    
    # Create Yattag doc, tag and text objects
    doc, tag, text = Doc().tagtext()
    
    xml_header = '<?xml version="1.0" encoding="UTF-8"?>'
    xml_schema = '<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"></xs:schema>'
    
    doc.asis(xml_header)
    doc.asis(xml_schema)
    
    with tag('Babies'):
        # Use ws.max_row for all rows
        for row in ws.iter_rows(min_row=2, max_row=100, min_col=1, max_col=5):
            row = [cell.value for cell in row]
            with tag("Baby"):
                with tag("Name"):
                    text(row[2])
                with tag("Gender"):
                    text(row[1])
                with tag("year"):
                    text(row[0])
                with tag("count"):
                    text(row[3])
                with tag("rank"):
                    text(row[4])
    
    result = indent(
        doc.getvalue(),
        indentation = '    ',
        indent_text = True
    )
    
    with open("baby_names.xml", "w") as f:
        f.write(result)

    Data sets

    Popular Baby Names dataset: https://catalog.data.gov/dataset/most-popular-baby-names-by-sex-and-mothers-ethnic-group-new-york-city-8c742

    Since you need a single <Portfolio> and <PortfolioPositions> as parent grouping, consider a nested loop by iterating through a list of data frames. Then, within each data frame loop through its rows:

    import xml.etree.ElementTree as ET
    import pandas as pd
    import xml.dom.minidom as md
    
    df = pd.read_excel("Input.xlsx", sheet_name="Sheet1", dtype=object)
    
    # LIST OF DATA FRAME SPLITS
    df_list = [g for i,g in df.groupby(
        ["Name", "BaseCurrency2", "TradingPower", "ValidationProfile", "CommissionProfile"]
    )]
    
    # ROOT LEVEL
    root = ET.Element('trading-data')
    root.set('xmlns:xsi', 'http://www.w3.org/2001/XMLSchema-instance')
    
    # ROOT CHILD LEVEL
    Portfolios = ET.SubElement(root, "Portfolios")
    Defaults = ET.SubElement(Portfolios, "Defaults", BaseCurrency="USD")
    
    # GROUP LEVEL ITERATION
    for df in df_list:
        Portfolio = ET.SubElement(
            Portfolios, 
            "Portfolio", 
            Name = df["Name"][0],
            BaseCurrency = df["BaseCurrency2"][0], 
            TradingPower = str(df["TradingPower"][0]),
            ValidationProfile = df["ValidationProfile"][0], 
            CommissionProfile = df["CommissionProfile"][0]
        )
    
        PortfolioPositions = ET.SubElement(Portfolio, "PortfolioPositions")
    
        # ROW LEVEL ITERATION
        for row in df.itertuples():
            if row.Type == "Cash":
                PortfolioPosition = ET.SubElement(
                    PortfolioPositions, 
                    "PortfolioPosition", 
                    Type = row.Type, 
                    Volume = str(row.Volume)
                )
                Cash = ET.SubElement(
                    PortfolioPosition, 
                    "Cash", 
                    Currency = str(row.Currency)
                )
            else:
                PortfolioPosition = ET.SubElement(
                     PortfolioPositions, 
                     "PortfolioPosition", 
                     Type = row.Type,
                     Volume = str(row.Volume),
                     Invested = str(row.Invested), 
                     BaseInvested = str(row.BaseInvested)
                )
                Instrument = ET.SubElement(
                     PortfolioPosition, 
                     "Instrument", 
                     Ticker = str(row.Ticker),
                     ISIN = str(row.ISIN),
                     Market = str(row.Market),
                     Currency = str(row.Currency2),
                     CFI = str(row.CFI)
                )
    
    # SAVE PRETTY PRINT OUTPUT
    with open("Output.xml", "wb") as f:
        dom = md.parseString(ET.tostring(root))
        f.write(dom.toprettyxml().encode("utf-8"))
    

    My question is that I have assigned one task in that I have to read excel document and store that data into XML file. So I have done one code in python for that. But it giving me error when I am writing an XML file.

    #!/usr/bin/python
    
    import xlrd
    import xml.etree.ElementTree as ET
    
    workbook = xlrd.open_workbook('anuja.xls')
    workbook = xlrd.open_workbook('anuja.xlsx', on_demand = True)
    worksheet = workbook.sheet_by_index(0)
    first_row = [] # Header
    for col in range(worksheet.ncols):
        first_row.append( worksheet.cell_value(0,col) )
    # tronsform the workbook to a list of dictionnaries
    data =[]
    for row in range(1, worksheet.nrows):
        elm = {}
        for col in range(worksheet.ncols):
            elm[first_row[col]]=worksheet.cell_value(row,col)
        data.append(elm)
    
    for set1 in data :
        f = open('data.xml', 'w')
        f.write("<Progress>%s</Progress>" % (set1[0]))
        f.write("<P>%s</P>" % (set1[1]))
        f.write("<Major>%s</Major>" % (set1[2]))
        f.write("<pop>%s</pop>" % (set1[3]))
        f.write("<Key>%s</Key>" % (set1[4]))
        f.write("<Summary>%s</Summary>" % (set1[5]))
    

    Error is

    Traceback (most recent call last):
      File "./read.py", line 23, in <module>
        f.write("<Progress>%s</Progress>" % (set1[0]))
    KeyError: 0
    

    Понравилась статья? Поделить с друзьями:
  • Преобразование цифры в пропись excel
  • Преобразование цифр текст excel
  • Преобразование цифр прописью в excel
  • Преобразование цифр в текст word
  • Преобразование фурье с помощью excel