Добрый день.
Если кто сталкивался с подобным или какой косяк виден — скажите пож что делаю не так.
Делаю вывод в Excel посредством XSLT преобразования.
Сначала преобразую внутреннюю таблицу в XML посредством стандартного преобразования ID, затем полученный XML преобразую в Excel совместимый XML посредством написанного XSLT преобразования ZST_1, затем сохраняю файл на рабочую станцию в файл с расширением .XLS.
При открытии файла в Excel происходит предупреждение, что формат открываемого файла отличается от указанного в расширении. Если предупреждение игнорировать, то файл открывается.
Если же сохранять файл с расширением .XML то Excel его не открывает, потому что при выполнении трансформации ZST_1 почему то исчезает строка <?mso-application progid=»Excel.Sheet»?> по которой определяется что это XML совместимый с Excel.
Как избавиться от этого сообщения?
И есть ли способ указать в преобразовании, что нужно сохранять в UTF-16, потому как моё преобразование сохраняет в UTF-8 что в ряде случае приводит к неотображению нужного текста.
программа
Code:
REPORT Z_TEST1.
data:
BEGIN OF z_PARTNER,
LIFNR TYPE LIFNR,
NAME(25),
end OF z_PARTNER,
T_PARTNER like STANDARD TABLE OF z_PARTNER.
CLEAR: T_PARTNER, z_PARTNER.
z_PARTNER-LIFNR = ‘1’.
z_PARTNER-NAME = ‘номер 1’.
APPEND z_PARTNER to t_PARTNER.
z_PARTNER-LIFNR = ‘2’.
z_PARTNER-NAME = ‘номер 2’.
APPEND z_PARTNER to t_PARTNER.
* трансформация ABAP2XML
TYPES: z_xml(1024) TYPE x.
DATA: lt_xml TYPE STANDARD TABLE OF z_xml,
lt_xml_xls LIKE lt_xml.
CALL TRANSFORMATION id
SOURCE data_node = t_PARTNER
RESULT XML lt_xml.
* трансформация XML2XML(EXCEL)
CALL TRANSFORMATION ZST_1
SOURCE XML lt_xml[]
RESULT XML lt_xml_XLS.
* выгрузка
* временная директория
DATA:
l_filename TYPE string,
l_dirname TYPE string.
CALL METHOD cl_gui_frontend_services=>get_sapgui_workdir
CHANGING
sapworkdir = l_dirname
EXCEPTIONS
OTHERS = 0.
CHECK sy-subrc = 0.
* имя файла для выгрузки
CLEAR: l_filename.
CONCATENATE l_dirname ‘TST’ ‘_’ sy-datum ‘_’ sy-uzeit ‘.xls’ INTO l_filename.
* выгрузка файла
CALL METHOD cl_gui_frontend_services=>gui_download
EXPORTING
filename = l_filename
filetype = ‘BIN’
CHANGING
data_tab = lt_xml_XLS.
* откроем выгруженный XML-эксель
cl_gui_frontend_services=>execute( document = l_filename operation = ‘OPEN’ ).
преобразование ZST_1
Code:
<xsl:transform xmlns:xsl=»http://www.w3.org/1999/XSL/Transform» xmlns:sap=»http://www.sap.com/sapxsl» xmlns:asx=»http://www.sap.com/abapxml» version=»1.0″>
<xsl:strip-space elements=»*»/>
<xsl:template match=»/»>
<?mso-application progid=»Excel.Sheet»?>
<Workbook xmlns=»urn:schemas-microsoft-com:office:spreadsheet» xmlns:o=»urn:schemas-microsoft-com:office:office» xmlns:x=»urn:schemas-microsoft-com:office:excel» xmlns:ss=»urn:schemas-microsoft-com:office:spreadsheet» xmlns:html=
«http://www.w3.org/TR/REC-html40»>
<DocumentProperties xmlns=»urn:schemas-microsoft-com:office:office»>
<Version>12.00</Version>
</DocumentProperties>
<ExcelWorkbook xmlns=»urn:schemas-microsoft-com:office:excel»>
<WindowHeight>8580</WindowHeight>
<WindowWidth>17100</WindowWidth>
<WindowTopX>360</WindowTopX>
<WindowTopY>45</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID=»Default» ss:Name=»Normal»>
<Alignment ss:Vertical=»Bottom»/>
<Borders/>
<Font ss:Color=»#000000″ ss:FontName=»Calibri» ss:Size=»11″ x:CharSet=»204″ x:Family=»Swiss»/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID=»s62″>
<Borders>
<Border ss:LineStyle=»Continuous» ss:Position=»Bottom» ss:Weight=»1″/>
<Border ss:LineStyle=»Continuous» ss:Position=»Left» ss:Weight=»1″/>
<Border ss:LineStyle=»Continuous» ss:Position=»Right» ss:Weight=»1″/>
<Border ss:LineStyle=»Continuous» ss:Position=»Top» ss:Weight=»1″/>
</Borders>
</Style>
</Styles>
<Worksheet ss:Name=»Лист12345″>
<Table ss:DefaultRowHeight=»15″ ss:ExpandedColumnCount=»4″ ss:ExpandedRowCount=»20000″ x:FullColumns=»1″ x:FullRows=»1″>
<xsl:for-each select=»asx:abap/asx:values/DATA_NODE/item»>
<Row>
<Cell ss:StyleID=»s62″>
<Data ss:Type=»Number»>
<xsl:value-of select=»LIFNR»/>
</Data>
</Cell>
<Cell ss:StyleID=»s62″>
<Data ss:Type=»String»>
<xsl:value-of select=»NAME»/>
</Data>
</Cell>
<Cell ss:StyleID=»s62″>
<Data ss:Type=»String»>
<xsl:value-of select=»asx:abap/asx:values/DATA_NODE/item/NUMBER_OF_LINE»/>
</Data>
</Cell>
<Cell ss:StyleID=»s62″/>
</Row>
</xsl:for-each>
</Table>
<WorksheetOptions xmlns=»urn:schemas-microsoft-com:office:excel»>
<PageSetup>
<Header x:Margin=»0.3″/>
<Footer x:Margin=»0.3″/>
<PageMargins x:Bottom=»0.75″ x:Left=»0.7″ x:Right=»0.7″ x:Top=»0.75″/>
</PageSetup>
<Selected/>
<Panes>
<Pane>
<Number>3</Number>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>
</xsl:template>
</xsl:transform>
NovaInfo 2, скачать PDF
Опубликовано 13 июля 2010
Раздел: Технические науки
Просмотров за месяц: 53
Аннотация
В настоящее время разбор и анализ таблиц Excel в программе TestComplete вызывает сложности при разработке тест-скриптов. Одним из вариантов решения данной проблемы может стать анализ таблиц Excel, сохраненных в XML-формате. MS Office позволяет сохранять таблицы в формате «Таблица XML». В данной статье мы рассмотрим возможность разбора и анализа таблиц в данном формате.
Ключевые слова
XML, TESTCOMPLETE, MS EXCEL
Текст научной работы
В настоящее время разбор и анализ таблиц Excel в программе TestComplete вызывает сложности при разработке тест-скриптов. Одним из вариантов решения данной проблемы может стать анализ таблиц Excel, сохраненных в XML-формате. MS Office позволяет сохранять таблицы в формате «Таблица XML». В данной статье мы рассмотрим возможность разбора и анализа таблиц в данном формате. Более подробную информацию о формате можно получить на сайте Microsoft. Информацию об использованных в этой статье методах и свойствах MS XML можно подчеркнуть из статьи Разбор и анализ XML-файла в TestComplete.
Создадим таблицу Excel, например, подобную этой:
При сохранении в XML формате данная таблица будет выглядеть следующим образом:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>Долганов Алексей Александрович</Author>
<LastAuthor>Долганов Алексей Александрович</LastAuthor>
<Created>2010-07-13T09:35:04Z</Created>
<Version>12.00</Version>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>11820</WindowHeight>
<WindowWidth>15315</WindowWidth>
<WindowTopX>120</WindowTopX>
<WindowTopY>45</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font ss:FontName="Calibri" x:CharSet="204" x:Family="Swiss" ss:Size="11" ss:Color="#000000"/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s62">
<NumberFormat ss:Format="#,##0.00"р.""/>
</Style>
</Styles>
<Worksheet ss:Name="Лист1">
<Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="3" x:FullColumns="1" x:FullRows="1" ss:DefaultRowHeight="15">
<Row ss:AutoFitHeight="0">
<Cell><Data ss:Type="String">Молоко</Data></Cell>
<Cell ss:StyleID="s62"><Data ss:Type="Number">10</Data></Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell><Data ss:Type="String">Мясо</Data></Cell>
<Cell ss:StyleID="s62"><Data ss:Type="Number">50</Data></Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell><Data ss:Type="String">Яблоки</Data></Cell>
<Cell ss:StyleID="s62"><Data ss:Type="Number">20</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.3"/>
<Footer x:Margin="0.3"/>
<PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
</PageSetup>
<Unsynced/>
<Print>
<ValidPrinterInfo/>
<PaperSizeIndex>9</PaperSizeIndex>
<HorizontalResolution>600</HorizontalResolution>
<VerticalResolution>600</VerticalResolution>
</Print>
<Selected/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>
Структура XML-файла начинается с корневого элемента Workbook, обозначающего книгу. Перейдем к его дочерним элементам.
- DocumentProperties (Свойства документа). В этом элементе не содержится никаких данных из таблицы, поэтому этот элемент мы рассматривать не будем;
- ExcelWorkbook (Книга Excel). Также в этом элементе не содержится важной для нас информации, поэтому также пропускаем;
- Styles (Стили). Содержит форматирование таблицы. Расмотрим этот элемент вкратце. Будем считать что нам важнее сами данные, чем их форматирование;
- Worksheet (Лист). Данных элементов может быть несколько, в зависимости от количества листов в книге. Отобрать данные элементы можно с помощью XML-метода getElementsByTagName.
WorkSheet
Обязательные параметры:
- Ss:Name (Название листа).
Необязательные параметры:
- Ss:Protected (Информация о защите листа);
- Ss:RightToLeft (Направление текста).
Дочерние элементы:
- Table (Таблица). Данные;
- WorksheetOptions (Настройки). Не содержит данных. рассматриваться не будет.
Table
Обязательные параметры: нет
Необязательные параметры:
- Ss:DefaultColumnWidth (Ширина столбцов по умолчанию). Указывается в pt (1pt = 4/3px);
- Ss:DefaultRowHeight (Высота строк по умолчанию). Указывается в pt (1pt = 4/3px);
- Ss:ExpandedColumnCount (Общее число столбцов в этой таблице);
- Ss:ExpandedRowCount (Общее число строк в этой таблице);
- Ss:LeftCell (Начало таблицы слева);
- Ss:StyleID (Стиль таблицы). Ссылается на элемент Styles (подробнее ниже);
- Ss:TopCell (Начала таблицы сверху).
Дочерние элементы:
- Column (Столбцы);
- Row (Строки).
Column
Обязательные параметры: нет
Необязательные параметры:
- C:Caption (Заголовок);
- Ss:AutoFitWidth (Автоматическая ширина столбца). Истина если содержит значение 1;
- Ss:Hidden (Признак скрытия столбца);
- Ss:Index (Индекс столбца);
- Ss:Span (Количество столбцов с одинаковым форматированием);
- Ss:StyleID (Стиль столбца);
- Ss:Width (Ширина столбца). Указывается в pt (1pt = 4/3px).
Остановимся подробнее на параметрах ss:Index и ss:Span. Например, имеется 5 столбцов:
- Ширина 100pt;
- Ширина 20pt;
- Ширина 20pt;
- Ширина 20pt;
- Ширина 50pt.
В XML-файле столбцы должны быть описаны следующим образом:
<column ss:width="100"/>
<column ss:span="2" ss:width="20"/>
<column ss:index="5" ss:width="50"/>
Row
Обязательные параметры: нет
Необязательные параметры:
- C:Caption (Заголовок);
- Ss:AutoFitWidth (Автоматическая высота строки). Истина если содержит значение 1;
- Ss:Height (Высота строки). Указывается в pt (1pt = 4/3px);
- Ss:Hidden (Признак скрытия строки);
- Ss:Index (Индекс строки);
- Ss:Span (Количество строк с одинаковым форматированием);
- Ss:StyleID (Стиль строки).
Дочерние элементы:
- Cell (Ячейка).
Cell
Обязательные параметры:
Ss:Type (Тип ячейки). Возможные значения: Number (Числовой); DateTime (Дата и время); Boolean (Логический); String (Строковый); Error (Ошибка). Возможные значения: #NULL!; #DIV/0!; #VALUE!; #REF!; #NAME?; #NUM!; #N/A; #CIRC!
Необязательные параметры: нет
Дочерние элементы:
- B (Жирным);
- Font (Шрифт);
- I (Курсив);
- S (Зачеркнутый);
- Span (Форматированный);
- Sub (Верхний регистр);
- Sup (Нижний регистр);
- U (Подчеркивание).
Значение элемента: Значение ячейки
Читайте также
-
Табличная имитация алгоритмов искусственного интеллекта в MS Excel
- Любивая Т.Г.
-
Применение MS Excel в решение логистических задач
- Князева А.А.
- Лыкова Н.П.
-
Создание xml-файла средствами MS XML в TestComplete
- Долганов А.А.
-
Разбор и анализ xml-файла в TestComplete
- Долганов А.А.
-
Основы работы с odt в TestComplete
- Долганов А.А.
Цитировать
Долганов, А.А. Разбор и анализ таблиц Excel с помощью MS XML в TestComplete / А.А. Долганов. — Текст : электронный // NovaInfo, 2010. — № 2. — URL: https://novainfo.ru/article/158 (дата обращения: 14.04.2023).
Поделиться
Here is a guide to create an xls file in an XML document.
First step is the declaration of XML document. This defines the XML version and the encoding.
<?xml version=”1.0″? encoding=”ISO-8859-1″?>
<?mso-application progid=”Excel.Sheet”?>
Next is the root tag and schemas for excel.
<Workbook xmlns=”urn:schemas-microsoft-com:office:spreadsheet” xmlns:o=”urn:schemas-microsoft-com:office:office” xmlns:x=”urn:schemas-microsoft-com:office:excel” xmlns:ss=”urn:schemas-microsoft-com:office:spreadsheet” xmlns:html=”http://www.w3.org/TR/REC-html40″>
Under the <Workbook> Tag, there are Elements that constant for an excel format.
First element is the <DocumentProperties>. This tag set the excel properties such as Author, Title, Date and Time created and so on.
<DocumentProperties xmlns=”urn:schemas-microsoft-com:office:office”>
As for child node of <DocumentProperties>.
<Author>
<LastAuthor>
<Created>
<Version>
<Title>
<Subject>
<Keywords>
<Category>
<Manager>
The next element is <ExcelWorkbook>. Below is the format and the corresponding child node.
<ExcelWorkbook>
<WindowHeight>8700</WindowHeight>
<WindowWidth>12315</WindowWidth>
<WindowTopY>120</WindowTopY>
<WindowTopX>60</WindowTopX>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectStructure>
</ExcelWorkbook>
Now is for the style of the data to be represent in the excel spreadsheet. This is similar to the CCS.
The tag element will be <Styles> and each child node will be <Style>. And each element of <Style> node is the format of how is the data to be represent.
<Styles>
<Style ss:ID=”Default” ss:Name=”Normal”>
<Alignment ss:Vertical=”Bottom”/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID=”DefaultNumber”>
<Alignment ss:Horizontal=”Right”/>
</Style>
<Style ss:ID=”BoldItalic”>
<Font ss:Bold=”1″ ss:Italic=”1″/>
</Style>
<Style ss:ID=”SimpleUnderline”>
<Font ss:Underline=”Single”/>
<Alignment ss:Horizontal=”Right”/>
</Style>
<Style ss:ID=”BoldAndUnderline”>
<Font ss:Bold=”1″ ss:Underline=”Single”/>
<Alignment ss:Horizontal=”Left”/>
</Style>
<Style ss:ID=”DoubleUnderline”>
<Font ss:Underline=”Double”/>
<Alignment ss:Horizontal=”Right”/>
</Style>
<Style ss:ID=”Currency3Decimals”>
<NumberFormat ss:Format=”"$"#,##0.000″/>
<Alignment ss:Horizontal=”Right”/>
</Style><Style ss:ID=”Header”>
<Font ss:FontName=”Comic Sans MS” x:Family=”Swiss” ss:Size=”12″/>
<Alignment ss:Horizontal=”Center” ss:Vertical=”Center”/>
</Style>
</Styles>
Those are some example style. You can put other style depending on how you like your data to be represent.
The next node is the <Worksheet> node. This node hold the Table informations and data and style of each cells.
<Worksheet ss:Name=”Sheet Name”>
Then under the <Worksheet> Node, is the <Table> node.
<Table ss:ExpandedColumnCount=”256″ ss:ExpandedRowCount=”21″ x:FullColumns=”1″ x:FullRows=”1″>
We will create a table with 5 columns.
<Column ss:AutoFitWidth=”0″ ss:Width=”10″/>
<Column ss:StyleID=”DefaultNumber” ss:AutoFitWidth=”0″ ss:Width=”80″/>
<Column ss:StyleID=”BoldItalic” ss:AutoFitWidth=”0″ ss:Width=”80″/>
<Column ss:StyleID=”SimpleUnderline” ss:AutoFitWidth=”0″ ss:Width=”90″/>
<Column ss:StyleID=”Currency3Decimals” ss:AutoFitWidth=”0″ ss:Width=”100″/>
The first width is the height of the cell and the second width is the actual width of the cell.
The StyleID is the id we declared in the <Styles> node.
<Row Num=”1″>
<Cell ss:Index=”2″ ss:StyleID=”Header” ss:MergeAcross=”4″><Data ss:Type=”String”>This is the first row</Data></Cell>
</Row>
The ss:Index=”2″, means that the data will be place at the 2nd column which is Column B. The ss:MergeAcross=”4″, merge the cells from column B to E.
<Row Num=”3″ ss:Index=”3″>
<Cell ss:Index=”2″><Data ss:Type=”String”>DefaultNumber</Data></Cell>
<Cell><Data ss:Type=”String”>BoldItalic</Data></Cell>
<Cell><Data ss:Type=”String”>SimpleUnderline</Data></Cell>
<Cell><Data ss:Type=”String”>Currency3Decimals</Data></Cell>
</Row>
<Row Num=”4″ ss:Index=”4″>
<Cell ss:Index=”2″><Data ss:Type=”Number”>123456</Data></Cell>
<Cell><Data ss:Type=”String”>Bold and Italic</Data></Cell>
<Cell><Data ss:Type=”String”>Underline</Data></Cell>
<Cell><Data ss:Type=”Number”>123456</Data></Cell>
</Row>
The last node for the worksheet is the <WorksheetOptions>. Its contain the Print, Pane, Selected, etc.
<WorksheetOptions xmlns=”urn:schemas-microsoft-com:office:excel”>
<Print>
<ValidPrinterInfo/>
<HorizontalResolution>200</HorizontalResolution>
<VerticalResolution>200</VerticalResolution>
<NumberofCopies>0</NumberofCopies>
</Print>
<Selected/>
<Panes>
<Pane>
<Number>3</Number>
<ActiveRow>1</ActiveRow>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
The <Worksheet> node can be use repeatedly, if you want to create more than one sheet in your excel file.
And now we can close the XML with the </Workbook> closing tag.
</Workbook>
Hope this guide help you with your project.
I get the following warning when opening an XML file with the ending .xls
but I want to use it as xls
:
«The file you are trying to open, ‘[filename]’, is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source before opening the file. Do you want to open the file now?» (Yes | No | Help)
Quoted from the MSDN blog article ‘Excel 2007 Extension Warning On Opening Excel Workbook from a Web Site’ archive link original link (broken).
How to solve this?
I use .xls with this source code:
<?xml version="1.0" encoding="utf-8"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
<Worksheet ss:Name="Export">
<Table>
<Row>
<Cell><Data ss:Type="Number">3</Data></Cell>
<Cell><Data ss:Type="Number">22123497</Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>
User5910
4635 silver badges13 bronze badges
asked Sep 14, 2011 at 12:12
8
Well as the commenters already mentioned your example-document is definitely not an xls-file (as those are binary) and Excel rightly complains to that fact (because a document might trick you with the wrong extension).
What you should do is to save the document with file extension xml and add the processing-instruction for an office document (or in this case SpreadsheetML as opposed to the original binary/ proprietary excel-format)
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
...
This used to work, but I just noticed that with Office 2007 the XML-processing component («XML Editor») doesn’t seem to be installed as default app for XML files. This did send XML-files to the correct application when they were opened (according to the processing instruction). Maybe on your machine this works as it was intended to work (otherwise you might have to change this behavior).
So this is basically the same the other commenters already said. Still I hope this helps.
Mads Hansen
62.9k12 gold badges113 silver badges144 bronze badges
answered Nov 16, 2011 at 16:17
Andreas JAndreas J
5264 silver badges18 bronze badges
2
Office Open XML (OOX) has become the default format with the release of Office 2007, but back in the 2003’s days, Microsoft had already developed a format to store Excel workbooks as XML.
A comprehensive overview is available here :
Dive into SpreadsheetML (Part 1 of 2)
Dive into SpreadsheetML (Part 2 of 2)
Contrary to OOX where data and metadata are stored in a multipart archive, an Excel workbook file in SpreadsheetML 2003 format consists in a single XML instance, and therefore easily managed using built-in Oracle XML functions and XML DB features.
In this article, I’ll focus on how to create and read such files with the help of SQL/XML functions, XSLT and XQuery.
1. Writing a file
The minimum valid structure for an instance looks like this :
<?xml version="1.0" encoding="UTF-8"?> <?mso-application progid="Excel.Sheet"?> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"> <Worksheet ss:Name="Sheet1"> <Table> <Row> <Cell> <Data ss:Type="String">Employee No</Data> </Cell> <Cell> <Data ss:Type="String">Employee Name</Data> </Cell> <Cell> <Data ss:Type="String">Job</Data> </Cell> </Row> <Row> <Cell> <Data ss:Type="Number">7839</Data> </Cell> <Cell> <Data ss:Type="String">KING</Data> </Cell> <Cell> <Data ss:Type="String">PRESIDENT</Data> </Cell> </Row> <!-- More rows --> </Table> </Worksheet> </Workbook>
It can be generated this way, with SQL/XML functions :
SELECT XMLConcat( XMLPi("mso-application", 'progid="Excel.Sheet"') , XMLElement("Workbook", XMLAttributes( 'urn:schemas-microsoft-com:office:spreadsheet' as "xmlns" , 'urn:schemas-microsoft-com:office:spreadsheet' as "xmlns:ss" ) , XMLElement("Worksheet", XMLAttributes('Sheet1' as "ss:Name") , XMLElement("Table", XMLElement("Row", XMLForest( XMLElement("Data", XMLAttributes('String' as "ss:Type"), 'Employee No') as "Cell" , XMLElement("Data", XMLAttributes('String' as "ss:Type"), 'Employee Name') as "Cell" , XMLElement("Data", XMLAttributes('String' as "ss:Type"), 'Job') as "Cell" ) ) , XMLAgg( XMLElement("Row", XMLForest( XMLElement("Data", XMLAttributes('Number' as "ss:Type"), e.empno) as "Cell" , XMLElement("Data", XMLAttributes('String' as "ss:Type"), e.ename) as "Cell" , XMLElement("Data", XMLAttributes('String' as "ss:Type"), e.job) as "Cell" ) ) order by e.empno ) ) ) ) ) FROM scott.emp e ;
Although this query is relatively simple and efficient, we can imagine how cumbersome it could get to write queries for more complex requirements.
So this is where XSLT comes into play. By creating a stylesheet working on a canonical XML input, we can hide the transformation logic and separate the data layer from the presentation layer.
Following is an example generating a multisheet workbook and some additional Excel-specific formattings (frozen headers and tab color set to red for total salaries higher than 10,000).
Here, I first stored the XSLT stylesheet in the XML DB repository. That’s not mandatory, we can also declare the stylesheet inline in the PL/SQL block, but it’s a good practice to keep the stylesheets in the database (repository or XMLType column in a relational table) if we intend to use them with the internal XSLT processor.
The stylesheet (test.xsl) :
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"> <xsl:output method="xml" encoding="UTF-8"/> <xsl:template match="/"> <xsl:processing-instruction name="mso-application">progid="Excel.Sheet"</xsl:processing-instruction> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"> <Styles> <Style ss:ID="h"> <Interior ss:Color="#C0C0C0" ss:Pattern="Solid"/> </Style> </Styles> <xsl:apply-templates/> </Workbook> </xsl:template> <xsl:template match="ROWSET"> <Worksheet ss:Name="{@name}"> <Table> <Row> <xsl:for-each select="ROW[1]/*"> <Cell ss:StyleID="h"> <Data ss:Type="String"> <xsl:value-of select="translate(local-name(), '_', ' ')"/> </Data> </Cell> </xsl:for-each> </Row> <xsl:apply-templates/> </Table> <x:WorksheetOptions> <x:FrozenNoSplit/> <x:SplitHorizontal>1</x:SplitHorizontal> <x:TopRowBottomPane>1</x:TopRowBottomPane> <x:ActivePane>2</x:ActivePane> <xsl:if test="@color"> <x:TabColorIndex><xsl:value-of select="@color"/></x:TabColorIndex> </xsl:if> </x:WorksheetOptions> </Worksheet> </xsl:template> <xsl:template match="ROW"> <Row> <xsl:apply-templates/> </Row> </xsl:template> <xsl:template match="ROW/*"> <Cell> <Data ss:Type="String"> <xsl:value-of select="."/> </Data> </Cell> </xsl:template> </xsl:stylesheet>
The transformation code :
DECLARE xmldoc CLOB; BEGIN select xmlserialize(document xmltransform( xmlelement("ROOT", xmlagg( xmlelement("ROWSET", xmlattributes( d.dname as "name" , case when sum(e.sal) > 10000 then '2' end as "color" ) , xmlagg( xmlelement("ROW", xmlforest( e.empno as "Employee_No" , e.ename as "Employee_Name" , e.job as "Job" , e.sal as "Salary" ) ) order by e.empno ) ) order by d.deptno ) ) , xdburitype('/office/excel/stylesheets/out/test.xsl').getXML() ) as clob ) into xmldoc from scott.dept d join scott.emp e on e.deptno = d.deptno group by d.deptno , d.dname ; dbms_xslprocessor.clob2file(xmldoc, 'TEST_DIR', 'test.xml'); END; /
The output file :
One of the most used Excel features is the Pivot Table generator. Creating such content is also possible directly from the database, using XSLT.
For instance, here’s some “raw” data :
SQL> select employee_id 2 , first_name 3 , last_name 4 , extract(year from hire_date) as hire_year 5 , job_id 6 from hr.employees 7 ; EMPLOYEE_ID FIRST_NAME LAST_NAME HIRE_YEAR JOB_ID ----------- -------------------- ------------------------- ---------- ---------- 198 Donald OConnell 2007 SH_CLERK 199 Douglas Grant 2008 SH_CLERK 200 Jennifer Whalen 2003 AD_ASST 201 Michael Hartstein 2004 MK_MAN 202 Pat Fay 2005 MK_REP 203 Susan Mavris 2002 HR_REP 204 Hermann Baer 2002 PR_REP 205 Shelley Higgins 2002 AC_MGR 206 William Gietz 2002 AC_ACCOUNT 100 Steven King 2003 AD_PRES 101 Neena Kochhar 2005 AD_VP 102 Lex De Haan 2001 AD_VP 103 Alexander Hunold 2006 IT_PROG ... 195 Vance Jones 2007 SH_CLERK 196 Alana Walsh 2006 SH_CLERK 197 Kevin Feeney 2006 SH_CLERK 107 rows selected
and we want to display, for a given job, the number of employees hired per year.
In SQL, that’s called a dynamic pivot but it’s not possible – with conventional methods – to produce such a result set out of a single SELECT statement (because the number of columns has to be known at parse time).
The PIVOT XML operator (11g) provides a partial answer to the problem by generating an XMLType containing aggregated “columns” (actually XML “elements”). The same functionality can be simulated in 10g too with XMLAgg and a partitioned outer join.
But with that method, we still have to build the pivot in SQL, in the database.
What I describe below let Excel do the job for us, through its standard pivot table functionality. We just have to generate a tab containing the raw data (hereafter named “DataSource”), and a tab (“PivotTable”) containing the minimum pivot table definition, i.e. no data and no cache.
The stylesheet (pivot.xsl) :
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"> <xsl:output method="xml" encoding="UTF-8"/> <xsl:param name="filename"/> <xsl:template match="/"> <xsl:processing-instruction name="mso-application">progid="Excel.Sheet"</xsl:processing-instruction> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"> <xsl:apply-templates/> </Workbook> </xsl:template> <xsl:template match="ROWSET"> <Worksheet ss:Name="DataSource"> <Table> <Row> <Cell><Data ss:Type="String">EMPLOYEE_ID</Data></Cell> <Cell><Data ss:Type="String">FIRST_NAME</Data></Cell> <Cell><Data ss:Type="String">LAST_NAME</Data></Cell> <Cell><Data ss:Type="String">HIRE_YEAR</Data></Cell> <Cell><Data ss:Type="String">JOB_ID</Data></Cell> </Row> <xsl:apply-templates/> </Table> </Worksheet> <Worksheet ss:Name="PivotTable"> <Table/> <x:PivotTable> <x:Name>My Pivot Table</x:Name> <x:ImmediateItemsOnDrop/> <x:ShowPageMultipleItemLabel/> <x:GrandTotalString>Total</x:GrandTotalString> <x:Location>R3C1:R4C2</x:Location> <x:PivotField> <x:Name>EMPLOYEE_ID</x:Name> <x:DataType>Integer</x:DataType> </x:PivotField> <x:PivotField> <x:Name>FIRST_NAME</x:Name> </x:PivotField> <x:PivotField> <x:Name>LAST_NAME</x:Name> </x:PivotField> <x:PivotField> <x:Name>HIRE_YEAR</x:Name> <x:Orientation>Column</x:Orientation> <x:AutoSortOrder>Ascending</x:AutoSortOrder> <x:Position>1</x:Position> <x:DataType>Integer</x:DataType> <x:PivotItem> <x:Name/> </x:PivotItem> </x:PivotField> <x:PivotField> <x:Name>JOB_ID</x:Name> <x:Orientation>Row</x:Orientation> <x:AutoSortOrder>Ascending</x:AutoSortOrder> <x:Position>1</x:Position> <x:PivotItem> <x:Name/> </x:PivotItem> </x:PivotField> <x:PivotField> <x:DataField/> <x:Name>Data</x:Name> <x:Orientation>Row</x:Orientation> <x:Position>-1</x:Position> </x:PivotField> <x:PivotField> <x:Name>Number of Employees</x:Name> <x:ParentField>EMPLOYEE_ID</x:ParentField> <x:Orientation>Data</x:Orientation> <x:Function>Count</x:Function> <x:Position>1</x:Position> </x:PivotField> <x:PTLineItems> <x:PTLineItem> <x:Item>0</x:Item> </x:PTLineItem> </x:PTLineItems> <x:PTLineItems> <x:Orientation>Column</x:Orientation> <x:PTLineItem> <x:Item>0</x:Item> </x:PTLineItem> </x:PTLineItems> <x:PTSource> <x:RefreshOnFileOpen/> <x:ConsolidationReference> <x:FileName>[<xsl:value-of select="$filename"/>]DataSource</x:FileName> <x:Reference>R1C1:R<xsl:value-of select="count(ROW)+1"/>C5</x:Reference> </x:ConsolidationReference> </x:PTSource> </x:PivotTable> </Worksheet> </xsl:template> <xsl:template match="ROW"> <Row> <Cell><Data ss:Type="Number"><xsl:value-of select="EMPLOYEE_ID"/></Data></Cell> <Cell><Data ss:Type="String"><xsl:value-of select="FIRST_NAME"/></Data></Cell> <Cell><Data ss:Type="String"><xsl:value-of select="LAST_NAME"/></Data></Cell> <Cell><Data ss:Type="Number"><xsl:value-of select="HIRE_YEAR"/></Data></Cell> <Cell><Data ss:Type="String"><xsl:value-of select="JOB_ID"/></Data></Cell> </Row> </xsl:template> <xsl:template name="PivotTable"> </xsl:template> </xsl:stylesheet>
The transformation code :
DECLARE res clob; v_filename varchar2(260) := 'test_pivot.xml'; BEGIN select xmlserialize(document xmltransform( xmlelement("ROWSET", xmlagg( xmlelement("ROW", xmlforest( employee_id , first_name , last_name , extract(year from hire_date) as hire_year , job_id ) ) ) ) , xdburitype('/office/excel/stylesheets/out/pivot.xsl').getXML() , 'filename="'''||v_filename||'''"' ) ) into res from hr.employees ; dbms_xslprocessor.clob2file(res, 'TEST_DIR', v_filename); END; /
The output file :
2. Reading a file
I’ll divide this section in two parts : querying and optimizing.
a) “One-shot” queries
Let’s say we want to read this document (saved as XML 2003 format) as if it were a relational table :
As usual, we’ll use an XMLType table to store the original file and then query from it.
Examples in the present article were tested on 11g XE (11.2.0.2) so storage is Binary XML by default :
create table tmp_xml of xmltype; insert into tmp_xml values( xmltype( bfilename('XML_DIR','test.xml') , nls_charset_id('AL32UTF8') ) );
The query involves two XMLTable() functions, the first one to break the document into separate worksheets, and the second to extract each row from them :
SQL> select x1.sheetname 2 , x2.id 3 , x2.comments 4 , x2.dt 5 from tmp_xml t 6 , xmltable( 7 xmlnamespaces( default 'urn:schemas-microsoft-com:office:spreadsheet' 8 , 'urn:schemas-microsoft-com:office:spreadsheet' as "ss" ) 9 , '/Workbook/Worksheet' 10 passing t.object_value 11 columns sheetname varchar2(31) path '@ss:Name' 12 , rowset xmltype path 'Table/Row' 13 ) x1 14 , xmltable( 15 xmlnamespaces(default 'urn:schemas-microsoft-com:office:spreadsheet') 16 , '/Row[position()>1]' 17 passing x1.rowset 18 columns id number path 'Cell[1]/Data' 19 , comments varchar2(2000) path 'Cell[2]/Data' 20 , dt timestamp path 'substring-before(Cell[3]/Data,".")' 21 ) x2 22 where x1.sheetname = 'MyData-1' 23 ; SHEETNAME ID COMMENTS DT --------------- ---------- -------------------------------------------------- ------------------------- MyData-1 1 This is a comment for line #1 09/02/12 12:09:37,000000 MyData-1 2 This is a comment for line #2 10/02/12 12:09:36,000000 MyData-1 3 This is a comment for line #3 11/02/12 12:09:36,000000 MyData-1 4 This is a comment for line #4 12/02/12 12:09:36,000000 MyData-1 5 This is a comment for line #5 13/02/12 12:09:36,000000 MyData-1 6 This is a comment for line #6 14/02/12 12:09:36,000000 MyData-1 7 This is a comment for line #7 15/02/12 12:09:36,000000 MyData-1 8 This is a comment for line #8 16/02/12 12:09:36,000000 MyData-1 9 This is a comment for line #9 17/02/12 12:09:36,000000 MyData-1 10 This is a comment for line #10 18/02/12 12:09:36,000000 MyData-1 11 This is a comment for line #11 19/02/12 12:09:36,000000 MyData-1 12 This is a comment for line #12 20/02/12 12:09:36,000000 MyData-1 13 This is a comment for line #13 21/02/12 12:09:36,000000 MyData-1 14 This is a comment for line #14 22/02/12 12:09:36,000000 MyData-1 15 This is a comment for line #15 23/02/12 12:09:36,000000 MyData-1 16 This is a comment for line #16 24/02/12 12:09:36,000000 16 rows selected
b) Optimized access of the document
If loading these documents in the database is a recurring task then, provided the structure doesn’t change, queries on the data can be optimized by creating a structured XML index on the XMLType table.
With such an index in place, and depending on the size of the document, there could be a significant overhead at insert time, but it’s a trade-off : subsequent queries will be considerably faster.
Here’s a small test case based on the following document (a 50,000-row worksheet, no header) :
Document properties (I’ll define a virtual column to hold the title property) :
Set up and query plan :
-- Table creation : create table ext_smldata of xmltype xmltype store as binary xml virtual columns ( title as ( XMLCast( XMLQuery( 'declare default element namespace "urn:schemas-microsoft-com:office:spreadsheet"; (::) declare namespace o = "urn:schemas-microsoft-com:office:office"; (::) /Workbook/o:DocumentProperties/o:Title' passing object_value returning content ) as varchar2(200) ) ) ); -- Index on the "TITLE" virtual column : create index ext_smldata_title_idx on ext_smldata (title); -- Structured XML index on the table : create index ext_smldata_sxi on ext_smldata (object_value) indextype is xdb.xmlindex parameters (q'# XMLTable ext_smldata_xtb XMLNamespaces (default 'urn:schemas-microsoft-com:office:spreadsheet') , '/Workbook/Worksheet/Table/Row' COLUMNS rec_id NUMBER PATH 'Cell[1]/Data/text()' , description VARCHAR2(80) PATH 'Cell[2]/Data/text()' , rec_value VARCHAR2(30) PATH 'Cell[3]/Data/text()' #'); -- Insert : insert into ext_smldata values( xmltype( bfilename('XML_DIR','smldata.xml') , nls_charset_id('AL32UTF8') ) );
SQL> set timing on SQL> set autotrace traceonly SQL> SELECT x.* 2 FROM ext_smldata t 3 , XMLTable( 4 XMLNamespaces (default 'urn:schemas-microsoft-com:office:spreadsheet') 5 , '/Workbook/Worksheet/Table/Row' 6 PASSING t.object_value 7 COLUMNS rec_id NUMBER PATH 'Cell[1]/Data/text()' 8 , description VARCHAR2(80) PATH 'Cell[2]/Data/text()' 9 , rec_value VARCHAR2(30) PATH 'Cell[3]/Data/text()' 10 ) x 11 WHERE t.title = 'SampleData1' 12 ; 50000 rows selected. Elapsed: 00:00:01.69 Execution Plan ---------------------------------------------------------- Plan hash value: 3987672269 ------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 59520 | 5405K| 174 (2)| 00:00:03 | |* 1 | HASH JOIN | | 59520 | 5405K| 174 (2)| 00:00:03 | | 2 | TABLE ACCESS BY INDEX ROWID| EXT_SMLDATA | 1 | 29 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | EXT_SMLDATA_TITLE_IDX | 1 | | 1 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL | EXT_SMLDATA_XTB | 59520 | 3720K| 171 (1)| 00:00:03 | ------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T"."SYS_NC_OID$"="SYS_SXI_0"."OID") 3 - access("T"."TITLE"='SampleData1') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3962 consistent gets 77 physical reads 4796 redo size 2823321 bytes sent via SQL*Net to client 37083 bytes received via SQL*Net from client 3335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 50000 rows processed
The explain plan shows that the underlying relational table supporting the XML index is used to retrieve the data.