Парсинг таблицы excel php

Время на прочтение
4 мин

Количество просмотров 56K

В первую очередь опишу проблему, которая заставила в тысячный раз вернуться к обсосанному со всех сторон вопросу: бестолковые менеджеры — без консультации с программистами — пообещали заказчику загрузку данных на сайт из xls(x).

Все бы ничего, но хостер заказчика дает 64мб памяти под выполнение и плевать он хотел на то, что у клиента Exсel файлы вообще без форматирования весят по 10-15мб, что при загрузке его PHPExel съедает (на тест-сервере) что-то около 500мб памяти.
Решение под катом с трудом дотянуло до 5мб.

Предусловия:
1. Имеется Exсel документ листов так в 10-20 с данными о товарах в интернет-каталоге. В каждом листе шапка — «название», «цена» и т.п. + воз доп. характеристик в 40 столбцов — и собственно данные в количестве «у-экселя-сантиметровый-скроллер»;
2. никакого CSV использовать нельзя. Все данные у заказчика уже в Exel и пересохранять их он не собирается… пообещали тут и все;
3. Spreadsheet_Excel_Writer откинут по причине неуниверсальности, хотя написано про него много хорошего. Жду комментариев по memory tests;
4. что удивительно, универсальных решений гугль не предложил. Неужели никто не сталкивался с такой проблемой на PHP *nix, удивился я.

Решение:
После перебора различных способов, вежливо предоставленных гуглом, решили почитать спецификации (эхх, учил меня отец…). Увидев там ключевые слова основан на Open XML и используется сжатие ZIP быстро позвонили заказчику и перевели разговор в русло xlsx и только: «Ну вы же понимаете! 21 век все-таки! Зачем нам хвататься за старое! Нужно одной ногой стоять в будущем!»

Далее алгоритм таков: принимаем файл, распаковываем его и внимательно смотрим на получившееся.
Полную инвентаризацию надо будет на досуге провести, а сейчас же нам наиболее интересно содержимое директории [xl], конкретно — /xl/worksheets/ и файл /xl/sharedStrings.xml.
В файле /xl/workbook.xml лежит описание листов, но т.к. задачи собрать названия листов не стояло, этот пункт я пропущу. При необходимости разобраться в нем не составит труда.

/xl/sharedStrings.xml

...
    <si>
        <t>Наименование</t>
    </si>
    <si>
        <t>Описание</t>
    </si>
    <si>
        <t>Изображение</t>
    </si>
    <si>
        <t>URL</t>
    </si>
    <si>
        <t>!Классификация</t>
    </si>
    <si>
        <t>!Бренд</t>
    </si>
    <si>
        <t>~1ф, 220-240 В, 50 Гц</t>
    </si>
...

и так далее в том же духе. Представляет собой текстовые данные в ячейках исходного документа. Со всех листов! Пока просто соберем эти данные в массив.

    $xml = simplexml_load_file(PATH . '/upload/xls_data/xl/sharedStrings.xml');
    $sharedStringsArr = array();
    foreach ($xml->children() as $item) {
        $sharedStringsArr[] = (string)$item->t;
    }

/xl/worksheets/
Это директория с файлами типа «sheet1.xml» с описанием данных листов. Конкретно в каждом файле нас интересует содержимое и его детей <row …>.

...
<sheetData>
...
<row r="1" spans="1:43" ht="48.75" customHeight="1" x14ac:dyDescent="0.2">
            <c r="A1" s="1" t="s">
                <v>0</v>
            </c>
            <c r="B1" s="1" t="s">
                <v>1</v>
            </c>
            <c r="C1" s="2" t="s">
                <v>2</v>
            </c>
            <c r="E2" s="12">
                <v>2</v>
            </c>
            <c r="F2" s="12"/>
           ....
</row>
<row r="2" spans="1:43" ht="13.5" customHeight="1" x14ac:dyDescent="0.2">
...
</sheetData>
...

Методом сопоставлений и экспериментов было выяснено, что атрибут [t=«s»] у ячейки (судя по всему type=string) является указанием на то, что значение берем из файла sharedStrings.xml. Указатель — значение — номер элемента из $sharedStringsArr. Если не указан — берем само значение за значение ячейки.

Собираем:

    $handle = @opendir(PATH . '/upload/xls_data/xl/worksheets');
    $out = array();
    while ($file = @readdir($handle)) {
        //проходим по всем файлам из директории /xl/worksheets/
        if ($file != "." && $file != ".." && $file != '_rels') {
            $xml = simplexml_load_file(PATH . '/upload/xls_data/xl/worksheets/' . $file);
            //по каждой строке
            $row = 0;
            foreach ($xml->sheetData->row as $item) {
                $out[$file][$row] = array();
                //по каждой ячейке строки
                $cell = 0;
                foreach ($item as $child) {
                    $attr = $child->attributes();
                    $value = isset($child->v)? (string)$child->v:false;
                    $out[$file][$row][$cell] = isset($attr['t']) ? $sharedStringsArr[$value] : $value;
                    $cell++;
                }
                $row++;
            }
        }
    }
    var_dump($out);

На выходе получаем многомерный массив, с которым уже можно свободно работать, а можно и сразу в базу лить данные — это личное дело каждого.

Напоследок скажу, что толком в спецификации xlsx не разбирался, а только выполнил поставленную задачу с конкретными xlsx документами. Куда-то ведь должны писаться формулы и изображения (t=«i»?). Когда столкнусь с такой задачей — непременно опишу, а пока представляю нетребовательный к системе алгоритм для сбора текстовых данных из xslx. Надеюсь, будет востребован, т.к. в поисках подобного не встречал.

P.S. Только расставляя метки наткнулся на Работа с большими файлами экселя. Хабрить надо было, а не гуглить — много бы времени сэкономил.

UPD:
Вот только что вот оказалось, что пустая ячейка может быть представлена как отсутствием параметра <v> в <c>, так и отсутсвием самого <c>. Необходимо сверять атрибут «r».

            <c r="A1" s="1" t="s"/>
            <c r="B1" s="1" t="s">
                <v>1</v>
            </c>
<!--тут пропущена ячейка С1-->
            <c r="D1" s="2" t="s">
                <v>2</v>
            </c>
            <c r="E1" s="12"/>

Исправлю по возможности.

PHP XLS Excel Parser

Probably, the fastest possible and the most efficient parser for XLS excel files for PHP!

Note: this parser works only with older XLS files that were used in Microsoft Excel 95 (BIFF5) and 97-2003 (BIFF8).
It will not work with the newer ones, XLSX!

  1. Requirements
  2. Basic usage
  3. Advanced usage
    • Sheet selection
    • Parsing modes
      1. Array mode
      2. Row-by-row mode
    • Debug mode
    • Temporary files and memory
  4. Additional information
    • Rows and columns numeration
    • Some terms
    • What happens when I open XLS file
  5. Public properties and methods
    • Properties
    • Methods (functions)
      1. General
      2. Memory free-ers
      3. Reading settings (mostly for Row-by-row mode)
      4. Constructor and destructor
  6. Error handling
  7. Security considerations
  8. Performance and memory
  9. More documentation

1. Requirements

At least PHP 5.6 32-bit is required. Untested with PHP versions prior to 5.6.
Works best with PHP 7.x 64-bit (faster, more memory efficient than PHP 5.6).

Also, this parser uses my PHP MSCFB Parser. Grab a copy of MSCFB.php if you don’t have one here: https://github.com/arti9m/PHP-MSCFB-Parser and put it in your PHP include directory or in the same directory where MSXLS.php is. MSCFB is «required-once» inside MSXLS, so there’s no need to include/require it manually.

2. Basic usage

  1. Download MSXLS.php from this repository and put it in your include directory or in the same directory where your script is.
  2. Make sure that MSCFB.php is in your include directory or in your script directory.
  3. Add the following line to the beginning of your PHP script (specify full path to MSXLS.php, if needed):
require_once 'MSXLS.php'; //MSCFB.php is 'required once' inside MSXLS.php
  1. Create an instance of MSXLS (open XLS file):
$excel = new MSXLS('path_to_file.xls');
  1. If no errors occured up to this point, you are ready to read the cells from your file. There are two ways you can do it: either read all cells at once into a two-dimensional array using Array mode (faster), or read the cells in Row-by-row mode, which is slower, but is more configurable, suitable for database upload and may use much less memory depending on usage scenario.
    In any case, it’s a good idea to check for errors before trying to read anything:
if($excel->error) die($excel->err_msg); //Terminate script execution, show error message.
  1. You can read all cells at once into a two-dimensional array:
$excel->read_everything(); //Read cells into $excel->cells

At this point all your cells data is contained inside $excel->cells array:

var_dump($excel->cells); //Output all parsed cells from XLS file
  1. Or you can read the cells row by row:
$excel->switch_to_row(); //switch to Row-by-row mode

while($row = $excel->read_next_row()){
  //You can process $row however you want here.
  //For example, you can upload a row into a database.
  $rows[] = $row; //For now, just store a parsed row inside $rows array.
}

Note: $excel->cells will be erased when $excel->switch_to_row() is executed, so make sure you save the contents of $excel->cells (if any) to some other variable before switching to Row-by-row mode. If you need to switch back to Array mode, use $excel->switch_to_array() method.

  1. If you need to perform some other memory-intensive tasks in the same script, it is a good idea to free some memory:
$excel->free(); //This is also called in the destructor
unset($excel);

3. Advanced usage

Note: every example in this section assumes that $excel is your MSXLS instance: $excel = new MSXLS('file.xls').


Sheet selection

If there is more than one worksheet in your file, and you want to parse the worksheet that is not the first valid non-empty worksheet, you will have to select your sheet manually. To do this, use $excel->get_valid_sheets() method to get an array with all available selectable worksheets. When the desired worksheet has been found, use its array index or ‘number’ entry as a parameter to $excel->select_sheet($sheet) method. For example:

var_dump($excel->get_valid_sheets()); //outputs selectable sheets info
$excel->select_sheet(1); //select sheet with index 1

Alternatively, if you know sheet name, you can use it with the same method to select sheet:

$excel->select_sheet('your_sheet_name'); //also works

Leave out sheet index/name to select the first available valid sheet:

$excel->select_sheet(); //selects the first valid non-empty sheet in XLS file

You can use $excel->get_active_sheet() method to get information about selected sheet.
Refer to Methods (functions) subsection to get more information about methods mentioned above.

Note: The first valid worksheet is selected automatically when the file is opened or when Parsing mode is changed.


Parsing modes

There are two modes which the parser can work in: Array mode and Row-by-row mode. By default, Array mode is used.

1. Array mode

This mode lets you read all cells at once into $excel->cells array property. It is designed to read all available data as fast as possible when no additional cells processing is needed. This mode is used by default. This mode can be selected with $excel->switch_to_array() method. Data is read with $excel->read_everything() method into $excel->cells array property. Example:

$excel->read_everything(); //Read cells into $excel->cells
var_dump($excel->cells); //Output all parsed cells from XLS file

When $excel->read_everything() is invoked for the first time for your file, a private structure called SST is built which contains all strings for all worksheets. It sits in memory until Parsing mode is changed or re-selected, or $excel->free() is called, or your MSXLS instance is destroyed. Therefore, it is rather memory-hungry mode if your file has a lot of unique strings. Non-unique strings are stored only once. Also, PHP is usually smart enough not to duplicate those strings in memory when a string is read into $excel->cells array from SST storage, or when you copy $excel->cells to some other variable.

In this mode, empty rows and cells are ignored. Boolean excel cells are parsed as true or false. If excel internally represents a whole number as float (which is often the case), it will be parsed as float type.

$excel->cells is a two-dimentional array. Its first dimension represents rows and its second dimension represents columns, both have zero-based numeration. See Rows and columns numeration for more information.

Note that all empty rows and cells will create ‘holes’ in $excel->cells array, because empty cells are simply skipped. It is advisable to use isset() function to determine whether the cell is empty or not.

Array mode has only one additional setting for parsing: $excel->set_fill_xl_errors($fill, $value), which defines whether or not to process excel cells with error values (such as division by zero). Please refer to Methods (functions) subsection for more information. In short, if $fill is false, error cells are skipped, otherwise they are filled with $value.

2. Row-by-row mode

This mode lets you read the cells row by row. It is designed to let you process each row individually while using as little memory as possible. This mode is selected with $excel->switch_to_row() method. Data is read with $excel->read_next_row() method, which returns a single row as an array of cells.

As the method name implies, row number is advanced automatically, so next time you call $excel->read_next_row(), it will read the next row. This method returns null if there are no more rows to read. You can manually set row number to read with $excel->set_active_row($row_number), where $row_number is a valid zero-based excel row number. You can get the first and the last valid row number with $excel->get_active_sheet() method:

$info = $excel->get_active_sheet(); //get selected sheet info
var_dump($info['first_row']); //displays first valid zero-based row index
var_dump($info['last_row']); //displays last valid zero-based row index
$excel->set_active_row($info['last_row']); //set active row to the last row of the sheet
$row = $excel->read_next_row(); //will read the last row of the sheet

Cell numeration in the returned row is zero-based. See Rows and columns numeration for more information.

When $excel->read_next_row() is invoked for the first time for your file, SST map will be built which is a structure that contains file stream offsets for every unique string in your excel file. It is similar to SST structure in Array mode, but SST contains the strings themselves, while SST map only contains addresses of those strings.

When $excel->read_next_row() is invoked for the first time for selected sheet, Rows map will be built. This structure contains file stream offsets for every excel row for currently selected worksheet.

Both of the structures mentioned above will be destroyed if Parsing mode is changed or re-selected, or if $excel->free() is called, or when your MSXLS instance is destroyed. Additionally, Rows map will be destroyed when $excel->select_sheet() is called, because Rows map is only valid for a selected sheet, unlike SST map, which is relevant for the whole file.

One advantage of Row-by-row mode is that it allowes many settings to be changed that affect which cells are proccessed and how. Please refer to Reading settings part of Methods (functions) subsection for more information.


Debug mode

Debug mode enables output (echo) of all error and warning messages. To enable Debug mode, set the 2nd parameter to true in the constructor:

$file = new MSCFB("path_to_cfb_file.bin", true); // Show errors and warnings

It is also possible to show errors from MSCFB helper class. To do this, set the 4th parameter to true in the constructor:

$file = new MSCFB("path_to_cfb_file.bin", true, null, true);

Warning! PHP function name in which error occured is displayed alongside the actual message. Do not enable Debug mode in your production code since it may pose a security risk! This warning applies both to MSXLS class and MSCFB class.


Temporary files and memory

If XLS file was saved as a Compound File (which is almost always the case), then MSXLS must use a temporary PHP stream resource to store Workbook stream that is extracted from the Compound File. It is stored either in memory or as a temporary file, depending on data size. By default, data that exceeds 2MiB (PHP’s default value) is stored as a temporary file. XLS file may sometimes be stored as a Workbook stream itself, in which case a temporary file or stream is not needed and not created.

You can control when a temporary file is used instead of memory by specifying the threshold in bytes as the 3rd parameter to the constructor. If Workbook stream size (in bytes) is less than this value, it will be stored in memory.

$excel = new MSXLS("path_to_file.xls", false, 1024); //data with size > 1KiB is stored in a temp file

You can instruct PHP not to use a temporary file (thus always storing Workbook stream in memory) by setting this parameter to zero:

$excel = new MSXLS("path_to_file.xls", false, 0); //temporary data is always stored in memory

Set this parameter to null to use default value:

$excel = new MSXLS("path_to_file.xls", false, null); //default temp file settings

Note: MSCFB helper class may also need to use a temporary stream resource. It will behave the same way as described above, and will also use that 3rd parameter as its memory limiter.

Note: temporary files are automatically managed (created and deleted) by PHP.

4. Additional information

Rows and columns numeration

Rows and columns numeration in this parser is zero-based. Excel row numeration is numeric and starts from 1, and column numeration is alphabetical and starts with A. Excel references a single cell by its column letter and row number, for example: A1, B3, C4, F9. If Array mode is used, cells are stored in $cells property, which is a two-dimensional array. The 1st index corresponds to row number, and the 2nd index is the column number. In Row-by-row mode, a single row is returned as an array of cells. If $row contains a row returned by read_next_row() method, Column A is $row[0], column D is $row[3], etc. In this mode, the user can get zero-based row number with last_read_row_number() method. The table below illustrates how the cells are numerated.

A B C D E F
1 $cells[0][0] $cells[0][1] $cells[0][2] $cells[0][3] $cells[0][4] $cells[0][5]
2 $cells[1][0] $cells[1][1] $cells[1][2] $cells[1][3] $cells[1][4] $cells[1][5]
3 $cells[2][0] $cells[2][1] $cells[2][2] $cells[2][3] $cells[2][4] $cells[2][5]
4 $cells[3][0] $cells[3][1] $cells[3][2] $cells[3][3] $cells[3][4] $cells[3][5]
5 $cells[4][0] $cells[4][1] $cells[4][2] $cells[4][3] $cells[4][4] $cells[4][5]
row $row[0] $row[1] $row[2] $row[3] $row[4] $row[5]

Some terms

A Compound File, or Microsoft Binary Compound File, is a special file format which is essentially a FAT-like container for other files.

Workbook stream, or just Workbook is a binary bytestream that essentially represents excel BIFF file.

Excel file format is known as BIFF, or Binary Interchangeable File Format. There are several versions exist which differ in how they store excel data from version to version. This parser supports BIFF version 5, or BIFF5, which is the file format used in Excel 95, and BIFF version 8 (BIFF8), which is used in Excel 97-2003 versions. The biggest difference between BIFF5 and BIFF8 is that they store strings differently. In BIFF5, strings are stored inside cells in locale-specific 8-bit codepage (for example, CP1252), while BIFF8 has a special structure called SST (Shared Strings Table), which stores unique strings inside itself in UTF16 little-endian encoding, and a reference to SST entry is stored in a cell.

Workbook stream consists of Workbook Globals substream and one or more Sheet substreams. Workbook Globals contains information about the file such as BIFF5 encoding, encryption, sheets information and much more (we do not actually need much more). Sheet substreams, or Sheets represent actual sheets that are created in Excel. They can be Worksheets, Charts, Visual Basic modules and some more, but only regular Worksheets can be parsed.

Excel keeps track of cells starting with first non-empty row and non-empty column, ending with last non-empty row and non-empty column. All other cells are completely ignored by this parser like they don’t exist at all.

What happens when I open XLS file

Note: during every stage extensive error checking is performed. See Error handling for more info.

When a user opens XLS file, for example by executing $excel = new MSXLS('file.xls'), first thing happens is the script checks whether XLS file is stored as a Compound File (most of the time it is) or as a Workbook stream. If it is a Compound File, the script attempts to extract Workbook stream to a temporary file and use that file in the future for all operations. Otherwise, it will directly use the supplied XLS file. The script never opens the supplied XLS file for writing.

After Workbook stream is accessed, the output encoding is set to mb_internal_encoding() return value. Then get_data() method is executed: the script extracts information such as sheet count, codepage, sheets byte offsets, etc.

After that, either the first non-empty worksheet will be selected and ready for parsing and all other sheets information will be available to the user, or some error will be created (for example, when no non-empty worksheet was found).

By default, Array parsing mode is active.

Attempts to invoke a Row-by-row-mode related method that is suitable for Array mode only (and vice versa) will create an error, disabling any further actions most of the time.

If no errors occured, it is now possible to select and setup parsing mode.

After a worksheet is parsed, you can select another worksheet for parsing (if any) with select_sheet() method. When you are finished parsing a file, it is a good idea to free memory manually, especially if something else is going on in your script later on. free() method and unset() function called one after another is the best way to do it.

5. Public properties and methods

Properties

(bool) $debug — whether or not to display error and warning messages. Set as the 2nd parameter to the constructor.

(string) $err_msg — a string that contains all error messages concatenated into one.

(string) $warn_msg — same as above, but for warnings.

(array) $error — array of error codes, empty if no errors occured.

(array) $warn — array of warning codes, empty if no warnings occured.

(array) $cells — two-dimensional array which is used as storage for cells parsed in Array mode. Filled when read_everything() is invoked. This propertry is made public (instead of using a getter) mainly for performance reasons.

Methods (functions)


1. General

get_data() — Checks XLS file for errors and encryption, gathers information such as CODEPAGE for BIFF5, SST location for BIFF8. Gathers information about all sheets in the file. Also executes select_sheet() to select first valid worksheet for parsing. This method is called automatically when XLS file is opened. Invoking it manually makes sence only if BIFF5 codepage was detected incorrectly and you cannot see sheet names (and you really need them). In this case, encoding settings must be configured with set_encodings() after file opening and get_data() should be called manually after it.


get_biff_ver() — returns version of excel file. 5 is BIFF5 (Excel 95 file), 8 is BIFF8 (Excel 97-2003 file).


get_codepage() — returns CODEPAGE string. Relevant only for BIFF5 files, in which strings are encoded using a specific codepage. In BIFF8, all strings are unicode (UTF-16 little endian).


get_sheets() — returns array of structures with sheets information. See the code below.

$excel = new MSXLS('file.xls');
$sheets = $excel->get_sheets(); //$sheets is array of sheet info structures
$sheet = reset($sheets); //$sheet now contains the first element of $sheets array

// Here is complete description of the sheet info structure:
$sheet['error'];         //[Boolean] Whether an error occured while collecting sheet information
$sheet['err_msg'];       //[String] Error messages, if any
$sheet['name'];          //[String] Sheet name
$sheet['hidden'];        //[Integer] 0: normal, 1: hidden, 2: very hidden (set via excel macro)
$sheet['type'];          //[String] Sheet type: Worksheet, Macro, Chart, VB module or Dialog
$sheet['BOF_offset'];    //[Integer] Sheet byte offset in Workbook stream of XLS file
$sheet['empty'];         //*[Boolean] Whether the worksheet is empty
$sheet['first_row'];     //*[Integer] First non-empty row number of the worksheet
$sheet['last_row'];      //*[Integer] Last non-empty row number of the worksheet
$sheet['first_col'];     //*[Integer] First non-empty column number of the worksheet
$sheet['last_col'];      //*[Integer] Last non-empty column number of the worksheet
$sheet['cells_offset'];  //*[Integer] Byte offset of the 1st cell record in Workbook stream

//Entries marked with * exist only for sheets of "Worksheet" type.

get_valid_sheets() — same as above, but returns only non-empty selectable worksheets. Additional $sheet['number'] entry is present, which is the same number as the index of this sheet in the array returned by get_sheets().


get_active_sheet() — returns currently selected sheet info in the same structure that get_valid_sheets() array consists of.


get_filename() — returns a file name string originally supplied to the constructor.


get_filesize() — returns size of the file supplied to the constructor (in bytes).


get_margins($which = 'all') — returns currently set margins for the selected worksheet. Margins are set automatically when the sheet is selected. Margins can be set manually with set_margins() method. They define what rows and columns are read by read_next_row() method.

$which can be set to ‘first_row’, ‘last_row’, ‘first_col’, or ‘last_col’ string, in which cases a corresponding value will be returned. $which also can be set to ‘all’ or left out, in which case an array of all four margins will be returned. If $which is set to something not mentioned above, false will be returned.


set_encodings($enable = true, $from = null, $to = null, $use_iconv = false) — manually set transcoding parameters for BIFF5 (Excel 95 file). This is usually not needed since the script detects these settings when the file is opened.

$enable parameter enables encoding conversion of BIFF5 strings.

$from is source encoding string, for example ‘CP1252’. Leaving it out or setting it to null resets this parameter to detected internal BIFF5 codepage.

$to is target encoding string, for example ‘UTF-8’. Leaving it out or setting it to null resets this parameter to the value returned by mb_internal_encoding() PHP function.

$use_iconv — If true, iconv() function will be used for convertion. Otherwise, mb_convert_encoding() will be used.


set_output_encoding($enc = null) — sets output encoding which excel strings should be decoded to.
$enc is target encoding string. If parameter set to null or left out, a value returned by mb_internal_encoding() function will be used.

Note: Setting $to parameter in set_encodings() and using set_output_encoding() do the same thing.
set_output_encoding() is provided for simplicity if BIFF8 files are used.


select_sheet($sheet = -1) — Select a worksheet to read data from.

$sheet must be either a sheet number or a sheet name. Use get_valid_sheets() to get those, if needed.
-1 or leaving out the parameter will select the first valid worksheet.


switch_to_row() — switch to Row-by-row parsing mode. Will also execute free(false) and select_sheet().


switch_to_array() — switch Array parsing mode. Will also execute free(false) and select_sheet().


read_everything() — read all cells from XLS file into $cells property. Works only in Array mode.


read_next_row() — parses next row and returns array of parsed cells. Works only in Row-by-row mode.


2. Memory free-ers

free_stream() — Close Workbook stream, free memory associated with it and delete temporary files.

free_cells() — re-initialize $cells array property (storage for Array mode).

free_sst() — re-initialize SST structure (Shared Strings Table, used by Array mode).

free_rows_map() — re-initialize rows map storage used by Row-by-row mode.

free_sst_maps() — re-initialize SST offsets map and SST lengths storage used by Row-by-row mode.

free_maps() — execute both free_row_map() and free_sst_maps().

free($stream = true) — free memory by executing all «free»-related methods mentioned above.
free_stream() is called only if $stream parameter evaluates to true.


3. Reading settings (mostly for Row-by-row mode)

set_fill_xl_errors($fill = false, $value = '#DEFAULT!') — setup how cells with excel errors are processed.

If $fill evaluates to true, cells will be parsed as $value. ‘#DEFAULT!’ value is special as it will expand to actual excel error value. For example, if a cell has a number divided by zero, it will be parsed as #DIV/0! string. If $value is set to some other value, error cells will be parsed directly as $value. If $fill evaluates to false, cells with errors will be treated as empty cells.

Note: this is the only setting that also works in Array mode.


set_margins($first_row = null, $last_row = null, $first_col = null, $last_col = null) — sets first row, last row, first column and last column that are parsed. If a parameter is null or left out, the corresponding margin is not changed. If a parameter is -1, the corresponding margin is set to the default value. The default values correspond to the first/last non-empty row/column in a worksheet.


set_active_row($row_number) — set which row to read next.
$row_number is zero-based excel row number and it must not be out of bounds set by set_margins() method.


last_read_row_number() — returns most recently parsed row number.
Valid only if called immediately after read_next_row().


next_row_number() — returns row number that is to be parsed upon next call of read_next_row().
Returns -1 if there is no more rows left to parse.


set_empty_value($value = null) — set $value as empty value, a value which is used to parse empty cells as.


use_empty_cols($set = false) — whether or not to parse empty columns to empty value.


use_empty_rows($set = false) — whether or not to parse empty rows.

Note: if empty columns parsing is disabled (it is disabled by default), read_next_row() will return -1 when an empty row is encountered. If empty columns parsing is enabled with use_empty_cols(true), it will return array of cells filled with empty value.


set_boolean_values($true = true, $false = false) — set values which excel boolean cells are parsed as. By default, TRUE cells are parsed as PHP true value, FALSE cells are parsed as PHP false value.


set_float_to_int($tf = false) — whether or not to parse excel cells with whole float numbers to integers. Often whole numbers are stored as float internally in XLS file, and by default they are parsed as floats. This setting allows to parse such numbers as integer type. Note: cells with numbers internally stored as integers are always parsed as integers.


4. Constructor and destructor

__construct($filename, $debug = false, $mem = null, $debug_MSCFB = false) — open file, extract Workbook stream (or use the file as Workbook stream), execute set_output_encoding() and get_data() methods.

$filename — path to XLS file.

$debug — if evaluates to true, enables Debug mode.

$mem — sets memory limit for temporary memory streams vs temporary files.

$debug_MSCFB — if evaluates to true, enables Debug mode in MSCFB helper class.


__destruct() — execute free() method, thus closing all opened streams, deleting temporary files and erasing big structures.


6. Error handling

Each time an error occures, the script places an error code into $error array property and appends an error message to $err_msg string property. If an error occures, it prevents execution of parts of the script that depend on successful execution of the part where the error occured. Warnings work similarly to errors except they do not prevent execution of other parts of the script, because they always occur in non-critical places. Warnings use $warn property to store warning codes and $warn_msg for warning texts.

If Debug mode is disabled, you should check if $error property evaluates to true, which would mean that $error array is not empty, i.e. has one or multiple error codes as its elements. Error handling example:

$excel = new MSXLS('nofile.xls'); //Try to open non-existing file

if($excel->error){
  var_dump(end($excel->error)); //Will output last error code
  var_dump($excel->err_msg); //Will output all errors texts
  die(); //Terminate script execution
}

if($excel->warn){
  var_dump(end($excel->warn)); //Will output last warning code
  var_dump($excel->warn_msg); //Will output all warnings texts
}

If Debug mode is enabled, errors and warnings are printed (echoed) to standart output automatically.

7. Security considerations

There are extensive error checks in every function that should prevent any potential problems no matter what file is supplied to the constructor. The only potential security risk can come from the Debug mode, which prints a function name in which an error or a warning has occured, but even then I do not see how such information can lead to problems with this particular class. It’s pretty safe to say that this code can be safely run in (automated) production of any kind. Same applies to MSCFB class.

8. Performance and memory

The MSXLS class has been optimized for fast parsing and data extraction, while still performing error checks for safety. It is possible to marginally increase performance by leaving those error checks out, but I would strongly advise against it, because if a specially crafted mallicious file is supplied, it becomes possible to cause a memory hog or an infinite loop.

The following numbers were obtained on a Windows machine (AMD Phenom II x4 940), with a 97.0 MiB test XLS file (96.2 MiB Workbook stream) using WAMP server. XLS file consists entirely of unique strings. Default temporary file settings is used.

Time Memory Time Memory Action
7.52s 1.0 MiB 3.48s 0.6 MiB Open XLS File (create MSXLS instance)
77.77s 213.2 MiB 16.41s 128.8 MiB Open XLS File and parse in Array mode
91.08s 192.2 MiB 27.20s 204.3 MiB Open file, parse in Row-by-row mode to variable
54.71s 82.9 MiB 21.49s 82.1 MiB Open file, parse in Row-by-row mode (don’t save)
5.6.25 32-bit 5.6.25 32-bit 7.0.10 64-bit 7.0.10 64-bit PHP Version

Note: Disabling temporary files does not decrease script execution time by any significant margin. In fact, the execution time is increased sometimes.

Note: It took 1.65 seconds and 12.0 MiB of memory to parse a real-life XLS pricelist of 13051 entries in Array mode in PHP 7.0.10. That XLS file was 3.45 MiB in size.

9. More documentation

All code in MSXLS.php file is heavily commented, feel free to take a look at it. To understand how XLS file is structured, please refer to MS documentation, or to OpenOffice.org’s Documentation of MS Compound File (also provided as a PDF file in this repository).

Получение данных из Excel-файла xlsx через php

Рассмотрим, как с помощью языка php получить данные из Excel-файла, который в формате xlsx.

Нам потребуется библиотека PHPExcel, скачать её можно тут.

Из скаченного архива нам понадобится только папка Classes. Копируем её в наш проект.

Создадим функцию, которая будет считывать переданный файл и возвращать нам его данные в массив

<?php
function parse_excel_file( $filename ){
	// путь к библиотеки от корня сайта
	require_once $_SERVER['DOCUMENT_ROOT'].'/PHPExcel/Classes/PHPExcel.php';
	$result = array();
	// получаем тип файла (xls, xlsx), чтобы правильно его обработать
	$file_type = PHPExcel_IOFactory::identify( $filename );
	// создаем объект для чтения
	$objReader = PHPExcel_IOFactory::createReader( $file_type );
	$objPHPExcel = $objReader->load( $filename ); // загружаем данные файла
	$result = $objPHPExcel->getActiveSheet()->toArray(); // выгружаем данные

	return $result;
}

Пример

Предположим, наш файл date.xlsx имеет путь site.ru/files/date.xlsx и содержит 3 столбца данных: имя, ключ, значение и 5 строк с заполненными данными. Тогда считывая файл нашей созданной функцией:

<?php
$res = parse_excel_file($_SERVER['DOCUMENT_ROOT'].'/flies/file.xlsx' );
print_r( $res );

Получаем:

Array
(
   [0] => Array
       (
          [0] => Название 1
          [1] => Ключ 1
          [2] => Значение 1
       )

   [1] => Array
       (
          [0] => Название 2
          [1] => Ключ 2
          [2] => Значение 2
       )
       ...
)

Требование к хостингу:

  • Версия PHP 5.2 или выше;
  • Включенное расширение PHP php_zip;
  • Включенное расширение PHP php_xml;
  • Включенное расширение PHP php_gd2.

Назад

Комментарии

Петр
25 февраля 2021, 23:17

Как вывести на страницу таблицу со всеми категориями, после того как спарсил эксель файл?

Михаил
20 января 2021, 00:21

Автору большая благодарность! Рабочий код!

Оставить комментарий

Содержание

  1. Парсинг файлов MS Excel с помощью PHP
  2. Парсинг файлов MS Excel с помощью PHP
  3. Комментарии
  4. Разбираем xlsx в PHP без готовых библиотек
  5. shuchkin/simplexlsx
  6. Sign In Required
  7. Launching GitHub Desktop
  8. Launching GitHub Desktop
  9. Launching Xcode
  10. Launching Visual Studio Code
  11. Latest commit
  12. Git stats
  13. Files
  14. README.md
  15. About
  16. Реализация быстрого импорта из Excel на PHP
  17. Что использовать в качестве инструмента?
  18. Наша боль, как разработчиков
  19. И тут нас отпустило.
  20. Полученные результаты производительности
  21. arti9m/PHP-XLS-Excel-Parser
  22. Sign In Required
  23. Launching GitHub Desktop
  24. Launching GitHub Desktop
  25. Launching Xcode
  26. Launching Visual Studio Code
  27. Latest commit
  28. Git stats
  29. Files
  30. README.md
  31. About

Парсинг файлов MS Excel с помощью PHP

Парсинг файлов MS Excel с помощью PHP

Наконец-то решена очередная задача: чтение таблиц формата MS Excel при помощи PHP. Ниже кратко расскажу, как это удалось сделать.

Для начала, после не слишком продолжительных поисков остановился на библиотеке PHPExcel (которую можно скачать здесь). Что это такое? Говоря кратко, это PHP-код, который необходимо разместить в той же папке, что и файлы сайта или системы чтения Excel-файлов, и индексный файл которой с наименованием «PHPExcel.php» перед чтением Excel-файлов необходимо подключить в соответствующем PHP-скрипте чем-то вроде include или require (а ещё лучше require_once).

Вызов конструктора чтения я предпочёл осуществить при помощи строки

$objPHPExcel = PHPExcel_IOFactory::load(«имя Excel-файла»);

согласно рекомендациям руководства как универсальный метод парсинга Excel-файла при отсутствии информации о том, в каком формате он будет загружен.

Однако почти сразу же наткнулся на очень острый подводный камень. Дело в том, что помимо формата XLS, формируемого старыми версиями MS Excel, существует ещё и XLSX (версия Excel 2007). Второй формат, в отличие от первого, требует для парсинга подключенной ZIP-библиотеки PHP. При отсутствии этой библиотеки разработчик видит на экране что-то типа «Fatal error: Uncaught exception «Exception» with message «ZipArchive library is not enabled». ».

Пришлось разбираться. В настоящее время использую пакет PHP-разработчика Denwer версии 5.2.12. Методом массированного коврового гуглопоиска выяснил, что для установки соответствующей ZIP-бибилотеки необходимо скачать отсюда набор библиотек в виде exe-файла, после чего запустить экзешник с целью установки библиотек. Предварительно, разумеется, не забываем остановить Денвер (запуском утилиты с адресом «папка с Денвером/denwer/Stop.exe»): если он запущен, набор библиотек не установится. Кроме того, у меня установщик библиотек при автоматическом поиска папки с Денвером вначале обнаружил последний на созданном Денвером виртуальном диске, запросил разрешения установиться туда, и естественно, получил от меня отказ. А второй найденной локацией оказался уже физический диск с Денвером. После скачивания и установки библиотек нужные следовало активировать, в моём случае это была всего одна библиотека PHP_ZIP, для активации которой нужно было залезть по адресу «папка с Денвером/usr/local/php5/», найти в ней конфигурационный файл под названием «php.ini», в нём отыскать строку «extension=php_zip.dll», раскомментировать её (для чего удалить точку с запятой в начале строки) и сохранить конфигурационный файл.

Ну и в результате всех этих манипуляций в награду после запуска Денвера и соответствующего скрипта чтения Excel-файлов я увидел на экране массив данных. Но это уже другая история.

Комментарии

14.03.2014 13:26 Комментирует Мунира:

26.02.2015 22:00 Комментирует Леха:

PHP_ZIP есть уже в последней сборке.А проверка и загрузка делается:$inputFileType = PHPExcel_IOFactory::identify($filepath); // узнаем тип файла $objReader = PHPExcel_IOFactory::createReader($inputFileType); // создаем объект для чтения файла $objPHPExcel = $objReader->load($filepath); // загружаем данные файла в объект $array = $objPHPExcel->getActiveSheet()->toArray(); // выгружаем данные из объекта в массив

20.01.2016 16:41 Комментирует HighMan:

Лучше было бы выложить PHP скрипт, который читает файл, загружает массив, выводит загруженный массив на страницу.А в пояснении указать как работать с XLSX.

Источник

Разбираем xlsx в PHP без готовых библиотек

В первую очередь опишу проблему, которая заставила в тысячный раз вернуться к обсосанному со всех сторон вопросу: бестолковые менеджеры — без консультации с программистами — пообещали заказчику загрузку данных на сайт из xls(x).

Все бы ничего, но хостер заказчика дает 64мб памяти под выполнение и плевать он хотел на то, что у клиента Exсel файлы вообще без форматирования весят по 10-15мб, что при загрузке его PHPExel съедает (на тест-сервере) что-то около 500мб памяти.
Решение под катом с трудом дотянуло до 5мб.

Предусловия:
1. Имеется Exсel документ листов так в 10-20 с данными о товарах в интернет-каталоге. В каждом листе шапка — «название», «цена» и т.п. + воз доп. характеристик в 40 столбцов — и собственно данные в количестве «у-экселя-сантиметровый-скроллер»;
2. никакого CSV использовать нельзя. Все данные у заказчика уже в Exel и пересохранять их он не собирается… пообещали тут и все;
3. Spreadsheet_Excel_Writer откинут по причине неуниверсальности, хотя написано про него много хорошего. Жду комментариев по memory tests;
4. что удивительно, универсальных решений гугль не предложил. Неужели никто не сталкивался с такой проблемой на PHP *nix, удивился я.

Решение:
После перебора различных способов, вежливо предоставленных гуглом, решили почитать спецификации (эхх, учил меня отец. ). Увидев там ключевые слова основан на Open XML и используется сжатие ZIP быстро позвонили заказчику и перевели разговор в русло xlsx и только: «Ну вы же понимаете! 21 век все-таки! Зачем нам хвататься за старое! Нужно одной ногой стоять в будущем!»

Далее алгоритм таков: принимаем файл, распаковываем его и внимательно смотрим на получившееся.
Полную инвентаризацию надо будет на досуге провести, а сейчас же нам наиболее интересно содержимое директории [xl], конкретно — /xl/worksheets/ и файл /xl/sharedStrings.xml.
В файле /xl/workbook.xml лежит описание листов, но т.к. задачи собрать названия листов не стояло, этот пункт я пропущу. При необходимости разобраться в нем не составит труда.

и так далее в том же духе. Представляет собой текстовые данные в ячейках исходного документа. Со всех листов! Пока просто соберем эти данные в массив.

/xl/worksheets/
Это директория с файлами типа «sheet1.xml» с описанием данных листов. Конкретно в каждом файле нас интересует содержимое и его детей .

Методом сопоставлений и экспериментов было выяснено, что атрибут [t=«s»] у ячейки (судя по всему type=string) является указанием на то, что значение берем из файла sharedStrings.xml. Указатель — значение — номер элемента из $sharedStringsArr. Если не указан — берем само значение за значение ячейки.

На выходе получаем многомерный массив, с которым уже можно свободно работать, а можно и сразу в базу лить данные — это личное дело каждого.

Напоследок скажу, что толком в спецификации xlsx не разбирался, а только выполнил поставленную задачу с конкретными xlsx документами. Куда-то ведь должны писаться формулы и изображения (t=«i»?). Когда столкнусь с такой задачей — непременно опишу, а пока представляю нетребовательный к системе алгоритм для сбора текстовых данных из xslx. Надеюсь, будет востребован, т.к. в поисках подобного не встречал.

P.S. Только расставляя метки наткнулся на Работа с большими файлами экселя. Хабрить надо было, а не гуглить — много бы времени сэкономил.

UPD:
Вот только что вот оказалось, что пустая ячейка может быть представлена как отсутствием параметра в , так и отсутсвием самого . Необходимо сверять атрибут «r».

Источник

shuchkin/simplexlsx

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

SimpleXLSX class (Official)

Parse and retrieve data from Excel XLSx files. MS Excel 2007 workbooks PHP reader. No addiditional extensions need (internal unzip + standart SimpleXML parser).

See also:
SimpleXLS old format MS Excel 97 php reader.
SimpleXLSXGen xlsx php writer.

Hey, bro, please ★ the package for my motivation 🙂 and donate for more motivation!

The recommended way to install this library is through Composer. New to Composer?

This will install the latest supported version:

or download PHP 5.5+ class here

XLSX to html table

rows() as $r ) < echo ‘ ‘.implode(‘ ‘, $r ).’

‘; > echo ‘

‘; > else < echo SimpleXLSX::parseError(); >«>

or styled html table

XLSX read huge file, xlsx to csv

XLSX get sheet names and sheet indexes

Using rowsEx() to extract cell info

type cell type
name cell name (A1, B11)
value cell value (1233, 1233.34, 2022-02-21 00:00:00, String)
f formula
s style index, use $xlsx->cellFormats[ $index ] to get style
css generated cell CSS
r row index
hidden hidden row or column
width width in custom units
height height in points (pt, 1/72 in)

Get sheet by index

XLSX::parse remote data

XLSX::parse memory data

Rows with header values as keys

Classic OOP style

code message comment
1 File not found Where file? UFO?
2 Unknown archive format ZIP?
3 XML-entry parser error bad XML
4 XML-entry not found bad ZIP archive
5 Entry not found File not found in ZIP archive
6 Worksheet not found Not exists

About

Parse and retrieve data from Excel XLSx files

Источник

Реализация быстрого импорта из Excel на PHP

Что использовать в качестве инструмента?

В качестве базового стандарта, используемого при импорте адресных баз, мы взяли Microsoft Excel. Объясняется это просто:

  • это стандартный инструмент, которым на базовом уровне владеют 100% пользователей компьютеров. Более того, в бизнесе — это де-факто корпоративный стандарт и используется даже, если на рабочих компьютерах Mac или Linux.
  • Практически все CRM-, CMS-, облачные или десктопные системы имеют экспорт в Excel или CSV, который простым пересохранением приводится к формату XLS или XLSX.
  • Известно также, что “90% ошибок ПО сидит в полуметре от монитора”. Не в обиду будет сказано рядовым пользователям, но мы должны учитывать самый базовый уровень подготовки и тех. поддержке для объяснения достаточно сказать “Загрузите Excel-файл”, а не объяснять процедуру подготовки файла в нужном формате.

Проблему пользователей при импорте адресных баз сняли. Но тут возникает уже проблема непосредственно разработки.

Наша боль, как разработчиков

Excel — это не open-source разработка, а проприетарное решение. Формат данных, особенно в новых версиях после 2007 года (xlsx), нетривиален. На Печкине используется PHP, поэтому мы начали поиск библиотек, которые позволят нам решить данную задачу. Но тут столкнулись с проблемой, что целый ряд библиотек, не позволяют читать xlsx:

  • php-spreadsheetreader reads a variety of formats (.xls, .ods AND .csv)
  • PHP-ExcelReader (xls only)
  • PHP_Excel_Reader (xls only)
  • PHP_Excel_Reader2 (xls only)
  • XLS File Reader Коммерческая и xls only
  • SimpleXLSX Из описания способен читать xlsx, однако, автор ссылается только на xls
  • PHP Excel Explorer Коммерческая и xls only

Обратила на себя наше внимание библиотека PHPExcel. Ее мы использовали еще несколько лет назад в сервисе sms-рассылок SMS24X7.ru. Петя Соколов (Petr_Sokolov), наш талантливый разработчик, написал обертку для этой библиотеки, исправляющую ряд ее недостатков и багов.

Библиотека, безусловно, интересная и развитая. Но для Печкина ее использовать стало невозможно уже через пару лет, когда выросли и мы и наши клиенты — ее катастрофическая требовательность к ресурсам и огромное время парсинга файлов. Например, нередки случаи загрузки на сервис адресных баз > 100 000 строк со сложной структурой. А если файл уже 500 000 строк и “весит” больше 30Мб?

И тут нас отпустило.

В процессе поисков мы наткнулись на коммерческую библиотеку libxl, увидев результаты “кустарного benchmark” на Stackoverflow.

Библиотека написана на C++, а благодаря великолепному объектно-ориентированному расширению для PHP от Ilia Alshanetsky, легка в освоении и интеграции (например, переписать наше текущее решение с PHPExcel на LibXL заняло около 3 часов). Что очень классно, учитывая, что, к сожалению, документации от разработчика расширения нет и необходимо пользоваться расширением Reflection.

Процесс установки очень прост.

В результате компиляции вы получите файл excel.so в папке /usr/lib/php5/20090626/. Теперь достаточно создать файл /etc/php5/conf.d/excel.ini с содержимым.

Проверим установился ли модуль и перезагрузим веб-сервер.

В коде все тоже очень просто. Подгружаете файл и читаете необходимые ячейки. Например, вот так:

Полученные результаты производительности

Отсутствие потребности в оперативной памяти (в процессе загрузки файла и его чтения) приятно порадовало.

А вот и прирост скорости загрузки excel-файла и его чтения на различных размерах адресных баз.

Данные тесты проводились на xlsx-файлах с N подписчиками в один стоблец с email. Реальные же адресные базы еще больше и сложнее и преимущество в скорости и потреблении памяти выглядит еще значительнее.

Стоимость библиотеки 199$ за девелоперскую лицензию, но, поверьте, это того стоит. Безусловно рекомендуем всем, кто сталкивается с проблемой импорта Excel-файлов на свой сервис.

Источник

arti9m/PHP-XLS-Excel-Parser

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

PHP XLS Excel Parser

Probably, the fastest possible and the most efficient parser for XLS excel files for PHP!

Note: this parser works only with older XLS files that were used in Microsoft Excel 95 (BIFF5) and 97-2003 (BIFF8).
It will not work with the newer ones, XLSX!

At least PHP 5.6 32-bit is required. Untested with PHP versions prior to 5.6.
Works best with PHP 7.x 64-bit (faster, more memory efficient than PHP 5.6).

Also, this parser uses my PHP MSCFB Parser. Grab a copy of MSCFB.php if you don’t have one here: https://github.com/arti9m/PHP-MSCFB-Parser and put it in your PHP include directory or in the same directory where MSXLS.php is. MSCFB is «required-once» inside MSXLS, so there’s no need to include/require it manually.

  1. Download MSXLS.php from this repository and put it in your include directory or in the same directory where your script is.
  2. Make sure that MSCFB.php is in your include directory or in your script directory.
  3. Add the following line to the beginning of your PHP script (specify full path to MSXLS.php, if needed):
  1. Create an instance of MSXLS (open XLS file):
  1. If no errors occured up to this point, you are ready to read the cells from your file. There are two ways you can do it: either read all cells at once into a two-dimensional array using Array mode (faster), or read the cells in Row-by-row mode, which is slower, but is more configurable, suitable for database upload and may use much less memory depending on usage scenario.
    In any case, it’s a good idea to check for errors before trying to read anything:
  1. You can read all cells at once into a two-dimensional array:

At this point all your cells data is contained inside $excel->cells array:

  1. Or you can read the cells row by row:

Note: $excel->cells will be erased when $excel->switch_to_row() is executed, so make sure you save the contents of $excel->cells (if any) to some other variable before switching to Row-by-row mode. If you need to switch back to Array mode, use $excel->switch_to_array() method.

  1. If you need to perform some other memory-intensive tasks in the same script, it is a good idea to free some memory:

3. Advanced usage

Note: every example in this section assumes that $excel is your MSXLS instance: $excel = new MSXLS(‘file.xls’) .

If there is more than one worksheet in your file, and you want to parse the worksheet that is not the first valid non-empty worksheet, you will have to select your sheet manually. To do this, use $excel->get_valid_sheets() method to get an array with all available selectable worksheets. When the desired worksheet has been found, use its array index or ‘number’ entry as a parameter to $excel->select_sheet($sheet) method. For example:

Alternatively, if you know sheet name, you can use it with the same method to select sheet:

Leave out sheet index/name to select the first available valid sheet:

You can use $excel->get_active_sheet() method to get information about selected sheet.
Refer to Methods (functions) subsection to get more information about methods mentioned above.

Note: The first valid worksheet is selected automatically when the file is opened or when Parsing mode is changed.

There are two modes which the parser can work in: Array mode and Row-by-row mode. By default, Array mode is used.

This mode lets you read all cells at once into $excel->cells array property. It is designed to read all available data as fast as possible when no additional cells processing is needed. This mode is used by default. This mode can be selected with $excel->switch_to_array() method. Data is read with $excel->read_everything() method into $excel->cells array property. Example:

When $excel->read_everything() is invoked for the first time for your file, a private structure called SST is built which contains all strings for all worksheets. It sits in memory until Parsing mode is changed or re-selected, or $excel->free() is called, or your MSXLS instance is destroyed. Therefore, it is rather memory-hungry mode if your file has a lot of unique strings. Non-unique strings are stored only once. Also, PHP is usually smart enough not to duplicate those strings in memory when a string is read into $excel->cells array from SST storage, or when you copy $excel->cells to some other variable.

In this mode, empty rows and cells are ignored. Boolean excel cells are parsed as true or false. If excel internally represents a whole number as float (which is often the case), it will be parsed as float type.

$excel->cells is a two-dimentional array. Its first dimension represents rows and its second dimension represents columns, both have zero-based numeration. See Rows and columns numeration for more information.

Note that all empty rows and cells will create ‘holes’ in $excel->cells array, because empty cells are simply skipped. It is advisable to use isset() function to determine whether the cell is empty or not.

Array mode has only one additional setting for parsing: $excel->set_fill_xl_errors($fill, $value) , which defines whether or not to process excel cells with error values (such as division by zero). Please refer to Methods (functions) subsection for more information. In short, if $fill is false , error cells are skipped, otherwise they are filled with $value .

2. Row-by-row mode

This mode lets you read the cells row by row. It is designed to let you process each row individually while using as little memory as possible. This mode is selected with $excel->switch_to_row() method. Data is read with $excel->read_next_row() method, which returns a single row as an array of cells.

As the method name implies, row number is advanced automatically, so next time you call $excel->read_next_row() , it will read the next row. This method returns null if there are no more rows to read. You can manually set row number to read with $excel->set_active_row($row_number) , where $row_number is a valid zero-based excel row number. You can get the first and the last valid row number with $excel->get_active_sheet() method:

Cell numeration in the returned row is zero-based. See Rows and columns numeration for more information.

When $excel->read_next_row() is invoked for the first time for your file, SST map will be built which is a structure that contains file stream offsets for every unique string in your excel file. It is similar to SST structure in Array mode, but SST contains the strings themselves, while SST map only contains addresses of those strings.

When $excel->read_next_row() is invoked for the first time for selected sheet, Rows map will be built. This structure contains file stream offsets for every excel row for currently selected worksheet.

Both of the structures mentioned above will be destroyed if Parsing mode is changed or re-selected, or if $excel->free() is called, or when your MSXLS instance is destroyed. Additionally, Rows map will be destroyed when $excel->select_sheet() is called, because Rows map is only valid for a selected sheet, unlike SST map, which is relevant for the whole file.

One advantage of Row-by-row mode is that it allowes many settings to be changed that affect which cells are proccessed and how. Please refer to Reading settings part of Methods (functions) subsection for more information.

Debug mode enables output (echo) of all error and warning messages. To enable Debug mode, set the 2nd parameter to true in the constructor:

It is also possible to show errors from MSCFB helper class. To do this, set the 4th parameter to true in the constructor:

Warning! PHP function name in which error occured is displayed alongside the actual message. Do not enable Debug mode in your production code since it may pose a security risk! This warning applies both to MSXLS class and MSCFB class.

Temporary files and memory

If XLS file was saved as a Compound File (which is almost always the case), then MSXLS must use a temporary PHP stream resource to store Workbook stream that is extracted from the Compound File. It is stored either in memory or as a temporary file, depending on data size. By default, data that exceeds 2MiB (PHP’s default value) is stored as a temporary file. XLS file may sometimes be stored as a Workbook stream itself, in which case a temporary file or stream is not needed and not created.

You can control when a temporary file is used instead of memory by specifying the threshold in bytes as the 3rd parameter to the constructor. If Workbook stream size (in bytes) is less than this value, it will be stored in memory.

You can instruct PHP not to use a temporary file (thus always storing Workbook stream in memory) by setting this parameter to zero:

Set this parameter to null to use default value:

Note: MSCFB helper class may also need to use a temporary stream resource. It will behave the same way as described above, and will also use that 3rd parameter as its memory limiter.

Note: temporary files are automatically managed (created and deleted) by PHP.

4. Additional information

Rows and columns numeration

Rows and columns numeration in this parser is zero-based. Excel row numeration is numeric and starts from 1, and column numeration is alphabetical and starts with A. Excel references a single cell by its column letter and row number, for example: A1, B3, C4, F9. If Array mode is used, cells are stored in $cells property, which is a two-dimensional array. The 1st index corresponds to row number, and the 2nd index is the column number. In Row-by-row mode, a single row is returned as an array of cells. If $row contains a row returned by read_next_row() method, Column A is $row[0] , column D is $row[3] , etc. In this mode, the user can get zero-based row number with last_read_row_number() method. The table below illustrates how the cells are numerated.

A B C D E F
1 $cells[0][0] $cells[0][1] $cells[0][2] $cells[0][3] $cells[0][4] $cells[0][5]
2 $cells[1][0] $cells[1][1] $cells[1][2] $cells[1][3] $cells[1][4] $cells[1][5]
3 $cells[2][0] $cells[2][1] $cells[2][2] $cells[2][3] $cells[2][4] $cells[2][5]
4 $cells[3][0] $cells[3][1] $cells[3][2] $cells[3][3] $cells[3][4] $cells[3][5]
5 $cells[4][0] $cells[4][1] $cells[4][2] $cells[4][3] $cells[4][4] $cells[4][5]
.
row $row[0] $row[1] $row[2] $row[3] $row[4] $row[5]

A Compound File, or Microsoft Binary Compound File, is a special file format which is essentially a FAT-like container for other files.

Workbook stream, or just Workbook is a binary bytestream that essentially represents excel BIFF file.

Excel file format is known as BIFF, or Binary Interchangeable File Format. There are several versions exist which differ in how they store excel data from version to version. This parser supports BIFF version 5, or BIFF5, which is the file format used in Excel 95, and BIFF version 8 (BIFF8), which is used in Excel 97-2003 versions. The biggest difference between BIFF5 and BIFF8 is that they store strings differently. In BIFF5, strings are stored inside cells in locale-specific 8-bit codepage (for example, CP1252), while BIFF8 has a special structure called SST (Shared Strings Table), which stores unique strings inside itself in UTF16 little-endian encoding, and a reference to SST entry is stored in a cell.

Workbook stream consists of Workbook Globals substream and one or more Sheet substreams. Workbook Globals contains information about the file such as BIFF5 encoding, encryption, sheets information and much more (we do not actually need much more). Sheet substreams, or Sheets represent actual sheets that are created in Excel. They can be Worksheets, Charts, Visual Basic modules and some more, but only regular Worksheets can be parsed.

Excel keeps track of cells starting with first non-empty row and non-empty column, ending with last non-empty row and non-empty column. All other cells are completely ignored by this parser like they don’t exist at all.

What happens when I open XLS file

Note: during every stage extensive error checking is performed. See Error handling for more info.

When a user opens XLS file, for example by executing $excel = new MSXLS(‘file.xls’) , first thing happens is the script checks whether XLS file is stored as a Compound File (most of the time it is) or as a Workbook stream. If it is a Compound File, the script attempts to extract Workbook stream to a temporary file and use that file in the future for all operations. Otherwise, it will directly use the supplied XLS file. The script never opens the supplied XLS file for writing.

After Workbook stream is accessed, the output encoding is set to mb_internal_encoding() return value. Then get_data() method is executed: the script extracts information such as sheet count, codepage, sheets byte offsets, etc.

After that, either the first non-empty worksheet will be selected and ready for parsing and all other sheets information will be available to the user, or some error will be created (for example, when no non-empty worksheet was found).

By default, Array parsing mode is active.

Attempts to invoke a Row-by-row-mode related method that is suitable for Array mode only (and vice versa) will create an error, disabling any further actions most of the time.

If no errors occured, it is now possible to select and setup parsing mode.

After a worksheet is parsed, you can select another worksheet for parsing (if any) with select_sheet() method. When you are finished parsing a file, it is a good idea to free memory manually, especially if something else is going on in your script later on. free() method and unset() function called one after another is the best way to do it.

5. Public properties and methods

(bool) $debug — whether or not to display error and warning messages. Set as the 2nd parameter to the constructor.

(string) $err_msg — a string that contains all error messages concatenated into one.

(string) $warn_msg — same as above, but for warnings.

(array) $error — array of error codes, empty if no errors occured.

(array) $warn — array of warning codes, empty if no warnings occured.

(array) $cells — two-dimensional array which is used as storage for cells parsed in Array mode. Filled when read_everything() is invoked. This propertry is made public (instead of using a getter) mainly for performance reasons.

get_data() — Checks XLS file for errors and encryption, gathers information such as CODEPAGE for BIFF5, SST location for BIFF8. Gathers information about all sheets in the file. Also executes select_sheet() to select first valid worksheet for parsing. This method is called automatically when XLS file is opened. Invoking it manually makes sence only if BIFF5 codepage was detected incorrectly and you cannot see sheet names (and you really need them). In this case, encoding settings must be configured with set_encodings() after file opening and get_data() should be called manually after it.

get_biff_ver() — returns version of excel file. 5 is BIFF5 (Excel 95 file), 8 is BIFF8 (Excel 97-2003 file).

get_codepage() — returns CODEPAGE string. Relevant only for BIFF5 files, in which strings are encoded using a specific codepage. In BIFF8, all strings are unicode (UTF-16 little endian).

get_sheets() — returns array of structures with sheets information. See the code below.

get_valid_sheets() — same as above, but returns only non-empty selectable worksheets. Additional $sheet[‘number’] entry is present, which is the same number as the index of this sheet in the array returned by get_sheets() .

get_active_sheet() — returns currently selected sheet info in the same structure that get_valid_sheets() array consists of.

get_filename() — returns a file name string originally supplied to the constructor.

get_filesize() — returns size of the file supplied to the constructor (in bytes).

get_margins($which = ‘all’) — returns currently set margins for the selected worksheet. Margins are set automatically when the sheet is selected. Margins can be set manually with set_margins() method. They define what rows and columns are read by read_next_row() method.

$which can be set to ‘first_row’, ‘last_row’, ‘first_col’, or ‘last_col’ string, in which cases a corresponding value will be returned. $which also can be set to ‘all’ or left out, in which case an array of all four margins will be returned. If $which is set to something not mentioned above, false will be returned.

set_encodings($enable = true, $from = null, $to = null, $use_iconv = false) — manually set transcoding parameters for BIFF5 (Excel 95 file). This is usually not needed since the script detects these settings when the file is opened.

$enable parameter enables encoding conversion of BIFF5 strings.

$from is source encoding string, for example ‘CP1252’. Leaving it out or setting it to null resets this parameter to detected internal BIFF5 codepage.

$to is target encoding string, for example ‘UTF-8’. Leaving it out or setting it to null resets this parameter to the value returned by mb_internal_encoding() PHP function.

$use_iconv — If true, iconv() function will be used for convertion. Otherwise, mb_convert_encoding() will be used.

set_output_encoding($enc = null) — sets output encoding which excel strings should be decoded to.
$enc is target encoding string. If parameter set to null or left out, a value returned by mb_internal_encoding() function will be used.

Note: Setting $to parameter in set_encodings() and using set_output_encoding() do the same thing.
set_output_encoding() is provided for simplicity if BIFF8 files are used.

select_sheet($sheet = -1) — Select a worksheet to read data from.

$sheet must be either a sheet number or a sheet name. Use get_valid_sheets() to get those, if needed.
-1 or leaving out the parameter will select the first valid worksheet.

switch_to_row() — switch to Row-by-row parsing mode. Will also execute free(false) and select_sheet() .

switch_to_array() — switch Array parsing mode. Will also execute free(false) and select_sheet() .

read_everything() — read all cells from XLS file into $cells property. Works only in Array mode.

read_next_row() — parses next row and returns array of parsed cells. Works only in Row-by-row mode.

2. Memory free-ers

free_stream() — Close Workbook stream, free memory associated with it and delete temporary files.

free_cells() — re-initialize $cells array property (storage for Array mode).

free_sst() — re-initialize SST structure (Shared Strings Table, used by Array mode).

free_rows_map() — re-initialize rows map storage used by Row-by-row mode.

free_sst_maps() — re-initialize SST offsets map and SST lengths storage used by Row-by-row mode.

free_maps() — execute both free_row_map() and free_sst_maps() .

free($stream = true) — free memory by executing all «free»-related methods mentioned above.
free_stream() is called only if $stream parameter evaluates to true.

3. Reading settings (mostly for Row-by-row mode)

set_fill_xl_errors($fill = false, $value = ‘#DEFAULT!’) — setup how cells with excel errors are processed.

If $fill evaluates to true, cells will be parsed as $value . ‘#DEFAULT!’ value is special as it will expand to actual excel error value. For example, if a cell has a number divided by zero, it will be parsed as #DIV/0! string. If $value is set to some other value, error cells will be parsed directly as $value . If $fill evaluates to false, cells with errors will be treated as empty cells.

Note: this is the only setting that also works in Array mode.

set_margins($first_row = null, $last_row = null, $first_col = null, $last_col = null) — sets first row, last row, first column and last column that are parsed. If a parameter is null or left out, the corresponding margin is not changed. If a parameter is -1, the corresponding margin is set to the default value. The default values correspond to the first/last non-empty row/column in a worksheet.

set_active_row($row_number) — set which row to read next.
$row_number is zero-based excel row number and it must not be out of bounds set by set_margins() method.

last_read_row_number() — returns most recently parsed row number.
Valid only if called immediately after read_next_row().

next_row_number() — returns row number that is to be parsed upon next call of read_next_row().
Returns -1 if there is no more rows left to parse.

set_empty_value($value = null) — set $value as empty value, a value which is used to parse empty cells as.

use_empty_cols($set = false) — whether or not to parse empty columns to empty value.

use_empty_rows($set = false) — whether or not to parse empty rows.

Note: if empty columns parsing is disabled (it is disabled by default), read_next_row() will return -1 when an empty row is encountered. If empty columns parsing is enabled with use_empty_cols(true), it will return array of cells filled with empty value.

set_boolean_values($true = true, $false = false) — set values which excel boolean cells are parsed as. By default, TRUE cells are parsed as PHP true value, FALSE cells are parsed as PHP false value.

set_float_to_int($tf = false) — whether or not to parse excel cells with whole float numbers to integers. Often whole numbers are stored as float internally in XLS file, and by default they are parsed as floats. This setting allows to parse such numbers as integer type. Note: cells with numbers internally stored as integers are always parsed as integers.

4. Constructor and destructor

__construct($filename, $debug = false, $mem = null, $debug_MSCFB = false) — open file, extract Workbook stream (or use the file as Workbook stream), execute set_output_encoding() and get_data() methods.

$filename — path to XLS file.

$debug — if evaluates to true, enables Debug mode.

$debug_MSCFB — if evaluates to true, enables Debug mode in MSCFB helper class.

__destruct() — execute free() method, thus closing all opened streams, deleting temporary files and erasing big structures.

6. Error handling

Each time an error occures, the script places an error code into $error array property and appends an error message to $err_msg string property. If an error occures, it prevents execution of parts of the script that depend on successful execution of the part where the error occured. Warnings work similarly to errors except they do not prevent execution of other parts of the script, because they always occur in non-critical places. Warnings use $warn property to store warning codes and $warn_msg for warning texts.

If Debug mode is disabled, you should check if $error property evaluates to true, which would mean that $error array is not empty, i.e. has one or multiple error codes as its elements. Error handling example:

If Debug mode is enabled, errors and warnings are printed (echoed) to standart output automatically.

7. Security considerations

There are extensive error checks in every function that should prevent any potential problems no matter what file is supplied to the constructor. The only potential security risk can come from the Debug mode, which prints a function name in which an error or a warning has occured, but even then I do not see how such information can lead to problems with this particular class. It’s pretty safe to say that this code can be safely run in (automated) production of any kind. Same applies to MSCFB class.

8. Performance and memory

The MSXLS class has been optimized for fast parsing and data extraction, while still performing error checks for safety. It is possible to marginally increase performance by leaving those error checks out, but I would strongly advise against it, because if a specially crafted mallicious file is supplied, it becomes possible to cause a memory hog or an infinite loop.

The following numbers were obtained on a Windows machine (AMD Phenom II x4 940), with a 97.0 MiB test XLS file (96.2 MiB Workbook stream) using WAMP server. XLS file consists entirely of unique strings. Default temporary file settings is used.

Time Memory Time Memory Action
7.52s 1.0 MiB 3.48s 0.6 MiB Open XLS File (create MSXLS instance)
77.77s 213.2 MiB 16.41s 128.8 MiB Open XLS File and parse in Array mode
91.08s 192.2 MiB 27.20s 204.3 MiB Open file, parse in Row-by-row mode to variable
54.71s 82.9 MiB 21.49s 82.1 MiB Open file, parse in Row-by-row mode (don’t save)
5.6.25 32-bit 5.6.25 32-bit 7.0.10 64-bit 7.0.10 64-bit PHP Version

Note: Disabling temporary files does not decrease script execution time by any significant margin. In fact, the execution time is increased sometimes.

Note: It took 1.65 seconds and 12.0 MiB of memory to parse a real-life XLS pricelist of 13051 entries in Array mode in PHP 7.0.10. That XLS file was 3.45 MiB in size.

9. More documentation

All code in MSXLS.php file is heavily commented, feel free to take a look at it. To understand how XLS file is structured, please refer to MS documentation, or to OpenOffice.org’s Documentation of MS Compound File (also provided as a PDF file in this repository).

About

Probably, the fastest possible and the most efficient parser for XLS excel files for PHP!

Источник

I recently received a tutorial request about creating a tutorial using PHP to parse Excel and CSV files. So today’s your lucky day, and by the end of this tutorial, you’ll be able an Excel and CSV parsing professional.

When migrating to Church Community Builder (CCB), sometimes other Church Management Software providers give you a backup of your data in an Excel or XML file.  But don’t fear when migrating to CCB, and you’re sitting confused and puzzled, endlessly staring at an Excel or XML file wondering how you’re going to migrate your data.

I recently wrote a tutorial about how to parse XML using XPATH and PHP, so that should help those of you handle the XML file migration portion.  But for those of you with rows and rows of Excel data, this tutorial is for you.

For starters, there are many PHP libraries available to download online to parse Excel files.  PHPExcel, php-excel-reader and simple XLXS are just a few libraries that allow you amazing flexibility for reading and writing Excel files.

I won’t cover these libraries in this tutorial, but do feel free to use the libraries.  All are well documented and easy to follow if you have a development experience or programming background.  Nevertheless, I’ll show you how to parse your Excel or CSV document using built-in PHP functionality.

Preparing a sample Excel and CSV file

One of the first items well complete for this tutorial is to create an Excel file with a header row. I’ll provide a simple excel spreadsheet to start with that consists of the following headers: First Name, Last Name, Email and Phone.

Create and save a file named testfile.xlsx with the listed headers above, and enter in dummy information as I have below.  Now, perfrom a “Save As” and name the file testfile.csv.  That’s it, your data is now ready to be PHP parsed.  Of course, don’t forget to place the testfile.csv in your web directory.

Screen Shot 2015-02-11 at 10.27.00 PM

Let the PHP parsing magic begin…

Now let us get down to business with the heavy lifting of PHP parsing magic using fopenfgetcsv and fclose built-in PHP functions.  Okay, so let me explain these two magical functions.

The fopen function allows you to open and read files or urls while the fgetcsv function reads or parses each line of a CSV in an array.  And of course, the fclose function closes that which was open.

So, open your favorite Text Editor of choice and create a err_upTmpName variable and assign it a text string of ‘testfile.csv‘, the name of your CSV file you created in the last section.  Oh yes, and be sure to save the file too, naming the file parse-csv.php.

The next variable to set and assign is the iterator variable.  Remember how your file has header values?  We’re setting an iterator variable to bypass the header values.

Typical spreadsheets have header values, but when parsing, you may not want to account for the header values or row.  It would be easy to simply remove the row and not need the iterator variable.

But for the sake of you possibly needing it for a variety of future projects, we’ll set a row iterator variable and assign it 0 (zero).

Next, I’ll create what looks to be a complicated if statement the check to see if the file has been open.  Notice we assign the handle variable and assign to it the fopen function, passing to it the err_upTmpName variable as the first argument and the string “r” as the second argument.

Notice that the expression is wrapped in parenthesis and compared using the !== operator, also known as the not identical operator.  Simply put, if the fopen function returns TRUE or anything not identical to FALSE, then we move inside the if statement to perform additional code.  If not (else), then we echo to the web browser that the file was unable to be opened.

Of course, you could get more specific or greater detail when it comes to error handling, but for the sake of this tutorial, I’ll keep it simple.

<?PHP

/**
 * parse-csv.php
 */

$err_upTmpName = 'testfile.csv';

$row = 0;

if (($handle = fopen($err_upTmpName, "r")) !== FALSE) {


} else {

	echo 'File could not be opened.';
}	

fclose($handle);

?>

Parsing CSV using the fgetcsv function

It’s time to focus on parsing each row of your CSV file, testfile.csv.  To do so, we’ll use a while statement, which acts like a iterator, but performs a task or action repeatedly until it can longer perform the task or action — rather performs the action as long as it’s TRUE.

Create another complex expression using the while statement. This time, we’ll create a data variable and set it equal to the fgetcsv function, passing it the handle variable as the first argument, 0 as the second argument, and “,” as the third argument (the most common CSV delimiter).  This while statement will run for each line of your CSV file as long as the result is TRUE.  I’ll explain…

Again, wrap the expression in parenthesis and use the !== operator to test whether or not it is TRUE or FALSE.  In this case, we want to test to make sure the value is not identical to FALSE. Set your open curly brace, and now you’re ready to skip your header row.

Remember early when we set the row variable equal to 0 (zero)?  Well, we’ll use an if statement to skip the first row since it is the header row.  Like I said early, you could easily remove the header row from the CSV file if you wanted.

Nevertheless, inside the if row is equal to 0 (zero) statement increment the row variable, now making it one and this ends the first while statement for the first row of your CSV file.  Onto the next row.

The next time the while statement executes, it will use the logic found inside of the else of the if else statement.  I’ve commented out one line inside the else statement.  This is to help you grasp what values are associated with each data array variable.

Next I use a simple if statement to validate whether or not the first name and last name fields hare not empty.  If they are empty, nothing is echoed to the web browser; however, when both are completed, we echo each of the data array fields or columns, with dashing concatenated between each value, of the CSV file for each row.

<?PHP

	while (($data = fgetcsv($handle, 0, ",")) !== FALSE) {

		if($row == 0){ 
			$row++; 
		} else {

			// $data[0] = first name; $data[1] = last name; $data[2] = email; $data[3] = phone
			/*********************************************************************************************************************/
			if(!empty($data[0]) && !empty($data[1])) echo $data[0].' - '.$data[1].' - '.$data[2].' - '.$data[3].'<br/>';

		}

	}

?>

Let the PHP magic show begin and end with parsed Excel or  CSV data

And here’s what it’s like when it all comes together.  Below you can see the entire file.  Again, this is a simple tutorial aiming to help you understand how to parse your Excel or CSV data using PHP should you not have the courage to tackle MySQL databases and tables.

Of course, in this example, we echoed data to the web browser, but you could easily setup a MySQL database and insert the parsed Excel or CSV data into their appropriate database table columns.

Well, that’s it!  Load your parse-csv.php file into your web directory with your testfile.csv, open in a web browser and happy parsing…  Do let me know if you have issues, and/or comments.  Stay tuned for our next tutorial.

<?PHP

/**
 * parse-csv.php
 */

$err_upTmpName = 'testfile.csv';

$row = 0;

if (($handle = fopen($err_upTmpName, "r")) !== FALSE) {

	while (($data = fgetcsv($handle, 0, ",")) !== FALSE) {

		if($row == 0){ 
			$row++; 
		} else {

			// $data[0] = first name; $data[1] = last name; $data[2] = email; $data[3] = phone
			/*********************************************************************************************************************/
			if(!empty($data[0]) && !empty($data[1])) echo $data[0].' - '.$data[1].' - '.$data[2].' - '.$data[3].'<br/>';

		}

	}

} else {

	echo 'File could not be opened.';
}	

fclose($handle);

?>

Like this post? Please share to your friends:
  • Парсинг строк vba excel
  • Парсинг сайта с помощью excel
  • Парсинг сайта в excel онлайн
  • Парсинг нетабличных данных с сайтов в excel
  • Парсинг на python вывод в excel