Node.js is an open-source and cross-platform JavaScript runtime environment that can also be used to read from a file and write to a file which can be in txt, ods, xlsx, docx, etc format.
The following example covers how an excel file(.xlsx) file is read from an excel file and then converted into JSON and also to write to it. It can be achieved using a package called xlsx to achieve our goal.
Module Installation: You can install xlsx module using the following command:
npm install xlsx
Note: For the following example, text.xlsx is a dummy data file that has been used.
Filename: test.xlsx
Sheet 1:
Sheet 2:
So the excel file test.xlsx has 2 sheets, one having Student details and another having lecturer details.
Read Operation Filename: read.js
Javascript
const reader = require(
'xlsx'
)
const file = reader.readFile(
'./test.xlsx'
)
let data = []
const sheets = file.SheetNames
for
(let i = 0; i < sheets.length; i++)
{
const temp = reader.utils.sheet_to_json(
file.Sheets[file.SheetNames[i]])
temp.forEach((res) => {
data.push(res)
})
}
console.log(data)
Explanation: First, the npm module is included in the read.js file and then the excel file is read into a workbook i.e constant file in the above program.
The number of files in that particular excel file is available in the SheetNames property of the workbook. It can be accessed as follows:
const sheets = file.SheetNames // Here the value of the sheets will be 2
A for loop is run until the end of the excel file starting from the first page. One of the most important functions used in the code above is the sheet_to_json() function present in the utils module of the xlsx package. It accepts a worksheet object as a parameter and returns an array of JSON objects.
There is a forEach loop which iterates through every JSON object present in the array temp and pushes it into a variable data which would contain all the data in JSON format.
Finally, the data is printed or any other modification can be performed on the array of JSON objects.
Step to run the application:
Run the read.js file using the following command:
node read.js
Output:
Write Operation In the following example, we will convert an array of JSON objects into an excel sheet and append it to the file.
Filename: write.js
Javascript
const reader = require(
'xlsx'
)
const file = reader.readFile(
'./test.xlsx'
)
let student_data = [{
Student:
'Nikhil'
,
Age:22,
Branch:
'ISE'
,
Marks: 70
},
{
Student:
'Amitha'
,
Age:21,
Branch:
'EC'
,
Marks:80
}]
const ws = reader.utils.json_to_sheet(student_data)
reader.utils.book_append_sheet(file,ws,
"Sheet3"
)
reader.writeFile(file,
'./test.xlsx'
)
Explanation: Here we have an array of JSON objects called student_data. We use two main functions in this program i.e json_to_sheet() which accepts an array of objects and converts them into a worksheet and another function is the book_append_sheet() to append the worksheet into the workbook.
Finally, all the changes are written to the test.xlsx file using writeFile() function which takes a workbook and a excel file as input parameter.
Step to run the application:
Run the read.js file using the following command:
node write.js
Output: The final test.xlsx file would look something like this:
Sheet 1:
Sheet 2:
Sheet 3: We can see sheet 3 is appended into the test.xlsx as shown below:
read-excel-file
Read small to medium *.xlsx
files in a browser or Node.js. Parse to JSON with a strict schema.
Demo
Also check out write-excel-file
for writing simple *.xlsx
files.
Install
npm install read-excel-file --save
If you’re not using a bundler then use a standalone version from a CDN.
Use
Browser
<input type="file" id="input" />
import readXlsxFile from 'read-excel-file' // File. const input = document.getElementById('input') input.addEventListener('change', () => { readXlsxFile(input.files[0]).then((rows) => { // `rows` is an array of rows // each row being an array of cells. }) }) // Blob. fetch('https://example.com/spreadsheet.xlsx') .then(response => response.blob()) .then(blob => readXlsxFile(blob)) .then((rows) => { // `rows` is an array of rows // each row being an array of cells. }) // ArrayBuffer. // https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer // // Could be obtained from: // * File // * Blob // * Base64 string // readXlsxFile(arrayBuffer).then((rows) => { // `rows` is an array of rows // each row being an array of cells. })
Note: Internet Explorer 11 requires a Promise
polyfill. Example.
Node.js
const readXlsxFile = require('read-excel-file/node') // File path. readXlsxFile('/path/to/file').then((rows) => { // `rows` is an array of rows // each row being an array of cells. }) // Readable Stream. readXlsxFile(fs.createReadStream('/path/to/file')).then((rows) => { // `rows` is an array of rows // each row being an array of cells. }) // Buffer. readXlsxFile(Buffer.from(fs.readFileSync('/path/to/file'))).then((rows) => { // `rows` is an array of rows // each row being an array of cells. })
Web Worker
const worker = new Worker('web-worker.js') worker.onmessage = function(event) { // `event.data` is an array of rows // each row being an array of cells. console.log(event.data) } worker.onerror = function(event) { console.error(event.message) } const input = document.getElementById('input') input.addEventListener('change', () => { worker.postMessage(input.files[0]) })
web-worker.js
import readXlsxFile from 'read-excel-file/web-worker' onmessage = function(event) { readXlsxFile(event.data).then((rows) => { // `rows` is an array of rows // each row being an array of cells. postMessage(rows) }) }
JSON
To read spreadsheet data and then convert it to an array of JSON objects, pass a schema
option when calling readXlsxFile()
. In that case, instead of returning an array of rows of cells, it will return an object of shape { rows, errors }
where rows
is gonna be an array of JSON objects created from the spreadsheet data according to the schema
, and errors
is gonna be an array of errors encountered while converting spreadsheet data to JSON objects.
Each property of a JSON object should be described by an «entry» in the schema
. The key of the entry should be the column’s title in the spreadsheet. The value of the entry should be an object with properties:
property
— The name of the object’s property.required
— (optional) Required properties can be marked asrequired: true
.validate(value)
— (optional) Cell value validation function. Is only called on non-empty cells. If the cell value is invalid, it should throw an error with the error message set to the error code.type
— (optional) The type of the value. Defines how the cell value will be parsed. If notype
is specified then the cell value is returned «as is»: as a string, number, date or boolean. Atype
could be a:- Built-in type:
String
Number
Boolean
Date
- «Utility» type exported from the library:
Integer
Email
URL
- Custom type:
- A function that receives a cell value and returns a parsed value. If the value is invalid, it should throw an error with the error message set to the error code.
- Built-in type:
Sidenote: When converting cell values to object properties, by default, it skips all null
values (skips all empty cells). That’s for simplicity. In some edge cases though, it may be required to keep all null
values for all the empty cells. For example, that’s the case when updating data in an SQL database from an XLSX spreadsheet using Sequelize ORM library that requires a property to explicitly be null
in order to clear it during an UPDATE
operation. To keep all null
values, pass includeNullValues: true
option when calling readXlsxFile()
.
errors
If there were any errors while converting spreadsheet data to JSON objects, the errors
property returned from the function will be a non-empty array. An element of the errors
property contains properties:
error: string
— The error code. Examples:"required"
,"invalid"
.- If a custom
validate()
function is defined and it throws anew Error(message)
then theerror
property will be the same as themessage
value. - If a custom
type()
function is defined and it throws anew Error(message)
then theerror
property will be the same as themessage
value.
- If a custom
reason?: string
— An optional secondary error code providing more details about the error. Currently, it’s only returned for «built-in»type
s. Example:{ error: "invalid", reason: "not_a_number" }
fortype: Number
means that «the cell value is invalid because it’s not a number«.row: number
— The row number in the original file.1
means the first row, etc.column: string
— The column title.value?: any
— The cell value.type?: any
— The schematype
for this column.
An example of using a schema
// An example *.xlsx document: // ----------------------------------------------------------------------------------------- // | START DATE | NUMBER OF STUDENTS | IS FREE | COURSE TITLE | CONTACT | STATUS | // ----------------------------------------------------------------------------------------- // | 03/24/2018 | 10 | true | Chemistry | (123) 456-7890 | SCHEDULED | // ----------------------------------------------------------------------------------------- const schema = { 'START DATE': { // JSON object property name. prop: 'date', type: Date }, 'NUMBER OF STUDENTS': { prop: 'numberOfStudents', type: Number, required: true }, // Nested object example. // 'COURSE' here is not a real Excel file column name, // it can be any string — it's just for code readability. 'COURSE': { // Nested object path: `row.course` prop: 'course', // Nested object schema: type: { 'IS FREE': { prop: 'isFree', type: Boolean }, 'COURSE TITLE': { prop: 'title', type: String } } }, 'CONTACT': { prop: 'contact', required: true, // A custom `type` can be defined. // A `type` function only gets called for non-empty cells. type: (value) => { const number = parsePhoneNumber(value) if (!number) { throw new Error('invalid') } return number } }, 'STATUS': { prop: 'status', type: String, oneOf: [ 'SCHEDULED', 'STARTED', 'FINISHED' ] } } readXlsxFile(file, { schema }).then(({ rows, errors }) => { // `errors` list items have shape: `{ row, column, error, reason?, value?, type? }`. errors.length === 0 rows === [{ date: new Date(2018, 2, 24), numberOfStudents: 10, course: { isFree: true, title: 'Chemistry' }, contact: '+11234567890', status: 'SCHEDULED' }] })
Tips and Features
Custom type
example.
{ 'COLUMN_TITLE': { // This function will only be called for a non-empty cell. type: (value) => { try { return parseValue(value) } catch (error) { console.error(error) throw new Error('invalid') } } } }
Ignoring empty rows.
By default, it ignores any empty rows. To disable that behavior, pass ignoreEmptyRows: false
option.
readXlsxFile(file, { schema, ignoreEmptyRows: false })
How to fix spreadsheet data before schema
parsing. For example, how to ignore irrelevant rows.
Sometimes, a spreadsheet doesn’t exactly have the structure required by this library’s schema
parsing feature: for example, it may be missing a header row, or contain some purely presentational / irrelevant / «garbage» rows that should be removed. To fix that, one could pass an optional transformData(data)
function that would modify the spreadsheet contents as required.
readXlsxFile(file, { schema, transformData(data) { // Add a missing header row. return [['ID', 'NAME', ...]].concat(data) // Remove irrelevant rows. return data.filter(row => row.filter(column => column !== null).length > 0) } })
The function for converting data to JSON objects using a schema is exported from this library too, if anyone wants it.
import convertToJson from "read-excel-file/schema" // `data` is an array of rows, each row being an array of cells. // `schema` is a "to JSON" convertion schema (see above). const { rows, errors } = convertToJson(data, schema)
A React component for displaying errors that occured during schema parsing/validation.
import { parseExcelDate } from 'read-excel-file' function ParseExcelError({ children }) { const { type, value, error, reason, row, column } = children // Error summary. return ( <div> <code>"{error}"</code> {reason && ' '} {reason && <code>("{reason}")</code>} {' for value '} <code>{stringifyValue(value)}</code> {' in column '} <code>"{column}"</code> {' in row '} <code>{row}</code> {' of spreadsheet'} </div> ) } function stringifyValue(value) { // Wrap strings in quotes. if (typeof value === 'string') { return '"' + value + '"' } return String(value) }
JSON (mapping)
Same as above, but simpler: without any parsing or validation.
Sometimes, a developer might want to use some other (more advanced) solution for schema parsing and validation (like yup
). If a developer passes a map
option instead of a schema
option to readXlsxFile()
, then it would just map each data row to a JSON object without doing any parsing or validation. Cell values will remain «as is»: as a string, number, date or boolean.
// An example *.xlsx document: // ------------------------------------------------------------ // | START DATE | NUMBER OF STUDENTS | IS FREE | COURSE TITLE | // ------------------------------------------------------------ // | 03/24/2018 | 10 | true | Chemistry | // ------------------------------------------------------------ const map = { 'START DATE': 'date', 'NUMBER OF STUDENTS': 'numberOfStudents', 'COURSE': { 'course': { 'IS FREE': 'isFree', 'COURSE TITLE': 'title' } } } readXlsxFile(file, { map }).then(({ rows }) => { rows === [{ date: new Date(2018, 2, 24), numberOfStudents: 10, course: { isFree: true, title: 'Chemistry' } }] })
Multiple Sheets
By default, it reads the first sheet in the document. If you have multiple sheets in your spreadsheet then pass either a sheet number (starting from 1
) or a sheet name in the options
argument.
readXlsxFile(file, { sheet: 2 }).then((data) => { ... })
readXlsxFile(file, { sheet: 'Sheet1' }).then((data) => { ... })
By default, options.sheet
is 1
.
To get the names of all sheets, use readSheetNames()
function:
readSheetNames(file).then((sheetNames) => { // sheetNames === ['Sheet1', 'Sheet2'] })
Dates
XLSX format originally had no dedicated «date» type, so dates are in almost all cases stored simply as numbers (the count of days since 01/01/1900
) along with a «format» description (like "d mmm yyyy"
) that instructs the spreadsheet viewer software to format the date in the cell using that certain format.
When using readXlsx()
with a schema
parameter, all schema columns having type Date
are automatically parsed as dates. When using readXlsx()
without a schema
parameter, this library attempts to guess whether a cell contains a date or just a number by examining the cell’s «format» — if the «format» is one of the built-in date formats then such cells’ values are automatically parsed as dates. In other cases, when date cells use a non-built-in format (like "mm/dd/yyyy"
), one can pass an explicit dateFormat
parameter to instruct the library to parse numeric cells having such «format» as dates:
readXlsxFile(file, { dateFormat: 'mm/dd/yyyy' })
Trim
By default, it automatically trims all string values. To disable this feature, pass trim: false
option.
readXlsxFile(file, { trim: false })
Transform
Sometimes, a spreadsheet doesn’t exactly have the structure required by this library’s schema
parsing feature: for example, it may be missing a header row, or contain some purely presentational / empty / «garbage» rows that should be removed. To fix that, one could pass an optional transformData(data)
function that would modify the spreadsheet contents as required.
readXlsxFile(file, { schema, transformData(data) { // Add a missing header row. return [['ID', 'NAME', ...]].concat(data) // Remove empty rows. return data.filter(row => row.filter(column => column !== null).length > 0) } })
Limitations
Performance
There have been some reports about performance issues when reading very large *.xlsx
spreadsheets using this library. It’s true that this library’s main point have been usability and convenience, and not performance when handling huge datasets. For example, the time of parsing a file with 2000 rows / 20 columns is about 3 seconds. So, for reading huge datasets, perhaps use something like xlsx
package instead. There’re no comparative benchmarks between the two, so if you’ll be making one, share it in the Issues.
Formulas
Dynamically calculated cells using formulas (SUM
, etc) are not supported.
TypeScript
I’m not a TypeScript expert, so the community has to write the typings (and test those). See example index.d.ts
.
CDN
One can use any npm CDN service, e.g. unpkg.com or jsdelivr.net
<script src="https://unpkg.com/read-excel-file@5.x/bundle/read-excel-file.min.js"></script> <script> var input = document.getElementById('input') input.addEventListener('change', function() { readXlsxFile(input.files[0]).then(function(rows) { // `rows` is an array of rows // each row being an array of cells. }) }) </script>
TypeScript
This library comes with TypeScript «typings». If you happen to find any bugs in those, create an issue.
References
Uses xmldom
for parsing XML.
GitHub
On March 9th, 2020, GitHub, Inc. silently banned my account (erasing all my repos, issues and comments, even in my employer’s private repos) without any notice or explanation. Because of that, all source codes had to be promptly moved to GitLab. The GitHub repo is now only used as a backup (you can star the repo there too), and the primary repo is now the GitLab one. Issues can be reported in any repo.
License
MIT
Useful link
var express = require('express');
var app = express();
var bodyParser = require('body-parser');
var multer = require('multer');
var xlstojson = require("xls-to-json-lc");
var xlsxtojson = require("xlsx-to-json-lc");
app.use(bodyParser.json());
var storage = multer.diskStorage({ //multers disk storage settings
destination: function (req, file, cb) {
cb(null, './uploads/')
},
filename: function (req, file, cb) {
var datetimestamp = Date.now();
cb(null, file.fieldname + '-' + datetimestamp + '.' + file.originalname.split('.')[file.originalname.split('.').length -1])
}
});
var upload = multer({ //multer settings
storage: storage,
fileFilter : function(req, file, callback) { //file filter
if (['xls', 'xlsx'].indexOf(file.originalname.split('.')[file.originalname.split('.').length-1]) === -1) {
return callback(new Error('Wrong extension type'));
}
callback(null, true);
}
}).single('file');
/** API path that will upload the files */
app.post('/upload', function(req, res) {
var exceltojson;
upload(req,res,function(err){
if(err){
res.json({error_code:1,err_desc:err});
return;
}
/** Multer gives us file info in req.file object */
if(!req.file){
res.json({error_code:1,err_desc:"No file passed"});
return;
}
/** Check the extension of the incoming file and
* use the appropriate module
*/
if(req.file.originalname.split('.')[req.file.originalname.split('.').length-1] === 'xlsx'){
exceltojson = xlsxtojson;
} else {
exceltojson = xlstojson;
}
try {
exceltojson({
input: req.file.path,
output: null, //since we don't need output.json
lowerCaseHeaders:true
}, function(err,result){
if(err) {
return res.json({error_code:1,err_desc:err, data: null});
}
res.json({error_code:0,err_desc:null, data: result});
});
} catch (e){
res.json({error_code:1,err_desc:"Corupted excel file"});
}
})
});
app.get('/',function(req,res){
res.sendFile(__dirname + "/index.html");
});
app.listen('3000', function(){
console.log('running on 3000...');
});
This article shows you how to read and extract content from an Excel (.xlsx) file by using Node.js. Without any further ado, let’s get our hands dirty and start coding.
Getting Things Ready
We are going to work with a simple Excel file that contains some information as follows:
You can download the file from the link below to your computer:
https://www.kindacode.com/wp-content/uploads/2021/11/KindaCode.com-Example.xlsx.zip
Writing Code
There are many great libraries that can help us easily read Excel files, such as xlsx (SheetJS), exceljs (ExcelJS), node-xlsx (Node XLSX). In the sample project below, we will use xlsx. It’s super popular and supports TypeScript out-of-the-box.
1. Create a new folder named example (the name doesn’t matter and is totally up to you), then initialize a new Node.js project by running:
npm init
2. In your project directory, create a subfolder called src, then add an empty index.js file. Copy the Excel file you’ve downloaded before into the src folder.
Here’s the file structure:
.
├── node_modules
├── package-lock.json
├── package.json
└── src
├── KindaCode.com Example.xlsx
└── index.js
3. Installing the xlsx package:
npm i xlsx
4. Below are the code snippets for index.js. There are 2 different code snippets. The first one uses the CommonJS syntax (with require), while the second one uses the ES Modules syntax (with import). Choose the one that fits your need.
CommonJS:
const path = require("path");
const xlsx = require("xlsx");
const filePath = path.resolve(__dirname, "Kindacode.com Example.xlsx");
const workbook = xlsx.readFile(filePath);
const sheetNames = workbook.SheetNames;
// Get the data of "Sheet1"
const data = xlsx.utils.sheet_to_json(workbook.Sheets[sheetNames[0]])
/// Do what you need with the received data
data.map(person => {
console.log(`${person.Name} is ${person.Age} years old`);
})
ES Modules:
// import with ES6 Modules syntax
import path from 'path';
import xlsx from 'xlsx';
import { fileURLToPath } from 'url'
import { dirname } from 'path'
const __filename = fileURLToPath(import.meta.url)
const __dirname = dirname(__filename)
const filePath = path.resolve(__dirname, "Kindacode.com Example.xlsx");
const workbook = xlsx.readFile(filePath);
const sheetNames = workbook.SheetNames;
// Get the data of "Sheet1"
const data = xlsx.utils.sheet_to_json(workbook.Sheets[sheetNames[0]])
/// Do what you need with the received data
data.map(person => {
console.log(`${person.Name} is ${person.Age} years old`);
})
5. Test our project:
node src/index.js
Output:
John Doe is 37 years old
Jane Doe is 37 years old
Captain is 72 years old
Voldermort is 89 years old
Conclusion
We’ve written a simple Node.js program to retrieve the data from a sheet of an Excel workbook. If you’d like to explore more new and interesting things about modern Node.js, take a look at the following articles:
- Node.js: How to Ping a Remote Server/ Website
- Best open-source ORM and ODM libraries for Node.js
- 7 Best Open-Source HTTP Request Libraries for Node.js
- Node.js: Executing a Piece of Code after a Delay
- Express + TypeScript: Extending Request and Response objects
You can also check out our Node.js category page for the latest tutorials and examples.
Таблицы Excel популярны в деловом мире как стандарт де-факто, нравится ли нам, как разработчикам, это. Иногда клиенты просят нас загрузить таблицу Excel, и все данные должны храниться в базе данных. XLSX — это пакет Node, который решает эту проблему. В этом посте мы собираемся использовать busboy для обработки данных формы.
Этот пост состоит из двух частей:
- Как разобрать лист Excel в формат JSON.
- Как создать таблицу Excel с использованием данных JSON.
Шаг 1. Установите пакет XLSX с помощью npm или bower.
npm i --save xlsx //or bower install js-xlsx
Шаг 2. Импортируйте Multer или busboy
npm install --save multer
Multer — это промежуточное программное обеспечение node.js для обработки multipart / form-data, которое в основном используется для загрузки файлов. Он написан поверх busboy для максимальной эффективности.
Busboy — это модуль Node.js для анализа входящих данных HTML-форм.
Шаг 2: импортируйте XLSX в index.js
const XLSX = require('xlsx')
Анализ данных Excel
req.busboy.on('file', (fieldname, file, fname) => { if (fieldname === 'file') { const buffers = [] file.on('data', (data) => { buffers.push(data) }) file.on('end', () => { buffer = Buffer.concat(buffers) workbook = XLSX.read(buffer, { type: 'buffer', }) }) } }) req.busboy.on('finish', () => { try { const excelProducts = XLSX.utils.sheet_to_json(workbook.Sheets[workbook.SheetNames[0]], { raw: false, header: 1, dateNF: 'yyyy-mm-dd', blankrows: false, }) } catch (err) { console.log(err) } }) req.pipe(req.busboy)
- Прочтите файл Excel с помощью буфера. Мы используем busboy, потому что этот файл загружен пользователем. Если файл уже загружен, вы можете использовать XLSX.readFile (filename: string, opts ?: ParsingOptions), указав имя файла.
- XLSX.utils.sheet_to_json () используется для чтения данных рабочего листа в массив объекта. Другие параметры передаются для указания различных параметров, таких как использовать необработанные значения (истина) или форматированные строки (ложь), включать пустые строки в вывод, формат даты по умолчанию, если указан заголовок, первая строка считается строкой данных, иначе первая row — это строка заголовка и не считается данными.
Некоторые вспомогательные функции в XLSX.utils
, генерирующие разные виды листов:
XLSX.utils.sheet_to_csv
создает CSVXLSX.utils.sheet_to_txt
генерирует текст в формате UTF16XLSX.utils.sheet_to_html
генерирует HTMLXLSX.utils.sheet_to_json
генерирует массив объектовXLSX.utils.sheet_to_formulae
создает список формул
Создание листа Excel
data = [{ firstName: 'John', lastName: 'Doe' }, { firstName: 'Smith', lastName: 'Peters' }, { firstName: 'Alice', lastName: 'Lee' }] const ws = XLSX.utils.json_to_sheet(data) const wb = XLSX.utils.book_new() XLSX.utils.book_append_sheet(wb, ws, 'Responses') XLSX.writeFile(wb, 'sampleData.export.xlsx')
- json_to_sheet преобразует массив объектов JavaScript в рабочий лист. Существуют и другие методы преобразования данных в рабочие листы, такие как aoa_to_sheet, table_to_sheet. sheet_add_json используется для добавления массива объектов JavaScript в существующий рабочий лист.
- book_new () создает новую книгу на листе.
- book_append_sheet добавляет лист к книге с названием «Ответы».
- XLSX.writeFile (wb, «sampleData.export.xlsx») пытается записать wb в «sampleData.export.xlsx».
Вы также можете указать ширину каждого столбца и объединить ячейки.
Некоторые вспомогательные функции XLSX.utils
для импорта различных данных в листы:
XLSX.utils.aoa_to_sheet
преобразует массив массивов данных JavaScript в рабочий лист.XLSX.utils.json_to_sheet
преобразует массив объектов JavaScript в рабочий лист.XLSX.utils.table_to_sheet
преобразует элемент DOM TABLE в рабочий лист.XLSX.utils.sheet_add_aoa
добавляет на рабочий лист массив массивов данных JavaScript.XLSX.utils.sheet_add_json
добавляет на рабочий лист массив объектов JavaScript.
Заключение:
Используя пакет XLSX, вы можете читать любой файл Excel, а также создавать файл Excel.
Спасибо за внимание.
Больше контента на plainenglish.io