Some Notes on CSV Parsing and Google Spreadsheets in Node.js
I was recently working on a short Node.js project that involved CSV parsing and automation of access to Google Spreadsheets. This post consists of a few notes on what was learned - not all of the tools in the ecosystem are all that they claim to be, sadly.
Miso.Dataset Google Spreadsheet Importers Don't Work
Miso.Dataset is a toolkit for manipulating tabular data. Insofar as the actual manipulation goes, it's pretty useful and can save you a lot of time. The Node.js distribution is a port of a client distribution, however, which means that whether or not non-core functionality actually works in Node.js is a coin-toss. The Google Spreadsheet importer is one such non-functional item; it works by calling one of the JSONP/XML format views of a spreadsheet and parsing that, but the actual HTTP requests are performed by lower level functions that will only work in a browser.
Google Spreadsheet JSONP Views are Buggy
It's possible to bypass the Google Spreadsheet API to obtain JSONP for a single worksheet with one request wherein the URL looks something like this:
https://spreadsheets.google.com/tq?key=%id&sheet=%name&tqx=version:0.6;responseHandler:%fn;reqId:0;out:json
This might seem tempting and easy, since the data is much more compact than the XML feed version. Unfortunately it is very easy to create a spreadsheet that returns invalid, broken JSON via this method. In addition, the data provided here uses a very aggressive algorithm to try to ensure that all columns have a header name; it merges rows until it finds non-empty values in all columns, and then gets stuck that way. Once it has started down the path of merging rows to create a header row, you can't stop it from doing that - the spreadsheet data interpretation is stuck that way, no matter how you edit it further.
Use a Google Spreadsheet Package that Works With the XML Feed
There are a number of small, simple Google Spreadsheet packages available via NPM, such as this one that incorporates authentication for non-public spreadsheets. It reads from the XML feed for a spreadsheet, which is less prone to issues.
Miso.Dataset Parsers Are Fragile
Getting data into Miso.Dataset is a two step process - there is an importer and then a parser. Any data import other than just feeding it an array of objects is risky, however, as the parsers tend to throw errors very readily: if given duplicate header row names, or empty header rows because one row happens to run long, for example. Since obtaining a working dataset from imported data is an asynchronous operation, you can't catch these errors.
// If the CSV-format data has duplicate or empty header rows, then // an error will be thrown here, and there isn't a great deal you can do // about that. var ds = new Miso.Dataset({ parser: Miso.Dataset.Parsers.Delimited, data: csvString, delimiter: "," }); ds.fetch({ error: function (error) {}, success: function () {} });
So use some other, more robust package to transform your CSV data into arrays of objects and then give that to Miso.Dataset:
// Passing in tabular data as an array of objects is much safer and // more robust. var ds = new Miso.Dataset({ data: [ {x: "y"}, {a: "b"} // etc ... ] }); ds.fetch({ error: function (error) {}, success: function () {} });
Use the Node CSV Package
You will save yourself a great deal of pain and suffering by using Node CSV to parse in raw CSV and transform it into a format that can be fed to Miso.Dataset. For example:
var csv = require("csv"); var parser = csv(); var records = []; parser.on("error", function (error) { // Do something. }); parser.on("record", function (row, index) { records.push(row); }); parser.on("end", function () { var ds = new Miso.Dataset({ data: records }); ds.fetch({ error: function (error) {}, success: function () {} }); }); // Tell the parser to return rows as objects keyed by column names. parser.from.options({ columns: true }); // Start parsing. This package will not throw errors for duplicate header rows // or other similar issues that will stop Miso.Dataset in its tracks. parser.from(csvString);