Output JSON from big datasets

05 Sep 2017 in Algorithms, JSON, Software Architecture

The problem: some time ago we were creating a RESTfull service to give some JSON-encoded data to a client. The resulting data was so huge the webserver process run out of memory.

The script was something like:

<?php
header("Content-Type:application/json");
$data = fetchDataFromDatabaseAndTransformIt();
echo json_encode($data);

The proposed solution was to break the data into pieces and echo those pieces one at a time. This way we were able to have a constant memory consumption not depending on the size of data:

<?php
$jsonWriter = new JsonWriter();
$jsonWriter->start();
while (($data = fetchDataFromDatabaseAndTransformItOnePieceAtATime())) {
    $jsonWriter->push($data);
}
$jsonWriter->end();

With the JsonWriter class something like:

class JsonWriter {
    function start() {
        header("Content-Type:application/json");
        echo "{";
    }
    function end() {
        echo "}";
    }
    function push($data) {
        echo json_encode($data) . ",";
    }
}

The only problem was that for an input like ["a", "b", "c"] the output json string was invalid: {"a","b","c",}.

So we needed to get rid of the last comma but not using any output buffering functions. The final solution was simple, output the first element last and based on that the JsonWriter class becomes:

class JsonWriter {
    // store first element
    private $element;

    function start() {
        header("Content-Type:application/json");
        echo "{";
    }
    function end() {
        echo json_encode($this->element);
        echo "}";
    }
    function push($data) {
        if (empty($this->element)) {
            $this->element = $data;
        } else {
            echo json_encode($data) . ",";
        }
    }
}