Mocking Generators in phpunit

01 Nov 2017 in PHP, phpunit

To mock a method call returning a generator one can utilize phpunit's returnCallback() method.

Suppose a class Foo that gets some data from a Provider class. The Provider class interacts with a database.

<?php
class Foo {
  private $provider;
  function __construct(Provider $provider) {
    $this->provider = $provider;
  }
  function bar() {
    $data = $this->provider->get();
    // ...do something with data...
    return $data;
  }
}

class Provider {
  private $pdo;
  function __construct(PDO $pdo) {
    $this->pdo = $pdo;
  }
  function get() {
    $stmt = $this->pdo->prepare(
      "SELECT something FROM table"
    );
    $stmt->execute();
    while(($row = $stmt->fetch())) {
      yield $row;
    }
  }
}

To unit test the Foo class, one can write the following:

<?php
class FooTest extends \PHPUnit_Framework_TestCase {
  private $sut;
  private $provider;
  function setup() {
    $this->provider = $this
      ->getMockBuilder(Provider::class)
      ->disableOriginalConstructor()
      ->setMethods(["get"])
      ->getMock();
    $this->sut = new Foo($this->provider);
  }
  function testBar() {
    // TODO: mock Provider::get() method
    $actual = $this->sut->bar();
    // ...assertions to follow...
  }
}

Because a generator returns an Iterator instead of an array, you can't mock Provider::get() with something like phpunit's returnValue(). Instead you can take advantage of phpunit's returnCallback() like below:

  // ...snip...
  function testBar() {
    $this->provider
         ->expects($this->any())
         ->method("get")
         ->will($this->returnCallback(function () {
            $data = [
                // mock values
            ];
            foreach ($data as $e) {
                // return a generator
                yield $e;
            }
         }));
  // ...snip...

Efficient pagination on a table with many records

30 Oct 2017 in MySQL

On this post, the author shows how to efficient paginate on a table with million records.

He shows the simplest (and wrong solution) of selecting all data via

SELECT fields FROM table

with all the filtering done in application logic.

Then he shows the classic

SELECT fields FROM table LIMIT X, Y

which fails (proportional to time) when getting the results near the last pages.

Finally he presents the efficient solution using the id field to filter the results with a query of

SELECT fields FROM table WHERE id > X LIMIT Y

which avoids the need to scan the table to get the right offset. Instead this is optimized on the WHERE portion of the query.

As you may guessed, the problem is to get the id offset right. On web applications that present data in a sequential manner this doesn't seem to be a problem.

Pretty printing JSON and XML documents

26 Oct 2017 in JSON, Shell, XML

I work many times in command line and here's how I'm pretty printing JSON documents:

# get file contents via cat, curl, wget, etc
cat <path/to/file.json> | python -mjson.tool

You may also use the jq command line utility (using the "." filter in order to pipe the result):

cat <path/to/file.json> | jq "." | less

For XML documents I use xmllint:

cat <path/to/file.xml> | xmllint --format -

If you have a schema file you can validate the document using something like:

xmllint --noout --schema <path/to/schema.xsd> <path/to/file.xml>

PHP: Generators and PDO

25 Oct 2017 in PHP

Do you know that you can use Generators in PHP to efficiently fetch data from a database?

Consider this code to get something from a database table:

<?php
$stmt = $this->pdo->prepare("SELECT * FROM table");
$stmt->execute();
// large typing version...
$rows = [];
foreach ($stmt as $row) {
    $rows[] = $row;
}
return $rows;
// ...or short typing version
return $stmt->fetchAll();

On large datasets this have a direct impact on memory footprint and you are probably going to hit the memory limit.

To avoid this problem, you can use generators like below:

<?php
$stmt = $this->pdo->prepare("SELECT * FROM table");
$stmt->execute();
foreach ($stmt as $row) {
    yield $row;
}

Be warned that this returns an Iterator/Generator instead of an actual array so you may have to wrap it in iterator_to_array when used in functions expecting an array instead of an Iterator. Eg:

<?php
// use it like
array_map($callable, iterator_to_array($data));
// instead of
array_map($callable, $data);

Writing XML with PHP

20 Oct 2017 in PHP, XML

In PHP there are many different ways to create XML documents from some data. You may use SimpleXML which is a quick way to construct XML documents:

$o = new \SimpleXMLElement('<object/>');
$o->addAttribute('version', '42');
$o->addAttribute('type', 'object');
$o->addChild('child', 'With a value');
echo $o->asXml();

The above code will output:

<?xml version='1.0'?>
<object version='42' type='object'>
<child>With a value</child>
</object>

There is also the DOM extension:

$xml = new \DOMDocument('1.0', 'UTF-8');
$root = $xml->createElement('object');
$root->setAttribute('version', '42');
$root->setAttribute('type', 'object');
$root->appendChild(
    $xml->createElement('child', 'With a value')
);
$xml->appendChild($root);
echo $xml->saveXml();

and XMLWriter:

$writer = new XMLWriter();
$writer->openURI('php://output');
$writer->startDocument('1.0', 'UTF-8');
$writer->startElement('object');
$writer->writeAttribute('version', '42');
$writer->writeAttribute('type', 'object');
$writer->writeElement('child', 'With a value');
$writer->endElement();
$writer->endDocument();
$writer->flush();

All of the above requires a lot of typing, effort and boilerplate to achieve the desired result and as complexity grows you may find yourself demotivated especially when working with deep structures of data.

A good alternative is sabre/xml which wraps XMLReader and XMLWriter making the creation of XML documents a more pleasant experience:

$service = new Sabre\Xml\Service();
echo $service->write('object', function(Sabre\Xml\Writer $writer) {
    $writer->writeAttribute('version','42');
    $writer->writeAttribute('type','object');
    $writer->writeElement('child', 'With a value');
});

The library also supports serialization on objects by either using a special method xmlSerialize on an object or by providing a callback method for a specific object/class.

How not to sort by average rating

17 Oct 2017 in Algorithms

An interesting article about rating based sorting:

PROBLEM: You are a web programmer. You have users. Your users rate stuff on your site. You want to put the highest-rated stuff at the top and lowest-rated at the bottom. You need some sort of "score" to sort by.

WRONG SOLUTION #1: Score = (Positive ratings) − (Negative ratings)

WRONG SOLUTION #2: Score = Average rating = (Positive ratings) / (Total ratings)

CORRECT SOLUTION: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

with a ruby implementation of:

require 'statistics2'

def ci_lower_bound(pos, n, confidence)
   if n == 0
       return 0
   end
   z = Statistics2.pnormaldist(1-(1-confidence)/2)
   phat = 1.0*pos/n
   (phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end

Whole article at evanmiller.org/how-not-to-sort-by-average-rating.html.

How to ship software that actually works

19 Sep 2017 in Software Development

Building great software and software that actually works, takes a lot of time, a lot of effort and it's hard, very hard.

Thomas Fuchs, author of Zepto.js, shares his checklist for getting software projects done:

Learn how to design things for humans

...design things for humans. I don't mean visual design (though that is part of it), I mean looking at a problem and figuring out how to create human-computer interactions that make people successful at solving the problems without having a hard time...

Stick to a few languages. Master them.

If there's a better solution in another language or environment only use it if gives you some really amazing advantage. It's often not worth the extra effort to become proficient enough with yet another tool.

Don't underestimate what it means for a production environments: things have to be provisioned, deployed, security-patched and monitored.

Don't follow the hype

Use what works for you. If you're productive in PHP, by all means, use PHP. Of course, sometimes technologies come along that actually measurably increase productivity or have other huge advantages, but it can't be overstated how few and far between those are—perhaps one or two happen in a decade.

Stick to a style

Just like languages, frameworks and libraries, the way you use a language seems to change like seasons. One month it's put it all in closures and the next month your hear that closures are so passé. Reduce cognitive stress while coding and debugging so you have more time thinking about the actual problem you want to solve.

Implement that minimum viable solution

It can't be said often enough: when writing code, don't write anything that the code doesn't absolutely need in order to work. Don't anticipate how you may extend the code in the future. It never turns out that way anyway. Concentrate on code that works, and write tests instead of wasting time on too much abstraction.

Avoid complexity

(...)

Coding > Configuration

Avoid pre-fabricated solutions that only solve your problem the first 80%. You're a programmer, not a configurator.

Never stop learning

Perhaps the best way to stay sharp is to occasionally do side projects, open source and perhaps micro-libraries. Experiment and tinker, so you don't lose the joy of creating things out of nothing.

You can read the whole article at http://mir.aculo.us/2015/08/25/how-to-actually-ship-software-that-actually-works/

Hype Driven Development

18 Sep 2017 in Rants

What kind of Hype Driven Developer are you?

Reddit driven
Conference driven
Loudest guy driven decisions
Gem/lib/plugin driven
Stack Overflow driven

Please answer after reading the article at blog.daftcode.pl:

Software development teams often make decisions about software architecture or technological stack based on inaccurate opinions, social media, and in general on what is considered to be "hot", rather than solid research and any serious consideration of expected impact on their projects.

Software quality at Microsoft

14 Sep 2017 in Rants

I'm a big fan of The Daily WTF and today this article popped out:

string isValidArticle(string article)
static StringBuilder vsb = new StringBuilder();
internal static string IsValidUrl(string value)
{
    if (value == null)
    {
        return "\"\"";
    }
    vsb.Length= 0;
    vsb.Append("@\"");
    for (int i=0; i<value.Length; i++)
    {
        if (value[i] == '\"')
            vsb.Append("\"\"");
        else
            vsb.Append(value[i]);
    }
    vsb.Append("\"");
    return vsb.ToString();
}
The code, taken on its own, is just bad. But when placed into context, it gets worse. This isn’t just code. It’s part of .NET's System.Runtime.Remoting package.

The method is named IsValidUrl, but it returns a string. It doesn't do any validation! All it appears to do is take any arbitrary string and return that string wrapped as if it were a valid C# string literal.

This entire file has one key job: generating a class capable of parsing data according to an input WSDL file... by using string concatenation.

The real WTF is the fact that you can embed SOAP links in RTF files and Word will attempt to use them, thus running the WSDL parser against the input data. This is code that’s a little bad, used badly, creating an exploited zero-day.

The full source code is available at referencesource.microsoft.com and I'm wondering, about the software quality at the time this code was written. Obviously nobody seemed to have reviewed this code.

Output JSON from big datasets

05 Sep 2017 in Algorithms, JSON, Software Architecture

The problem: some time ago we were creating a RESTfull service to give some JSON-encoded data to a client. The resulting data was so huge the webserver process run out of memory.

The script was something like:

<?php
header("Content-Type:application/json");
$data = fetchDataFromDatabaseAndTransformIt();
echo json_encode($data);

The proposed solution was to break the data into pieces and echo those pieces one at a time. This way we were able to have a constant memory consumption not depending on the size of data:

<?php
$jsonWriter = new JsonWriter();
$jsonWriter->start();
while (($data = fetchDataFromDatabaseAndTransformItOnePieceAtATime())) {
    $jsonWriter->push($data);
}
$jsonWriter->end();

With the JsonWriter class something like:

class JsonWriter {
    function start() {
        header("Content-Type:application/json");
        echo "{";
    }
    function end() {
        echo "}";
    }
    function push($data) {
        echo json_encode($data) . ",";
    }
}

The only problem was that for an input like ["a", "b", "c"] the output json string was invalid: {"a","b","c",}.

So we needed to get rid of the last comma but not using any output buffering functions. The final solution was simple, output the first element last and based on that the JsonWriter class becomes:

class JsonWriter {
    // store first element
    private $element;

    function start() {
        header("Content-Type:application/json");
        echo "{";
    }
    function end() {
        echo json_encode($this->element);
        echo "}";
    }
    function push($data) {
        if (empty($this->element)) {
            $this->element = $data;
        } else {
            echo json_encode($data) . ",";
        }
    }
}