Thumbnail Coping with type invariance in PHP

Door: Thijs Zumbrink
23-02-2017 17:40

In this article I will discuss one of the problems you may run into when depending on PHP's relatively young type system. This problem is type invariance: the inability to change a method's signature when implementing an interface: it must exactly match including the type hints, which is too restrictive for optimal use. In the second half of the article, I give an example based on code in a real library.

Why a type system?

But before diving into that, let's see why we want a type system at all. Since type usage in PHP is optional, the pragmatic option is to just ignore them when they stand in-between you and your problem. However, I like a type system for the following reasons:

Most obviously, using type hints clarifies the usage of your method: if it takes a plain $phoneNumber parameter, I don't know what format to pass it in. Is it a string, an integer (please don't do that!) or possibly a PhoneNumber object? With a proper annotation the doubt goes away: by type hinting to PhoneNumber, I should be able to pass any valid instance of that class, and the class will help me construct valid instances. While this can also be achieved via docblocks, but they are not enforced by the interpreter.

Furthermore, by using type hinting to "explain" the program, you help the compiler and any static analysis tools to reason about your program. This also means that they can help you prevent mistakes. Rather than waiting for your program to blow up on specific cases during runtime, using static analysis you can know beforehand, without even running the code, whether a call is logically valid. I recommend the phpstan tool. Mistakes will surface earlier and as a result are easier to fix or prevent at all.

Lastly it can help guide the design of your code. If applying type hints or keeping static analysis happy proves difficult, it may be a smell of bad architecture. The most obvious improvement will usually be the introduction of value objects, such as the PhoneNumber mentioned above. Especially when designing an interface, which will be used by others and should provide a good code contract, putting extra thought into parameter and return types will result in a cleaner design. For example, static analysis can point you to a circular dependency when inspecting the types. When solved, this could make your code significantly easier to reason about.

Having said that, if you're of the "pragmatist school" that likes to get things done without too much nitpicking, the rest of this post will probably irritate you. However if you care about the understandability of your code and are willing to stress the details to get there, read on!

Type variance

Here's a little refresher or introduction to the terms type variance, invariance co- and contravariance. If you're already familiar with these terms, feel free to skip the section.

The concept is quite logical, in PHP, as in many other languages, the types of parameters are covariant. This means that you may pass a more specific object than needed. For example, you can pass a Triangle value to a method accepting a Shape, since a Triangle is a specific Shape. (Assuming that Triangle inherits from the Shape class.)

Return types are similarly covariant. A method may return something more specific than it promises. If it promises to return a Shape, it is still correct when actually returning a Triangle.

The concept changes turns around when we talk about overriding methods from a base class, or implementing interface methods. In this case, parameter types are contravariant: your overridden method may accept less specific parameters. So for a base method that needs a Triangle, it's perfectly reasonable to override it, accepting a Shape instead. Existing Triangle calls are still supported after all.

Return types stay covariant in this case: your overridden method may return something more specific than the base method. It's reasonable to return a Triangle, since existing callers expect a Shape anyway.

These are both examples of type variance. On the other hand, invariance simply means that whatever values you pass, or whatever method you override, it must exactly match the specification.

PHP's type invariance

You may think that PHP already does this well, but you'd be mistaken. As for program execution this is all fine: parameters are covariant and returned values are contravariant. However, when overriding base methods or implementing interface methods, all types become invariant.

This has consequences for the design of your code. When leveraging the type system to guide program design, expecting reasonable type variance is not too much to ask. In some cases it makes a good design impossible to actually implement since we must exactly copy the method signature prescribed by the interface. Specifically I will focus on return type invariance, since it's the most annoying one.

Fortunately this is not intentional. When the RFC for return type declarations was first written, it allowed for covariant return type hints. However, due to implementation difficulties related to autoloading, this was changed to invariant return types, with a note that it should be improved later. Read an explanation of the issue here.

Invariance obstacle

So what is actually the issue with all this? I stumbled into this when designing an interface between a (turn-based) game and an AI that decides the moves to play. Consider TicTacToe, where the state of the game is stored in an instance of the TicTacToeState class. Applying a move results in a new state, following immutability best practices:

class TicTacToeState
{
public function makeMove(TicTacToeMove $move): TicTacToeState
{
$newState = clone $this;
// (fill the cell using coordinates in $move...)
// (proceed the turn order...)
return $newState;
}

public function getBoard(): TicTacToeBoard
{
// Example of another method in this class
// Contents not important
return $this->board;
}
}


That seems reasonable, and a runner for the game could look something like this:

$state = new TicTacToeState();

while (...) {
$coordinates = getInputCoordinates();
$move = new TicTacToeMove($coordinates);
$state = $state->makeMove($move);
$board = $state->getBoard();
displayBoard($board);
}


The issue arises when creating an interface for an AI. The interface needs to contain (among other things) the makeMove method, so that multiple moves can be evaluated and the best one picked. What if we try this:

interface Move
{

}

interface GameState
{
public function makeMove(Move $move): GameState;

// (In reality we need more methods in this interface, such as
// a method that gives us all possible moves. Otherwise the AI
// cannot discover its choices. Also a method that evaluates the
// score. They are left out for simplicity.)
}


When we try this and apply the variance theory, we can see that the structure of the program is incorrect. The problem lies in the $move parameter. Using the arguments of type theory, we may implement this method by accepting a Move parameter or something more general. But what we need is a TicTacToeMove! How else are we going to know the coordinates of the intended cell? This is a fundamental flaw, not even caused by PHP's invariance shortcoming.

Intuitively we can think of it as the AI possibly (although in practice it doesn't happen) wanting to pass other Move instances as well. And passing a BattleshipMove into a TicTacToeState doesn't sound very appealing. And while not a huge problem, if we can eliminate a possible illegal use of our method, I'm all for it.

So we restructure the code and interface a little bit. The idea is to pass the TicTacToeState and Coordinates into the TicTacToeMove constructor and keep them contained in the move. This way, any validly constructed TicTacToeMove only acts on TicTacToeStates. We arrive at the following solution:

interace Move
{
// Any source GameState data should be provided by the concrete
// program implementation, e.g. via the constructor.

public function apply(): GameState;
}

interface GameState
{
// (Other interface methods)
}

class TicTacToeMove implements Move
{
public function __construct(
TicTacToeState $state,
Coordinates $coords
) {
...
}

public function apply(): TicTacToeState
{
$newState = clone $this->sourceState;
// (fill the cell using $this->coords...)
// (proceed the turn order in $newState...)
return $newState;
}
}


This solves the design issue with the parameter, but it won't run. The problem is that the concrete class specifies TicTacToeState as a return type, but PHP's invariance forces us to specify GameState instead. If we do that the method remains correct, but calling it from the game execution code would go awry:

$state = new TicTacToeState();

while (...) {
$coordinates = getInputCoordinates();
$move = new TicTacToeMove($state, $coordinates);
$state = $move->apply();
$board = $state->getBoard();
// ERROR: $state is a GameState, not a TicTacToeState!
// Therefore getBoard() does not exist.
displayBoard($board);
}


Granted, it's only an error in static analysis tools with the check level tuned to strict, but if that's actually what we want, it shows that our solution is invalid. And this time it's caused by PHP's return type invariance, not our design.

Solutions

We cannot put getBoard() into the GameState interface, since it is specific to TicTacToe. Furthermore the AI system has nothing to do with boards, it just needs to know about moves and their effects.

We could accept the GameState and cast it again to a TicTacToeState, but that feels wrong for something that could be solved with a proper type system. Also, I'm sure the static analysis wouldn't be too happy about that.

We could return the GameState type, but add a docblock explaining that it's actually a TicTacToeState. That would at least make our editor happy, and static analysis might also accept it depending on the tool used.

Another possibility is to have the concrete TicTacToe code call a method that properly returns a TicTacToeState, while the interfaced AI code calls another method that returns a GameState. This trades some method pollution but ultimately makes PHP and static analysis tools happy:

interace Move
{
// Any source GameState data should be provided by the concrete
// program implementation, e.g. via the constructor.

public function applyAbstract(): GameState;
}

interface GameState
{
// (Other interface methods)
}

class TicTacToeMove implements Move
{
public function __construct(
TicTacToeState $state,
Coordinates $coords
) {
...
}

/**
* @return TicTacToeState
* @todo fix return type hint when return type variance is supported
* for overriding/implementing methods.
*/
public function applyAbstract(): GameState
{
return $this->apply();
}

public function apply(): TicTacToeState
{
$newState = clone $this->sourceState;
// (fill the cell using $this->coords...)
// (proceed the turn order...)
return $newState;
}
}


Conclusion

We have seen how paying attention to the type system can guide the design of your code. If you are a "type system believer", you will notice that PHP has an unintended shortcoming that might bite you, as illustrated with a real example. (From the minimax project.) This shortcoming will most likely be fixed, but it is not yet known when that will happen.

As we have also encountered, there are multiple types of causes that can underlie different invariance problems. Some actually point to design flaws, while others are only problematic because of the shortcoming in PHP. This can cause confusion, so think before dismissing it as one or the other cause.

Reacties
Log in of registreer om reacties te plaatsen.