Index

This file was automatically generated from http://svn.pugscode.org/pugs/docs/notes/theory.pod on Wed Jun 6 22:16:47 2007 GMT, revision 16639.

Perl 6 Theories


TITLE

Perl 6 Theories - Grand unified object model

SYNOPSIS

    # theory representing a total order
    theory Ordered{::T} {
        multi infix:«<=» (T $x, T $y --> Bool) {...}
        multi infix:«>=» (T $x, T $y --> Bool) { $y <= $x }
        multi infix:«>»  (T $x, T $y --> Bool) { not $x <= $y }
        ...
    }

    # a cons list data type
    union MyList[::T] (
        MyNil,
        MyCons(T $.head, MyList[T] $.tail),
    );
    
    # Lists of ordered things are lexographically ordered
    # (read <= as "if" rather than "less than or equal to")
    model Ordered{MyList[::T]} <= Ordered{::T} {
        multi infix:«<=» (MyNil,  _)     { true }
        multi infix:«<=» (MyCons, MyNil) { false }
        multi infix:«<=» (MyCons ($h1, $t1), MyCons ($h2, $t2)) {
            when $h1 <= $h2 { true }
            when $h2 >  $h1 { false }
            default         { $t1 <= $t2 }
        }

        # the theory automatically fills in <, >, >=, ==, etc.
    }

DESCRIPTION

If Perl 6 is to have a strongly typed dialect but still maintain its dynamicity, it must have a theoretical model for its typing. Strong typing without a model ends up with kludges and special cases (see C++). This document presents a model that is precise and general, but still feels like Perl.

The root of this model is the distinction between interface and implementation. In particular, it holds the idea of a mock object very dearly; that is, if you can create an object that pretends to be of a particular type, then it can be used anywhere that a real object of that type can. In fact, this proposal destroys the concept of a mock object, by saying that any object that conforms to the interface of a type is of that type.

You can never refer to a concrete type explicitly, and you cannot create a concrete type without creating an interface to go along with it.

Before we get to the type theory, there are two pieces of machinery that need to be built: tuples and pattern constructors.

Tuples

The first piece of machinery we need is the tuple. The tuple underlies all of the semantics here, but you never really see it. Perl 6's tuple is much richer than tuples of other languages; in short, it is an object representing a parameter list.

The tuple has three parts: positional, named, and block, any of which can be empty. It is constructed using the comma.

    my $tuple = (1, :foo(2), 3, 4):{ $_ + 1 };

This tuple has three positionals, 1,3,4; one named, foo => 2; and one block, { $_ }. The block parameters are specified as adverbs on the tuple as a whole. If you want a block inside a positional, just put one there. If you want a pair inside a positional, enclose it in parentheses (that is, named elements are lexically determined; in the tuple (1,$x), $x will always be positional, whether or not it contains a pair):

    my $tuple = ({ $_ + 1 }, (foo => 2));  # two positionals

Unlike in most other languages, a single-argument tuple is not the same as a value. To create a single-argument tuple (which ought to be an uncommon thing), use a null comma at the end:

    my $tuple = ("foo",);                  # a tuple with one positional

Tuples, like lists, may be flattened into other tuples:

    my $a = (1,2,3);    # a tuple with positionals 1,2,3
    my $b = (0, [,]($a), 4);  # a tuple with positionals 0,1,2,3,4

As a special case, function application will automatically (lexically) tuplize a single argument:

    foo($x);            # equivalent to foo($x,)

And therefore, you can think of all functions as taking only a single argument: the tuple of the argument list. This will be important later.

Lists and tuples should interact in a reasonable way so that you can pretend that they're the same thing (but in a strongly typed dialect, they are quite different). XXX handwave

Since a tuple is essentially an argument list, a tuple pattern is essentially a parameter list. The meaning of the slurpy scalar *$x has changed. Now it is just like a slurpy list, except that it gathers the rest of the signature into a tuple $x instead of a list. To facilitate the technique that the slurpy scalar was originally introduced for, we can use this:

    sub first (*[ $x, *@xs ]) { $x }

The upshot is that you can easily delegate an arbitrary argument list to another function:

    sub foo (*$args) { bar([,] $args) }   # call bar with whatever args foo
                                       # was called with

Tuples also have a pattern constructor, comma, for use in signatures and bindings. So you can say:

    multi snd (($x, $y)) { $y }

snd takes only one argument (no, I mean that in the usual sense, not the every-function-takes-exactly-one-argument sense), where that argument is a two-tuple, and it extracts the second element. Now, what the heck is a pattern constructor?

Pattern Constructors

Before we get to exactly what a pattern constructor is, let's define what a pattern is. A pattern is a thing that can extract "subinformation" out of an object as it's being bound. It appears in argument lists like so:

    rule parameter { <type> <zone> <parameter_name>  <pattern>? 
                   |        <zone> <parameter_name>? <pattern>? }

There are several patterns with special syntax: square brackets for matching inside arrays, curly brackets for matching out of hashes, bare parentheses for matching a tuple, all numeric and string literals (for matching themselves under ===), perhaps among others. These either require a set of syntactic categories (like pattern:term:<...>) or a straight-up grammar munge.

The patterns that are easy to define are the normal patterns. These are given by the rule:

    rule normal_pattern { <pattern_name> <tuple_pattern>? }

(If the tuple_pattern is omitted, the empty tuple () is assumed)

So you'll have things like:

    multi traverse (Leaf ($v), *&code) { &code($v) }
    multi traverse (Branch ($l, $r), *&code) { 
        traverse($l):&code;
        traverse($r):&code;
    }

Where that first line could have also been either of:

    multi traverse ($leaf Leaf ($v), *&code)   {...}
    multi traverse (Tree _ Leaf ($v), *&code)  {...}

But those don't add anything to the readability in this case.

How does one define a normal pattern constructor? By placing names into the pattern: grammatical category. A normal pattern constructor takes a type and returns a tuple, which is then unified against the given tuple pattern.

For instance, I could define an identity pattern that would just return the one-tuple of the object:

    sub pattern:<Id> ($obj) { ($obj,) }

Or the typed version:

    sub pattern:<Id> (::T $obj --> (::T,)) { ($obj,) }

You could use this pattern to give more than one name to an argument:

    sub mul3 ($x Id ($y Id ($z))) { $x + $y + $z }

Other than that, it's not useful, of course. But that goes to show that if the pattern constructors are typed, they imply those type constraints on the corresponding argument:

    sub pattern:<Leaf> (Tree $t --> (Any,)) {...}

    multi traverse (Tree _ Leaf ($value)) {...}

The Tree in the second declaration is redundant; the Leaf pattern already implies it.

Classes and Roles

There are two ways to create a concrete type, otherwise known as a class. One is with the class declarator, the other is the union declarator (which we'll get to in a bit). A class defined by its instance data (or attributes) and its interface (or methods).

    class Account {
        has $.balance;     # instance data and interface
        has $.name;        # instance data and interface
        has $._password;   # instance data
        
        method withdrawl ($amount, $password) {  # interface
            # ...
        }
    }

This definition does two things: it defines the package Account with a new constructor (which puts together the instance data), and it defines the role Account that contains pretty much the body of the class, specifying its interface with default implementations (when you say has $.balance, you are specifying the interface method balance with a default implementation of object state). Whenever you say Account in type position (after a my or in a signature), you are referring to this role. Whenever you say Account elsewhere, you are referring to the package.

To subclass a class, you say that a new class does the old one:

    class CreditAccount {
        does Account;
        
        method withdrawl ($amount, $password) {
            # try to borrow some money
        }
    }

Of course, you are specializing the role Account, not the class. In fact, the class doesn't exist at all; it is just a notation to define the package, concrete type, and role all at once.

Unions and Factories

Perl 6 supports union types (also called case types or algebraic data types). Union types are to factores what classes are to roles (more on that later). To demonstrate the syntax, let's define a simple binary tree:

    union Tree (
        Leaf($value),
        Branch(Tree $left, Tree $right),
    );

The case types (Leaf and Branch) are defined as a constructor name followed by a tuple pattern; i.e. an argument list. Like class, there is no such thing as the union Tree; it is just a notation to define a bunch of related things:

It defines the packages Tree, Tree::Leaf, and Tree::Branch. It defines (so far empty) roles Tree::Leaf and Tree::Branch, and the factory Tree:

    # automatically defined
    factory Tree {
        generator Leaf ($value) {...}
        generator Branch (Tree $left, Tree $right) {...}
    }

A factory is the dual of a role: it defines a bunch of functions that return Trees (where roles define functions that accept trees). generator is the corresponding dual to method (it returns the thing in question). We'll see why there's all this vocabulary later.

Unions can be parameterized over one or more types:

    union Tree[::T] (
        Leaf(T $value),
        Branch(Tree[T] $left, Tree[T] $right),
    );

And the return type of the constructor functions can be stated explicitly to form a GADT:

    union AST[::] (    # :: is a siglet meaning "a type"
        Const(::T                          --> AST[::T]),
        Cond(AST[Bool], AST[::T], AST[::T] --> AST[::T]),
        Succ(AST[Int]                      --> AST[Int]),
        ...
    );

If the argument names to the constructors are given $.attribute form, then attribute accessors are defined on the type:

    union Tree (
        Leaf($.value),
        Branch(Tree $.left, Tree $.right),
    );
    Leaf(4).value;                         # 4
    Branch(Leaf(2), Leaf(3)).right.value;  # 3
    Leaf(4).left;   # method found but pattern match failed

Unions can be extended:

    union AnnotatedTree does Tree (
        Annotation($.str, Tree $.tree),
    );

When you extend a union, you create a superset (but still a subtype). That is, everything that is a Tree is also an AnnotatedTree (superset), but an AnnotatedTree may be used anywhere a Tree can.

Theories

Roles and factories are special cases of a much more general structure called a theory. A theory declares a general algebraic structure over several types. A good example of a theory that is not a role or a factory is the definition of a ring (roughly an integer-like thing):

    theory Ring{::R} {
        multi infix:<+>   (R, R --> R) {...}
        multi prefix:<->  (R --> R)    {...}
        multi infix:<->   (R $x, R $y --> R) { $x + (-$y) }
        multi infix:<*>   (R, R --> R) {...}
        # only technically required to handle 0 and 1
        multi coerce:<as> (Int --> R)  {...}
    }

This says that in order for a type R to be a ring, it must supply these operations. The operations are necessary but not sufficient to be a ring; you also have to obey some algebraic laws (which are, in general, unverifiable programmatically), for instance, associativity over addition: (a + b) + c == a + (b + c).

To instantiate the theory, that is, declare that some type obeys these laws, you model it:

    model Ring{Int} {
        # define all needed operations
    }

model is really just a shorthand for an anonymous theory that models another:

    theory {
        models Ring{Int};
        # definitions
    }

Saying that a theory models another theory says that if you obey the laws of the former, then you automatically obey the laws of the latter.

How do we actually use theories? We can specify constraints over type variables with them:

    multi pow (::T $x, Int $power --> ::T <= Ring{::T}) {
        my $result = $x;
        for (2..$power) { $result *= $x }
        return $result;
    }

This says that pow works on any first argument, as long as it belongs to a type that models Ring. <= is to be read as "where" or "requires" or "is implied by". It needs a better spelling.

Roles and factories reduce to theories using a topic type. So the role:

    role Foo {
        method bar () {...}
    }

Is equivalent to the theory:

    theory Foo{::T} {
        multi bar (T) {...}
    }

Because the role specifies a topic type (which we named ::T when we converted to a theory), and method defines a multi with its first argument being the topic type. It is an error to define a method when there is no topic type.

Conversely, a factory just reduces to a theory with a topic type. generator defines a multi with its return type being the topic type.

But roles and factories are a bit more special than theories. They have one additional property each:

That is, every subset of something that does a role also does that role, and every superset of something that does a factory also does that factory. This is self-evident when you notice that roles can only have methods and factories can only have generators.

When you specify a type in a type signature, you're really specifying a type variable with a constraint:

    multi foo (Bar $x) {...}

Is equivalent to:

    multi foo (::T $x <= Bar{::T}) {...}

And so a role that returns the type the role defines is still a role, even though it appears to depend on the type contravariantly:

    role Foo {
        method bar (--> Foo) {...}
    }

Is equivalent to:

    theory Foo{::T} {
        multi bar (T --> Foo) {...}
        # which is
        multi bar (T --> ::U <= Foo{U}) {...}
    }

That is, you don't return the same type, you just return one that conforms to the same interface.

Finally, let's look at the full reduction of our Tree union type and try to understand it:

    union Tree (
        Leaf($value),
        Branch(Tree $left, Tree $right),
    );

This turns into:

    theory Leaf{::T} {   # expanded from "role Leaf"
        multi pattern:<Leaf> (T --> (Any,)) {...}
    }

    theory Branch{::T} {
        multi pattern:<Branch> (T --> (::U, ::V) 
                                <= Tree{::U} && Tree{::V}) {...}
    }

    theory Tree{::T} <= Leaf{::T} || Branch{::T} {   # note the or
        multi Leaf ($value --> T) {...}
        multi Branch (::A $left, ::B $right --> T 
                        <= Tree{::A} && Tree{::B}) 
        {...}
    }

Whew, good thing we don't have to write that. As you can see, even though it looks like Branch is behaving covariantly by looking at the union, it isn't. A branch accepts any two things that behave like this Tree, not the tree itself.

Vocabulary

Here is a summary of the object-oriented vocabulary in Perl:

* theory
The big one in the structures department. Specifies a general algebra over one or more types.
* role
A covariant, unary theory, that is, a theory over a single type where the functions defined within only take the type as a parameter, never return it.
* factory
A contravariant, unary theory, that is, a theory over a single type where the functions defined within only return the type, never take it as a parameter.
* class
A notation that simultaneously defines a role, a concrete type, and a package of the given name.
* union
A notation that simultaneously defines a factory together with a concrete type, a role, and a package for each case in the factory.
* model
A notation for a zero-parameter theory that models another theory.
* models
Declares implication over structures. If one structure models another, that means that if a type obeys the former, then it obeys the latter.
* does
A unary shorthand for models. A does B is a shorthand for A{::T} models B{::T}.
* multi
The big one in the functions department. Specifies a multiply-implemented function dispatched based on the involved types.
* method
Goes with roles. Defines a multi that implicitly takes the topic type as its first parameter.
* generator
Goes with factories. Defines a multi that implicitly returns the topic type.

Notes

There's a little problem with the GADT form:

    union AST[::] (
        Const(::T --> AST[::T]),
    );

This convolves to:

    theory AST{::A, ::B} {
        multi Const(::T --> ::U <= AST[::T]{::U}) {...}
    }

Which doesn't even say anything about ::B. It should be:

    theory AST{V} {
        multi Const(::T --> ::U <= V{::T, ::U}) {...}
    }

That is, it is a theory-valued theory. I guess this makes sense, as all types are really theories. But so far type variables can only assume concrete types; that's the cool thing about them. You can never explicitly mention a concrete type, but that's the only thing a variable can be, so you have to go through interfaces.

I suppose theory-valued theories are necessary if we want GADT (and there are probably some other uses as well), but I fear what it will do to the type inferencer. Maybe there is another way.

UPDATE: There is, and here it is. Each case gets its own theory, which lifts the return value from the constructor. Let's consider a two-constructor union:

    union AST[::] (
        Const(::T         --> AST[::T]),
        TestZero(AST[Int] --> AST[Bool]),
    );
    

This convolves to:

    theory AST::Const{::T, ::ASTT} {
        multi Const(T --> ASTT) {...}
    }

    theory AST::TestZero{::ASTBool} {
        multi TestZero(::T --> ASTBool 
                    <= AST{::TInt, ::T} && Int{::TInt}) {...}
    }

    # I'll use => as a shorthand for "implies"; i.e. a => b is
    # equivalent to !a || b
    theory AST{::Param, ::Tree}
            <= AST::Const{::Param, ::Tree}
            && (Bool{::Param} => AST::TestZero{::Tree})
    { }

So it all reduces to complex logic between concrete (non-variable) theories T and type variables ::V in the form T{::V}.