This file was automatically generated from http://svn.pugscode.org/pugs/docs/notes/theory.pod on Wed Jun 6 22:16:47 2007 GMT, revision 16639.
Perl 6 Theories - Grand unified object model
# theory representing a total order
theory Ordered{::T} {
multi infix:«<=» (T $x, T $y --> Bool) {...}
multi infix:«>=» (T $x, T $y --> Bool) { $y <= $x }
multi infix:«>» (T $x, T $y --> Bool) { not $x <= $y }
...
}
# a cons list data type
union MyList[::T] (
MyNil,
MyCons(T $.head, MyList[T] $.tail),
);
# Lists of ordered things are lexographically ordered
# (read <= as "if" rather than "less than or equal to")
model Ordered{MyList[::T]} <= Ordered{::T} {
multi infix:«<=» (MyNil, _) { true }
multi infix:«<=» (MyCons, MyNil) { false }
multi infix:«<=» (MyCons ($h1, $t1), MyCons ($h2, $t2)) {
when $h1 <= $h2 { true }
when $h2 > $h1 { false }
default { $t1 <= $t2 }
}
# the theory automatically fills in <, >, >=, ==, etc.
}
If Perl 6 is to have a strongly typed dialect but still maintain its dynamicity, it must have a theoretical model for its typing. Strong typing without a model ends up with kludges and special cases (see C++). This document presents a model that is precise and general, but still feels like Perl.
The root of this model is the distinction between interface and implementation. In particular, it holds the idea of a mock object very dearly; that is, if you can create an object that pretends to be of a particular type, then it can be used anywhere that a real object of that type can. In fact, this proposal destroys the concept of a mock object, by saying that any object that conforms to the interface of a type is of that type.
You can never refer to a concrete type explicitly, and you cannot create a concrete type without creating an interface to go along with it.
Before we get to the type theory, there are two pieces of machinery that need to be built: tuples and pattern constructors.
The first piece of machinery we need is the tuple. The tuple underlies all of the semantics here, but you never really see it. Perl 6's tuple is much richer than tuples of other languages; in short, it is an object representing a parameter list.
The tuple has three parts: positional, named, and block, any of which can be empty. It is constructed using the comma.
my $tuple = (1, :foo(2), 3, 4):{ $_ + 1 };
This tuple has three positionals, 1,3,4; one named, foo => 2;
and one block, { $_ }. The block parameters are specified as adverbs
on the tuple as a whole. If you want a block inside a positional, just
put one there. If you want a pair inside a positional, enclose it in
parentheses (that is, named elements are lexically determined; in the
tuple (1,$x), $x will always be positional, whether or not it contains a
pair):
my $tuple = ({ $_ + 1 }, (foo => 2)); # two positionals
Unlike in most other languages, a single-argument tuple is not the same as a value. To create a single-argument tuple (which ought to be an uncommon thing), use a null comma at the end:
my $tuple = ("foo",); # a tuple with one positional
Tuples, like lists, may be flattened into other tuples:
my $a = (1,2,3); # a tuple with positionals 1,2,3
my $b = (0, [,]($a), 4); # a tuple with positionals 0,1,2,3,4
As a special case, function application will automatically (lexically) tuplize a single argument:
foo($x); # equivalent to foo($x,)
And therefore, you can think of all functions as taking only a single argument: the tuple of the argument list. This will be important later.
Lists and tuples should interact in a reasonable way so that you can pretend that they're the same thing (but in a strongly typed dialect, they are quite different). XXX handwave
Since a tuple is essentially an argument list, a tuple pattern is
essentially a parameter list. The meaning of the slurpy scalar *$x
has changed. Now it is just like a slurpy list, except that it gathers
the rest of the signature into a tuple $x instead of a list. To
facilitate the technique that the slurpy scalar was originally
introduced for, we can use this:
sub first (*[ $x, *@xs ]) { $x }
The upshot is that you can easily delegate an arbitrary argument list to another function:
sub foo (*$args) { bar([,] $args) } # call bar with whatever args foo
# was called with
Tuples also have a pattern constructor, comma, for use in signatures and bindings. So you can say:
multi snd (($x, $y)) { $y }
snd takes only one argument (no, I mean that in the usual sense, not
the every-function-takes-exactly-one-argument sense), where that
argument is a two-tuple, and it extracts the second element. Now, what
the heck is a pattern constructor?
Before we get to exactly what a pattern constructor is, let's define what a pattern is. A pattern is a thing that can extract "subinformation" out of an object as it's being bound. It appears in argument lists like so:
rule parameter { <type> <zone> <parameter_name> <pattern>?
| <zone> <parameter_name>? <pattern>? }
There are several patterns with special syntax: square brackets for
matching inside arrays, curly brackets for matching out of hashes, bare
parentheses for matching a tuple, all numeric and string literals (for
matching themselves under ===), perhaps among others. These either
require a set of syntactic categories (like pattern:term:<...>)
or a straight-up grammar munge.
The patterns that are easy to define are the normal patterns. These are given by the rule:
rule normal_pattern { <pattern_name> <tuple_pattern>? }
(If the tuple_pattern is omitted, the empty tuple () is assumed)
So you'll have things like:
multi traverse (Leaf ($v), *&code) { &code($v) }
multi traverse (Branch ($l, $r), *&code) {
traverse($l):&code;
traverse($r):&code;
}
Where that first line could have also been either of:
multi traverse ($leaf Leaf ($v), *&code) {...}
multi traverse (Tree _ Leaf ($v), *&code) {...}
But those don't add anything to the readability in this case.
How does one define a normal pattern constructor? By placing names into
the pattern: grammatical category. A normal pattern constructor
takes a type and returns a tuple, which is then unified against the
given tuple pattern.
For instance, I could define an identity pattern that would just return the one-tuple of the object:
sub pattern:<Id> ($obj) { ($obj,) }
Or the typed version:
sub pattern:<Id> (::T $obj --> (::T,)) { ($obj,) }
You could use this pattern to give more than one name to an argument:
sub mul3 ($x Id ($y Id ($z))) { $x + $y + $z }
Other than that, it's not useful, of course. But that goes to show that if the pattern constructors are typed, they imply those type constraints on the corresponding argument:
sub pattern:<Leaf> (Tree $t --> (Any,)) {...}
multi traverse (Tree _ Leaf ($value)) {...}
The Tree in the second declaration is redundant; the Leaf pattern
already implies it.
There are two ways to create a concrete type, otherwise known as a
class. One is with the class declarator, the other is the union
declarator (which we'll get to in a bit). A class defined by its
instance data (or attributes) and its interface (or methods).
class Account {
has $.balance; # instance data and interface
has $.name; # instance data and interface
has $._password; # instance data
method withdrawl ($amount, $password) { # interface
# ...
}
}
This definition does two things: it defines the package Account with
a new constructor (which puts together the instance data), and it
defines the role Account that contains pretty much the body of the
class, specifying its interface with default implementations (when you
say has $.balance, you are specifying the interface method balance
with a default implementation of object state). Whenever you say
Account in type position (after a my or in a signature), you are
referring to this role. Whenever you say Account elsewhere, you are
referring to the package.
To subclass a class, you say that a new class does the old one:
class CreditAccount {
does Account;
method withdrawl ($amount, $password) {
# try to borrow some money
}
}
Of course, you are specializing the role Account, not the class. In
fact, the class doesn't exist at all; it is just a notation to define
the package, concrete type, and role all at once.
Perl 6 supports union types (also called case types or algebraic data types). Union types are to factores what classes are to roles (more on that later). To demonstrate the syntax, let's define a simple binary tree:
union Tree (
Leaf($value),
Branch(Tree $left, Tree $right),
);
The case types (Leaf and Branch) are defined as a constructor
name followed by a tuple pattern; i.e. an argument list. Like
class, there is no such thing as the union Tree; it is just a
notation to define a bunch of related things:
It defines the packages Tree, Tree::Leaf, and Tree::Branch. It
defines (so far empty) roles Tree::Leaf and Tree::Branch, and the
factory Tree:
# automatically defined
factory Tree {
generator Leaf ($value) {...}
generator Branch (Tree $left, Tree $right) {...}
}
A factory is the dual of a role: it defines a bunch of functions that
return Trees (where roles define functions that accept trees).
generator is the corresponding dual to method (it returns the
thing in question). We'll see why there's all this vocabulary later.
Unions can be parameterized over one or more types:
union Tree[::T] (
Leaf(T $value),
Branch(Tree[T] $left, Tree[T] $right),
);
And the return type of the constructor functions can be stated explicitly to form a GADT:
union AST[::] ( # :: is a siglet meaning "a type"
Const(::T --> AST[::T]),
Cond(AST[Bool], AST[::T], AST[::T] --> AST[::T]),
Succ(AST[Int] --> AST[Int]),
...
);
If the argument names to the constructors are given $.attribute form,
then attribute accessors are defined on the type:
union Tree (
Leaf($.value),
Branch(Tree $.left, Tree $.right),
);
Leaf(4).value; # 4
Branch(Leaf(2), Leaf(3)).right.value; # 3
Leaf(4).left; # method found but pattern match failed
Unions can be extended:
union AnnotatedTree does Tree (
Annotation($.str, Tree $.tree),
);
When you extend a union, you create a superset (but still a subtype).
That is, everything that is a Tree is also an AnnotatedTree
(superset), but an AnnotatedTree may be used anywhere a Tree can.
Roles and factories are special cases of a much more general structure called a theory. A theory declares a general algebraic structure over several types. A good example of a theory that is not a role or a factory is the definition of a ring (roughly an integer-like thing):
theory Ring{::R} {
multi infix:<+> (R, R --> R) {...}
multi prefix:<-> (R --> R) {...}
multi infix:<-> (R $x, R $y --> R) { $x + (-$y) }
multi infix:<*> (R, R --> R) {...}
# only technically required to handle 0 and 1
multi coerce:<as> (Int --> R) {...}
}
This says that in order for a type R to be a ring, it must supply these
operations. The operations are necessary but not sufficient to be a
ring; you also have to obey some algebraic laws (which are, in general,
unverifiable programmatically), for instance, associativity over
addition: (a + b) + c == a + (b + c).
To instantiate the theory, that is, declare that some type obeys these laws, you model it:
model Ring{Int} {
# define all needed operations
}
model is really just a shorthand for an anonymous theory that models
another:
theory {
models Ring{Int};
# definitions
}
Saying that a theory models another theory says that if you obey the
laws of the former, then you automatically obey the laws of the latter.
How do we actually use theories? We can specify constraints over type variables with them:
multi pow (::T $x, Int $power --> ::T <= Ring{::T}) {
my $result = $x;
for (2..$power) { $result *= $x }
return $result;
}
This says that pow works on any first argument, as long as it belongs
to a type that models Ring. <= is to be read as "where" or
"requires" or "is implied by". It needs a better spelling.
Roles and factories reduce to theories using a topic type. So the role:
role Foo {
method bar () {...}
}
Is equivalent to the theory:
theory Foo{::T} {
multi bar (T) {...}
}
Because the role specifies a topic type (which we named ::T when we
converted to a theory), and method defines a multi with its first
argument being the topic type. It is an error to define a method
when there is no topic type.
Conversely, a factory just reduces to a theory with a topic type.
generator defines a multi with its return type being the topic
type.
But roles and factories are a bit more special than theories. They have one additional property each:
That is, every subset of something that does a role also does that role, and every superset of something that does a factory also does that factory. This is self-evident when you notice that roles can only have methods and factories can only have generators.
When you specify a type in a type signature, you're really specifying a type variable with a constraint:
multi foo (Bar $x) {...}
Is equivalent to:
multi foo (::T $x <= Bar{::T}) {...}
And so a role that returns the type the role defines is still a role, even though it appears to depend on the type contravariantly:
role Foo {
method bar (--> Foo) {...}
}
Is equivalent to:
theory Foo{::T} {
multi bar (T --> Foo) {...}
# which is
multi bar (T --> ::U <= Foo{U}) {...}
}
That is, you don't return the same type, you just return one that conforms to the same interface.
Finally, let's look at the full reduction of our Tree union type and
try to understand it:
union Tree (
Leaf($value),
Branch(Tree $left, Tree $right),
);
This turns into:
theory Leaf{::T} { # expanded from "role Leaf"
multi pattern:<Leaf> (T --> (Any,)) {...}
}
theory Branch{::T} {
multi pattern:<Branch> (T --> (::U, ::V)
<= Tree{::U} && Tree{::V}) {...}
}
theory Tree{::T} <= Leaf{::T} || Branch{::T} { # note the or
multi Leaf ($value --> T) {...}
multi Branch (::A $left, ::B $right --> T
<= Tree{::A} && Tree{::B})
{...}
}
Whew, good thing we don't have to write that. As you can see, even
though it looks like Branch is behaving covariantly by looking at the
union, it isn't. A branch accepts any two things that behave like this
Tree, not the tree itself.
Here is a summary of the object-oriented vocabulary in Perl:
theory
role
factory
class
union
model
models
does
models. A does B is a shorthand for
A{::T} models B{::T}.
multi
method
generator
There's a little problem with the GADT form:
union AST[::] (
Const(::T --> AST[::T]),
);
This convolves to:
theory AST{::A, ::B} {
multi Const(::T --> ::U <= AST[::T]{::U}) {...}
}
Which doesn't even say anything about ::B. It should be:
theory AST{V} {
multi Const(::T --> ::U <= V{::T, ::U}) {...}
}
That is, it is a theory-valued theory. I guess this makes sense, as all types are really theories. But so far type variables can only assume concrete types; that's the cool thing about them. You can never explicitly mention a concrete type, but that's the only thing a variable can be, so you have to go through interfaces.
I suppose theory-valued theories are necessary if we want GADT (and there are probably some other uses as well), but I fear what it will do to the type inferencer. Maybe there is another way.
UPDATE: There is, and here it is. Each case gets its own theory, which lifts the return value from the constructor. Let's consider a two-constructor union:
union AST[::] (
Const(::T --> AST[::T]),
TestZero(AST[Int] --> AST[Bool]),
);
This convolves to:
theory AST::Const{::T, ::ASTT} {
multi Const(T --> ASTT) {...}
}
theory AST::TestZero{::ASTBool} {
multi TestZero(::T --> ASTBool
<= AST{::TInt, ::T} && Int{::TInt}) {...}
}
# I'll use => as a shorthand for "implies"; i.e. a => b is
# equivalent to !a || b
theory AST{::Param, ::Tree}
<= AST::Const{::Param, ::Tree}
&& (Bool{::Param} => AST::TestZero{::Tree})
{ }
So it all reduces to complex logic between concrete (non-variable)
theories T and type variables ::V in the form T{::V}.