Index

This file was automatically generated from http://svn.pugscode.org/pugs/docs/Perl6/Spec/S07-iterators.pod on Sat Aug 1 14:01:18 2009 GMT, revision 27701.

Synopsis 7: Iterators and Laziness


TITLE

Synopsis 7: Iterators and Laziness

AUTHORS

    Tim Nelson <wayland@wayland.id.au>
    Daniel Ruoso <daniel@ruoso.com>

VERSION

    Created: 27 Nov 2008

    Last Modified: 20 Apr 2009
    Version: 5

Laziness and Eagerness

As we all know, one of the primary virtues of the Perl programmer is laziness. This is also one of the virtues of Perl itself. However, Perl 6 knows better than to succumb to false laziness, and so is eager sometimes, and lazy others.

One thing that Perl understands is the difference between Laziness and Eagerness. When something is Lazy, it says "just give me what you've got; I'll get the rest later", whereas when it's eager, it says "More! More! Give me everything you can get!".

Perl 6 defines 4 levels of laziness:

Strictly Lazy
Does not evaluate anything unless explictly required by the user, including not traversing non-lazy objects. This behavior is generally available only by pragma or by explicit programming with non-lazy primitives.
Mostly Lazy
Try to obtain available items without causing eager evaluation of other lazy objects. However, the implementation is allowed to do batching for efficiency.
Mostly Eager
Obtain all items, but does not try to eagerly evaluate when known to be infinite.
Strictly Eager
Obtain all items, fail in data structures known to be infinite. This behavior is generally available only by pragma or by explicit programming primitives.

It's important to realize that the responsibility of determining the level of laziness/eagerness in each operation is external to each lazy object, the runtime, depending on which operation is being performed, is going to assume the level of laziness and perform the needed operations to apply that level.

The laziness level of some common operations

List Assignment: my @a = @something;

In order to provide p5-like behavior in list assignment, this operation is performed in the Mostly Eager level, meaning that if you do

  my @a = grep { ... }, @b;

the grep will be evaluated as long as @b is not infinite.

  my @a = grep { ... }, 1, 2, 3, 4..*

will give grep an infinite list (even if the first elements are known), therefore it will also be lazy. On the other hand

  my @a = grep { ... }, 1, 2, 3, 4;

will be eagerly evaluated.

Feed operators: my @a <== @something;

The feed operator is strictly lazy, meaning that no operation should be performed before the user requests any element. That's how

  my @a <== grep { ... } <== map { ... } <== grep { ... } <== 1, 2, 3

is completely lazy, even if 1,2,3 is a fairly small known compact list.

But it's important to notice that eagerness takes precedence over laziness, meaning that

  my @a = grep { ... } <== map { ... } <== grep { ... } <== 1, 2, 3

Will be eagerly evaluated, but that is still different from

  my @d = 1,2,3;
  my @c = grep { ... }, @d;
  my @b = map { ... }, @c;
  my @a = grep { ... }, @b;

Because in the first, the processing would be made as a flow, avoiding the creation of the intermediary eager lists that the second example creates. On the other hand

  my @d <== 1,2,3;
  my @c <== grep { ... }, @d;
  my @b <== map { ... }, @c;
  my @a = grep { ... }, @b;

provides the same laziness level of the first example.

The Iterator Role

The iterator role represents the lazy access to a list, walking through:

Data structure (list, tree, table, etc)
Feed (map, grep, etc)
Stream (mostly for IO)
In fact, any value that wants to behave like a list

It's important to realize that the iterator of a list can be accessed by the .Iterator() method (but only the runtime will be calling that most of the time), and the implementation of each iterator is private to the list and implementation specific.

This is a minimal API that should allow custom iterator implementations, but this spec should be expanded in the future to provide additional API for batch-aware iterators.

The methods in this role are:

method get {...}

Returns the next item for that iteration. The grouping of elements returned in each iteration is visible if this iterator is being used to build a slice. While building a List, the items will be flattened.

When it runs out of items, it will return Nil.

The Iterator::PushBack Role

This role defines an iterator that knows how to receive values back to be consumed again as if they were never consumed. The iterator is free to refuse values that were not consumed first and in the correct order, since this role is not intended to modify the original data, only to modify the traversal of that data.

method pushback($value) {}

This pushes $value back to the array. If this iterator is proxying immutable data, it is free to refuse a value that is either out of the original order or if it wasn't consumed. In that case, it should fail with UnorderedPushBackException or BadPushBackException.

The Iterator::Unshift Role

This role defines an iterator that can receive new elements in the top of the stack, it works like pushback, but it should be able to store objects that are completely new to this stream. Either by being backed by a mutable list or by providing a local stack. This role implies Iterator::PushBack, where pushback will be handled by unshift.

The purpose of defining two different Roles is to give the user the "expect to fail" or "expect to succeed" semantics.

method unshift($value) {}

This unshifts $value either to the backed list or to a local stack of the iterator.

Auxiliary Implementations

Perl's built-ins require that a number of auxiliary types implement Iterators. These are available for general use, and are instantiated by ending a feed at a scalar, array, or sliced array.

Generic Item Iterator

To create a generic item iterator, end a feed in a scalar:

  my $it <== @a;

You can later do:

  my $item = $it.get;
  my @items = @$it;

(XXX TODO: effect of type constraints of $it -- maybe to control flattening? )

(XXX TODO:

   What's this do?

   my $it <== @a; my $it2 <== $it; my $it3 <== $it; get $it2; get $it3;

   ...allow tag team suckling?

   Is it worth the dwim, or do we just forbid it?

)

Generic Lazy List

To obtain a generic lazy list, end a feed in a Positional.

    my @it <== 1..2000;

The generic lazy list consumes the feed on demand. As the elements of the list are accessed, values that were in each subsequently returned Capture are placed at the end of the Positional. If a Capture with more than one value is returned, each value is stored at its own index. Empty Captures have no effect, and so the information that they ever existed at all is lost.

You can later do things like:

    @c = @it[1..5];
    $s = @it[4];

...and this may run the iterator to find any values which are only lazily present. Values which have already been retrieved from the iterator are kept available unless they are removed, so if you are working with a large sequence you may want to:

    shift(@it);

Reading and writing indices in a generic lazy list iterator will force consumption and caching of values for all indices up to and including the accessed one.

    @it[4] = 2;  # @it[0..3] are cached now, @it[4] was consumed/destroyed

Adding to the beginning of the list just stores values without touching the iterators.

    @it.unshift(<a b c>);  # works as expected

Trying to find the number of elements in any way may force eager consumption of the remainder of the values in the iterator, and trying to access the end of the list certainly will.

    @it.pop; # Eagerness happens.  Maybe bad things, too.

...but you had better be very sure that the list is finite. Attempts to fiddle with the end of an infinite list may result in the interpreter raising quite the exception, in the luckiest cases it may just deliver an Inf, but in some cases you may find yourself spending a very, very long time doing laps around the yard. The exact behavior may be implementation specific.

So don't do that.

If the list is known to be finite, the generic array iterator will revert to a normal array once the feed is exhausted. So it is legal to:

    @it.push; # eagerly exhausts the feed, then turns into to a normal array

...with the same caveats about infinity.

(XXX TODO: effect of type constraints/shape of @it? )

(XXX TODO:

   my @a = (1,2); @a <<== ... promotes an occupied Positional to
   iterator, right? Should probably be explicitly mentioned.
)

Generic Lazy Slice

The generic lazy slice consumes the Captures from an iterator but stores the results as a bi-dimensional list, where the first dimension corresponds to an iteration, and the second contains the values in the Capture returned for that iteration. Empty Captures are stored just like the rest of the iterations.

To obtain a generic lazy slice, end a feed in a sliced Positional.

    my @@it <== map { ... }, 1,2,3;

(XXX TODO:

    @@it <== (1,mysub(),2;1,2,3);
    @@it[0];
    @@it[0;1];

 exactly when does mysub() get called?)

Coroutines

Perl6 does not have a formally defined sub-style coroutine. Doubtless there will be external modules to do so in different flavors. Such a construct, where many calls made to the name of a sub with different parameters, expecting to reach the same sub every time, is not really compatible with multi-method-dispatch. While it may suit some programming styles which only use a subset of the Perl6 language, it cannot be made generally applicable across the Perl6 feature set.

This is not to say you cannot do coroutines in Perl6. In fact, the gather/take construct is a simple coroutine. But if you want to pass values back and forth Lua-style, you have to use a suplimentary object:

     sub my_coro (*@slurp) {
        gather do {
            my Int $initval;
            my Str $secondval;
            my Int $thirdval;
            $initval = shift(@slurp);
            $initval++;
            take $initval;
            $secondval = shift(@slurp);
            take "$initval $secondval";
            $thirdval = shift(@slurp);
            take $thirdval + $initval;
        }
    }
    my @a = (1);
    my $it;
    $it <== my_coro() <== while 1 { shift(@a) };
    say "First result: " ~ get $it;
    @a.push("Hello World");
    say "Second result: " ~ get $it;
    @a.push(500);
    say "Third result: " ~ get $it;

...if you want to pass multiple parameters on each call, you can use a slice slurpy instead, to pass a Capture.

The iterator and array can of course be bundled up to give a more natural feel:

    class my_sub2coro {
        has $!it = Failure;
        has @!data = ();
        has $.coro;
        submethod BUILD {
            $!it <== &($.coro)() <== while 1 { shift(@!data) };
        }
        method corocall($message) {
            @!data.push($message);
            get $!it;
        }
    }
    my $c = my_sub2coro.new(coro => &my_coro);
    say "First result" ~ $c.corocall(1);
    say "Second result" ~ $c.corocall("Hello World");
    say "Third result" ~ $c.corocall(500);

(XXX TODO: As feeds are not implemented yet, above example code not tested)

Additions

Please post errors and feedback to perl6-language. If you are making a general laundry list, please separate messages by topic.