« To the tune of "2000 miles" by the Pretenders | No walls » |

Mon, Nov 26, 2012

I'm going through one of my periodic attempts to learn Haskell.

Being a functional language, it introduces a fair number of new concepts and modifies others. Every now and again, I find my understanding of what's going on *"under the hood"* starting to slip, so I have to stop and refactor from *"it's magic"* to *"it's actually doing this, then that, then the other."*

A helpful way to do this, I find, is to try and implement the same functionality in a language I know rather better. So, perl.

At the moment, my issues are lazy evaluation and currying. As a simple example, imagine a function that multiplies two numbers together:

let multiply x y = x * y multiply 3 4 12

Great. Fine. No problem.

Perl is fairly similar:

sub multiply { my ($x, $y) = @_; return $x * $y; } say multiply 3, 4;

Only, it's not that simple.

foo = multiply 2 3

We know that foo will be 6. Right?

But what about:

bar = multiply 2

You might expect an error. After all, we've called a function that needs two arguments, and only given it one. But we don't get this. Because Haskell functions don't actually take two arguments. They take one. Only one.

They APPEAR to accept multiple arguments, because Haskell does clever things behind your back. Something called currying. It creates a function that takes one argument -the first one you supplied - which then returns a function that *also* takes an argument - the second one you supplied. Then THAT function is the one that returns the answer.

Sounds like a needlessly-complex way of handling something so simple, but it comes with a benefit too.

Because by partially-applying a function by passing it too-few arguments, we don't get an error, we get the interim-function we implicitly generated:

let triple = multiply 3 triple 7 21

And we can then use that function wherever we want. Quite a handy time-saver.

We can implement this in perl in one of two ways. If we want to keep the original syntax, we'd have to do this:

sub multiply { my $x = shift; my $m = sub { $x * shift }; return @_ ? $m->(shift) : $m; } say multiply 3, 4; my $triple = multiply 3; say $triple->(5);

This works, but it leads to inconsistent syntax and it's tedious. A better way would be to use currying at all times, Haskell-style:

my $multiply = sub { my $x = shift; return sub { my $y = shift; return $x * $y; }; }; say $multiply->(3)->(4); my $triple = $multiply->(3); say $triple->(5);

There. Slightly more tedious to write, with all the arrows and brackets, but it's consistent and works nicely. And is slightly nicer to maintain if you start needing to handle more than two arguments.

However, it doesn't yet exhibit laziness. This means that the value should only be calculated at the point where it's actually needed, not at the point where you first ask for it to be available.

For this, we need one more function, and one more set of parentheses:

my $multiply = sub { my $x = shift; return sub { my $y = shift; return sub { return $x * $y; } }; }; say $multiply->(3)->(4)->(); my $triple = $multiply->(3); my $answer = $triple->(5); say $answer->();

And we're done: A lazy curried multiplication function.

Whew!

But the thing that was really baffling me for a while was infinite lists. In Haskell, you can generate a list of numbers easily:

[1..10] [1,2,3,4,5,6,7,8,9,10]

You can also generate an infinite list of numbers, just by leaving out the terminator:

[1..]

But this is a bad idea. For obvious reasons.

But you can safely do something like

take 7 [1..] [1,2,3,4,5,6,7]

because the infinite list is lazily-calculated and when we only want the first seven, we only GET the first seven. Although it's a bit pointless doing it longhand like that. However, you can also do something like:

take 12 [7,14..] 7,14,21,28,35,42,49,56,63,70,77,84

which will give you the answers to the seven times table.

Very nice.

Where I got bogged down was where a function that could be applied to an infinite list didn't appear to cause problems.

For example, you can set up a function that multiplies lists:

let multiply_list x ys = map (*x) ys

And then we can partially-apply it to get a more specific example:

let double_list = multiply_list 2

And then we can run it on a list:

double_list [1..10] [2,4,6,8,10,12,14,16,18,20]

Lovely.

But what do you think happens with this?

take 5 $ double_list [1..]

Thinking iteratively, it seems like this should go horribly wrong: We're applying a function that will apply to every item in a list.. to an infinite list. You'd think that in order to be safe, you'd have to do

double_list $ take 5 [1..]

But instead, they both work the same.

Weird..?

Where I went wrong, I think, was to assume that, just because a function will apply to a whole list, it must try to run through the whole list. But it doesn't, because laziness goes deeper than that. It helps to think of a list as just another function instead.

Because then our infinite list is actually an iterator function that says *"Start at x and go up in increments of y until value reaches z"* where y defaults to 1 and z can validly be infinity.

So then the doubling function doesn't need to process the list, it just generates a new function that transforms the original function: *"Start at x and go up in increments of y until value reaches z; double the value"*

e.g. in perl:

my $list = sub { my ($x,undef,$y,$z) = shift =~ m#(\d+)(,(\d+))?\.\.(\d+)?#; my $inc = $y ? $y - $x : 1; return sub { return if $z && ($x > $z); my $value = $x; $x += $inc; return $value; }; };

(CBA to write it out clearly longhand, if you don't understand closures & regexes just accept that it's magic and move on ;)

This will DTRT with the Haskell-style list definitions, such as "1..10", "1,3..20", "1,5.." etc.

We need something to actually call it usefully, so we'll define a *take* function as well:

my $take = sub { my ($n, $list) = @_; my @values; while ($n--) { my $value = $list->(); push @values, $value if $value; } return @values; };

There: A finite array from an infinite list.

The whole point of this was to make sure that we could safely implement a "double_list" function, so let's quickly do that and see what we get:

my $double_list = sub { my $list = shift; return sub { 2 * $list->(); } }; say join ',', $take->(12, $double_list->($list->('5,10..'))); 10,20,30,40,50,60,70,80,90,100,110,120

Yay! We're successfully taking a limited set of values from an infinite list, even though we're applying functions to it between the thing that limits it to being finite.

By being lazy and using lots of useful functional things such as currying and closures, I've successfully reproduced Haskell functionality. I make no claim that it's anything like how Haskell actually does anything, but Haskell is so incredibly concise that I was simply losing track of what it was actually doing, and thus losing the ability to understand it.

Having understood the concepts well enough to implement them in another language, I now feel happy enough to move on.

Wasn't so bad. Was it..?

Comment from: Steven Haryanto [Visitor] · http://blogs.perl.org/users/steven_haryanto

Thanks for writing this. I've had periodic urges to learn Haskell too, but so far never managed to get very far :)

27/11/12 @ 00:33