« The joy of Just WorksThe perfect compromise »

Fri, Nov 04, 2016

[Icon][Icon]Thinking in functions

• Post categories: Omni, Technology, My Life, Programming

AKA "When you're been reading so much Lisp that it starts to affect your brain"

A while ago I did a talk on functional programming. As my example, I showed how to impliment linked lists using closures instead of the built-in data structures. Recently, as I was reading some Lispy code, it suddenly dawned on me that there was one step further I could have taken that hadn't occurred to me. So I sat and had a play, and sure enough it worked.

So here's the process, for anyone who wants it :) I've put it in Javascript, since it makes it easy for anyone to try via the browser.

Firstly, the specified task: We want a new data structure, "pair", which can hold two values. From this, we want be able to build linked lists. We want all access to be handled via functions: Create a pair with pair(), get one value with head() and the other with tail().

The quick, easy, obvious approach is to use a built-in data structure under the hood. Such as an array. So:

function pair (head, tail) {return [head, tail] }
function head (pair) { return pair[0] }
function tail (pair) { return pair[1] }

This is a full implementation of the desired "pair" functionality. A quick test will bear this out:

> var test_pair = pair("the_head","the_tail")
> test_pair
[ 'the_head', 'the_tail' ]
> head(test_pair)
> tail(test_pair)

All good! Works perfectly. So.. we're done?

Not quite. This obeys the letter of the law, but falls foul of the spirit. Firstly, it's too easy to bypass the abstraction and access the data directly, instead of through our accessor functions. Secondly, it's all too easy for somebody to shunt more data into the array and make our "pair" hold more than the desired two values.

> test_pair[0]
> test_pair[2] = "sneaky misuse of pairs!"
'sneaky misuse of pairs!'
> test_pair
[ 'the_head','the_tail', 'sneaky misuse of pairs!' ]

Not good. We must fix this! Let's switch away from arrays, and go instead to closures:

function pair (head, tail) {
 return function (fn) {
  return fn(head, tail)

function head (pair) {
 return pair(
  function (head,tail) { return head }

function tail (pair) {
 return pair(
  function (h,t) { return t }

If you're not used to closures, this may seem a little abstract. Let's break it down!

pair() now returns a function. That shouldn't be a surprise in this day & age, functions are first-class objects in most languages. Let's call it the p1 function. The p1 function is a very simple function, as all it does accept another function (call it p2) as its argument and call it. Very simple.

The important thing is that p1 still has access to the original arguments pair() was called with, via the magic of closures. So both the head and tail are available to p1, which means when you pass a p2 function into p1, p2 is called with the original head and tail. So now the p2 function can decide what to do with them. In the case of head(), it passes in a function that accepts the head and tail, and returns the head. Tail is identical, except for which value it returns.

Once you get used to closures and functional programming in this style, it all makes perfect sense. I appreciate it can make the eyes glaze a little at first though. Let's test that it works:

> var test_pair = pair("the_head","the_tail")
> test_pair
> head(test_pair)
> tail(test_pair)

So, we have continued to fulfill the specification as stated, and now we have a data structure that can ONLY be used to hold our head and tail values - no sneaky extras, and no way to view the data directly.

Onwards, then! We want to use pairs to implement a linked list. In this implementation, a list of, say, (1,2,3) could be implemented by pair(1,pair(2,pair(3))). The head of a pair contains a value, the tail contains the rest of the list. Let's see how that works:

> var list = pair(1,pair(2,pair(3)))
> list
> head(list)
> head(tail(list))
> head(tail(tail(list)))

Okay, simple enough. But a long list calls for a lot of nested pair() calls! Let's automate that.. we need a list() function! And whilst we're at it, let's write a few list accessors to cut down on all that head-tail-tail stuff.

function list () {
 var list;

 for (i = arguments.length; i > 0; i--) {
  list = pair(arguments[i-1], list)

 return list

function first (l) { return head(l) }
function second (l) { return head(tail(l)) }
function third (l) { return head(tail(tail(l))) }

And make sure it works as expected:

> var lst = list('a','b','c')
> first(lst)
> second(lst)
> third(lst)

So far, so good. Now, a common desire with lists is an ability to apply a function that expects a single value to a list of values instead: Say we have a triple() function that expects a number, but we want to be able to apply it to the list (1,2,3) to get (3,6,9).

This is typically done with a recursive function called map(), which creates and returns a new list, which it generates by applying the desired function to the head value of the original list, and pairing it to the result of calling itself on the tail of the original list.

function undef (x) { if (typeof(x) === 'undefined') { return true } }

function map(fn, list) {
 if ( undef(list) ) {
  // If the list is undefined, we've reached the end - stop recursing
 // We're still going: Call the function on the head, recurse over the tail
 var h = head(list);
 var t = tail(list);
 return pair( fn(h), map(fn, t) )

function triple (x) { return 3 * x }

And let's see if it works:

> var small_list = list(1,2,3)
> var big_list = map(triple, small_list)
> first(big_list)
> second(big_list)
> third(big_list)

Looks good! We are making linked lists and mapping over them to make new lists. Very Lispy!

Now, this is as far as I ever got just from generic reading about closures, functions, and Church numbers. It does the job, and does it using nothing but functions. All very clever.

But one day as I was reading up on Lisp, I suddenly realised there was another way. Instead of map() navigating the list via head() and tail(), I could skip the overhead of those extra function calls. In the original approach, pair() creates p1, and then head() and tail() both call p1, passing it a function to return either the head or the tail. Map uses head() and tail() and is thus implementation-agnostic - the original array-based approach would still work with map as defined.

But if we throw that agnosticism out and write something directly for the function-based approach, then we can avoid the overhead of calling head() and tail(). Instead, we can pass p1 the recursive function that does the mapping. Because p1 will call this function with both the head and tail arguments as parameters, this means the new function gets the values it needs directly instead of through the accessor functions, thus:

function fmap(fn, list) {
 var rec;
 rec = function (h,t) {
  if (undef(t)) { return pair( fn(h) ); }
  else { return pair( fn(h), t(rec) ); }
 return list(rec);

And sure enough:

> var big_list2 = fmap(triple, small_list)
> first(big_list2)
> second(big_list2)
> third(big_list2)

I know, I know. It's evil. It's over-engineered. It makes far too much use of passing around recursive functions. And I'd probably be very upset to find anything like it in production code.

But.. it was fun to work it all out :)

Feedback awaiting moderation

This post has 4 feedbacks awaiting moderation...


[Links][icon] My links

[Icon][Icon]About Me

[Icon][Icon]About this blog

[Icon][Icon]My /. profile

[Icon][Icon]My Wishlist


[FSF Associate Member]

February 2018
Mon Tue Wed Thu Fri Sat Sun
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28        


User tools

XML Feeds

eXTReMe Tracker

Valid XHTML 1.0 Transitional

Valid CSS!

[Valid RSS feed]

open source blog