Category Archives: Dr. Stump

Beware of Side Effects

Working on software with a lot of code can be a lot of fun. It can be hard to manage properly, too. Usually teams are formed to build large software projects. This is great because normally a team of developers is more productive than a single developer. However, it is crucial that all members are aware of other members’ contributions and changes to the project. I don’t want a team member to break my code, just as other team members don’t want me to break their code. It’s important for each team member to understand the effects of their code as well as the rest of their team’s code.

A couple of months ago I added tail recursion optimization to Blaise. It worked quite well for a while until a team member changed a piece of code that was completely unrelated to tail recursion and unknowing broke the tail recursion optimization. Sadly, this created more work for both of us. We have to figure out which change caused the tail recursion optimization to break and then we need to figure out how to modify the code we have. Ideally, we would have known potential side effects from changing code beforehand and we would have proactively handled them instead of finding them weeks later.

Is Blaise blazing fast?

Since I introduced tail recursion optimization to Blaise it is natural wonder how fast it is. An optimization better reduce processing time, right? I’m currently writing mergesort in Ocaml and Blaise so that it is tail recursive and takes advantage of mutable objects so lists are manipulated in place. Blaise is somewhat of an object-oriented/functional programming language hybrid and I’m coding to its strengths.

I wrote a few scripts that compile both Ocaml and Blaise code to C and execute the a.out files 1000 times each and compute an average running time. My goal is to get Blaise to run as fast as possible and measure up to Ocaml.

Tail Recursion

Lately, I’ve been adding support for tail recursion optimization in the Blaise compiler. Anytime the last statement inside of the body of a function is a call to itself: it’s a tail recursive call. The execution moves from the bottom of the body of code to the top of the body of code. This works just like a loop. Normally with recursion the computer will push an environment onto stack memory and move into the new recursive function call with a free environment to which it can write local variables. With tail recursion it’s not necessary to push onto stack memory. The code can be transformed into a loop and the Blaise compiler does just that. Check out the example below.

Here is a tail recursive function

let someFunction arg1 arg2 =
    (* body of function *)
    (* The last statement is a recursive call.*)
    someFunction arg1' arg2'
;;

When the example function above is compiled to C we can use ‘goto’ statements to turn the function into a loop.

void * someFunction(void * arg1, void * arg2) {
someFunction_goto:
    /* body of function */
    /*
        The recursive call is turned into variable reassignments and
        a goto statement that takes us to the top of the loop
    */
    arg1 = arg1';
    arg2 = arg2';
    goto someFunction_goto;
}

Next I plan on adding tail recursion support for mutually (tail) recursive functions.

Currying Unknown Functions

Sadly, this topic confused me for awhile. Recently, I found that supporting currying for unknown functions is a fairly easy concept. I didn’t realize how easy of a concept this was until I spoke with a postdoc that has experience developing functional language compilers. 🙂 At first I was perplexed by the idea that a single pass compiler was supposed to support calling a function that may not have been defined in code. That didn’t make sense and it still doesn’t make sense! One should only be able to call a function when it has already been defined!

The only time a function may have already been defined but unknown in certain parts of code is when a function requires another function as a parameter. A generic function that takes a function as an argument doesn’t know which function is going to be passed to it… therefore we have an unknown function, inside the body of function definition.