Introduction to Haskell, part 1
This post is an attempt to explain and introduce the reader to Haskell. The hope is that by the end of the post, you will be able to see some of the beauty in a language like such, and while you might not switch to it immediately, some of the nice properties might be highlighted.
To start off, you will want to install the Glasgow Haskell Compiler to follow these examples; it can be found here:
http://haskell.org/ghc/
After you've done that, we can begin!
Introduction
First off, to get a feel for the environment, simply start it up and try a few expressions:
[austin@continuum ~]$ ghci GHCi, version 6.8.1: http://www.haskell.org/ghc/ :? for help Loading package base ... linking ... done. Prelude> "hello world" "hello world" Prelude> 1 + 245 246 Prelude> 8 * 13 - 42/3 90.0 Prelude> 1222 * 5 6110 Prelude>
|
Nothing out of the usual. This is just a prompt where we can enter expressions and see the result easily.
Now, we need to define a few useful properties of Haskell that might not make sense at first, but hopefully they will soon enough.
Pure
Haskell is a
pure language. What this means is, haskell lacks destructive updates. In many other languages where you would write "x = x + 1," you cannot do this in haskell. A corollary to this is that, there is no such thing as a 'variable' as thought of in most programming languages: because they can't vary!
If you do this:
x will forever and ever always be '1'. In this sense, you should not think of x as a 'box' that holds something, in which you can take that thing out, and put something else back in. Rather, you can think of it as a sticky label or a name. Once you stick it there, it stays there and you can't remove the label. You can't just out of nowhere take my name and put it on another person either, it always refers to me.
This is one of the selling points of haskell: there are no variables!
This may sound
incredibly stupid and inconvenient, but it is actually incredibly useful. A few things that come from this fact are:
1. You can never ever modify an existing label like x. Rather, you can only create
new labels based on x, such as say:
2. Because of point 1, you never ever have to worry about things like locks and synchronization in the face of concurrency. After all, if you have some global value like that, when two threads access it, by the very principles of the language they cannot modify that value, rather, they can only create new values. In this sense, memory corruption and having to lock resources simply disappears when you deal with multiple threads!
3. If you have two functions, f and g, their computation can happen at any given time, because f and g do not depend on each other: they only depend on their inputs, and their inputs alone.
4. Haskell functions are
referentially transparent.
I'll elaborate a little more on 3. When a function f is referentially transparent, we say that, for any given input x, it will always and forever equal f applied to that same x. More concisely: f(x) === f(x) where 'x' is any valid input. A little more elaboration is probably needed.
In a language like C you could create a function f like this which is referentially opaque (the opposite, meaning f(x) might not equal f(x)):
int x = 5; int f(int y) { x = y + x; return x; }
|
What happens when we run for the first time, f(1)? Well, naturally the result would be '6'. But what happens when we run f(1) again? We get 7! f(x) != f(x)!
Now, this doesn't mean the universe is going to explode. But we do lose a couple of very useful properties by allowing code like this which uses destructive updates (reassigning a variable to another value):
1. It's hard to test. Really hard, actually. In this case, the example is trivial to predict, but allowing this in larger pieces and more critical pieces of code can completely throw you off, since you cannot always be sure of what 'f(x)' may return! We call this 'non-determinism,' f(x) can do many many different things every time we call it, even if we
always call it with the same x!
2. Knowing what the global 'x' is at any given point in the program becomes incredibly difficult, now. If our program crashes and say x is some global critical data structure that got corrupted, how do we know where it got corrupted and when? We can insert a lot of printf's and whatnot, but this just hides the problem. We can easily say this makes reasoning about our program much much more difficult!
To expand further and bring things into scope, this applies to
any function that has a side effect. And a side effect qualifies as anything that interacts with the world outside of 'f': writing to a database, printing to the screen, reading a file, getting user input. All of these things depend on the outside world and therefore are non-deterministic, and side-effecting.
But I thought I said that Haskell doesn't allow interacting with the world outside f? Well, it doesn't. All haskell is pure, it's just that we have things like monads (also called 'warm fuzzy things') to help us interact with the outside world more easily and more concisely.
But I thought I also said that because these things are pure, their computations can happen at any point in the program's lifetime? Well, they can. It's just that we can use some useful constructs to say that 'f depends on g happening first.' In the same way, if you got input from a user and printed it back out, you would in fact want the input to come first, then printing it. Again, we simply use some useful constructs to allow us to specify in what order these things are evaluated.
Notably, the monad that allows us to do these things is in fact called the IO monad. But that's for another time.
Lazy
Another selling point of Haskell is that it is
lazy, while many other languages such as C or Ruby are
strict.
What does laziness mean? It means computations only happen when they are needed. If for example you have this code:
You would think by running 'f x y' what would happen is this: you would evaluate 'x' to be 17, evaluate 'y' to be 5, and then return 5, because the function ignores its first argument. This is indeed true in a language like C.
The truth is, we never evaluate x and the expression (15 + 2)! The value of x is never needed inside of function f, and so therefore, we simply don't compute it. This allows us to do things like create infinite lists.
For example, if you run:
Prelude> let x = [1,2,3..]
|
'x' is a list of every number from 1 to infinity. This is possible because we don't really care about how big the list is. All we know is that when we need a value from the list, it will be computed properly.
What does this get us? It allows us some nice abstraction, because you may have two functions, one of which 'generates' data, and the other which 'selects' data. With a lazy language, separating these two functions is a lot easier, because the generator may generate infinite amounts of data, while the selector would only pick out what's useful. This means we would only evaluate up to a point, rather than evaluate potentially
huge amounts of data in the generator, before even getting to the selector.
Types
Haskell as a language puts an incredibly strong emphasis on types. If you've ever programmed in any other language such as C or Java, you probably know what I'm talking about here.
In Haskell, types are taken to the extreme as they are very useful. How useful? They're proofs your program is 'correct'! Long ago, a couple of very smart guys (one of which, Haskell is named after) found out that types in languages like Haskell and logical propositions in theory are actually isomorphic, or 'a different representation, that has the same structure.' What this means is, that if your types line up and your program compiles, chances are, your program is correct! (This all sounds very magical and for a good reason, if you're interested in it I suggest you google the 'Curry Howard Isomorphism')
It is very often in Haskell to rely on types to help, by for example, embedding the problem into the type system, because if our program compiles, it will probably work as we expect (unless our 'specification' of the program is wrong, of course.)
In haskell, if we have a function f like so:
We say f has the type of:
Simply, it takes an integer and returns an integer. Likewise, say we have the function add:
add :: Int -> Int -> Int add x y = x + y
|
The last type in a type signature constitutes the return value, and the rest are parameters, so if we read the type signature only we can say that 'it takes two ints, and returns one int.'
Actual code
Now that we've discussed the basic stuff, we can get onto some actual code!
Now, Haskell is known for being very very concise.
For example, lets write a utility that counts the number of lines in a file:
module Main where main :: IO () main = interact (show . length . lines)
|
That's it! Compile it and run it like so:
[austin@continuum ~]$ ghc wc.hs [austin@continuum ~]$ cat wc.hs | ./a.out 4[austin@continuum ~]$
|
Quite nice! In one line we've replicated the unix utility 'wc -l' quite nicely. Now, there are a couple of points here:
1. Interact is a function that has a type of this:
interact :: (String -> String) -> IO ()
This means, interact takes a function as an argument, which takes a string and returns a string, and then simply does an IO action. What interact actually does is it takes input from standard input, pipes it to the function, and it will print out whatever the function returns. Example:
[austin@continuum ~]$ ./a.out asdf one two three it is a beautiful world when you aren't the only one out there 3[austin@continuum ~]$
|
The function we pass to interact is a 'composed' function. In Haskell, like in math, it is easy to compose functions to create new functions. For example, in math if you have two functions, you may write this:
Or:
(In this case, * is the 'compose' operator which you should have seen in basic algebra.)
Likewise, in haskell, if you have a function f with a type:
And function g with type:
You can
compose them to create a new function h, like so:
This is simply a nifty trick called function composition. It's useful and helps us write more compact code.
Now, what does 'show . length . words' do? Well, let's use GHCi to figure it out:
Prelude> :t words words :: String -> [String] Prelude> words "hello world" ["hello","world"] Prelude> :t length length :: [a] -> Int Prelude> length (words "hello world") 2 Prelude> :t show show :: (Show a) => a -> String Prelude> show (length (words "hello world")) "2" Prelude>
|
So, now knowing what we know above, these things should become fairly obvious.
Right now I think I'm going to close off this tutorial until I can write up a next part. This was more of the introduction than anything else; in the upcoming articles, I'll be sure to try and include some more practical examples. :) In the mean time if you're interested, I highly suggest you toy around with GHCi some more and try a couple things.
What would you rate this article?Please login to rate and upload coding articles. To start registering a free account with us, click here..