An introduction to Rust implementing a functional list
In this post I'll implement a functional linked list using the Rust programming language. The goal of it is to provide an overview and explanatory view of the language, so I won't be focusing much in the data structure itself. Anyway, I hope you have some fun reading it =D
2020-06-27Introduction
Before diving into any Rust code, we need to learn two fundamental concepts that make Rust unique:
Ownership:
- Each value in Rust has a variable that's called its owner.
- There can only be one owner at a time.
- When the owner goes out of scope, the value is dropped.
let n = String::from("hello, world"); // 'n' owns the string 'hello, world'
let m = n; // now 'm' owns the string, what about n?
println!("{}", m); // prints 'hello, world'
println!("{}", n); // **compilation error**
Borrowing:
- At any given time, you can either have one mutable reference or any number of immutable references.
let mut a = String::from("hello, world");
let b = &mut a; // mutable reference to 'a'
let c = &a; // immutable reference to 'a'
println!("{}", c); // prints 'hello, world'
println!("{}", b); // **compilation error**
- References must always be valid (lifetimes).
let b;
{
let a = String::from("hello, world");
b = &a; // b is a reference to a
} // a dies here, so b is invalid
println!("{}", b); // **compilation error**
Hands on coding
With this out of our way, we can start writing our List implementation.
First let's create our project using cargo
, Rust's packager manager, with the command:
cargo init --lib list
This will generate a startup workspace to get the us running. Then open the file src/lib.rs
which is the root of our crate (think of crate
as a package
for now), put in the following code to export our about to be created module:
// src/lib.rs
pub mod list;
Now we can create a list.rs
file in the src
folder and start to write our code.
Let's first define our Node
struct, the semantics of a struct in Rust is pretty similar to C/C++.
#[derive(Debug)] // that's just a macro attribute that provide us default trait implementation (more on this later)
pub struct Node<T> { // we'll be building a generic list
elem: T,
next: ?? // what should we put here?
}
Well, let's try to follow our first-glance intuition and write something like this:
#[derive(Debug)]
pub struct Node<T> {
elem: T,
next: Node<T>
}
When compiling we get:
error[E0072]: recursive type `Node` has infinite size
--> src/main.rs:8:1
|
8 | pub struct Node<T> {
| ^^^^^^^^^^^^^^^^^^ recursive type has infinite size
9 | elem: T,
10 | next: Node<T>,
| ------------- recursive without indirection
|
= help: insert indirection (e.g., a `Box`, `Rc`, or `&`) at some point to make `Node` representable
It looks like we can't have a recursive definition of Node
, that because the compiler must know the size of the struct to allocate the amount of memory needed for it. How to solve it?
References, Lifetimes and Box
The compiler have already told us how! We need indirection. If we use a pointer-like value in the next
field of our struct the compiler will know exactly the amount of memory needed because pointers have a fixed size. So let's hurry up and put a reference in there:
#[derive(Debug)]
pub struct Node<T> {
elem: T,
next: &Node<T> // '&' is the syntax for an immutable reference
}
If we run we get:
error[E0106]: missing lifetime specifier
--> src/main.rs:10:11
|
10 | next: &Node<T>,
| ^ expected named lifetime parameter
|
help: consider introducing a named lifetime parameter
|
8 | pub struct Node<'a, T> {
9 | elem: T,
10 | next: &'a Node<T>,
|
Ok now there's this new concept called lifetime parameter
, let's see what it is.
Remember in the "Borrowing" session when we said that "references must always be valid", well that's accomplished by lifetime parameters. Most of the time the Rust compiler will infer the lifetime parameters (check out the 'Lifetime Elision' session in the Rust Book), however there are some special cases when we need to explicitly tell it about them. Here's how we do it:
// implicit
fn foo(x: &i32) {
}
// explicit
fn bar<'a>(x: &'a i32) { // 'a' is just a name for the parameter
}
In our case, it happens because, thus far, Rust has preferred to err on the side of explicitness, modulo a few areas that were deemed an ergonomic hit without some implicitness. There’s been some talk about eliding lifetimes in more places (check this out).
Then, following the compiler suggestion, let's change our code to:
#[derive(Debug)]
pub struct Node<'a, T> { // that's the syntax to declare a lifetime parameter for a struct
elem: T,
next: &'a Node<'a, T>, // we say that if the 'next' field lives 'a' then our struct must live at least 'a'
}
The code compiles! Great! However, we can make it better. Sometimes things can get too complicated with references and lifetime parameters, a better solution would be using the heap, the simplest way to do this in Rust is with the Box
pointer. From the Rust official documentation:
"Box
So let's use it:
#[derive(Debug)]
pub struct Node<T> {
elem: T,
next: Box<Node<T>>,
}
Done! We now say that our Node
is a recursively defined struct, but with a fixed size. There's just one problem with that: how do we know when we have reached the end or our list? Rust doesn't have a NULL
pointer as we would use in C/C++, but it has something way more sophisticated, the Option
enum. It is defined as:
pub enum Option<T> {
None,
Some(T),
}
In Rust, enums represent one of many possible variants. Many languages support an enum syntax that works like this. Rust lets us go a bit further and add data associated with each of the variants. In our example, we have a generic enum, which means that for some type T, our enum may be a None
variant or a Some
variant carrying a value of type T
.
With data added to variants, this style of enum takes on many of the properties of union types in other languages. The combination of enum and union-like behavior is why this sort of type is often referred to as a “tagged union”. In type theory, it’s referred to as a sum type, and is one of many algebraic types such as product types and quotient types.
Let's use this new type in our Node
struct:
#[derive(Debug)]
pub struct Node<T> {
elem: T,
next: Box<Option<Node<T>>>,
}
Cool huh? We just added the following semantics to our code: if the next
field is the enum variant None
then we've reached the end of our list.
This whole Box<Option<Node<T>>>
is a little annoying to look at though, there's something called type
aliases in Rust, we can use just as we would in our bash terminal.
type Link<T> = Option<Box<Node<T>>>;
#[derive(Debug)]
pub struct Node<T> {
elem: T,
next: Link<T>,
}
Perfect!
Creating an API
Now that we have our basic structure to work with recursive lists we can add some behavior to it.
What about we creating our List
struct now? As we're building a simple linked list we can simply define our list as:
pub struct List<T> {
head: Link<T>,
}
Let's define some basic operations:
impl<T> List<T> {
pub fn new() -> Self {}
pub fn push(&mut self, elem: T) {}
pub fn peek(&self) -> Option<&T> {}
pub fn pop(&mut self) -> Option<T> {}
}
Ok, that's a lot of new code, let's understand each of these lines.
-
impl<T> List<T>
: theimpl
keyword is used to define implementation on types, here we are using a generic implementationimpl<T>
for a genericList<T>
. -
pub fn new() -> Self {}
: defining a static method (note that it doesn't use theself
parameter as the other ones) that returnsSelf
which is the implementing type of ourimpl
block, for example it can beList<i32>
,List<String>
and so on. -
I'll assume the other methods are straightforward, the
self
keyword behaves just as athis
pointer in C/C++.
If we try to compile this code it won't work, that because the Rust compiler expect us to return an Option
value from the last two methods. So let's write the body of our methods:
impl<T> List<T> {
pub fn new() -> Self {
List { head: None } // that's basically our constructor
}
pub fn push(&mut self, elem: T) {
let node = Box::new(Node {
elem,
next: self.head.take(), // the `take` method takes the value out of the option, leaving a `None` in its place
});
self.head = Some(node);
}
pub fn peek(&self) -> Option<&T> { // what happens if we return the value instead of a reference?
self.head.as_ref().map(|node| &node.elem) // the `map` method maps an Option<T> to Option<U> by applying a function to a contained value
}
pub fn pop(&mut self) -> Option<T> {
self.head.take().map(|node| {
self.head = node.next;
node.elem
})
}
}
We've finally used Rust closures for the first time. Closures are fundamentally a pointer to code together with a pointer to a table of free variables, it allows us to treat functions as first-class citizens. In Rust we can define them by the simple syntax | vars | { code }
or we can optionally omit the enclosing curly brackets for a single expression | vars | expr
.
Testing
OK, so our basic implementation is supposed to work. But how do we test it?
Rust let us write testing functions, unit tests go into a tests mod with the #[cfg(test)]
attribute. Test functions are marked with the #[test]
attribute. Let's write some:
#[cfg(test)]
mod test {
use super::List;
#[test]
fn basics() {
let mut list = List::new();
// Check empty list behaves right
assert_eq!(list.pop(), None);
// Populate list
list.push(1);
list.push(2);
list.push(3);
// Check normal removal
assert_eq!(list.pop(), Some(3));
assert_eq!(list.pop(), Some(2));
// Push some more just to make sure nothing's corrupted
list.push(4);
list.push(5);
// Check normal removal
assert_eq!(list.pop(), Some(5));
assert_eq!(list.pop(), Some(4));
// Check exhaustion
assert_eq!(list.pop(), Some(1));
assert_eq!(list.pop(), None);
}
}
Here everything is pretty straightforward right? There's just this assert_eq!
we still didn't talk about, but think about it as a function that takes two parameters and checks if they're equal;
Cleaning memory and Traits
Do we need to worry about cleaning up our list? Well, no, the Rust ownership system handles it to us automatically. However, we may have some problems if we just let the Rust compiler to handle it.
The problem here is with our recursive definition. When dropping the Box
pointer, first we need to drop it's content (which is recursive) and then deallocating the pointer itself, because of that, we can't have a tail recursive optimization and, therefore, we might blow the stack for really big lists.
To work around it we have to implement an iterative "destructor" for our data structure. This can be accomplished by the Drop
trait available in the std
.
So what is a trait? For now, think about it as an interface
in Java or C++ (or if you're a Haskeller we can trace a better comparison with type classes
). This is how we implement a trait for our list:
impl<T> Drop for List<T> {
fn drop(&mut self) {
let mut cur_link = self.head.take();
while let Some(mut boxed_node) = cur_link {
cur_link = boxed_node.next.take();
// boxed_node goes out of scope and gets dropped here;
}
}
}
Shared Ownership and an Immutable List
What if we want our list to be immutable? Like those used in functional language like Haskell or ML.
Let's look at a simple example:
list1 = A -> B -> C -> D
list2 = tail(list1) = B -> C -> D
list3 = push(list2, X) = X -> B -> C -> D
In this case, the memory should look like this:
list1 -> A ---+
|
v
list2 ------> B -> C -> D
^
|
list3 -> X ---+
This won't work with Box
pointers anymore, in this case the ownership of B is shared. A basic implementation of shared ownership in Rust is done by reference counting, in other words, using the Rc
pointer.
First of all let's rewrite our interface as the methods push
and pop
doesn't make much sense now.
impl<T> List<T> {
pub fn new() -> Self {}
}
pub fn head(list: &List<T>) -> Option<&T> {}
pub fn tail(list: &List<T>) -> List<T> {}
pub fn cons(list: &List<T>, elem: T) -> List<T> {}
Note that we've given up on using self
because we'll get a functional style composing these defined functions. Now let's introduce our Rc
pointer in our Link
type:
use std::rc::Rc;
type Link<T> = Option<Rc<Node<T>>>;
The semantics of the pointer is pretty simple, it keeps an internal counter and every time we call .clone()
it increases the pointer by one. When the counter gets to 0 the value is dropped.
Let's implement our methods:
impl<T> List<T> {
pub fn new() -> Self {
List { head: None }
}
}
pub fn head<T>(list: &List<T>) -> Option<&T> {
list.head.as_ref().map(|node| &node.elem)
}
pub fn tail<T>(list: &List<T>) -> List<T> {
List {
head: list.head.as_ref().and_then(|node| node.next.clone()),
}
}
pub fn cons<T>(list: &List<T>, elem: T) -> List<T> {
List {
head: Some(Rc::new(Node {
elem,
next: list.head.clone(),
})),
}
}
The code is, again, straightforward, so I'll leave you to read it by yourself.
Now let's test it!
#[cfg(test)]
mod test {
use super::*;
#[test]
fn basics() {
let list: List<i32> = List::new();
assert_eq!(head(&list), None);
let list = cons(&cons(&cons(&List::new(), 1), 2), 3);
assert_eq!(head(&list), Some(&3));
let list = tail(&list);
assert_eq!(head(&list), Some(&2));
let list = tail(&list);
assert_eq!(head(&list), Some(&1));
let list = tail(&list);
assert_eq!(head(&list), None);
}
}
Last we have to define our iterative drop:
impl<T> Drop for List<T> {
fn drop(&mut self) {
let mut head = self.head.take();
while let Some(node) = head {
// unwraps if it's unique
match Rc::try_unwrap(node) {
Ok(mut node) => {
head = node.next.take();
}
Err(_) => {
break;
}
}
}
}
}
The only unknown operator here is the match
, it's used in Rust to simulate patter matching just like in functional languages.
Iterator and Append
To finish the tutorial, let's implement two more cool functionalities to our list.
An iterator is a pattern that abstract the logic of transversing in a container, particularly lists. In Rust std library we have a Iterator
trait that helps us implementing the patter, let's take a look.
pub trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
}
The new kid on the block here is type Item
. This is declaring that every implementation of Iterator has an associated type called Item. In this case, this is the type that it can spit out when you call next
. Here it goes our implementation:
pub struct Iter<'a, T> {
next: Option<&'a Node<T>>,
}
impl<'a, T> Iterator for Iter<'a, T> {
type Item = &'a T;
fn next(&mut self) -> Option<Self::Item> {
self.next.map(|node| {
self.next = node.next.as_ref().map(|node| &**node);
&node.elem
})
}
}
And here are some tests:
#[test]
fn iter_t() {
let list = cons(&cons(&cons(&List::new(), 1), 2), 3);
let mut iter = iter(&list);
assert_eq!(iter.next(), Some(&3));
assert_eq!(iter.next(), Some(&2));
assert_eq!(iter.next(), Some(&1));
}
Now let's build our last piece of the puzzle, the append function. Its semantics is supposed to be quite simple, given two List<T>
return a new List<T>
which is the result of appending the final of the first list to the beginning of the second. Let's code it:
pub fn append<T>(l1: &List<T>, l2: &List<T>) -> List<T>
where
T: Clone,
{
match head(l1) {
None => List {
head: l2.head.clone(),
},
Some(e) => {
let mut l = tail(&l1);
l = append(&l, l2);
cons(&l, e.clone())
}
}
}
You should be able to understand the whole code by now, except for the ẁhere
clause. What it does is basically set a trait bound to the generic parameter T
, which means that every possible T
must implement a trait called Clone
, in fact, we need to clone some elements in order to append the two lists. If you know Haskell, it's as simple as defining a type class bound, for example: Clone a => List a -> List a -> List a
.
That's how we test this new function:
#[test]
fn append_t() {
let l1 = cons(&cons(&List::new(), 1), 2);
let l2 = cons(&cons(&List::new(), 5), 6);
let list = append(&l1, &l2);
let mut iter = iter(&list);
assert_eq!(iter.next(), Some(&2));
assert_eq!(iter.next(), Some(&1));
assert_eq!(iter.next(), Some(&6));
assert_eq!(iter.next(), Some(&5));
}
And that's it !!! You did build a functional linked list using an imperative (multi paradigm) language!!! If you've read this until the end, I hope you have enjoyed it. See you in the next post if you're sticking around.
Here's the full code for reference:
use std::rc::Rc;
type Link<T> = Option<Rc<Node<T>>>;
#[derive(Debug)]
pub struct List<T> {
head: Link<T>,
}
#[derive(Debug)]
struct Node<T> {
elem: T,
next: Link<T>,
}
impl<T> List<T> {
pub fn new() -> Self {
List { head: None }
}
}
pub fn head<T>(list: &List<T>) -> Option<&T> {
list.head.as_ref().map(|node| &node.elem)
}
pub fn tail<T>(list: &List<T>) -> List<T> {
List {
head: list.head.as_ref().and_then(|node| node.next.clone()),
}
}
pub fn cons<T>(list: &List<T>, elem: T) -> List<T> {
List {
head: Some(Rc::new(Node {
elem,
next: list.head.clone(),
})),
}
}
pub fn append<T>(l1: &List<T>, l2: &List<T>) -> List<T>
where
T: Clone,
{
match head(l1) {
None => List {
head: l2.head.clone(),
},
Some(e) => {
let mut l = tail(&l1);
l = append(&l, l2);
cons(&l, e.clone())
}
}
}
pub fn iter<T>(list: &List<T>) -> Iter<T> {
Iter {
next: list.head.as_ref().map(|node| &**node),
}
}
impl<T> Drop for List<T> {
fn drop(&mut self) {
let mut head = self.head.take();
while let Some(node) = head {
match Rc::try_unwrap(node) {
// unwraps if it's unique
Ok(mut node) => {
head = node.next.take();
}
Err(_) => {
break;
}
}
}
}
}
pub struct Iter<'a, T> {
next: Option<&'a Node<T>>,
}
impl<'a, T> Iterator for Iter<'a, T> {
type Item = &'a T;
fn next(&mut self) -> Option<Self::Item> {
self.next.map(|node| {
self.next = node.next.as_ref().map(|node| &**node);
&node.elem
})
}
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn basics() {
let list: List<i32> = List::new();
assert_eq!(head(&list), None);
let list = cons(&cons(&cons(&List::new(), 1), 2), 3);
assert_eq!(head(&list), Some(&3));
let list = tail(&list);
assert_eq!(head(&list), Some(&2));
let list = tail(&list);
assert_eq!(head(&list), Some(&1));
let list = tail(&list);
assert_eq!(head(&list), None);
}
#[test]
fn iter_t() {
let list = cons(&cons(&cons(&List::new(), 1), 2), 3);
let mut iter = iter(&list);
assert_eq!(iter.next(), Some(&3));
assert_eq!(iter.next(), Some(&2));
assert_eq!(iter.next(), Some(&1));
}
#[test]
fn append_t() {
let l1 = cons(&cons(&List::new(), 1), 2);
let l2 = cons(&cons(&List::new(), 5), 6);
let list = append(&l1, &l2);
let mut iter = iter(&list);
assert_eq!(iter.next(), Some(&2));
assert_eq!(iter.next(), Some(&1));
assert_eq!(iter.next(), Some(&6));
assert_eq!(iter.next(), Some(&5));
}
}