Among the many amazing features of functional programming, or any language that supports lambdas, is the ability to express complex function calls with very concise and... human readable code. And what better use case than the very useful collection pipeline?

But to implement a collection pipeline, we need two main things: a common way for functions to access and manipulate collections and a library of said functions.

In this first part of our two-part series, we will be talking about the common interface that collections have in Clojure and how we can implement methods that take advantage of it. In part two, which is scheduled for the next issue, we'll be looking at the most used functions: Map, Filter, and Reduce.

# Data Collections and the Sequence Abstraction

Depending on the way we want to retrieve data from a data collection, we might choose different internal representations of the data to optimize either the memory or the computation performance.

In Clojure, most data manipulation functions work on all the data collections, no matter what the type of the collection is. The same function works on a vector, a map, a list, or a range of numbers.

Clojure treats all data collections as sequences.

## The Fundamental Idea of Data Manipulation

Think about all the numbers between 0 and 99.

What data structure are you going to choose to represent the numbers? Will you use an array, a vector, a linked list, a set, or another data structure?

The answer depends on the use case. Depending on your use case, you might choose:

- a set if the order of the elements doesn't matter and you need to be able to determine quickly whether a number belongs to the data structure;
- an array if the size is fixed and you need to access elements by their index;
- a vector if the size is dynamic and you need to access elements by their index;
- a linked list if you only need to access the elements in order.

You might even think of a more lazy data structure in which you don't store the elements themselves but only the first element and a function that generates the next element. (In our case, the first element would be 0 and the function that generates the next element would be `increment`

.)

No matter what your choice is, let's give a name to your data structure. We will call the sequence `S`

.

Now, how will you use sequence `S`

to generate a sequence `V`

that is made of the elements of `S`

where each element is multiplied by 10? In other words, how will you generate the sequence made of 0, 10, 20, 30, and so on out of `S`

?

Can you generate the vector `V`

out of `S`

with zero knowledge about the data structure chosen to represent the numbers between 0 and 99?

The answer is yes. The only thing you need is a way to traverse the elements of `S`

in some order. A data structure whose elements can be traversed in some order is called-you guessed it-a *sequence*!

More precisely, we traverse the elements of a seqable collection using the four basic sequence functions:

`first`

retrieves the first element of the sequence.`rest`

returns a sequence of the items after the first element.`cons`

returns a new sequence where an element is prepended to the provided sequence.`empty?`

returns true when the sequence is empty and false otherwise.

The fundamental idea of data manipulation in Clojure is that using only `first`

, `rest`

, `cons`

and `empty?`

we can write lots of data manipulation functions. For instance, we can write a function that receives a sequence `s`

and returns a sequence that is made of the elements of `s`

where each element is multiplied by 10. Here is the code for such a function:

```
(defn inc-all [s]
(if (empty? s)
s
(let [new-first (inc (first s))
new-rest (inc-all (rest s))]
(cons new-first new-rest))))
```

You don't need a deep understanding of the recursive algorithm^{1} implemented by this function. The important thing to notice is that the code doesn't access the internal structure of the sequence, but only uses the four sequence functions: `first`

, `rest`

, `cons`

and `empty?`

. As a consequence, `mult-by-ten`

works with any data structure that implements the four basic sequence functions, including a vector, a list, a set, or any other data structure.

In Clojure, we have dozens of data manipulation functions that operate on every sequence, and this is made possible by the fact that the functions use only those four basic sequence functions. For Clojure developers, this is highly valuable as it allows them to write their own data manipulation functions by composing the functions provided by Clojure, and our functions also operate on every sequence with no special effort. This is the power of true polymorphism.

In this first part of our two-part series, we will explore the most common data manipulation functions to give you an idea of the wide variety of algorithms these simple four functions enable. In part two, we will look at the most significant one: map-reduce.

## Range of Numbers

In Clojure, the simplest way to generate a range of numbers is through the `range`

function.

Let's see `range`

in action:

```
user=> (range 10)
(0 1 2 3 4 5 6 7 8 9)
```

The exact type of the sequence returned by `range`

is not important at all. The only thing that you need to know is that it is a sequence. Therefore it is well handled by the four basic sequence functions: `first`

, `rest`

, `cons`

and `empty?`

. By default, sequences are represented with wrapping rounded parentheses.

Let's see the three basic sequence functions in action on a range of numbers.

The `first`

element of `(range 10)`

is 0:

```
user=> (first (range 10))
0
```

`rest`

returns the sequence without the first element. Therefore when we call `rest`

on `(range 10)`

, we get a sequence that starts with 1 and ends with 9:

```
user=> (rest (range 10))
(1 2 3 4 5 6 7 8 9)
```

Obviously, `(range 10)`

is not empty:

```
user=> (empty? (range 10))
false
```

## Maps as Sequences

`map`

s are associative data collections in the sense that inside a `map`

, each key is associated with a value. Let's look at a `map`

that describes a person:

```
{:first-name "Kelly"
:last-name "Kapowski"}
```

Let's compare this `map`

to a nested `vector`

made of pairs that describes the same person:

```
[[:first-name "Kelly"]
[:last-name "Kapowski"]]
```

Although the information is represented differently by the `map`

and the `vector`

, from a purely logical standpoint, the `map`

and the `vector`

contain the same information. The information is:

- the key
`:first-name`

is associated with the value "Kelly" - the key
`:last-name`

is associated to the value "Kapowski"

Conceptually, we could say that a `map`

is a sequence of key-value pairs. In a sense, a `map`

can be considered a sequence.

In Clojure, we cannot access a `map`

as a `sequence`

; we need to use the `seq`

function to transform it into a `sequence`

of key-value pairs:

```
user=> (seq {:first-name "Kelly"
=> :last-name "Kapowski"})
([:first-name "Kelly"] [:last-name "Kapowski"])
```

Once a `map`

is converted to a `sequence`

, we can use on it all the `sequence`

functions:

```
user=> (first (seq {:first-name "Kelly"
=> :last-name "Kapowski"}))
[:first-name "Kelly"]
user=> (rest (seq {:first-name "Kelly"
=> :last-name "Kapowski"}))
([:last-name "Kapowski"])
user=> (empty? (seq {:first-name "Kelly"
=> :last-name "Kapowski"}))
false
```

Now it's time to unveil the real power of `sequence`

s. Clojure provides a rich set of functions to manipulate `sequence`

s in many different ways: mapping, filtering, sorting, grouping, combining and more. When you get used to those functions, you can build data transformation pipelines with a few lines of code.

## Overview about Mapping

Mapping is when we apply a transformation to each element of a sequence. In Clojure, the transformation is defined as a function that takes one argument. The transformation function operates only on the current element of the sequence. Don't confuse map as a noun, which is a data structure, with map as a verb, which is a function.

Clojure provides a rich set of functions from each family. This abundance of functions makes it fun for the Clojure developers to write code for many data pipeline tasks. Of course, it is fun only after you understand when and how to use those functions.

Let's focus for now on the most common mapping function available in Clojure: map. We will explore filtering and reducing in another article.

The most famous expression that involves the map function is:

```
(map inc [1 2 3])
```

It returns a sequence of the numbers 1, 2, and 3, incremented by 1:

```
user=> (map inc [1 2 3])
(2 3 4)
```

`map`

receives two arguments: a function `func`

and a sequence `s`

. It returns a sequence consisting of the result of applying `func`

to each element of `s`

.

The expression `(map inc [1 2 3])`

returns a sequence where the `inc`

function is applied to each element of `[1 2 3]`

. The result is the sequence `(2 3 4)`

.

It might help to think about the result of `(map inc [1 2 3])`

as a sequence made up of `(inc 1)`

, followed by `(inc 2)`

, followed by `(inc 3)`

. It is important to pay attention to two properties of the sequence returned by `map`

:

- The order of the elements in the sequence returned by
`map`

corresponds to the order of the elements in the original sequence. - The length of the sequence returned by
`map`

is the same as the length of the original sequence.

Because of the power of the sequence abstraction, `map`

is able to handle any type of sequence: vectors, lists, sets, and so on.

```
user=> (map inc (set [1 2 3]))
(2 3 4)
user=> (map inc (list 1 2 3))
(2 3 4)
```

The result is always a sequence!

It means that we are allowed to call `map`

on the sequence returned by `map`

. For example, we can increment again:

```
user=> (map inc (map inc [1 2 3]))
(3 4 5)
```

## Cracking the Code of map

It's quite amazing that one can write the code for the map function and it will work with any sequence.

Let's do it---it's just a few lines of code:

```
(defn my-map [func s]
(if (empty? s)
s
(let [new-first (func (first s))
new-rest (my-map func (rest s))]
(cons new-first new-rest))))
```

Let's check ourselves:

```
user => (my-map inc [1 2 3])
(2 3 4)
```

## Conclusion

In this brief overview of the concept of sequences, we have illustrated a principle that appears often in Clojure: with simplicity comes power.

The concept of sequence is so simple---it's just four basic functions, but the possibilities are endless.

##### Footnotes

1: There is a fun book that explores the many possibilities of these kinds of recursive algorithms. It's called "The Little Schemer" by Daniel P. Friedman and Matthias Felleisen.