To use or not to use the ++ operator in Elixir

You may have already been warned about the risks of using the ++ operator to concatenate lists in Elixir. In fact, a good piece of advice can be found directly in the official documentation for this method:

The complexity of a ++ b is proportional to length(a), so avoid repeatedly appending to lists of arbitrary length, e.g. list ++ [item]. Instead, consider prepending via [item | rest] and then reversing.

For the rest of this article, we are going to dive into an example in an attempt to explain what could be problematic with using ++, while also checking if we can improve on things by refactoring as suggested in the documentation. Spoiler alert: life is not that easy 😅.

The problem

Let’s say we need to implement a function that receives a list of integers and returns a new list that includes each number n times, where n equals is the integer’s actual value. We also expect that the original order of these numbers is preserved in the resulting list. Check out the following examples:

my_func([1]) # it returns [1]
my_func([1,2]) # it returns [1,2,2]
my_func([1,3,2]) # it returns [1,3,3,3,2,2]

Our first naive implementation

A very simple solution to this problem could rely on using Enum.reduce/3 to duplicate each number and concatenate the duplicated values to the accumulated list.

def number_dups_list(numbers) do
  Enum.reduce(numbers, [], fn n, acc ->
    acc ++ List.duplicate(n, n)
  end)
end

Let’s see how it performs by running this function with a large list of integers. Below is our first test (using :timer.tc to quickly measure the time elapsed to run the code):

iex> :timer.tc(fn -> number_dups_list(List.duplicate(1,100_000)) end)
{30785140, .... }

After running it a few times in order to get a few samples, I noted that it takes around 3 seconds (on my machine). That’s pretty slow; there might be room for improvement.

Let’s follow the advice from the documentation

If we look carefully at the execution of our previous test, where the input was a list containing a lot of 1s, we see that our implementation is using ++ repeatedly. In fact, the crucial factor here is that the list to the left of the operator is getting larger and larger with each iteration. At the same time, the list to the right is always [1]. As the documentation states, the overall performance depends on the length of the list to the left of this operator.

Let’s see what happens if we reverse the order (then use Enum.reverse to re-sort).

def number_dups_list(numbers) do
  Enum.reduce(numbers, [], fn n, acc ->
    List.duplicate(n, n) ++ acc
  end)
  |> Enum.reverse
end

Let’s run it again to see if we have enhanced performance:

iex> :timer.tc(fn -> number_dups_list(List.duplicate(1,100_000)) end)
{11570, .... }

On my machine, this executes in approximately 11ms! This is a huge difference! The larger the list is, the more is the difference with the previous approach.

All good so far, but… are we sure that we’re going to see better performance in all cases? Chances are, we’re not out of the woods yet. Let’s run our function again using different input.

iex()> :timer.tc(fn -> number_dups_list([10_000_000, 20_000_000]) end)
{2638101, .... }

You may have already noticed the issue! We’re using a rather short list now, but the numbers are very big numbers. Thus, we’re once again in a position where the list to the left of the ++ operator is always a large list, resulting in slower execution of the function (around 2 seconds).

The funny thing here is that our former implementation works a bit better for this new list (less than 1.5 seconds). It is only at the first stage of the execution where the list to the left of the ++ operator is a short list (an empty list). These results suggest that reversing makes a difference here; the previous implementation (where reversing is not needed) is preferable for this type of input.

At this point, we want to start measuring performance in a more robust way. We’ll use Benchee to handle all the benchmarking work for us. See below for the benchmarking (assume this function is included in the same module and note we are now using different function names for each implementation).

def run_benchmarks do
  inputs = %{
    "long list with low values" => List.duplicate(1,100_000),
    "short list with high values" => [10_000_000, 20_000_000]
  }

  Benchee.run %{
    "Using ++" =>
      fn(list) -> number_dups_list1(list) end,
    "Using ++ and reverse"  =>
      fn(list) -> number_dups_list2(list) end,
  }, time: 20, warmup: 5, inputs: inputs
end

Running this code on my computer gives me the following output:

##### With input long list with low values #####
Name                           ips        average  deviation         median         99th %
Using ++ and reverse        123.53      0.00810 s    ±18.33%      0.00824 s       0.0127 s
Using ++                    0.0366        27.35 s     ±0.00%        27.35 s        27.35 s

Comparison:
Using ++ and reverse        123.53
Using ++                    0.0366 - 3378.87x slower

##### With input short list with high values #####
Name                           ips        average  deviation         median         99th %
Using ++                      0.93         1.07 s    ±41.40%         1.00 s         2.38 s
Using ++ and reverse          0.50         2.01 s    ±29.48%         2.01 s         3.48 s

Comparison:
Using ++                      0.93
Using ++ and reverse          0.50 - 1.88x slower

This confirms what we saw before, specifically that our first implementation is faster with a short list of large numbers. As we mentioned previously, the difference is invoking Enum.reverse in the case of a long list (you can easily verify this by commenting out this part and checking that the benchmarking results are now quite similar, even though the functions don’t produce the exact same results).

What’s the problem with the lists on the left?

A further explanation of this issue would require a blog post of its own 😀.

Let’s just say that the Erlang VM is forced to make a full copy of the list to the left of the ++ when calculating the result of the operation, while this is not necessary for the list to the right. The latter list’s allocated memory can be reused by the resulting list, but the original list to the left needs to be copied to memory. Therefore, the larger the list to the left, the more costly the overall append operation.

You can find more information here.

Attempt at an implementation without using `++`

The official documentation suggests that we refactor the code utilizing ++ to include [item|rest]. Why don’t we give it a try? One thing that is not explicitly stated here is that we probably need to use Enum.reduce/3 to iterate over the elements of the list to the left of the ++ operator in order to achieve the same results.

def number_dups_list3(numbers) do
  Enum.reduce(numbers, [], fn n, acc ->
    Enum.reduce(List.duplicate(n, n), acc, fn n2, acc2 ->
      [ n2 | acc2 ]
    end)
  end)
end

Time to run this new implementation and see what happens…

iex> fn -> number_dups_list3(List.duplicate(1,100_000)) end |> :timer.tc
{16279, .... }

iex> fn -> number_dups_list3([10_000_000, 20_000_000]) end |> :timer.tc
{2515518, .... }

We found that in the first case, where the values in the list are small, the code works well. However, it seems that there isn’t much of a difference when it comes to our second implementation relying on ++ (the one that grows the resulting list by prepending short lists).

Additionally, in the case of the input list, [10_000_000, 20_000_000], things are performing quite slowly. Let’s take a look at the complete report after running the 3 functions with the assistance of Benchee:

##### With input long list with low values #####
Name                                       ips        average  deviation         median         99th %
Using ++ and reverse                    119.85        8.34 ms    ±24.50%        8.22 ms       15.47 ms
Without ++ ([h | t] and reverse)         70.75       14.13 ms    ±21.33%       13.07 ms       26.79 ms
Using ++                                0.0300    33361.65 ms     ±0.00%    33361.65 ms    33361.65 ms

Comparison:
Using ++ and reverse                    119.85
Without ++ ([h | t] and reverse)         70.75 - 1.69x slower
Using ++                                0.0300 - 3998.44x slower

##### With input short list with high values #####
Name                                       ips        average  deviation         median         99th %
Using ++                                  0.92         1.08 s    ±45.19%         1.01 s         2.51 s
Using ++ and reverse                      0.41         2.45 s    ±27.75%         2.29 s         3.72 s
Without ++ ([h | t] and reverse)          0.35         2.85 s    ±39.46%         2.39 s         4.82 s

Comparison:
Using ++                                  0.92
Using ++ and reverse                      0.41 - 2.26x slower
Without ++ ([h | t] and reverse)          0.35 - 2.64x slower

We were unable to beat our former implementations relying on ++ for any of our test cases. Hence, the option suggested by the documentation was not the best fit for us in either of our two situations.

Another implementation idea?

We can brainstorm further to come up with other ideas to determine a solution to our problem. If we stop thinking as much about performance, we can try to find additional ways to reimplement the code, possibly ending up with a more elegant solution.

For example, if we combine Enum.map/2 and Enum.flatten/1, we end up with the following implementation:

def  number_dups_list4(numbers) do
  Enum.map(numbers, &(List.duplicate(&1, &1)))
  |>  List.flatten
end

This is actually quite simple to understand. We’re now creating a list that includes both of the intermediate lists with the duplicated items. In the end, everything is flattened in the final list.

How might it perform? If we try to guess what is going to happen, I would say that there will be a struggle between the need to keep the intermediate lists in memory across all iterations and the fact that there is no need to reverse the resulting list (as with the other solutions).

The final answer is revealed after running our benchmarks:

##### With input long list with low values #####
Name                                       ips        average  deviation         median         99th %
Using ++ and reverse                    120.03        8.33 ms    ±25.82%        8.11 ms       16.30 ms
map / flatten                           113.00        8.85 ms    ±21.77%        8.17 ms       17.72 ms
Without ++ ([h | t] and reverse)         76.24       13.12 ms     ±9.12%       12.93 ms       17.96 ms
Using ++                                0.0324    30821.21 ms     ±0.00%    30821.21 ms    30821.21 ms

Comparison:
Using ++ and reverse                    120.03
map / flatten                           113.00 - 1.06x slower
Without ++ ([h | t] and reverse)         76.24 - 1.57x slower
Using ++                                0.0324 - 3699.55x slower


##### With input short list with high values #####
Name                                       ips        average  deviation         median         99th %
Using ++                                  0.93         1.08 s    ±39.43%         1.03 s         2.38 s
map / flatten                             0.61         1.63 s    ±46.49%         1.53 s         3.90 s
Using ++ and reverse                      0.53         1.89 s    ±34.73%         1.84 s         3.61 s
Without ++ ([h | t] and reverse)          0.37         2.70 s    ±33.23%         2.37 s         4.16 s

Comparison:
Using ++                                  0.93
map / flatten                             0.61 - 1.51x slower
Using ++ and reverse                      0.53 - 1.75x slower
Without ++ ([h | t] and reverse)          0.37 - 2.50x slower

We can see that this solution ranks quite well, taking second place in both situations. This implementation wouldn’t be a bad choice if we expected to have both types of inputs (especially if we discard our first attempt given the bad results where the accumulated list is always to the left of the ++ operator).

However, we would need to test additional cases to make a more informed decision. After all, we are testing two very different and, perhaps, extreme use cases. For instance, we can test our implementations with Enum.to_list(1..1000), or whatever could be considered an average case. I tested with this new input and I found it runs faster with the second implementation, the one with ++ and reverse (while the map and flatten version performs like the [item | rest] version).

Conclusion

Here are a few thoughts about working with the ++ operator in Elixir, especially in situations where the expected load is high and the performance impact could be an issue.

It is important to know the risks of using ++ to append to lists, especially when the list to the left of the operator could have a large number of elements.
If you’re in the situation described above, you may consider using Enum.reduce and [item | rest], though you should verify if these adjustments actually improve the situation. Although I think the advice in the documentation is a good rule of thumb, you shouldn’t blindly follow it. In fact, I would try to use ++ but invert the order of the appended lists (and reverse the result later). Moreover, you can also test different implementations using different language tools, as we did when we tried Enum.map and Enum.flatten, in order to see how they compare with the previous attempts.
Reflect on the shape and size of your expected inputs. Also think about the number of created lists and their length in the intermediate steps of the calculation. The best performing implementation will depend on usage patterns.
Keep in mind that it’s usually hard to predict the resulting performance…unless you are an expert at Elixir/Erlang internals 😀. For the rest of us mere mortals, it is crucial to run benchmarks in each case and make decisions based on that. Every situation is unique!

Lastly, you may want to take a look at the Erlang Efficiency Guide chapter dedicated to lists. It includes some good insights on list operators, including ++, flatten, etc. Elixir is mostly relying on those Erlang functions.

And, because I know you are curious, here are the complete code and benchmarking results mentioned in this post.

Blog article