The last version of R comes with a small update of vector behavior in regard to inserting new value just after the last element. See the entry from the announcement (https://stat.ethz.ch/pipermail/r-announce/2017/000612.html):

```
* Assigning to an element of a vector beyond the current length now
over-allocates by a small fraction. The new vector is marked
internally as growable, and the true length of the new vector is
stored in the truelength field. This makes building up a vector
result by assigning to the next element beyond the current length
more efficient, though pre-allocating is still preferred. The
implementation is subject to change and not intended to be used
in packages at this time.
```

In this post, I’ll try to show how does this work in practice, and what does it mean to the user.

## Functions used in benchmark

To test the new functionality I created three simple functions which add a new element at the end of the vector in three different ways:

- Assign new item using
`[[]]`

operator:

```
fnc_assign <- function(y) {
x <- NULL
for (i in y) {
x[[i]] <- i
}
x
}
```

- Use
`c(...)`

function:

```
fnc_combines <- function(y) {
x <- NULL
for (i in y) {
x <- c(x, i)
}
x
}
```

- Pre-allocate memory (recommended way):

```
fnc_prealloc <- function(y) {
x <- numeric(length(y))
for (i in y) {
x[[i]] <- i
}
x
}
```

Then I created simple bash scripts to download and compile two versions of R. The testing functions were executed in both releases using `microbenchmark`

(the code used to create this analysis is available in GitHub repository - https://github.com/zzawadz/blog-entry-vector-speed).

## Results

### Assign with `[[]]`

operator.

In the first version of the append function, the difference is significant. For \(10^4\) elements, the speed up is by order of magnitude.

Version | Fnc | Median Time | Ratio |
---|---|---|---|

R-3.3.3 | fnc_assign(y) | 62.84 | 8.52 |

R-3.4.0 | fnc_assign(y) | 7.37 | 1.00 |

Even when we compare this result with the pre-allocated version, it is only slightly slower. Good news!

Version | Fnc | Median Time | Ratio |
---|---|---|---|

R-3.4.0 | fnc_assign(y) | 7.37 | 1.07 |

R-3.4.0 | fnc_prealloc(y) | 6.89 | 1.00 |

### Pre-allocation.

No real difference;)

Version | Fnc | Median Time | Ratio |
---|---|---|---|

R-3.3.3 | fnc_prealloc(y) | 6.72 | 1.00 |

R-3.4.0 | fnc_prealloc(y) | 6.89 | 1.03 |

### c() function.

This version does not support the new functionality. It might be a bit confusing because it seems to be quite natural to add the new observation to the end of a vector using `c(...)`

function. However, this causes massive loss of performance, and it is a big mistake! **It’s better to avoid** `c(...)`

**in the** `for`

**loop!**

Version | Fnc | Median Time | Ratio |
---|---|---|---|

R-3.3.3 | fnc_combines(y) | 118.08 | 1.04 |

R-3.4.0 | fnc_combines(y) | 113.33 | 1.00 |

## Impact of vector size.

It is worth to note that in the old version (and in the current one when the `c`

function is used) creating a vector of length \(n\) by adding just one element in the loop has \(O(n^2)\) complexity. Inserting just one element causes reallocating the whole vector in the memory, so in the ith iteration, we need to move \(i-1\) elements. So in the end we get \(1 + 2 + 3 + ... + n - 2 + n - 1\) operations, which is \(O(n^2)\).

In the new version, the reallocation can be done less often so it can be much more linear. See the following graphs for comparison:

And the zoom for the new version:

## Summary

In R 3.4.0 the memory management for vectors is much better organized so resizing of the vector is much faster (\(O(n)\) vs. \(O(n^2)\)). The only problem is that this does not apply to the `x <- c(x, i)`

- this one is always slow. But the best option is to stay with preallocation because it is still the fastest way of dealing with growable vectors:)