The Fibonacci Sequence

4 minute read

Here’s another post inspired by some python code, this time about the Fibonacci sequence by Stuart Mumford.

I won’t be replicating the Python code here, as the first version, that dynamically grows the sequence would be horribly slow in R, and others don’t apply directly.

Here’s a plain implementation that first initialises the results variable res before applying the Fibonacci formula F[i] = F[i-1] + F[i-2].

fib <- function(n) {
res <- c(1, 1, numeric(n-2))
for (i in 3:length(res))
res[i] <- res[i-1] + res[i-2]
return(res)
}

Let’s now benchmark the function for n = 1000 numbers.

n <- 1000
library(microbenchmark)
(tm1 <- microbenchmark(fib(n)))
## Unit: milliseconds
## expr min lq mean median uq max neval
## fib(n) 1.188045 1.244372 1.393931 1.258884 1.301569 2.396896 100

About 1.259 milliseconds. Not great, but reasonable. We’ll compare these timings with python later.

Byte compiling

The first optimisation we can do is to byte compile the fib function using the cmpfun function from the compiler package.

library(compiler)
cmpfib <- cmpfun(fib)
(tm2 <- microbenchmark(cmpfib(n)))
## Unit: microseconds
## expr min lq mean median uq max neval
## cmpfib(n) 117.561 119.5005 121.6874 120.523 122.0945 153.51 100

We improve the median timing by 10 fold and reach 120.52 microseconds. That’s a neat improvement for very little extra effort (but note that, in my experience, byte compiling will not always give such benefits, is any).

Using Rcpp

The famous Rcpp package is of course the way to go when efficiency is key. The package is nowadays so mature, well documented and has such a clean R/C++ interface and elegant syntactic sugar, that the overhead of calling C/C++ has become substantially smaller and certainly worth the extra effort.

library("Rcpp")
cppFunction('NumericVector fibc(int n) {
NumericVector res(n);
res[0] = 1;
res[1] = 1;
for(int i = 2; i < n; ++i) {
res[i] = res[i-1] + res[i-2];
}
return res;
}'
)
(tm3 <- microbenchmark(fibc(n)))
## Unit: microseconds
## expr min lq mean median uq max neval
## fibc(n) 5.007 5.287 6.55657 5.5095 6.5975 36.185 100

A median 5.51 microseconds, that’s certainly competitive.

Summary

Let’s summarise our timings and benchmark the plain R implementation fib, the byte compiled version cmpfib, and the C++ version fibc.

microbenchmark(fib(n), cmpfib(n), fibc(n))
## Unit: microseconds
## expr min lq mean median uq max neval
## fib(n) 1157.960 1197.5640 1385.39855 1248.719 1366.8100 4133.164 100
## cmpfib(n) 112.275 114.8165 119.52424 117.345 121.7955 161.000 100
## fibc(n) 5.441 6.2460 7.98231 7.601 8.6850 16.644 100
## cld
## c
## b
## a

Of course, all this only makes sense if the results are actually identical.

identical(fib(n), fibc(n))
## [1] TRUE
identical(fib(n), cmpfib(n))
## [1] TRUE

Recursion is beautiful

but slow, particularly in R.

(code from the Rcpp gallery)

Here, I’m only running the code for the 10th Fibonacci number.

fibrec <- function(n) {
if ((n == 0) | (n == 1)) return(1)
else return(fibrec(n-1) + fibrec(n-2))
}
cppFunction('int fibrecc(int n) {
if ((n == 0) | (n == 1)) return 1;
else return fibrecc(n-1) + fibrecc(n-2);
}'
)
microbenchmark(fibrec(10), fibrecc(10))
## Unit: microseconds
## expr min lq mean median uq max neval cld
## fibrec(10) 223.989 231.2355 235.5794 233.4070 237.154 304.463 100 b
## fibrecc(10) 1.688 2.1905 3.1256 2.7225 3.241 23.193 100 a

Comparing with python

The python examples used n = 10000 to run the benchmarks. Let’s run our code with the same input and compare.

n <- 10000
microbenchmark(fib(n), cmpfib(n), fibc(n))
## Unit: microseconds
## expr min lq mean median uq max
## fib(n) 12069.023 12371.270 13444.58376 13140.3095 13390.654 56483.90
## cmpfib(n) 1110.513 1124.640 1148.03574 1133.9155 1158.191 1256.67
## fibc(n) 40.014 41.879 47.36213 47.5155 51.648 72.05
## neval cld
## 100 c
## 100 b
## 100 a

The first python implementation was

which times, on my computer, at

The implementation using numba, a just in time compilation library for Python, which I failed to install locally, made a huge improvement - 45 microseconds, along the lines of our Rcpp implementation. (Stuart claimed that this was “probably too much of a difference, something fishy is probably going on here” - not sure why.)

Anyway, the byte compiled R implementation comes pretty close to the standard (and other non-jit) Python implementations. The real difference, however, is in the output. In R, we reach the limit of 2^1024 at fib(1447)

res <- fibc(10000)
table(is.infinite(res))
## 
## FALSE TRUE
## 1476 8524
2^(.Machine$double.max.exp-1)
## [1] 8.988466e+307
2^.Machine$double.max.exp
## [1] Inf
res[1476:1477]
## [1] 1.306989e+308           Inf

whereas in Python

I don't know of a way, if any, to bypass this limitation in R.

EDIT Below is a solution by Tim Yates, that gives arbitrary precision integers, to break the 1447 barrier.

library("gmp")
fibz <- function(n) {
res <- c(as.bigz(1), as.bigz(1), numeric(n-2))
for (i in 3:length(res))
res[i] <- res[i-1] + res[i-2]
return(res)
}

res <- fibz(1477)
tail(res)
## Big Integer ('bigz') 6 x 1 matrix:
## [,1]
## [1,] 19068715787993103250798191017822508780304711296653447105164617625061892380553158790029933641311895769883301076259336390381478959071042013362207470861439442042207716496877416186367716237827969144420530123057395842992396843429263278602433441912660289224629863753510222293898147458388732831214202620525571685189
## [2,] 30853830266784575100678257781542321845927673771368582171892474736689669436457638053409604601717742060442953667486779831823788109598285206820317254881870159525476238152062172030021830093387677115300762912267211089359316816370446352734136536195111047775110314012593682083284288938439502331878741597580734600418
## [3,] 49922546054777678351476448799364830626232385068022029277057092361751561817010796843439538243029637830326254743746116222205267068669327220182524725743309601567683954648939588216389546331215646259721293035324606932351713659799709631336569978107771336999740177766103904377182436396828235163092944218106306285607
## [4,] 80776376321562253452154706580907152472160058839390611448949567098441231253468434896849142844747379890769208411232896054029055178267612427002841980625179761093160192801001760246411376424603323375022055947591818021711030476170155984070706514302882384774850491778697586460466725335267737494971685815687040886025
## [5,] 130698922376339931803631155380271983098392443907412640726006659460192793070479231740288681087777017721095463154979012276234322246936939647185366706368489362660844147449941348462800922755818969634743348982916424954062744135969865615407276492410653721774590669544801490837649161732095972658064630033793347171632
## [6,] 211475298697902185255785861961179135570552502746803252174956226558634024323947666637137823932524397611864671566211908330263377425204552074188208686993669123754004340250943108709212299180422293009765404930508242975773774612140021599477983006713536106549441161323499077298115887067363710153036315849480388057657

Unfortunately, this comes at a substantial cost in terms of execution time (thanks to Willem Ligtenberg suggesting to check this) and, to a lesser extend, size.

n <- 1000
microbenchmark(f1 <- fib(n),
fz <- fibz(n),
times = 10L)
## Unit: milliseconds
## expr min lq mean median uq
## f1 <- fib(n) 1.139019 1.173742 1.194115 1.209615 1.215238
## fz <- fibz(n) 1071.668161 1072.543708 1081.231261 1077.171210 1081.093749
## max neval cld
## 1.22241 10 a
## 1121.27528 10 b
print(object.size(f1), units="Kb")
## 7.9 Kb
print(object.size(fz), units="Kb")
## 52.5 Kb

Leave a Comment