# Introduction

If you have taken a class on data structures and/or algorithms you have probably learned a great deal about Abstract Data Types (ADT). Most people who write code in R though, do not care about having access to many of these ADTs. This is because in most cases they have other data structures that are more useful for data related tasks like data.frames. R is also peculiar because it is a very functional language which means that side effects are quite rare. It also uses copy on modify semantics so if you tried to implement many data structures they would get very expensive. You could hack these data structures together using environments though. A more structured way though is to use functionality new to R version 2.12 from 2012, this added Reference classes (RC) that allow you to add mutation to a class as opposed to the S3 and S4 class systems.

I have used the RC class system a few times but never really dug in and got an in depth understanding of them. I got a little more interested when I saw the R6 package. If you code a lot in any language you will realize there are points in time when you need the machinery to create data structures that can make some problems much easier.

I wanted to validate the performance comparisons in some of the R6 posts where it is stood up next to a few other approaches. To do this I created a Queue class in each system.

library(R6)
library(rbenchmark)
library(jvmr)

Queue_R6 <- R6Class("Queue",
public = list(
data = NA,
initialize = function(data) {
if (!missing(data)) {
self$data <- as.list(data) } else { self$data <- list()
}
},
#
size = function() length(self$data), # push = function(item) self$data[[self$size() + 1]] <- item, # pop = function() { if (self$size() == 0) return(NULL) # stop("queue is empty!")
value <- self$data[[1]] self$data[[1]] <- NULL
value
}
)
)

Queue_RC <- setRefClass(
Class = "Queue",
fields = list(name = "character", data = "list"),
methods = list(
size = function() length(data),
#
push = function(item) data[[size() + 1]] <<- item,
#
pop = function() {
if (size() == 0) stop("queue is empty!")
value <- data[[1]]
data[[1]] <<- NULL
value
},
#
initialize=function(...) {
callSuper(...)
.self
}
)
)

These functions almost look the exact same, only a few minor defferences. We can compare these two implementations to see which performs better. The following benchmark will create an empty queue, then push the numbers from 1 to 100 into the queue and then pop each number back off. Not a very complex task but it should test the aspects of the data structure.

q_comp <- function(str, num = 100) {
eval(parse(text = paste0('x <- ', str)))
for(i in seq(num)) x$push(i) for(i in seq(num)) x$pop()
}

benchmark(replications = rep(100, 3),
q_comp('Queue_RC$new()'), q_comp('Queue_R6$new()'),
columns = c('test', 'elapsed', 'replications', 'relative'))
##                       test elapsed replications relative
## 2 q_comp("Queue_R6$new()") 0.339 100 1.003 ## 4 q_comp("Queue_R6$new()")   0.357          100    1.056
## 6 q_comp("Queue_R6$new()") 0.338 100 1.000 ## 1 q_comp("Queue_RC$new()")   1.044          100    3.089
## 3 q_comp("Queue_RC$new()") 0.986 100 2.917 ## 5 q_comp("Queue_RC$new()")   0.990          100    2.929

So the R6 version is indeed faster, about 3x faster.

# Optimizations

There is more though. The R6 approach provides some other possible methods to improve the speed. These can be realized by turning different options off. The main two are the portable argument and class argument to FALSE.

We can create a function with each of these options and one with both modified. Then we can compare each of them to the base with everything set to TRUE.

benchmark(replications = rep(100, 3),
q_comp('Queue_R6$new()'), q_comp('Queue_R6_class$new()'),
q_comp('Queue_R6_port$new()'), q_comp('Queue_R6_both$new()'),
columns = c('test', 'elapsed', 'replications', 'relative'))
##                              test elapsed replications relative
## 4   q_comp("Queue_R6_both$new()") 0.104 100 1.020 ## 8 q_comp("Queue_R6_both$new()")   0.106          100    1.039
## 12  q_comp("Queue_R6_both$new()") 0.104 100 1.020 ## 2 q_comp("Queue_R6_class$new()")   0.104          100    1.020
## 6  q_comp("Queue_R6_class$new()") 0.102 100 1.000 ## 10 q_comp("Queue_R6_class$new()")   0.102          100    1.000
## 3   q_comp("Queue_R6_port$new()") 0.364 100 3.569 ## 7 q_comp("Queue_R6_port$new()")   0.363          100    3.559
## 11  q_comp("Queue_R6_port$new()") 0.356 100 3.490 ## 1 q_comp("Queue_R6$new()")   0.366          100    3.588
## 5        q_comp("Queue_R6$new()") 0.373 100 3.657 ## 9 q_comp("Queue_R6$new()")   0.353          100    3.461

It seems as long as you have the class argument set to FALSE you will see 3x to 4x improvements in speed. This means it is about an order of magnitude over the RC approach. This seems to be close to on par with the other things I have seen, maybe a bit less. I am sure it will change based on input size and complexity of the problem as well.

# Stricter Comparison

Just out of curiosity I am also interested in seeing how fast this is to code in something known to be faster. The JVM should foot the bill here, and I can run it from inside of R in the same benchmark suite. I bet it will be much faster but not sure by how much.

# Create the interpreter instance.
scala <- scalaInterpreter()

interpret(scala,'
def queue_scala(n: Int) {
val queue = new scala.collection.mutable.Queue[Integer]
for( x <- (1 to n)) queue += x
}
')

queue_scala <- function(n) {
eval(parse(text = paste0('scala["queue_scala(', n, ')"]')))
}

r_queue <- function(n) {
x <- Queue_R6_both$new() for(i in seq(n)) x$push(i)
}

comp <- data.frame(n = numeric(), r = numeric(),
scala = numeric(), ratio = numeric())
for(i in 0:5) {
n <- formatC(10 ^ i, format = "fg")
scalaTime <- system.time(queue_scala(n))[3]
rTime <- system.time(r_queue(10 ^ i))[3]
ratio <- rTime / scalaTime
comp <- rbind(comp, data.frame(10 ^ i, scalaTime, rTime, ratio))
}

comp
##              n scalaTime   rTime        ratio
## elapsed  1e+00     0.150   0.000 0.000000e+00
## elapsed1 1e+01     0.125   0.000 0.000000e+00
## elapsed2 1e+02     0.116   0.000 0.000000e+00
## elapsed3 1e+03     0.130   0.009 6.923077e-02
## elapsed4 1e+04     0.104   0.818 7.865385e+00
## elapsed5 1e+05     0.138 160.043 1.159732e+03

The R version seems to blow up in time here while the Scala version has not budged. I want to try and see how far it can go until it seems an increas in time. I should run it by itself.

comp <- data.frame(n = numeric(), scala = numeric())
for(i in 0:7) {
n <- formatC(10 ^ i, format = "fg")
scalaTime <- system.time(queue_scala(n))[3]
comp <- rbind(comp, data.frame(n, scalaTime))
}

comp
##                 n scalaTime
## elapsed         1     0.095
## elapsed1       10     0.093
## elapsed2      100     0.099
## elapsed3     1000     0.090
## elapsed4    10000     0.095
## elapsed5   100000     0.088
## elapsed6  1000000     0.557
## elapsed7 10000000     9.983

That seems to do the trick. This comparison is not completely fair or rigorous. This is because there are other things happening outside of the data structure itself. I think the explosion actually comes from the for loop in R, which would be fairly easy to learn. More what I was interested in was if it can work in a similar amount of time for moderate tasks. It looks as though you can use the method in R as long as you are dealing with data that is tens of thousands. I bet you can still use it after that but you probably can't use a for loop. I gave the implementation the ability to load vectors as well, so we can bulk load.

system.time(x <- Queue_R6$new(1:1000000)) ## user system elapsed ## 0.604 0.018 0.622 system.time(x <- Queue_R6$new(1:10000000))
##    user  system elapsed
##  10.922   0.224   9.909

Getting rid of the for loop takes out most of all of the load time. Getting things out may be a different story though.

# Conclusion

It is interesting though that the R version is much faster at first then it gets slower than the Scala version. After some thought this is obvious though. It takes some time to call out to the Scala version which is why it does not speed up for the first few iterations, the real computation time is dwarfed by the external call time. Then at a certain point the time is longer for the computation.

This is pretty cool though because there is a pretty easy to use method to create common data structures that are also fast.