Generators and Random Variables

August 11, 2017

I have been using Python more lately and something I find cool that is missing from R is the notion of Generator. In Python, generators are functions that act like iterators. In R you have iterartors but they are coupled with the for loop (or while). In R you can loop over things with and index.

for (i in 1:ncol(iris)) print(names(iris)[i])
## [1] "Sepal.Length"
## [1] "Sepal.Width"
## [1] "Petal.Length"
## [1] "Petal.Width"
## [1] "Species"

We can also do this by just using the thing iteself.

for (i in names(iris)) print(i)
## [1] "Sepal.Length"
## [1] "Sepal.Width"
## [1] "Petal.Length"
## [1] "Petal.Width"
## [1] "Species"

You are tied to using the for construct. There are other looping constructs in R as well such as the apply family and the map family from the purrr pacakge.

In Python creating a generator is basically the same as creating a function. You just add yield to return the current iteration and set the next value of the generator.

def firstn(n):
num = 0
while num < n:
yield num
num += 1

print(sum(firstn(1000)))
## 499500

Since R is functional you can have it create the function for you. For the first case lets do something like the case in Python with a known list. Also note whats below is a pretty quickly thrown together implementation. I would not be surprised if there was already a clean version out there. My intent was not making production grade code here.

firstn <- function(n) {
.seq <<- 1:n
function() {
tmp <- .seq[1]
.seq <<- .seq[-1]
if (is.na(tmp)) simpleError('fully consumed') else tmp
}
}

We can use it like so!

x <- firstn(2)
x()
## [1] 1
x()
## [1] 2
x()
## <simpleError: fully consumed>

You could make it cleaner by letting it know how many it has left, but again this is not a full implementation of anything.

In Python there are really two things at work with most uses of generators, the other being iterators. This is like the for usage in R. In Python the generator will be called to crate the sequence and if you loop over it you are using it as an interable, part of the duck typing notion. What about infinite generators. If you don’t know the upper bound, which also means you probably are not using it in a for loop, you can’t instantiate the sequence. Something that is pretty cool in Haskell is the infinite sequence.

take 5 [1,2..]

This is actually pretty similar to what happened in Python as far as a result goes, you still provide the upper bound. However the two dots are very different, this is an infinite list of numbers, using take just pulls out how many you ask for. Since Haskell uses lazy evaluation it never calculates the inifinite list, just up to 5 in this case, which is good because otherwise it might be a bit slow (not sure what the run time is for ininity). We would not do it this way in R though, we could however remove the notion of purity and introduce some side effects, or add state. If we want to make it generic we can have an argument as to how the sequence is created.

gen_inf <- function(init, op, by) {
.val <<- init
function() {
tmp <- op(.val, by)
.val <<- tmp
tmp
}
}

So we can give it mathematical operators like so.

x <- gen_inf(0, +, 1)
x()
## [1] 1
x()
## [1] 2
x <- gen_inf(1, *, 2)
x()
## [1] 2
x()
## [1] 4
x <- gen_inf(10, /, 2)
x()
## [1] 5
x()
## [1] 2.5

I made the variable that gets mutated here hidden so that it is less likely to have wierd bugs from interacting with other variables. We can also make it a little cleaner and remove side effects from the global scope, by using a closure. So now we are not changing state in the global environment.

Borrowing and some code and modifying it a bit from the very interesting Win Vector Blog.

gen1 <- function() {
i <- 0
function() {
i <<- i + 1
i
}
}

x <- gen1()
x()
## [1] 1
x()
## [1] 2

So now the state changes inside the closure. We can also borrow from an answer to this Stack Overflow Question and do the assignment in one go.

gen2 <- (function() {
i <- 0
function() {
i <<- i + 1
i
}
})()

gen2()
## [1] 1
gen2()
## [1] 2

This has another use, when other more complicated ways to calculate subsequent numbers are needed giving it as an argument is more difficult (maybe you can be cleaver and give it an anonymous function but who knows). This took the Python approach and created an explicit generator for the type of operation we needed opposed to a generator factory (which seems to meta).

What if we don’t even want to iterate, we may just want a draw from a random number generator, which is already a generator, but the implementation is somewhat hidden.

gen_rand <- function(rand, ...) {
function() {
rand(1, ...)
}
}
x <- gen_rand(runif)
x()
## [1] 0.2716148
x()
## [1] 0.6871177
x()
## [1] 0.6364619
x <- gen_rand(runif, 15, 20)
x()
## [1] 18.19208
x()
## [1] 15.53267
x()
## [1] 19.62515
x <- gen_rand(rpois, 10)
x()
## [1] 11
x()
## [1] 15
x()
## [1] 14

We can couple this with something you have probably never used in R to do something pretty cool.

makeActiveBinding('poisson', gen_rand(rpois, 10), env = globalenv())

poisson
## [1] 11
poisson
## [1] 11
poisson
## [1] 8

You could use this to write programs that are stochastic, or have some notion of probabilstic programming. Not the full Bayesian inference which usually comes with those but you have variables that are random draws from a distribution. Not really variables though, the variable is a function, or really a function that returns a fucntion to mimimc a generator that has had some syntactic sugar added by using the bindings to make the call not need parenthesis so it looks like a variable (thats a mouthful).