Simplifying Things with Neo4j

Kenny Darrell

July 4, 2014


Last time I walked through the Social Balance problem on networks of terrorist organizations. One thing that was interesting is that after all of the data was collected the code became very awkward. Trying to create algorithms and traversals on graphs in R is not very natural. There is a really nice solution to this problem though. A few posts back I wrote about creating some wrappers around Neo4j to use from R. There is now a nice package that wraps a much larger set of functionality in one cohesive package. You can find more info on this at the packages Github repo. This allows us to get rid of the code used to traverse the graph graph for triangles and evaluate social balance rules on them. We only have to worry about getting the data into a Neo4j database.

Lets start with the data that was used last time in its cleanest state. This means that we have just used R for scraping and cleaning of the data. This code was all pretty standard.

You can download the data up to this point from the link below.

Cleaned Data

options(stringsAsFactors = FALSE)


First we need make sure we have an instance of Neo4j up and running. Without going into too many of the details you need to download Neo4J, instructions for doing this can be found here. Once installation has been done you can use the commands below on a Mac to get Neo4j up and running.

cd neo4j-community-2.0.3
bin/neo4j start

If you use a browser and navigate to http://localhost:7474/ you can validate that it is indeed up and running.

Next we need to load the RNeo4j package. To get this setup there are instructions here. Once it is loaded we have to create a connection to the database instance.

graph = startGraph("http://localhost:7474/db/data/")


There are a few steps of preparation needed, but these are minor. We need to insert the data into the database. This is done using the functionality provided by the RNeo4j package.

# Create the unique set of all orgainzations
nodes <- unique(rbind(data.frame(id = net$from, name = net$fromn),
                      data.frame(id = net$to,   name = net$ton)))

# These will hold pointers to the node in the database when we create edges.
neoNodes <- list()

# Loop through all nodes and insert into db, retaining pointer.
for(i in 1:nrow(nodes)) {
  neoNodes[[i]] = createNode(graph, label = "Terror", name = nodes$id[i], 
                           aka = nodes$name[i]) 

# Now we add each edge to the db as a relationship.
for (i in 1:nrow(net)) {
  # Get index for which nodes are in this edge.
  from <- which(nodes == net$from[i])
  to <- which(nodes == net$to[i])
  # Add to db.
  createRel(from]], net$conn[i], neoNodes[[to]], how = net$type[i])


Now that all of the data is loaded we can use Cypher, a query language in Neo4j, to do all of the heavy lifting. This basically boils down to writing a query for the data that we want. We can do both steps in one, finding triangles and resolving there adherence to social balance.

q <- 'MATCH (a)-[r:hates]->(b)<-[s:likes]-(c), (a)-[t:hates]->(c) 
      return a.aka, b.aka, c.aka;'
tri <- cypher(graph, query = q)

triples <- rbind(data.frame(Source = tri[, 1], Target = tri[, 2]),
           data.frame(Source = tri[, 1], Target = tri[, 3]),
           data.frame(Source = tri[, 2], Target = tri[, 3]))

d3plot(triples, 700, 600)


The complexity and awkwardness of this code has decreased substantially. There was some added complexity of adding the data to Neo4j, but this was a one time thing. In the case where everything happened within R if you wanted to do a another type of traversal you would have to recode the structure and the traversal algorithm. Now you can just write another query. These queries are also very interesting, they are completely declarative. You specify the structure or shape of what you want and the database takes care of how to resolve it. I am looking forward to exploring this type of thing more as it seems to give you a lot of power to do a lot of things in a rather straightforward manner.