### ERGM: Creating Large Fully-Connected Network objects in Statnet

### TL;DR

Use the following approach:

```
````network <- network(matrix(1, n, n), directed=F)`

(At least with Statnet 1.7 / R64 2.13)
### In more depth

Statnet seems optimized for sparsely connected graphs. This is not too surprising, since many of the "real" graphs I deal with have a density around *d* = .0002 or thereabouts, and even some fairly large small world graphs, like IMDB only have a density of ~.18. However, there's one special case where I need to have a fully connected graph: the input to an `edgecov()`

term in an ERGM model. This graph has to have *all* possible edges, not just the observed edges, and so, density = 1.0.

One of the challenges is how to create and initialize these very large networks. The step to create them would often take a very long time, and it wasn't clear I was using the best approach. There are at least two method that seemed plausible: use `matrix`

to initialize an adjacency matrix, or use `network.intialize`

and then add all of the edges in afterwards. It was not clear up front which one would be faster. So, I did a quick experiment: I ran each method 50 times on various sized graphs, and compared the results.

```
````# Method 1: network(matrix()) `

startTime = proc.time()

for (i in 1:50) n <- network(matrix(1, 200, 200), directed=F)

proc.time() - startTime

# Method 2: network.intialize() and then assignment

startTime = proc.time()

for (i in 1:50) {

n <- network.initialize(200, directed=F)

n[,] <- 1

}

proc.time() - startTime

The results were pretty clear:

Method | 200x200 | 500x500 | |

#1 | 12.32 | 361.91 | |

#2 | 42.42 | 8+ hours |

I used `proc.time()`

for the timing. There are suggestions that this is not super-accurate, but the difference is so stark, I think even 1s resolution is more than enough. Also: I've discovered that 32-bit R is a really bad environment for working with even "medium-sized" graphs (500 nodes or so), much less "large" graphs. The extra address space afforded by the 64bit version of R avoids a lot of out-of-memory conditions.

## 0 Comments:

Post a Comment

<< Home