A Scalable Generative Graph Model with Community Structure
About
Network data is ubiquitous and growing, yet we lack realistic generative network models that can be calibrated to match real-world data. The recently proposed Block Two-Level Erdss-Renyi (BTER) model can be tuned to capture two fundamental properties: degree distribution and clustering coefficients. The latter is particularly important for reproducing graphs with community structure, such as social networks. In this paper, we compare BTER to other scalable models and show that it gives a better fit to real data. We provide a scalable implementation that requires only O(d_max) storage where d_max is the maximum number of neighbors for a single node. The generator is trivially parallelizable, and we show results for a Hadoop MapReduce implementation for a modeling a real-world web graph with over 4.6 billion edges. We propose that the BTER model can be used as a graph generator for benchmarking purposes and provide idealized degree distributions and clustering coefficient profiles that can be tuned for user specifications.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Synthetic Network Generation | DBLP | GFDL113.84 | 29 | |
| Synthetic Network Generation | YouTube | GFDL1 Score2.24 | 29 | |
| Synthetic Network Generation | PolBlogs | GFDL1 Score2.09 | 29 | |
| Synthetic Network Generation | Yelp | GFDL1 Link Prediction Accuracy0.73 | 25 | |
| Synthetic Network Generation | YouTube | Eigenvalue MMD6.42 | 14 | |
| Synthetic Network Generation | DBLP | Transitivity6.09 | 11 | |
| Synthetic Network Generation | PolBlogs | Triangle Similarity (MMD)0.88 | 11 | |
| Synthetic Network Generation | DBLP | Link Prediction Accuracy85 | 10 | |
| Synthetic Network Generation | YouTube | Link Prediction Accuracy Ratio91 | 10 | |
| Synthetic Network Generation | Yelp friendship network | Triangle Ratio (10^-4)2.27 | 9 |