Name | #nodes | #edges | #labels | Type | URL |
---|---|---|---|---|---|
Youtube | 1,138,499 | 2,990,443 | 47 | undirected | [raw] [preprocessed] |
TWeibo | 2,320,895 | 50,655,143 | 100 | directed | [raw] [preprocessed] |
Orkut | 3,072,441 | 117,185,084 | 100 | undirected | [raw] [preprocessed] |
In-2004 | 1,382,908 | 16,539,643 | - | directed | [raw] [preprocessed] |
DBLP | 5,425,963 | 17,298,032 | - | undirected | [raw] [preprocessed] |
Pokec | 1,632,803 | 30,622,564 | - | directed | [raw] [preprocessed] |
LiveJournal | 4,847,571 | 68,475,391 | - | directed | [raw] [preprocessed] |
IT-2004 | 41,291,594 | 1,135,718,909 | - | directed | [raw] [preprocessed] |
41,652,230 | 1,468,365,182 | - | directed | [raw] [preprocessed] | |
Friendster | 65,608,366 | 1,806,067,135 | - | undirected | [raw] [preprocessed] |
UK-2007 | 105,896,555 | 3,738,733,648 | - | directed | [raw] [preprocessed] |
UK-union | 133,633,040 | 5,475,109,924 | - | directed | [raw] [preprocessed] |
ClueWeb12 | 978,408,098 | 42,574,107,469 | - | directed | [raw] |
ClueWeb09 | 1,684,868,322 | 7,939,635,651 | - | directed | [raw] [preprocessed] |
Welcome to cite our paper if you publish results based on our preprocessed datasets.
@article{yang13homogeneous,
title={Homogeneous Network Embedding for Massive Graphs via Reweighted Personalized PageRank},
author={Yang, Renchi and Shi, Jieming and Xiao, Xiaokui and Yang, Yin and Bhowmick, Sourav S},
journal={Proceedings of the VLDB Endowment},
volume={13},
number={5}
}
@article{shi13realtime,
title={Realtime Index-Free Single Source SimRank Processing on Web-Scale Graphs},
author={Shi, Jieming and Jin, Tianyuan and Yang, Renchi and Xiao, Xiaokui and Yang, Yin},
journal={Proceedings of the VLDB Endowment},
volume={13},
number={7}
}
Name | Type | #nodes | #edges | #attributes | #labels | URL |
---|---|---|---|---|---|---|
Wiki | directed | 2405 | 17981 | 4973 | 19 | [raw] [preprocessed] |
Cora | directed | 2708 | 5429 | 1433 | 7 | [raw] [preprocessed] |
Citeseer | directed | 3312 | 4660 | 3703 | 6 | [raw] [preprocessed] |
Pubmed | directed | 19717 | 44338 | 500 | 3 | [raw] [preprocessed] |
BlogCatalog | undirected | 5196 | 343486 | 8189 | 6 | [raw] [preprocessed] |
PPI | undirected | 56944 | 818716 | 50 | 121 | [raw] [preprocessed] |
undirected | 232965 | 11606919 | 300 | 41 | [raw] [preprocessed] | |
Flickr | undirected | 7575 | 479476 | 12047 | 9 | [raw] [preprocessed] |
undirected | 4039 | 88234 | 1283 | 193 | [raw] [preprocessed] | |
directed | 81306 | 1768149 | 216839 | 4065 | [raw] [preprocessed] | |
Google+ | directed | 107614 | 13673453 | 15907 | 468 | [raw] [preprocessed] |
TWeibo | directed | 2320895 | 50655143 | 1657 | 8 | [raw] [preprocessed] |
MAG | directed | 59249719 | 978147253 | 2000 | 100 | [raw] [preprocessed] |
MAG-SC | directed | 10541560 | 265219994 | 2784240 | 8 | [raw] [preprocessed] |
Tips: node attributes in our preprocessed datasets are compressed as “attrs.pkl” file via cPickle package in Python 2.7 or “attrs.npz” file, which can be loaded as a sparse attribute matrix by using the following code
import cPickle as pickle
features = pickle.load(open("attrs.pkl"))
or
from scipy import sparse
features = sparse.load_npz("attrs.npz")
Welcome to cite our paper if you publish results based on our preprocessed datasets.
@article{yang2020scaling,
title={Scaling Attributed Network Embedding to Massive Graphs},
author={Yang, Renchi and Shi, Jieming and Xiao, Xiaokui and Yang, Yin and Liu, Juncheng and Bhowmick, Sourav S},
journal={Proceedings of the VLDB Endowment},
volume={14},
number={1},
pages={37--49},
year={2021},
publisher={VLDB Endowment}
}