Dataset : BlogCatalog

BlogCatalog Data Set
Abstract: BlogCatalog is the social blog directory which manages the bloggers and their blogs.

Number of Nodes:


Number of Edges:


Missing Values?



Nitin Agarwal+, Xufei Wang*, Huan Liu*

+ Department of Information Science, University of Arkansas at Little Rock.

* School of Computing, Informatics and Decision Systems Engineering, Arizona State University. E-mail:,

Data Set Information:

2 files are included:

1. nodes.csv
-- it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains all the node ids used in the dataset.

2. edges.csv
-- this is the friendship network among the bloggers. The blogger's friends are represented using edges. Here is an example.


This means blogger with id "1" is friend with blogger id "2".

Attribute Information:

This is the data set crawled on July, 2009 from BlogCatalog ( ). BlogCatalog is a social blog directory website. This contains the friendship network crawled. For easier understanding, all the contents are organized in CSV file format.

-. Basic statistics

Number of bloggers : 88,784

Number of friendship pairs: 4,186,390

Relevant Papers:

Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang. "A Social Identity Approach to Identify Familiar Strangers in a Social Network", 3rd International AAAI Conference on Weblogs and Social Media (ICWSM09), pp. 2 - 9, May 17-20, 2009. San Jose, California.

Citation Request:

