DBLP + Citations

DBLP Papers + Citation Relationship

Do you want the citation relationship between DBLP papers? Here you are.
Each node is a paper from DBLP, and is further associated with abstract and citation relationships, released by Arnetminer.org


The data set is designed for research purpose only.

It consists of all papers from DBLP. Besides the information from DBLP, we augment each paper with abstract (if we can find from the Web) and the citation relationship between papers. We will continuously release the latest version.

Version 1: DBLP-Citation-network:  1,632,442 papers and 2,327,450 citation relationships (2010-10-22).

Version 2: DBLP-Citation-network:  2,084,055 papers and 2,244,018 citation relationships (2010-10-22).

Data set #paper #Citation Relationship Comment
Version 1: DBLP-Citation-network 1,632,442 2,327,450 2011-01-01
Version 2: DBLP-Citation-network 2,084,055 2,244,018 2013-09-29


Data Description

For V1

The first version of the DBLP + Citation data set is organized into 1,632,442 blocks, each for a paper. [Download]

For each block, each line starting with a specific prefix indicates an attribute of the paper. More specifically,

#* --- paperTitle
#@ --- Authors
#t ---- Year
#c  --- publication venue
#index 00---- index id of this paper
#% ---- the id of references of this paper (there are multiple lines, with each indicating a reference)
#! --- Abstract

The following is an example:

#*Information geometry of U-Boost and Bregman divergence
#@Noboru Murata,Takashi Takenouchi,Takafumi Kanamori,Shinto Eguchi
#cNeural Computation
#!We aim at an extension of AdaBoost to U-Boost, in the paradigm to build a stronger classification machine from a set of weak learning machines. A geometric understanding of the Bregman divergence defined by a generic convex function U leads to the U-Boost method in the framework of information geometry extended to the space of the finite measures over a label set. We propose two versions of U-Boost learning algorithms by taking account of whether the domain is restricted to the space of probability functions. In the sequential step, we observe that the two adjacent and the initial classifiers are associated with a right triangle in the scale via the Bregman divergence, called the Pythagorean relation. This leads to a mild convergence property of the U-Boost algorithm as seen in the expectation-maximization algorithm. Statistical discussions for consistency and robustness elucidate the properties of the U-Boost methods based on a stochastic assumption for training data.

The dataset can be downloaded [DBLP + Citation Version 1:  here]


For V2

[DBLP + Citation Version 1:  here]

#* --- paperTitle
#@ --- Authors
#year ---- Year
#conf --- publication venue
#citation --- citation number (both -1 and 0 means none)
#index ---- index id of this paper
#arnetid ---- pid in arnetminer database
#% ---- the id of references of this paper (there are multiple lines, with each indicating a reference)
#! --- Abstract

The following is an example:

#*Spatial Data Structures.
#@Hanan Samet
#confModern Database Systems
#!An overview is presented of the use of spatial data structures in spatial data
bases. The focus is on hierarchical data structures, including a number of varia
nts of quadtrees, which sort the data with respect to the space occupied by it.
Such techniques are known as spatial indexing methods. Hierarchical data structu
res are based on the principle of recursive decomposition. They are attractive b
ecause they are compact and depending on the nature of the data they save space
as well as time and also facilitate operations such as search. Examples are give
n of the use of these data structures in the representation of different data ty
pes such as regions, points, rectangles, lines, and volumes.


If you use this data set for research, please cite one of the following papers:

Created by Jie Tang. Click here to edit. Last updated on October 20 , 2010.