The data set is designed for research purpose only.
It consists of all papers from DBLP. Besides the information from DBLP, we augment each paper with abstract (if we can find from the Web) and the citation relationship between papers. We will continuously release the latest version.
Version 1: DBLP-Citation-network: 1,632,442 papers and 2,327,450 citation relationships (2010-10-22).
|Data set||#paper||#Citation Relationship||Comment|
|Version 1: DBLP-Citation-network|
The first version of the DBLP + Citation data set is organized into 1,632,442 blocks, each for a paper. [Download]
For each block, each line starting with a specific prefix indicates an attribute of the paper. More specifically,
#* --- paperTitle
#@ --- Authors
#t ---- Year
#c --- publication venue
#index 00---- index id of this paper
#% ---- the id of references of this paper (there are multiple lines, with each indicating a reference)
#! --- Abstract
The following is an example:
#*Information geometry of U-Boost and Bregman divergence
#@Noboru Murata,Takashi Takenouchi,Takafumi Kanamori,Shinto Eguchi
#!We aim at an extension of AdaBoost to U-Boost, in the paradigm to build a stronger classification machine from a set of weak learning machines. A geometric understanding of the Bregman divergence defined by a generic convex function U leads to the U-Boost method in the framework of information geometry extended to the space of the finite measures over a label set. We propose two versions of U-Boost learning algorithms by taking account of whether the domain is restricted to the space of probability functions. In the sequential step, we observe that the two adjacent and the initial classifiers are associated with a right triangle in the scale via the Bregman divergence, called the Pythagorean relation. This leads to a mild convergence property of the U-Boost algorithm as seen in the expectation-maximization algorithm. Statistical discussions for consistency and robustness elucidate the properties of the U-Boost methods based on a stochastic assumption for training data.
The dataset can be downloaded [DBLP + Citation Version 1: here]
If you use this data set for research, please cite one of the following papers: