Resources (Statistics and Datasets for Research Purpose)

Arnetminer Resources (Statistics and Datasets for Research Purpose)

  • Arnetminer has been in operation on the internet since 2006. We have already collected 548,504 researcher profiles using an approach based on Conditional Random Fields (CRF), 2,858,504 publications, 5,042 conferences, and 32,215,473 paper-paper citation relationships, 47,443,857 coauthor relationships, and 14,720,130 paper-published-at relationships from online databases including DBLP, ACM Digital library, Citeseer, and others. The extracted/integrated data is stored into an academic network base. Based on the academic network, services such as expertise search, Bole search, citation tracing analysis, topical graph search, and topic browser have been provided. The system has received a large amount of accesses from more than 180 countries. Feedbacks from users and system logs indicate that users consider the system really help people to find and share information in the academic community.

  • In this page, we list some interesting results, problem specification, datasets, tools, codes:


  • - Conference rank in different years and by different algorithms

Expert List order by H-index

  • We calculate expert's H-index based on datas in Arnetminer, and we provide a sorted list of experts.

Researcher Profiling (Researcher Profile Extraction)

  • We are developing extraction tools in ArnetMiner, a researcher social network system. The tool will be used to extract researcher profile from the Web page and outputs the extracted information into a researcher database. (our related papers [ICDM'07] [KDD'08]).

Social Influence Analysis in Large-scale Network

  • In large social networks, nodes (users, entities) are influenced by others for various reasons. For example, the colleagues have strong influence on one??s work, while the friends have strong influence on one's daily life. How to differentiate the social influences from different angles (topics)? How to quantify the strength of those social influences? How to estimate the model on real large networks? In this work, we focus on measuring the strength of social influence quantitatively. (our related papers [KDD'09]).

Link Semantic Analysis on the Web

  • The work intends to study how to quantify link semantics. Specifically, an ideal output of link semantics analysis is to provide users with the following information: (1) multiple topics discussed in each page; (2) semantics of a link between two pages; and (3) the influential strength of each link. With such an analysis, a user could easily trace the origins of an idea/technique, analyze the evolution and impact of a topic, filter the pages by certain categories of links, as well as zoom in and zoom out the linkage tracing graph with the degree of influence.

Expert Finding and Association Search

  • The data set is organized for expert finding and association search. For expert finding, we chose 13 highly frequently queried keywords in ArnetMiner.org, and created 13 people lists as the ground truth. Details about how we created the data set is described here, and can also refer to our KDD2008 and ICDM2008 papers.

Conference Rank

  • We provide new feature of conference Rank.
  • We develop 3 algorithms for ranking conferences.