Arnetminer Introduction

Arnetminer Flyer [EN] [中文]

Arnetminer: search and mining of academic social networks

-- Offering comprehensive search and mining services for academic community

-- 提供最全面的学术社区搜索和挖掘功能


Arnetminer ( aims to provide comprehensive search and mining services for researcher social networks. In this system, we focus on: (1) creating a semantic-based profile for each researcher by extracting information from the distributed Web; (2) integrating academic data (e.g., the bibliographic data and the researcher profiles) from multiple sources; (3) accurately searching the heterogeneous network; (4) analyzing and discovering interesting patterns from the built researcher social network. The main search and analysis functions in arnetminer include:

  • Profile search: input a researcher name (e.g., Jie Tang), the system will return the semantic-based profile created for the researcher using information extraction techniques. In the profile page, the extracted and integrated information include: contact information, photo, citation statistics, academic achievement evaluation, (temporal) research interest, educational history, personal social graph, research funding (currently only US and CN), and publication records (including citation information, and the papers are automatically assigned to several different domains).
  • Expert finding: input a query (e.g., data mining), the system will return experts on this topic. In addition, the system will suggest the top conference and the top ranked papers on this topic. There are two ranking algorithms, VSM and ACT. The former is similar to the conventional language model and the latter is based on our Author-Conference-Topic (ACT) model. Users can also provide feedbacks to the search results.
  • Conference analysis: input a conference name (e.g., KDD), the system returns who are the most active researchers on this conference, and the top-ranked papers.
  • Course search: input a query (e.g., data mining), the system will tell you who are teaching courses relevant to the query.
  • Associate search: input two researcher names, the system returns the association path between the two researchers. The function is based on the well-known "six-degree separation" theory.
  • Sub-graph search: input a query (e.g., data mining), the system first tells you what topics are relevant to the query (e.g., five topics "Data mining", "XML Data", "Data Mining / Query Processing", "Web Data / Database design", "Web Mining" are relevant), and then display the most important sub-graph discovered on each relevant topic, augmented with a summary for the sub-graph.
  • Topic browser: based on our Author-Conference-Topic (ACT) model, we automatically discover 200 hot topics from the publications. For each topic, we automatically assign a label to represent its meanings. Furthermore, the browser presents the most active researchers, the most relevant conferences/papers, and the evolution trend of the topic is discovered.
  • Academic ranks: we define 8 measures to evaluate the researcher's achievement. The measures include "H-index", "Citation", "Uptrend, "Activity", "Longevity", "Diversity, "Sociability", "New Star". For each measure, we output a ranking list in different domains. For example, one can search who have the highest citation number in the "data mining" domain.
  • User management: one can register as a user to: (1) modify the extracted profile information; (2) provide feedback on the search results; (3) follow researchers in arnetminer; (4) create an arnetminer page (which can be used to advertise confs/workshops, or recruit students). has been in operation on the internet for more than three years. Currently, the academic network includes more than 6,000 conferences, 3,200,000 publications, 700,000 researcher profiles. The system attracts users from more than 200 countries and receives >200,000 access logs per day. The top five countries where users come from are United States, China, Germany, India, and United Kingdom.

Talks Given

The system has been demonstrated at KDD'07KDD'08WWW'08ISWC'07. We are invited to introduce the system at MSRAGOOGLE China, IBM TJ Watson, NokiaUIUCCUHKHKUST, Peking U, ICT CAS. This is a presentation I have made at UIUC and CUHK.

Arnetminer History

  • 2006/5, V0.1 Perl-based CGI version

Profile extraction, person/paper/conf. search

  • 2006/8, V1.0 Java (Demo @ ASWC)

Rewrite the above functions

  • 2007/7, V2.0 (Demo @ KDD, ISWC)

New: survey search, research interest, association search

  • 2008/4, V3.0 (Demo @ WWW)

Query understanding, New search GUI, log analysis

  • 2008/11, V4.0 (Demo @ KDD, ICDM)

Graph search, topic mining, NSFC/NSF

  • 2009/4, V5.0 (Demo @ KDD)

Bole/course search, profile editing, open resources, #citation

  • 2009/12, V6.0

Academic statistics, user feedbacks, refined ranking

Team Members

Thanks to all members involved in this project! is initialized and the first version is also developed by Jie Tang in 2006.

The core team members include:


  Jie Tang  Principle Investigator and project leader, Associate Prof @ Department of Computer Science and Technology, Tsinghua University.  Dr. Tang  continues to lead the arnetminer project, which aims to offer the best services for searching and mining academic social network!

  Bo Gao, Software Engineer. Currently he is in charge of the development and maintenance of the system. 

  Jing Zhang, Master (graduated). She is one of the first two members in our team, and started to work in the team from 2006. She helped develop a number of important functions (including: expert finding [7,9], the user interface, query understanding, and user management) in Arnetminer. Currently she is working in IBM CDL. 


With noted contribution from:

  Mingcai Hong, Master (graduated). Mingcai is one of the first two members. He is the major developer of the second version of Currently, he is working at  a fund management company.

  Limin Yao, Master (graduated). Limin worked on the researcher profiling problem [9, 11], and she implemented the method for extracting user profiles from the Web using conditional random fields (CRFs). She is currently a Ph.D student at UMass, working with Andrew McCallum. 

  Duo Zhang, Master (graduated). Duo proposed a name disambiguation method [9, 12] based on hidden Markov Random Fields (HMRFs). He is now a Ph.D student at UIUC, working with Chengxiang Zhai.

  Feng Wang, Master (graduated). Feng developed the topic browser based on the output of Author-Conference-Topic (ACT) [6] model. He is now working at

  Zi Yang, Master student. Zi implemented the Bole search (finding best supervisor). He also helped develop the sub-graph search function. 

  Yize Li, Bachelor (graduated). Yize was working on temporal expertise search. She is now a PhD student at UCSC. 

  Liu Liu, Bachelor (graduated). Liu was working on academic suggestion. She studied variant methods based on topic model for recommending papers, paper reviewers, and relevant conferences/journals. She is now a Master student at CMU. 

  Chi Wang, Bachelor (graduated). Chi developed a new approach, which can automatically discover the advisor-advisee relationships between researchers [2]. He was also working on social influence analysis problem [4]. The discovered relationships can be very helpful to Bole search and community discovery. Currently, Chi is a Ph.D student at UIUC, working with Jiawei Han. 

  Chenhao Tan, Bachelor. Chenhao helped develop the new version of expert finding. He is also working on the social action tracking problem [1]. He will graduate in the summer of 2010. 

  Zhe Wang, Bachelor. Zhe implemented the academic ranking function. He implemented the eight measures for evaluate the academic achievements of researchers. Also, he helped create the personal statistics page, for example,

Zhanpeng Fang, Bachelor. Zhanpeng helped develop the new version of organization search. He also helped developed the conference analysis page.

Other contributors:

Xiaocao Zhou, who developed course search.

Quan Lin, who developed user feedback.

Bo Wang, who help develop the bole search.


Special thanks to collaborators

Many have contributed to and gave many constructive suggestions. We would like to thank (some may be missing) :

Keke Cai

Chris Ding

Ying Ding

Wei Fan

A.C.M. Fong

Jiawei Han

Ruoming Jin

Irwin King

Juanzi Li

Bangyong Liang

Yan Liu

Qiong Luo

Rui Ma

Zhong Su

Jimeng Sun

Haixun Wang

Qiang Yang

Jeffrey Xu Yu

Philip S. Yu

Lei Zhang

Li Zhang

Representative Publication

Social influence analysis: Topic-based social influence analysis in large-scale networks

  1. Chenhao Tan, Jie Tang, Jimeng Sun, Quan Lin, and Fengjiao Wang. Social Action Tracking via Noise Tolerant Time-varying Factor Graphs. In Proceedings of the Sixteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'2010).
  2. Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, and Jingyi Guo. Mining Advisor-Advisee Relationships from Research Publication Networks. InProceedings of the Sixteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'2010).
  3. Zi Yang, Jie Tang, and Juanzi Li, Social Community Analysis via Factor Graph Model. IEEE Intelligent Systems, (to appear).
  4. Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. Social Influence Analysis in Large-scale Networks. In Proceedings of the Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'2009). pp. 807-816.
  5. Jie Tang, Jing Zhang, Jeffrey Xu Yu, Zi Yang, Keke Cai, Rui Ma, Li Zhang, and Zhong Su. Topic Distributions over Links on Web. In Proceedings of 2009 IEEE International Conference on Data Mining (ICDM'2009). pp. 1055-1060.

Social network ranking: Topic-based heterogeneous ranking

  1. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'2008). pp.990-998.
  2. Jie Tang, Ruoming Jin, and Jing Zhang. A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search. In Proceedings of 2008 IEEE International Conference on Data Mining (ICDM'2008). pp. 1055-1060.
  3. Bo Wang, Jie Tang, Wei Fan, Songcan Chen, Zi Yang, and Yanzhu Liu. Heterogeneous Cross Domain Ranking in Latent Space. In Proceedings of the Eighteenth Conference on Information and Knowledge Management (CIKM'2009). pp. 987-996.

Social network extraction: Acquiring knowledge from social Web

  1. Jie Tang, Limin Yao, Duo Zhang, and Jing Zhang. A Combination Approach to Web User Profiling. ACM Transaction on Knowledge Discovery from Data (TKDD), (to appear).
  2. Xin Xin, Juanzi Li, Jie Tang, and Qiong Luo. Academic Conference Homepage Understanding Using Constrained Hierarchical Conditional Random Fields. In Proceedings of the Seventeenth Conference on Information and Knowledge Management (CIKM'2008). pp. 1301-1310.
  3. Jie Tang, Duo Zhang, and Limin Yao. Social Network Extraction of Academic Researchers. In Proceedings of 2007 IEEE International Conference on Data Mining (ICDM'2007). pp. 292-301. 
  4. Duo Zhang, Jie Tang, Juanzi Li, and Kehong Wang. A Constraint-Based Probabilistic Framework for Name Disambiguation. In Proceedings of the Sixteenth Conference on Information and Knowledge Management (CIKM'2007). pp. 1019-1022.
  5. Jie Tang, Mingcai Hong, Juanzi Li, and Bangyong Liang. Tree-Structured Conditional Random Fields for Semantic Annotation. In Proceedings of the 5th International Conference of Semantic Web (ISWC'2006). pp. 640-653

Previous/Current Sponsors

The project is (was) partially funded by National High-tech R&D Program (863 Program)Chinese Young Faculty Research FundingNSFC Funded ProjectIBM China Research Lab, and Minnesota/China Collaborative Research Program.

If you are interested in sponsoring, please contact with Jie Tang.



Created by Jie Tang. Last updated on Mar. 11, 2010.