LDMTA2010: Large-scale Data Mining: Theory and Applications (LDMTA 2010)

http://arnetminer.org/LDMTA2010   [CFP in PDF] [CFP in txt]

The 2nd Workshop on Large-scale Data Mining: Theory and Applications (LDMTA 2010)

in conjunction with SIGKDD2010, July 25‐28, 2010, Washington, DC,USA. to submit


**Award** (May 15, 2010) We are proud to announce that this year, our workshop will provide travel award to support researcher attending KDD-2010 [check details].

The following are are the student travel fellowship awardee (thanks to NSF and IBM):

Xintian Yang, Ohio State University
Mario Navas, University of Houston
U Kang, Carnegie Mellon Univ.

**New** The program schedule is available here.

Theme and Topics

Due to the explosion of various data, there has been an increasing demand for scalable machine learning and data mining algorithms in many applications, such as medical informatics, telecommunications, social network analysis, and e-commerce. We are interested in investigating the scalability and efficiency of existing machine learning and data mining algorithms with respect to both theoretical and experimental perspectives. We seek papers in the following topics: 

  • Methodologies or systems on large-scale data mining;
  • Scalable data mining algorithms and systems over multiple (heterogeneous) data sources;
  • Scalable data preprocessing and cleaning techniques;
  • Scalable mining systems in finance, sciences, retail, e-commerce;
  • Emerging applications of large-scale data mining, such as climate modeling, medical informatics;
  • Empirical study of data mining algorithms and applications;
  • Parallel data mining methods and applications;
  • Web mining and social search applications.

Invited Speaker

David A. FerrucciDepartment Group Manager, IBM T.J. Watson Research Center

Talk Title: Build Watson: An Overview of DeepQA for The Jeopardy! Challenge

Abstract: Computer systems that can directly and accurately answer peoples' questions over a broad domain of human knowledge have been envisioned by scientists and writers since the advent of computers themselves. Open domain question answering holds tremendous promise for facilitating informed decision making over vast volumes of natural language content. Applications in business intelligence, healthcare, customer support, enterprise knowledge management, social computing, science and government would all benefit from deep language processing. The DeepQA project (www.ibm.com/deepqa) is aimed at exploring how advancing and integrating Natural Language Processing (NLP), Information Retrieval (IR), Machine Learning (ML), and massively parallel computation and Knowledge Representation and Reasoning (KR&R) can greatly advance open-domain automatic Question Answering. An exciting proof-point in this challenge is to develop a computer system that can successfully compete against top human players at the Jeopardy! quiz show (www.jeopardy.com). Attaining champion-level performance Jeopardy! requires a computer system to rapidly and accurately answer rich open-domain questions, and to predict its own performance on any given category/question. The system must deliver high degrees of precision and confidence over a very broad range of knowledge and natural language content with a 3-second response time. To do this DeepQA evidences and evaluates many competing hypotheses. A key to success is automatically learning and combining accurate confidences across an array of complex algorithms and over different dimensions of evidence. Accurate confidences are needed to know when to “buzz in” against your competitors and how much to bet. High precision and accurate confidence computations are just as critical for providing real value in business settings where helping users focus on the right content sooner and with greater confidence can make all the difference. The need for speed and high precision demands a massively parallel computing platform capable of generating, evaluating and combing 1000’s of hypotheses and their associated evidence. In this talk I will introduce the audience to the Jeopardy! Challenge and describe our technical approach and our progress on this grand-challenge problem.

Speaker Bio: Dr. David Ferrucci is a Research Staff Member at IBM’s T.J. Watson’s Research Center and where he leads the Semantic Analysis and Integration department. His research focuses on technologies for discovering knowledge in natural language and for leveraging the results in a variety of intelligent search and knowledge management solutions. He has been the Principal Investigator (PI) on several government-funded research programs on automatic question answering, intelligent systems and saleable text analytics. His team consists of 25 researchers and software engineers specializing in the areas of Natural Language Processing (NLP), Software Architecture, Information Retrieval, Machine Learning and Knowledge Representation and Reasoning (KR&R). Dr. Ferrucci, as chief architect, led the UIMA project at IBM and chaired the UIMA standards committee at OASIS. UIMA is a software framework and open standard used by industry and academia for integrating, deploying and scaling advanced text and multi-modal analytics. The UIMA framework is deployed in IBM products and has been contributed to Apache open-source to facilitate broader adoption and development. UIMA helped lay the foundation for doing large-scale, collaborative unstructured information research. In 2007, Dr. Ferrucci took on the Jeopardy! Challenge – tasked to create a computer system that can rival human champions at the game of Jeopardy!. As the PI for the exploratory research project dubbed DeepQA, he focused on advancing automatic, open-domain question answering using massively parallel evidence based hypothesis generation and evaluation. He explored the feasibility, won the support for and has set and driven the technical agenda for the Jeopardy! Challenge. He engaged top university researchers in the field to help explore better ways to openly and collaboratively accelerate research at a workshop on the open advancement of Question Answering. By building on UIMA, on key university collaborations and by taking bold research, engineering and management steps, he led his team to integrate and advance many search, NLP and semantic technologies to deliver results that have out-performed all expectations and have demonstrated world-class performance at a task previously thought insurmountable with the current state-of-the-art. Watson, the computer system built by Ferrucci’s team is now competing with top Jeopardy! players. Next steps are to demonstrate how DeepQA can help make dramatic advances for intelligent decision support in areas including medicine, government and law. Dr. Ferrucci graduated from Manhattan College with a BS in Biology and from Rensselaer Polytechnic Institute in 1994 with a PhD in Computer Science specializing in knowledge representation and reasoning. He is published in the areas of AI, KR&R, NLP and automatic question answering.

Robert Grossman
, Professor, Laboratory for Advanced Computing, University of Illinois at Chicago (UIC)

Talk Title: My Other Computer is a Data Center: The Sector Perspective on Big Data

Abstract: The publication of a series of Google technical reports describing the Google File System, MapReduce and BigTable and the development of the Hadoop system and related projects that provided an open source implementation of this technology changed the way that scientists computed with big data.  Today, developing systems to manage and analyze big data is an active area of research.

In this talk, we give an overview of some of this work.  The Sector/Sphere system is one of the systems being developed for working with big data.  It consists of a parallel programming framework called Sphere that enables arbitrary user defined functions to be executed over the data managed by the Sector distributed storage system.  In this talk, we describe Sector and Sphere and some of the lessons that we have learned over the past several years using Hadoop, Sector/Sphere and related systems when working with big data.

Speaker Bio: Dr. Robert Grossman is the Managing Partner of Open Data Group. Open Data helps companies develop and improve their analytic strategies and provides outsourced analytic services so that companies can increase revenues, decrease costs, and improve business processes. Dr. Grossman is also the Director of the Laboratory for Advanced Computing at the University of Illinois at Chicago. The Laboratory for Advanced Computing performs research in data intensive computing and develops open source computing infrastructure, such as the UDT protocol for high performance data transport and the Sector system for cloud computing.

Important Date

  • New Submission deadline: May 10th, 2010
  • Submission deadline: May 4th, 2010
  • Review period: approximately 3 weeks
  • Notification date: May 25th, 2010
  • Final version submission date: May 28th, 2010


Please prepare your paper not more than 10 pages in PDF file, Prepare your paper not more than 10 pages in PDF file, with ACM camera‐ready template: http://www.acm.org/sigs/pubs/proceed/template.html.

All papers must be submitted in Adobe Portable Document Format (PDF). Please ensure that any special fonts used are included in the submitted documents. Please use the following link to submit your paper here.


If you cannot submit there, please send to us by email <liuya@us.ibm.com>

Workshop Co-Chairs

  • Chid Apte, IBM TJ Watson, apte@us.ibm.com
  • Yan Liu, IBM TJ Watson, liuya@us.ibm.com
  • Jimeng Sun, IBM TJ Watson, jimeng@us.ibm.com 
  • Jie Tang, Tsinghua University, jietang@tsinghua.edu.cn 

Steering Committee

  • Christos Falutosos, Carnegie Mellon University, 
  • Robert Grossman, University of Chicago 
  • Jiawei Han, University of Illinois at Urbana-Champaign

Program Committee

  • Petros Drineas, RPI
  • Daniel Dunlavy, Sandia National Laboratories
  • Amol Ghoting, IBM T. J. Watson Research
  • Tamara Kolda, Sandia National Labs
  • Aditya Menon, University of California, San Diego
  • Yu-Ru Lin, Arizona State University
  • Spiros Papadimitriou, IBM Research
  • Hanghang Tong, Carnegie Mellon University
  • Charalampos Tsourakakis, Carnegie Mellon University
  • Fei Wang, Cornell University
  • Xifeng Yan, University of California, San Babara
  • Elad Yom-Tov, IBM Research, Haifa Lab
  • Jeffrey Yu, Chinese University of Hong Kong

Contact us

Yan Liu, IBM TJ Watson Research Center, liuya@us.ibm.com, 1-914-945-2128

Travel Award

The application procedure: 

The applicants should submit the following items by Jun 25. The notification will be out to the applicants by early-July.

1) one page cover letter to summarize the their qualification for this award.

2) CV of the applicant

3) photocopy of their student ID or other proofs to be a full-time student such as department letter. 


Selection criteria for travel awards: 

Travel awards will be given with the following priority ranking by the workshop organizing committee to the applicants:

- Student authors, with priority to the ones whose paper received the highest score. 

If there is funding left, we will give it to non-author students, as long as they have at least one paper in a major data mining or database conference (KDD, PKDD, ICDM, PAKDD, SDM, SIGMOD, VLDB, ICDE, ICML, NIPS), with priorities to the follows:

- Female and minority students 

- Students who helped on organizing the workshops as student reviewers 

- Students who published in the related topics on large-scale data mining


Funding coverage: 

The award aims at covering the associated travel cost to participate the workshop including transportation cost, hotel, registration and meals. The funding will be equally divided to award all travels award recipients who are current US students. Note that depending on the funding availability, the travel award may not cover all the cost of the student travel. 

Program [Details]

9:00-9:10am Opening
9:10-10:10am Keynote1 
10:10-10:30am Break
10:30-12:30am Technical Presentation
12:10-2:00pm Lunch
2:00-3:00pm Keynote2 
3:00-3:30pm Break
3:30-4:20pm Technical Presentation
4:20-4:30pm Break
4:30-5:30pm Panel 
5:30-5:40pm Concluding remarks 



To edit this page, please click here. Maintained by ArnetMiner.Org