Recursive data mining for role identification in electronic communications.

International Journal of Hybrid Intelligent Systems(2010)

引用 1|浏览24
暂无评分
摘要
We present a text mining approach that discovers patterns at varying degrees of abstraction in a hierarchical fashion. The approach allows for certain degree of approximation in matching patterns, which is necessary to capture non-trivial features in realistic datasets. Due to its nature, we call this approach Recursive Data Mining (RDM). We demonstrate a novel application of RDM to role identification in electronic communications. We use a hybrid approach in which the RDM discovered patterns are used as features to build efficient classifiers. Since we want to recognize a group of authors communicating in a specific role within an Internet community, the challenge is to recognize possibly different roles of an author within different communication communities. Moreover, each individual exchange in electronic communications is typically short, making the standard text mining approaches less efficient than in other applications. An example of such a problem is recognizing roles in a collection of emails from an organization in which middle level managers communicate both with superiors and subordinates. To validate our approach we use the Enron dataset which is such a collection. The results show that a classifier that uses the dominant patterns discovered by Recursive Data Mining performs well in role identification.
更多
查看译文
关键词
Data mining,feature extraction or construction,text classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要