Frequent Substructure-Based Approaches for Classifying Chemical Compounds

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining(2005)

引用 597|浏览1
暂无评分
摘要
Computational techniques that build models to correctly assign chemical compounds to various classes of interest have many applications in pharmaceutical research and are used extensively at various phases during the drug development process. These techniques are used to solve a number of classification problems such as predicting whether or not a chemical compound has the desired biological activity, is toxic or nontoxic, and filtering out drug-like compounds from large compound libraries. This paper presents a substructure-based classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the data set. The advantage of this approach is that during classification model construction, all relevant substructures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Experimental evaluation on eight different classification problems shows that our approach is computationally scalable and, on average, outperforms existing schemes by 7 percent to 35 percent.
更多
查看译文
关键词
different classification problem,frequent subgraph discovery algorithm,substructure-based classification algorithm,efficient frequent subgraph discovery,classifying chemical compounds,substructure discovery process,classification problem,chemical compound,computational technique,computational scalability,classification model construction,frequent substructure-based approaches,biochemistry,svm,feature extraction,scaling factor,computations,indexing terms,data mining,biological activity,virtual screening,classification,data bases,feature selection,computational geometry,graphs,drug development,graph theory,algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要