Automated data classification

user-5fe1a78c4c775e6ec07359f9(2016)

引用 6|浏览32
暂无评分
摘要
A system and method for data classification are presented. A plurality of training tokens are identified by at least one server communicatively coupled to a network. Each training token includes a token retrieved from a content source and a classification of the token. For each training token in the plurality of training tokens, a plurality of n-gram sequences are identified, a plurality of features for the plurality of n-gram sequences are generated, and first training data is generated using the token retrieved from the content source, the plurality of features, and the classification of the token. A first classifier is trained with the first training data, and the first classifier is stored into a storage system in communication with the at least one server.
更多
查看译文
关键词
Data classification,Security token,Classifier (linguistics),Pattern recognition,Computer data storage,Computer science,Artificial intelligence,Automated data,Training set
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要