Approximate matching of persistent LExicon using search-engines for classifying Mobile app traffic

Gyan Ranjan,Alok Tongaonkar,Ruben Torres

IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications（2016）

引用 24|浏览84

暂无评分

摘要

We present AMPLES, Approximate Matching of Persistent LExicon using Search-Engines, to address the Mobile-Application-Identification (MApId) problem in network traffic at a per-flow granularity. We transform MApId into an information-retrieval problem where lexical similarity of short-text-documents is used as a metric for classification tasks. Specifically, a network-flow, observed at an intercept-point, is treated as a semi-structured-text-document and modified into a flow-query. This query is then run against a corpus of documents pre-indexed in a search-engine. Each index-document represents an application, and consists of distinguishable identifiers from the metadata-file and URL-strings found in the application's executable-archive. The search-engine acts as a kernel function, generating a score distribution vis-'a-vis the index-documents, to determine a match. This extends the scope of MApId to fuzzy-classification mapping a flow to a family of apps when the score distribution is spread-out. Through experiments over an emulator-generated test-dataset (400 K applications and 13.5 million flows), we obtain over 80% flow coverage and about 85% application coverage with low false-positives (4%) and nearly no false-negatives. We also validate our methodology over a real network trace. Most importantly, our methodology is platform agnostic, and subsumes previous studies, most of which focus solely on the application coverage.

查看译文

关键词

approximate matching of persistent lexicon using search-engines,AMPLES,mobile-application-identification,MApId classification,information retrieval,network flow,semistructured text document,flow query,metadata file,URL string,fuzzy classification mapping

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要