Optimizing capacity-heterogeneous unstructured P2P networks for random-walk traffic

Peer-to-Peer Computing(2009)

引用 3|浏览22
暂无评分
摘要
Existing algorithms for utilizing high-capacity nodes in heterogeneous P2P systems (e.g., (4), (13), (20)) often re- quire unrealistically large node degree and high mainte- nance overhead in P2P networks with highly diverse node capacities and high churn. In this paper, we propose an unstructured P2P system that addresses these issues. We first prove that the overall throughput of search queries in a heterogeneous network is maximized if and only if traf- fic load through each node is proportional to its capacity. We then propose a system that achieves this traffic distribu- tion by biasing search walks using the Metropolis-Hastings algorithm (5), (12) without requiring any special underly- ing topology. We finish the paper by comparing our method with Gia (4), where we find in simulation that the former outperforms the latter under all studied conditions, two novel saturation metrics introduced in this paper, and such end-to-end parameters as query success rate, latency, and query-hits. Recent measurement studies (e.g., (15)) show that P2P overlays contain nodes of varying capacity, which is a gen- eral term describing the ability of each user to router traffic, store object replicas, and answer queries. If a P2P system is not capacity-aware, slow nodes may become overwhelmed by messages even when many high-capacity peers in the network are still under-utilized. Therefore, node hetero- geneity plays a key role in the design and future success of unstructured P2P systems. In this paper, we propose a set of metrics for measuring the throughput of heterogeneous P2P networks under random-walk traffic, create algorithms for maximizing search and replication ability of peers, and evaluate their performance in simulations. We start with ba- sic definitions and assumptions. 1.1 Assumptions In this paper, we assume a general unstructured P2P network that utilizes random walks (as opposed to flood- ing (6), (22) or some hybrid methods (3), (23)) for finding neighbors, searching for content, and replicating existing file pointers. The reason for using random walks is their ability to achieve arbitrary stationary distributions (see be- low) across existing nodes in the system and clear bounds on overhead. The capacity Ci of user i determines its ability to process incoming messages, where excess traffic is back- logged at each user in some infinite queue. Also assume that joining users, as well as peers seeking new neighbors, perform so-called build walks with TTL kb ‚ 2 and select peers at the end of these walks as their neighbors. Note that the transition probability of these random walks influences the topology of the resulting overlay. Once the graph is built, search walks are then used for discovering the desired content. Nodes looking for certain files start several random walks with TTL ks ‚ 2 and ex- amine the content of each peer through which these walks pass. This query is called successful if any of the search walks started by a node passes through at least one user that shares the desired file or knows which peer does. The number of found users holding the file is called query-hits, which determines the download's ability to be parallelized and its robustness against failure. To increase query success rates and achieve redundancy, nodes replicate pointers to their shared files to other peers, as long as the number of replicas stored at each node i does not exceed its capacity Ci. When publishing their content into the system, nodes start a replication walk with TTL kr ‚ 2 and select one or more nodes along the walk to hold pointers to all of their content. After replication, any query for this content can also be answered by one of its replicas thereby improving the query success rate of the P2P system. A well-designed P2P system should possess build, search, and replication components that guarantee a high network-wide throughput under load and achieve good per- formance using such end-to-end parameters as query suc- cess rates, query-hits, and latency to find the first result. Next, we discuss the challenges involved in designing these three components and then outline our proposed system to address these issues. 1.2 Challenges
更多
查看译文
关键词
peer-to-peer computing,query processing,search problems,telecommunication traffic,Metropolis-Hastings algorithm,capacity-heterogeneous unstructured P2P network,end-to-end parameters,heterogeneous P2P systems,heterogeneous network,high maintenance overhead,high-capacity nodes,large node degree,query success rate,random-walk traffic,saturation metrics,search queries,search walks,traffic distribution,traffic load,unstructured P2P system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要