Efficient Neighbor Selection in Self-Reorganizing P2P NetworksRobert Beverly, Mike Afergan & Karen SollinsIntroductionSelf-reorganization and community formation in Peer-to-Peer (P2P) overlays offers the potential to address diverse outstanding research problems including scalability, incentives and lookup efficiency [1,2]. The premise of nodes reorganizing by adding and removing peer connections to find a desirable set of neighbors is intuitively appealing. However, such distributed reorganization often requires prohibitively expensive or intractable communication between a large number of nodes. In this work, we investigate machine learning techniques to address two specific problems nodes face during reorganization: i) minimizing induced load and ii) finding nodes most likely to answer future queries. In determining the network topology, nodes effectively make a per-neighbor classification decision: to connect or not to connect. Minimizing the induced load on other nodes maps well to a feature selection problem. Our initial analysis, based on a 72-hour trace of actual live Gnutella data, examines two questions: what machine learning models are most applicable to our problem and given each, how one should perform feature selection. We also investigate how the number and complexion of the training samples affects success rates. We show, using live Gnutella traces, the ability to find a small number of queries with high discrimination power and build an effective classification model using these features. Approach:Our previous research investigated self-reorganizing networks where selfish, utility-maximizing nodes connect only to those nodes for which the benefit (derived from obtaining files) exceeds the cost (experienced when satisfying queries). This presents a decision problem. Figure 1. shows node k with potential connections to i and j. k derives benefit from connecting to i but not in connecting to j. Further, the query load from i is low while the load from j is high. Thus, k will prefer to connect to i.
Since a link is formed only if both nodes benefit from maintaining that link, this presents a paradox. The only way to learn about another node is to issue queries, but issuing queries makes a node less desirable. Further, during this learning period, nodes may have undesirable links. We formulate this problem as one of feature selection. Feature selection is traditionally used to reduce the computational complexity of performing classification. We however use feature selection in a novel way. In particular, we are attempting to use feature selection to minimize the number of queries, which equates to network traffic and induced query load. As depicted in Figure 2, we view each potential Gnutella query as a feature whose value is the answer (yes or no) to the query. For example, "Dave Matthews" is a feature which is yes for node i iff node i has Dave Matthews files. As such, answering the question, "How should node k efficiently query potential neighbors?" reduces to a problem of feature selection. We seek to select a small number of queries such that we can still efficiently determine to whom the node should connect.
We use the utility function in conjunction with our data set to train our classifiers and then to evaluate test decisions from the classifier. ResultsThe major findings of our work thus far are:
Consequently, we hope this work serves as a step forward for research in the space of practical self-reorganization systems and algorithms. References[1] K. Sripanidkulchai, B. Maggs and H. Zhang. Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems. In Proceedings of IEEE INFOCOM, 2003. [2] Y. Chawathe, N. Lanham, S. Ratnasamy, S. Shenker and L. Breslau. Making Gnutella-like P2P systems scalable. In Proceedings of ACM SIGCOMM, 2003. |
||||
|