A novel self-learning architecture for p2p traffic classification in high speed networks
The popularity of a new generation of smart peer-to-peer applications has resulted in several new challenges for accurately classifying network traffic. In this paper, we propose a novel two-stage p2p traffic classifier, called Self-Learning Traffic Classifier (SLTC), that can accurately identify p2p traffic in high speed networks. The first stage classifies p2p traffic from the rest of the network traffic, and the second stage automatically extracts application payload signatures to accurately identify the p2p application that generated the p2p flow. For the first stage, we propose a fast, light-weight algorithm called Time Correlation Metric (TCM), that exploits the temporal correlation of flows to clearly separate peer-to-peer (p2p) traffic from the rest of the traffic. Using real network traces from tier-1 ISPs that are located in different continents, we show that the detection rate of TCM is consistently above 95% while always keeping the false positives at 0%. For the second stage, we use the LASER signature extraction algorithm  to accurately identify signatures of several known and unknown p2p protocols with very small false positive rate (<1%). Using our prototype on tier-1 ISP traces, we demonstrate that SLTC automatically learns signatures for more than 95% of both known and unknown traffic within 3 min.