Scope
Anomaly and outlier detection in computer networks is a large topic1 2. We need to scope it for our purposes along several dimensions as the Tor network is an overlay network on top of the Internet, that is on top of computer networks. As such it comes with its own constraints and requirements preventing us from applying standard anomaly detection methods tailored towards computer networks 1:1 to the Tor network.
-
The Tor network is made up of thousands of relays run by volonteers around the world, yet it would be too short-sighted to just look at the state of particular relays in order to determine the health of the network. We might need to take e.g. Tor client behavior, interactions between relays in a circuit and directory authority health into account as well in order to be able to decide whether we are confronted with (potentially dangerous) anomalies or not.
-
We are concerned with the live Tor network. Anomaly detection in self-built Tor networks (like Cui et al.3 were focusing on when investigating the measurement of anonymity in the Tor network) is out of scope.
-
We focus on particular areas of data collection and metrics under the assumptions that not all possible areas are equally important and potential anomalies often surface in seemingly unrelated data sets. Thus, focusing on covering important data (sub-)sets should give us a good general indicator of potential anomalies in the network. Once we've picked the particular areas of interest we'll be summarizing the state of knowledge/research with recommendations on which algorithms to pick for detecting anomalies in the respective area. In case there are areas important to us which are less well-studied in the literature and thus no algorithms/code available, we might be able to recommend some based on our experience and already existing implementation in our infrastructure. Even though those solutions are not vetted by independent researchers having them available is strictly better than no anomaly detection happening whatsoever.
-
Data collection and metrics at Tor are crucial for a number of purposes. We've outlined what they mean in our anomaly detection context. It's important to keep in mind that it's possible that data collection gets corrupted at any point in time resulting in measurements gaps and potentially misleading data, which in turn might influence anomaly detection both in creating false negatives and false positives.
-
Hodge, V.J. and Austin, J.: A survey of outlier detection methodologies. In: Artificial Intelligence Review, 22 (2), 2004, pp. 85-126. ↩
-
Sherenaz Al-Haj Baddar, Alessio Merlo, and Mauro Migliardi: Anomaly Detection in Computer Networks: A State-of-the-Art Review. In: Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA), vol. 5, no. 4, pp. 29-64, December 2014. ↩
-
Cui, J., Huang, C., Meng, H. et al.: Tor network anonymity evaluation based on node anonymity, In: Cybersecurity 6:55, 2023. ↩