User anomalies
Going back to the graphs shown in the baselines section, how can we determine anomalies in them?
For censorship detecion, Danezis proposed1 in 2011 that the number of users over time in a jurisdiction is normal if it follows the trends we see in other jurisdictions. To account for weekly patterns an interval of 7 days is considered in the model, and to build typical ratios of connections the 50 largest jurisdictions are used. Given that most of them being in countries without reported mass censorship which helps with defining what a "normal" connection pattern is supposed to look like.
Issues with that model got identified2 early on: e.g. the 7 day delta makes the algorithm trigger after a censorship event got reported and there were still a considerable amount of false positives visible. However, an improved version of Danezis' algorithm taking those and other shortcomings into account didn't get fully developed.
Wright et al. published3 an improved version of anomaly detection in Tor usage that is based on a different approach: away from an event-based one, focusing on a time period instead. For that Wright et al. make use of the Principal Component Analysis (PCA)4 over a 180-day window. They ignore countries whose usage never rises above 100 users (to ignore the high variance in such data) and identify and remove seasonality effects in the data with the help of the Seasonal and Trend Decomposition using Loess (STL) method5.
While focused on detecting censorship persiods as well, the improved anomaly detection algorithm by Wright et al. is meant to be able to detect general user anomalies, too.
Implementation status
We have currently implemented Danezis' algorithm in our metrics infrastructure and make the results available on our website6. This looks like:
XXX
At the same time, we work on replacing Danezis's algorithm with the approach brought up by Wright et al. There are past anomaly/censorship reports available in the infolabe-anomalies7 archives.
-
Danezis, George: An anomaly-based censorship-detection system for Tor, 2011. (https://research.torproject.org/techreports/detector-2011-09-09.pdf) ↩
-
See: https://archive.torproject.org/websites/lists.torproject.org/pipermail/tor-dev/2013-May/004805.html and https://archive.torproject.org/websites/lists.torproject.org/pipermail/tor-dev/2013-May/004832.html for details. ↩
-
https://censorbib.nymity.ch/pdf/Wright2018a.pdf ↩
-
https://en.wikipedia.org/wiki/Principal_component_analysis ↩
-
R. B. Cleveland et al.: STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics 6 (1990), 3–73. ↩
-
https://metrics.torproject.org/userstats-censorship-events.html ↩
-
https://web.archive.org/web/20240226184836/http://lists.infolabe.net/archives/infolabe-anomalies/ ↩