Someone recently asked me if I knew how to find where Tor-hidden services were really hosted. I identified a few possible methods for finding the origin servers, but none of them worked universally – or even in most situations. Eventually, I did find one way to definitively locate an origin server. However, that method is not trivial – and is still just theoretical.
First, I found the following entry on Tor’s webpage: “If your computer isn’t online all the time, your hidden service won’t be either. This leaks information to an observant adversary.” The following idea then came to mind: Let’s say you have a small army of bots (probably a dozen or so are necessary for the sake of redundancy; basically, the more bots you use, the better) connected to Tor. You’d then need to feed something – like the Internet Health Report – into a central database that the de-cloaking bots can monitor.
Because the Internet can be flaky and regularly has minor outages – sometimes related to routing, and sometimes related to a simple lack of power it’s easy (if you have time) to determine if an outage is the cause of a problem, even on robust cloud infrastructures. Furthermore, some companies (e.g., Keynote) already specialize in tracking outages for you.
De-cloaking begins with a few of your robots doing regular polling to make sure your service remains online. This polling is essential for performing tests. When you do discover an outage on the Internet, you should immediately have your robots from Tor nodes around the world attempt to contact the server in question. If just a few of the bots are blocked, it’s likely that they are either just transiting the “broken” network or that the bot is itself on this “broken” network.
However, if none of your bots can reach the service in question, there’s a good chance that you’ve found the part of the Internet that’s currently broken. One caveat is that if all of the Introducer nodes lie beyond the path of the disruption it may give a false positive, but this is unlikely unless the outage is extremely close to where the polling robots are, or the outage is extremely large. So false positives are a real possibility, although not enough of a deterrent to make this attack un-viable.
This same “contact the server in question” technique can reveal other additional granular/smaller breakages by monitoring for outages within a specific network, then monitoring down to the data center, and possibly even down to the subnet. At the subnet level you’re monitoring a small enough set of machines that one could at least theoretically cause selective minor outages (even a few seconds could do the trick) by using a wide variety of denial-of-service attacks to find the one machine that, when attacked, also makes your bots unable to access the site at exactly the same time the site you are monitoring becomes unresponsive.
Alternatively, if the IP range is small enough, a government agency could simply watch the wire for Tor traffic. That method, however, is painstaking and requires physical interception, and may require a lot of traffic analysis. However, this method could work.
Theoretically, you also could speed up the de-cloaking by looking at the date stamp in the HTTP response of the hidden service. If that service is listening on port 80, you could simply check the dates and then ignore the ones that fail to match the correct time zone/clock skew. Then, unless the problem is deliberate tampering, you’d almost certainly and much more quickly know what’s causing the outages. That is, unless the hidden services are within a VM that fails to use NTP (network time protocol), while the parent does use NTP, or unless both dates were set by hand.
Overall, using a time-stamp to improve de-cloaking is risky, because it could also be a ‘red herring’ – a tricky method used by a hidden Tor service administrator to hide the service further. A similar technique has been discussed before using clock skews of each Tor node and validating that it matches the Tor hidden service to find the origin server. But using clock skews or time-stamps assumes that the hidden service is not within a VM on the host machine, which is a real possibility; therefore, this may not always work.
The concept of a Tor hidden service using multiple machines with the same Tor private key to create a “load balancing” effect to thwart this de-cloaking attack has two issues. The first is that apparently in practice the failover effect can take hours, not seconds. The second is that depending on how the data is mirrored between the two hidden services, it may be extremely easy to tell which server you are communicating with. If something like rsync is used in favor of NFS to mirror content, the inodes on disc and timestamps will be different, leading to different eTag fingerprints and different Last-Modified time stamps, which can be discerned simply by looking at the HTTP headers.
Admittedly, what I’m describing here is just a theoretical attack. A large part of this attack is simply passive recon tied in with some generic polling techniques. However, that is a minor barrier for determined adversaries. This is an attack method that could make it significantly more difficult to perfectly hide a Tor-hidden service from a sophisticated adversary using today’s technology without significant forethought or planning.
Therefore, it is probably unwise without taking additional precautions to run a Tor-hidden service that relies entirely on IP anonymity for safety.