A taxonomy of hosting options, for personal data security

People often talk about “self-hosting” as a singular concept when in reality it’s a spectrum of options with very different trade-offs. In this post I propose to set out a more useful classification and describe some of the pros and cons of each. The scenario I have in mind is where you want to keep some data private to yourself, maybe across multiple devices, and perhaps share that data with specific people. Concretely: cloud file storage, email; that sort of thing.

In my mind the spectrum looks like this:

  1. Big Cloud Providers
  2. Small Operators
  3. Self-Managed Server
  4. Server with Physical Custody
  5. Self-Managed VPN
  6. LAN or Peer-To-Peer
  7. Non-Digital

Big Cloud Providers

This is the maximum convenience option: the iCloud, the M365, the Google. The experience is polished, your data syncs everywhere, and good desktop and mobile software is available. Your security is actively guarded by the best in the business. You’re riding alongside a lot of individuals and organisations who are more powerful than you. If something goes wrong you’ll have loud voices on your side.

On the flip side, you put 100% trust in your host and the legal framework under which they operate. If laws compel them to reveal your data or provide a backdoored update to your device, then they will do it. The Five Eyes agencies know that this is where most of the goodies are so they will put maximum effort into accessing that data (c.f. PRISM—more about that in a minute). For this option you should be confident that the government isn’t interested in snooping on your digital life, or that if they do they won’t find any reason to bother you—now or in the future.

Small Operators

Instead of a household name you might pin your hopes on a plucky underdog like Migadu or Fastmail for email, or one of the commercial NextCloud providers. The most tangible benefit is flexibility: they offer power-user features that the big shots don’t feel are worth the effort, or their support involves actual humans, or they enable you to use a completely open-source software stack if that’s your thing. A lesser benefit is that they are a smaller target. If you are a surveillance apparatus or a criminal actor then it’s more lucrative to go after the bigger operators if you can.

The main downside to the small operator is that they have fewer resources—they may not have the same QA or depth of security monitoring as the big players, and loose cannons among their staff may have more agency to snoop on your data. They don’t have the cash reserves to respond to sustained legal pressure, for example if the government makes a request that they believe is unlawful. If it relies on individuals or small numbers of people then they may be vulnerable to illegal actions too such as hacking of their workstations, or threats/coercion.

Unless you benefit from specific things that you can’t get from the big players, like customisability or the ability to use particular software, then I find this category a hard sell. Some folks value provider diversity, but for today’s analysis I’m taking a selfish view of your own security.

Self-Managed Server

In this scenario you are running your own VPS or dedicated server located in a datacentre where you don’t have access. This is similar to a Small Operator but with additional obscurity benefits. Because your server is a relatively unique snowflake it is less easy to automate data extraction and it’s relatively unlikely that you would be swept up in passive surveillance.

The main downside is that you are now responsible for system administration. You have to secure your OS, your services, install software updates, and protect yourself against attacks from the open internet. Let’s be real: 99% of people using VPSes wouldn’t have the faintest clue if they were popped by a clever adversary using a 0-day. You don’t need to be a high-profile target—using tools like Shodan, all IP addresses hosting a service of a particular version can be targeted via scripted attacks.

The stakes are high if this is the email server that receives your password resets, or the NextCloud server with your medical records and the notes containing your innermost thoughts. Meanwhile you’re still completely owned by the people who administer the facility. Hopefully they have good policies and practices.

Server with Physical Custody

Now we move to servers which are physically located in your home or business but available to the world. This is a powerful combination. Provided your system administration is up to scratch you can have high levels of security, confidence that nobody is casually snooping through your private data, and the convenience of data sync on your laptop and mobile while you’re out and about.

Like the VPS option you are responsible for network-layer security and security updates. You must also perform secure off-site backups. You may now need a static IP, which often comes with a monetary cost and also an anonymity cost—unless you have a multihomed router your general browsing/torrenting/whatever is now coming from a long-lived IP, making tracking and advertising trivial. It’s likely that your hosted services reveal your real-life identity so now every website you visit can figure out who you are if they want to.

If you are going for this option I strongly suggest getting an additional IP/subnet for your server if your ISP allows it, or otherwise proxying your (encrypted) server traffic via a VPS/VPN on a static IP.

Self-Managed VPN

If you’re worried that you can’t stay ahead of attacks from the open internet then you might put your services within a VPN. This could involve third-party infrastructure like ZeroTier, Tailscale or SyncThing relays, or you might host your own WireGuard or OpenVPN server. This is really very secure—an adversary must gain access to both the VPN and the service to compromise your data. If you get sloppy with your updates then it’s probably no big deal, since nobody can reach those ports in the first place.

This is starting to get seriously inconvenient. Every device that wants to connect to one of these services needs to be on the VPN, probably all the time. You might be tempted to take a shortcut and use unencrypted HTTP for your services but now you have the risk of being spoofed on a hostile network when your VPN is disconnected and sending your bearer token to an attacker in the clear. Credentials or certificates for the VPN must be managed, and what’s your plan for noticing if someone managed to steal some of those creds?

LAN or Peer-To-Peer

Perhaps you feel any sort of internet exposure is too risky, including a VPN. You’ve decided that some degree of physical proximity is a hard requirement for your data to be accessed. This really does sidestep a lot of threats. SyncThing is a great example of this, or the extensive USB sync capabilities that you have between macOS and iOS devices. If someone wants your data they’re either going to need a warrant for a physical search, or to come and use a vulnerability within WiFi range.

Of course, the downside is convenience. Unless your device is given the opportunity to sync you may be working with stale data. If you have multiple devices or people moving in and out of range then there will be conflicting edits that must be resolved. Much less software supports this mode of operation.

Non-Digital

We shouldn’t neglect meatspace. Much of my communication is with people I regularly see in person. It’s only thanks to a quirky fossil-fuel-driven orgy of industrial innovation that when I’m talking to someone in the next suburb, those messages are being relayed via a datacentre on the other side of the planet. Communication face-to-face or via paper or USB stick avoids basically all of the threats discussed so far and it is practical in some cases.

Physical communication is not without its threats of course. Tradecraft has been around forever but modern things to look out for include 5G phones providing high-accuracy tracking of your location and the increasing and consolidated use of security cameras to follow your activities when you leave your home.

Conclusion

You have a range of ways to store and share your data, particularly under the banner of “self-hosting”, and the threats associated with each method can be quite different. You should carefully consider your own risk profile and choose accordingly. It is easy to find polemics in favour of a particular approach but you should weigh it up against your own circumstances.

A side note on PRISM

Nearly ten years later, I’ve been reviewing the reports about PRISM from Edward Snowden’s leaks, along with the statements made by tech companies at the time. The implication of the reporting is that Yahoo, Microsoft, Google, Apple, and so on were coerced into providing their users’ data in a way that it could be queried or observed directly by the NSA. These companies all rejected knowledge of the programme. The conspiracy goes that they have to say that because they can’t legally reveal the programme’s existence.

After thinking about this some more I’ve concluded that these organisations were probably telling the truth. There is a relatively simple explanation for what happened, discussed in one of WaPo’s articles: NSA was tapping the private fibre-optic networks of these big tech companies and sucking out all the data they could without the targets’ knowledge. This makes perfect sense—it’s a huge operational risk to have civilians in these companies aware of what’s going on, even if they’re legally obligated to keep it secret. NSA’s presentation shows “dates when PRISM collection began for each provider”, which is entirely consistent with that data being collected without their consent.

From a report in the Guardian:

Mike Hearn, who says he worked for two years on the networks that replicate Google data between its different computing centres, says that “GCHQ [the British surveillance centre] turns out to be even worse than the NSA [the US National Security Agency]”. He added that he joined an American colleague, Brandon Downey, “in issuing a giant fuck you to the people [at the NSA and GCHQ] who made these slides”

Google is understood to be working on “forward encryption” for its private network so that communications even over its private leased lines would be unintelligible to anyone without the “keys” to decrypt it.

Since then, large tech companies have begun publishing transparency reports that show how many requests they have had to field from the government to reveal customer data or metadata. When I look at the number of requests made to Microsoft for example, in my view this is not unreasonably high, realising that there is going to be a certain amount of crime backed by evidence, credible suicide risks, and other emergencies.

This doesn’t detract in any way from Snowden’s disclosures—I think the NSA was power-crazy to do that kind of thing, not that they care what I think as a foreign citizen—but there is a persistent meme that the big tech companies all willingly signed up for this, which I don’t think is true. I said above that “big cloud” will follow the law. They will. But I retain a certain amount of optimism that if anything of this scale is asked of them then they will put up a fight, or some individuals within the company will.