Metadata is constantly generated during online activities like browsing websites, using social media, visiting cryptocurrency exchanges, and conducting transactions. It’s a vital part of online privacy but one which is often overlooked by users, developers and service providers alike. Metadata privacy is a particular issue in web3.
Although decentralization is a worthy goal to strive towards, the reality is that privacy issues in decentralized systems are even more important. In web2, Google and Facebook can see all your data and metadata (bad), but in web3 potentially anyone can see it (even worse)! Here’s why it’s crucial to protect your metadata, regardless of whether you have something to hide.
What exactly is metadata?
Most of us know we need to be careful about sharing personal data, but we often assume that the process of transmitting data online is secure thanks to modern security practices like end-to-end encryption. However, this is a common yet significant misconception here.
Encryption protects the actual content of the data being transmitted — your messages or your bank details, for example — but it does nothing to safeguard the accompanying metadata, which is the information about the data being sent. While this might not seem important, this metadata contains many sensitive details about you, such as:
- Identity: Information about who you are.
- Communication partners: Details about who you are interacting with.
- Timing: Timestamps indicate when the interactions occur.
- Geolocation: Location data that reveals where both parties are located.
- Data volume: Insights into the amount of data being exchanged.
While the importance of protecting who you are and who you talk with is fairly obvious, the others may seem irrelevant, especially details like data size. But these all paint a picture of your habits in potentially alarming ways. Your location can tell people where you live and whether you’re home right now or not. Data size and timing can be used to identify particular devices, especially in the world of wearables or Internet of Things (IoT) devices. Do you want everyone to know that you’ve got a particular home security system protecting your home? Or pacemaker protecting your heart? What about your family members, who may not know so much about online security? All of this matters.
Why is it crucial to protect your metadata?
Individually, metadata may not seem sensitive, but each online interaction generates numerous pieces of publicly accessible or easily discoverable metadata. Accumulating enough metadata allows someone to create a comprehensive profile of your online and offline activities.
Armed with this profile, hackers can attempt to access your accounts or conduct spearfishing attacks. Governments and companies can link you with information you thought was anonymous and potentially discriminate against you or block access to services entirely.
What if you have nothing to hide?
You know you have nothing to hide, but companies and governments don’t. And they’re notoriously paranoid. In fact, having access to the metadata but not the data (thanks to encryption) can be the worst of all worlds because it leaves people free to speculate what the encrypted data hides.
Imagine an insurance company can see (from metadata) that you’re using a blood pressure app but not (thanks to encryption) the data that shows your blood pressure is great. They might assume that it’s bad because people who use such apps are more likely to be those with issues. The problem isn’t always that the metadata necessarily reveals reality — it’s that it paints an outline of a profile that people are free to fill in as pessimistically as they like.
“That still doesn’t sound like it applies to me”…
Here’s a crypto example. If you’re like most people, you don’t just have one crypto wallet: you have multiple wallets, designed to store tokens for different types of purpose. Some are for interacting with exchanges, others for cold storage, maybe some NFTs or degen trading accounts.
Now anyone who has been in crypto for a while knows that all on-chain information is public. If you want to keep those accounts anonymous, you need to make sure your exchange account (where you’ve done full KYC) doesn’t interact with your degen account. You might think that’s easy enough, and that you just have to ensure to never send tokens between them. Unfortunately, this assumption is incorrect.
Every crypto interaction creates metadata: every time you log into your wallet, or your exchange, or an NFT marketplace, you’re creating timestamps and leaking your geolocation data, IP address and more. By gathering all this data, the service providers that make crypto work can link together your addresses, even if they’ve never interacted on the blockchain at all.
Can we stop creating metadata or make it private?
Unfortunately, preventing metadata creation or ensuring complete privacy isn’t simple. Metadata plays a vital role in how the Internet works, originating from a time when its vast growth and malicious potential weren’t anticipated. The internet works because computers can communicate with each other, and this communication requires identifiers like IP addresses and timestamps. Imagine trying to run a delivery network without postal addresses: it would be impossible.
But while it’s impossible to shut off the metadata faucet, that doesn’t mean things can’t be better. Companies and services routinely collect and store your metadata without explicit consent, including ISPs, telecom companies, DNS servers, and CDNs like Cloudflare.
This problem is exacerbated by the business models which drive the so-called web2 — the internet dominated by huge, centralized entities such as Google, Facebook, and Amazon. Moving to decentralized crypto-based approaches — known as web3 — provides a potential way out, but it’s not a magic bullet. In fact, today’s web3 is as bad as web2.0 for metadata privacy, if not worse.
Why is web3 not better than web2?
In the era of web2, data is akin to gold, driving business models and incentivizing companies to gather as much data as possible. While the web3 movement has emerged in response to the flaws of the current Internet, many crypto services still raise significant privacy concerns similar to those prevalent in web2. In particular, lax handling of IP addresses creates a wide array of privacy leaks.
Consider the following examples of metadata leaks that occur when connecting to crypto services:
- MetaMask linkability: When you connect to MetaMask, a single call (eth_call) reveals the balances of all your addresses. Moreover, the RPC provider gains knowledge of your IP address.
- NFT Front-running: Bidding on an NFT reveals your IP address, wallet address, the specific NFT you intend to purchase, and your bid amount.
- DEX MEV: Using a DEX like Uniswap requires multiple RPC requests, sending data about your balances, token pairs for swapping, price details, and slippage to the RPC provider before broadcasting your transaction
This problem can be solved, but both web3 services and users must prioritize privacy. The ongoing collection of metadata demands heightened awareness and protective measures in crypto services. As users become more aware of this issue, web3 creators will need to safeguard metadata privacy in order to stay competitive.
What can you do to ensure metadata privacy?
Unfortunately, there’s currently no fool-proof way to keep your metadata secure and ensure your privacy. Tools like VPNs and basic online security practices like frequently changing your passwords can help, but ultimately this is down to technology builders to recognize the extent of the issue and fix it.
One way to help is to be vocal about the desire for privacy. There are many great privacy technologies being developed, but developers often feel little incentive to integrate them. Even though most people say they care about privacy and web3 values, platforms won’t undertake the hard work to make their services fully private unless they know that it’s a dealbreaker for users. We all need to work together to ensure that web3 can live up to its potential. That means placing privacy on the same footing as other values like decentralization and freedom.
Frequently asked questions
What’s the difference between data and metadata?
Doesn’t crypto fix privacy?
Are VPNs useful for privacy in web3?
About the author
Sebastian Bürgel builds technical solutions that empower the individual. As founder of the private data exchange infrastructure HOPR, he contributes to establishing full stack privacy for web3. He also co-founded two other technology startups: Validity Labs (blockchain education and services) and Sonect (fintech). Sebastian holds a PhD in microtechnology from the Swiss Federal Institute of Technology, ETH Zurich.