Peter Cai

@PeterCxy

Some random guy out there. en_US / zh_CN

4,323 words

https://sn.angry.im/@PeterCxy
You'll only receive email when Peter Cai publishes a new post

Wireguard with Network Namespace + BitTorrent / Shadowsocks / ...

Backrgound

I have long been running a BT/PT download box on one of my dedicated servers. The reason that I have extremely poor uplink at my home broadband and running any kind of P2P software is simply killing the network. However, putting those software on a server without any protection is a bad idea -- they will happily announce your server IP everywhere, and, * cough *, some nasty things may happen to you, even by just downloading some pretty innocent files. I need at least some kind of protection to avoid leaking the real IP to the torrent world. Using SOCKS5 proxy alone is not the best idea either: Anything in the BT protocol, for example, DHT, can easily leak the IP address, if the BT client itself is not isolated in a way that it can't see the real IP.

This is the same with my personal proxy service. Residing in China, there is basically no way to connect to VPN services abroad directly, even without them being blocked -- ISPs here just throttle UDP traffic in an extreme way, and TCP VPNs are unbearably slow and easily interrupted with RST. Normally we use self-hosted encrypted proxies instead of VPNs to bypass this, usually hosted on cheap VPSes such as Vultr. However, with this way, it is easy to leak the proxy IP (the VPS IP) to software because they can simply record the mapping between the source IP and the account holder. What I need is still another layer of protection -- that I should use a different outbound IP than the server itself.

Unfortunately, enabling VPN on a server is not something as easy as doing it on your own computer. You can't just simply set the default route, because by doing so, access to the server through its main IP will be broken, and you will be left locked, lonely, helpless, outside of the server. Moreover, only enabling VPN is not enough at all, since the public IP is assigned on the primary network device, and it is fairly simple to fetch that address (and many software will actually do this, announcing every possible IP to the public). A full isolation of network is needed, but I do not want to introduce a complete container like Docker, because it seems just way too excessive.

Network Namespace

Luckily, Linux has this implemented for us. The ip-netns(8) tool manages a cool feature brought by the Linux kernel, Network Namespace, which is exactly what we need here. Actually, full container implementations will also leverage this feature to virtualize their network environment, but we are only using the network part here, which is much more lightweight than a container virtualization.

A network namespace is logically another copy of the network stack, with its own routes, firewall rules, and network devices.

So, all we have to do is to find some way to put the VPN tunnel device in a network namespace, and set the default route only in that namespace. There will be nothing but the VPN device and the only default route visible inside the namespace, which is pretty safe for most software not designed to intentionally escape from namespaces.

The Legacy of OpenVPN

Previously I was a user of ProtonVPN, which was a great VPN to use for my purpose (except that it has completely no IPv6 support, I was expecting VPNs to implement IPv6 NAT...). Since it used OpenVPN as its main VPN software, I used to make use of OpenVPN's up and down scripts to enable VPN in network namespaces.

Since OpenVPN is a pretty old and widely-adopted protocol, there are plenty of guides on how to realize this with OpenVPN. What I used was a script here that moves the TUN interface into a network namepsace managed by the script upon finalizing the connection. The script is pretty mature, and works just fine.

However, ProtonVPN is starting to breaking down these days. Though I have no idea, but since some random day, ProtonVPN started to become null routed randomly. I am sure it is not blocked by ISP because I only run it on my VPS outside of China and I really cannot see routes to its IPs in my BGP sessions elsewhere. It just seems to be down without reason. Besides, OpenVPN is much too bloated and sometimes causing problems itself. Since Linus Torvalds has said that Wireguard should be merged into mainline Linux kernel soon, I started to look for an alternative solution based on Wireguard.

Attempt: Wireguard + wg-quick

After some searching I found a pretty good Wireguard VPN provider with both IPv4 and IPv6 NAT support. Wireguard is pretty easy to configure, since the provider will often provide something lie this:

[Interface]
PrivateKey = blahblah
Address = 192.168.x.x/24, fe80::xxx/64
DNS = x.x.x.x

[Peer]
PublicKey = blahblah
AllowedIPs = 0.0.0.0/0,::0/0
Endpoint = x.x.x.x:xxxx

which is normally placed in /etc/wireguard/wireguard-config-name.conf. Such configuration is meant for the tool wg-quick(8). However, this tool doesn't seem to support Network Namespace out of the box. I did a naïve attempt like below:

ip netns add vpn
ip netns exec vpn ip link add dev wireguard-vpn type wireguard
ip netns exec vpn wg-quick my-config-name

...and of course, it failed. Wireguard will also obey the network namespace rules while establishing its underlying sockets, and that was why this failed -- you can't connect to any VPN in a newly-created network namespace without any route. Resolving this by introducing the host network to the namespace didn't seem appealing to me, since it will be very complex to configure and will still potentially leak the real IP.

The Real Solution

After some Google-fu, I found an official document of Wireguard that described an interesting property of the Wireguard driver: it "remembers" the network namespace where it was created.

it remembers the namespace in which it was created. "I was created in namespace A." Later, WireGuard can be moved to new namespaces ("I'm moving to namespace B."), but it will still remember that it originated in namespace A.

WireGuard uses a UDP socket for actually sending and receiving encrypted packets. This socket always lives in namespace A – the original birthplace namespace.

This is exactly what we were looking for! If Wireguard could send its underlying UDP packets in a different namespace than where the Wireguard device is, we can have a completely "clean" network namespace that has only the Wireguard as default route while having Wireguard being able to connect via the original host network!

All we have to do now is, create the Wireguard interface, then apply the configuration, and move it to a newly-created network namespace, then set the IPs, routes etc. We can no longer use wg-quick for this, because the tool is meant for quick configuration and will configure the routes for us in the main namespace (according to AllowedIPs). We have to use a weaker version of it, called wg setconf, instead. Note that we have to comment out the DNS and Address lines in the provided configuration if present, because wg setconf does not support setting DNS and IP address.

I tried with a simple script according to the above procedure

#!/bin/bash
CONFIG_NAME="$1"
DEV_NAME="wg-$CONFIG_NAME"

ip netns add $CONFIG_NAME
ip netns exec $CONFIG_NAME ip link set lo up
ip link add dev $DEV_NAME type wireguard
wg setconf $DEV_NAME /etc/wireguard/$CONFIG_NAME.conf
ip link set $DEV_NAME netns $CONFIG_NAME up

Note that I have set the name of the namespace to be the same as the configuration file name. Run it with ./script.sh wireguard-config-name, and it successfully set up the namespace with the Wireguard device in it. However, the IP addresses was not set because we did not use wg-quick and commented out the Address line in configuration. At this point, I could have simply hard-coded the addresses in the script, but it did not sound like an elegant solution

I did a not-so-elegant-but-better-than-nothing hack, which was to make use of the commented-out Address line: we could simply parse the line (ignoring the #) and extract the addresses from there!

addrs=$(grep -oP "#Address = \K(.*)" /etc/wireguard/$CONFIG_NAME.conf)
IFS=", "; for addr in $addrs; do
  if [[ $addr = *":"* ]]; then
    # IPv6
    ip netns exec $CONFIG_NAME ip -6 addr add $addr dev $DEV_NAME
  else
    # IPv4
    ip netns exec $CONFIG_NAME ip addr add $addr dev $DEV_NAME
  fi
done

Adding this to the previous script, we now have the IP properly assigned to the Wireguard device. Now, we could pretty much do the same with the routes, by extracting them from AllowedIPs, but somehow I decided that it was better to just set the default routes for both IPv4 and IPv6

ip netns exec $CONFIG_NAME ip route add default dev $DEV_NAME
ip netns exec $CONFIG_NAME ip -6 route add default dev $DEV_NAME

Now we are done with the script to set the interface up. Tearing it down is much simpler

#!/bin/bash
CONFIG_NAME="$1"

ip netns del $CONFIG_NAME

Running Systemd Services inside the Namespace

At this point, we can use ip netns exec to run programs inside the network namespace. However, I would like to run systemd services inside it. To fully leverage the abilities of systemd, I decided to first write a service to manage the Wireguard interface in network namespace. Assuming that the up and down scripts described above are placed in /path/to/wg-up.sh and /path/to/wg-down.sh, I wrote a service named wg-netns@.service

[Unit]
Description=Execute Wireguard in network namepsace
After=network-online.target

[Service]
User=root
Type=oneshot
RemainAfterExit=true
ExecStart=/path/to/wg-up.sh %i
ExecStop=/path/to/wg-down.sh %i

[Install]
WantedBy=multi-user.target

Then enable it by systemctl enable wg-netns@wireguard-config-name. Now, we can use systemctl edit some-service to put some-service into the namespace by writing

[Unit]
Requires=wg-netns@wireguard-config-name.service
After=wg-netns@wireguard-config-name.service

[Service]
User=
User=root
ExecStart=
ExecStart=/usr/bin/ip netns exec wireguard-config-name /path/to/the/program

in the editor provided by systemctl edit. Note that this configuration is very generic, and you may need to consult the original service file for the complete command to put in place of /path/to/the/program. Besides, by using such configuration, you are also running the program as root, which can be a security concern and could make some program behave abnormally. You may need to add su -u blah before the actual command (after ip netns exec wireguard-config-name) to switch to the proper user to run your program.

Now you can enable the service as normal. Services configured like this will only start when wg-netns@wireguard-config-name is started, and will restart or stop if wg-netns@wireguard-config-name is restarted or stopped.

One More Thing: Exposing Ports within the Namespace

All the configuration above are perfectly fine if we do not need any service running in the namespace to be accessible to the outside. But for the BT client and the Shadowsocks server, we must at least be able to access their listening TCP port in order to control / use them while retaining the isolation. My solution was to set up a separate veth interface and assign the namespace a separate internal IP address without NAT, so that I can access the ports via the internal IP or forward them to the outside while forbidding the services themselves to break the isolation.

This step is much simpler. We just create a pair of veth devices, put one of them into the namespace, then assign a pair of IPs to each end.

ip link add dev "$CONFIG_NAME"0 type veth peer name "$CONFIG_NAME"1
ip link set "$CONFIG_NAME"0 up
ip link set "$CONFIG_NAME"1 netns $CONFIG_NAME up
ip addr add $PRIVATE_ADDRESS_HOST dev "$CONFIG_NAME"0
ip netns exec $CONFIG_NAME ip addr add $PRIVATE_ADDRESS_CLIENT dev "$CONFIG_NAME"1

...where PRIVATE_ADDRESS_HOST is the internal address to be assigned to the host and PRIVATE_ADDRESS_CLIENT is the address to be assigned to the client. This is normally something like 192.168.1.1. In the script, I actually wrote like

source ${BASH_SOURCE%/*}/ext/$CONFIG_NAME.conf
if $PRIVATE_VETH_ENABLED; then
  ip link add dev "$CONFIG_NAME"0 type veth peer name "$CONFIG_NAME"1
  ip link set "$CONFIG_NAME"0 up
  ip link set "$CONFIG_NAME"1 netns $CONFIG_NAME up
  ip addr add $PRIVATE_ADDRESS_HOST dev "$CONFIG_NAME"0
  ip netns exec $CONFIG_NAME ip addr add $PRIVATE_ADDRESS_CLIENT dev "$CONFIG_NAME"1
fi

..so that you can have a ext/wireguard-config-name.conf (relative to the location of the up script, corresponding to /etc/wireguard/wireguard-config-name.conf) with additional variables about the internal IP which is not related to Wireguard itself

#!/bin/bash
PRIVATE_VETH_ENABLED=true
PRIVATE_ADDRESS_HOST="192.168.123.1/24"
PRIVATE_ADDRESS_CLIENT="192.168.123.2/24"

Correspondingly, you have to do something to tear down the veth pair in the down script

source ${BASH_SOURCE%/*}/ext/$CONFIG_NAME.conf

if $PRIVATE_VETH_ENABLED; then
  ip netns exec $CONFIG_NAME ip link del dev "$CONFIG_NAME"1
  ip link del dev "$CONFIG_NAME"0
fi

You can then set up port forwarding or anything else to this internal IP.

Now you have a complete working setup of Wireguard inside network namespace.

Source code

I have uploaded the source code of my completed setup to https://git.angry.im/PeterCxy/wg-netns.

Troubleshooting a mysterious Mastodon bug: the Accept-Encoding header and federation

The story

As you may all know, I am the administrator of a Mastodon instance, https://sn.angry.im. One thing that is really fun doing this job (and every SysAdmin job) is that you run into different problems from time to time, sometimes without doing anything or sometimes after some upgrade.

Last week, Mastodon v2.4.0 was out and I, along with my friend, admin at https://cap.moe, decided to upgrade to the new release as quickly as possible. Since there was nothing breaking in the new version, it didn't take long before we both finished executing a few Docker commands and restart into the new version. As usual, we tried to post something to ensure that everything works fine after any upgrade, and this is where things started to break.

We first noticed that I cannot see anyone on cap.moe on my home timeline, while he could see everyone from my instance on his timeline. We thought this was a problem of subscription, so we both did a resubscription task in the administrator panel of our Mastodon instances. However, it was not fixed in any way by this. We then tried to mention each other in a toot to find out if it was because a timeline logic error, but it was not. Still, he could see me but I can't see anyone on his instance.

One thing interesting is that, since some other instances, for example, pawoo.net, can see both of our instances' posts, I can simply retoot one of his toots on pawoo and I will receive the toot on my instance in several seconds. I didn't know what this meant, but it was really something 面白い.

Since other mysterious bugs have happened before and just magically fixed themselves after a while, I decided that it was a good idea to leave it alone and see if things go back to normal. Now it is a week after the initial upgrade, and nothing has changed throughout the entire week, and I can't bear a Mastodon timeline without the jokes from fakeDonaldTrump account of cap.moe to fill my spare time anymore. I finally decided to troubleshoot this "bug".

Attempts

My first idea was that it could be caused by some errors in the task queue or something in the database, both of which could be reset by applying an instance block and removing it after everything is cleared from my instance, at least this was what I believed. This, obviously, was not the case. After removing the instance block, everything was still like what they were before. Mastodon provides no support for really removing users anyway, at least in the database. As what the admin of cap.moe said:

This is completely suicide attack.

If you are an administrator, do NEVER attempt anything that works like a suicide attack, because it solves nothing but adds complexity.

The only option left here is to dump all the traffic and see what's going wrong with the requests. As I had already known, the ActivityPub protocol, which Mastodon relies on, uses active pushes rather than passive pulls to distribute messages. Thus, it could be something on my side that prevented the push to succeed. I decied to capture all the traffic by tcpdump and inspect it using Wireshark.

Since all the traffic of my Mastodon instance is HTTPS-encrypted behind a reverse proxy, I could only dump all the traffic between Nginx and the upstream, then feed all of them into Wireshark to filter by HTTP headers. This was a pain, but I eventually did it and figured out something from the traffic: My instance was replying with 401 Unauthorized to the pushes from cap.moe.

A little inspection into the source code indicated that such error is linked to signature verification. Each ActivityPub needs to be signed by an Actor's private key, which can be verified using the public key. I assumed that this could only be caused by database errors -- my database must have stored a different public key from the original one, either by an error in database upgrade or some random cosmos radiation. I checked the public key by

account = Account.find(id_on_cap_moe)
account.public_key

in the Ruby console of Mastodon. I also asked the admin of cap.moe to run the same command with the id on his own instance, and then we compared the output public key. Unfortunately, they are exactly the same -- This can't be the problem either.

The solution

With all the attempts above failed, I decided that I should compare the request of a successful delivery with the failed one. I tried to toot something on pawoo and then toot something on cap.moe, while I kept tcpdump running. After this, I fed them to Wireshark as usual and followed the individual HTTP streams. The Siganture header drew my attention.

This is the header in the failed request

Signature: keyId="https://cap.moe/users/PeterCxy#main-key",algorithm="rsa-sha256",headers="(request-target) user-agent host date accept-encoding digest content-type",signature="ZC4c0wxPRn+RVYTeAaPjEgA3PDW/jHQ3CdUSn3u+mH2HUxsiQV3TV0dObzC4Z9VGOmY0ZE0cbQ9KiketDxPAq99InDnDjJ49aUT6/L0gSXJQlpM4SGGT8VyipkFm/dzoxbJ8jiT9WjcrXwD1/sJV4IvuA0LJs96mRkuexykguSu2PefvS7PTw5ufAxGTWn3YmtvkMeYLBi5V7LUz3xcONe2iqcSO6hKZ77puTvvWJZgfeNxMyoRXyrcrKUSUZhgfR8z7rwPgxvcoigfiL/SH0xrKyBIdO6HjjjuMsTOSa4xRsrGgopowpAx19ya83YiTRdvkO720u3Dy3ZsWifoRCw=="

And in the successful request from pawoo

Signature: keyId="https://pawoo.net/users/PeterCxy#main-key",algorithm="rsa-sha256",headers="(request-target) user-agent host date digest content-type",signature="Esf8TAlrYId7XhP7AKlRdGTz+tWXT+/ehYCrCLKCgx3UWPxnzNBssawr7oG5xPuB1QU/TLw6M09Rp9pd+0+F20GaEVUE2UTLNwKDizDbEj2XmK7RjEE4ys3Md1b8E+d4YbTVnUWqi0WnufUNTrjLCdyPCPHn3fqJ5Bv9/W4aUDF+nFbJAZr2n1cmu6Nb28nhS1PQAz7AzzsZy/Du+R6S3x91OjRMIa7Xt1EgLWH6/TEchUsxiP78QKZIbzIlEca+BhWCQiQ2qjO+VtwNDDypqh9HheNn23iuy4xm6hKwjHiVVkfekbEK47fNRXH5fakhmHmN7Zl813lrotkIGbDrdA=="

Notice that the headers in the failed signature indicated that the accept-encoding header is also signed, while it was absent in the successful request.

Now I knew what was wrong with the Mastodon stuff: I erased the Accept-Encoding header in my Nginx reverse proxy configuration! This was due to the use of sub_filter, since I needed to insert something into the HTML of Mastodon while I was too lazy to modify the source code and re-build the Docker image myself.

The solution seems easy now. Originally, my Nginx configuration included

proxy_set_header Accept-Encoding "";

Since I do still want to use sub_filter for HTML pages, I changed it to

set $my_encoding $http_accept_encoding;
if ($http_content_type != "application/activity+json") {
  set $my_encoding "";
}
proxy_set_header Accept-Encoding $my_encoding;

This erases the Accept-Encoding header except when the content type is application/activity+json, which is used to communicate between Mastodon nodes.

Save and reload the Nginx configuration, everything works fine now.

The cause and more questions

After asking the maintainer of Mastodon, @Gargron@mastodon.social, I figured out where was this problem introduced:

https://github.com/tootsuite/mastodon/pull/7425/commits/4de98db0312de2a45d8f08d6f6611ebc64eed8b1

This pull request added direct support of gzip compression in Mastodon, thus bringing the Accept-Encoding header into the signature. My erasure of this header, obviously, broke the signature check and made all of these happen.

However, these questions are still not answered after all of these:

  1. Why am I only losing federation with some 2.4.0 instances but not all? The pull request seemed to be enabled by default and there should be no way to disable it.
  2. What's the point of including this header in the signature?

I couldn't find the answer on my own, and I decided not to because nothing is wrong now.

And that's it, the process of troubleshooting a mysterious bug.

"Blocklists"

There just really can't be any idea worse than blocklists.

As a Mastodon instance administrator, I've seen the growth and popularization of Mastodon as a decentralized social media, especially after the recent case of data leakage of Facebook. This can't be a better phenomenon as to us, since we have always hoped that people will one day wake up from the dream that large entities, such as governments and companies, would ever protect their freedom and / or privacy. However, while the amount of users and administrators of Mastodon increases, unexpected things also happen, due to the fact that some of the users just followed others to join Mastodon without knowing what they are actually doing. One of these is the emergence of Mastodon blocklists.

I saw such blocklist for the first time on a Mastodon post, which was published as an artical on Telegraph [1]. To be honest, it was really disturbing to me at the first sight, because I was not expecting this to happen so soon on Mastodon -- I was just talking about the possibility of such things happening on Mastodon with my friend that morning. Not surprisingly, this blocklist is, just like every other blocklists I've seen, full of personal prejudice and unjustified / unclear criteria. What's more disturbing is that people are actually requesting Mastodon to introduce auto-subscription to these blocklists [2], with unmanned scripts to download and apply every line in the blocklists published by some unknown and maybe prejudiced guy.

To make it clear, I am personally totally fine with the idea of doamin blocks / account blocks that is present in Mastodon for a long time. These are essential tools for some Mastodon instances to be legal, because instances have different values and different applicable laws. To maintain federation, these differences must be respected. What I am entirely against is to brainlessly take some random guy's blocklist and apply them blindly to your own instance, believing that the list completely correspond to your own value, and thinking that you have avoided a lot of extra work of blocking SPAM / Child Porn / ... instances and accounts.

Once people got the power of "control", they're making there own place where they escape from before, there is nothing new under the sun.

This was the response from my friend @AstroProfundis on this issue.

Truly, there is nothing new under the sun. It has not been long after the case that an activitist on Twitter was blocked by a popular blocklist that everyone just blindly follows [3], and people are fleeing from Twitter and Facebook for their overwhelmingly centralized power, and now people are again building their own centralized kindoms using blocklists, pretending that every instance is still independent even when they are using the same list of blocked users and domains. Well, unless you call them federate laws.

What are we hoping from a federated social media in the first place? Think about it. To me, it's the ability to scatter users into different instances with diverse values and views of the world. It's the possibility that if several instances are compromised or act against what users want, they can simply switch to the others and still get the same happy life as before. It's also the opportunity that every minority group can have their voice conveyed through the entire Fediverse. Sure, instances can each have their own rules of blocking, but they will never affect the Fediverse as a whole, and, as I personally believe, there will never be a consensus so wide that most of the instances will block a particular group of people. And, our lovely well-crafted blocklists will completely ruin these.

I've set up my own e-mail server before, which is a federated protocol with an idea similar to Mastodon, and what I discovered is that, with the blocklists, one will be essentially prevented from doing so if he / she wants the e-mails to be delivered properly to most of the e-mail hosts. These lists, by trusting popular IPs and distrusting unpopular ones, are essentially favoring gigantic hosts that owns the resources to perform complex machine-learning based fancy filtering algorithms on their outgoing e-mails. (Or even filter the outgoing e-mails by hand? Huh.) Moreover, once blocked, the process of disputing and unblocking will be overwhelmingly hard and complex for any individual e-mail host to get through. Yes, there are multiple lists following seemingly different standards. Yes, there are ways you could get yourself unblocked providing that proper justification is given. Will these make any difference? No. Even North Korea says that its people can put up disputes against their jurisdictional decisions -- despite the fact that this would never work.

I really hope that there will be some study on how much of these blocklists reflect their criteria written on paper, without much prejudice. Since there has been none, I can only conclude from my personal experience that such blocklists tend to become prejudiced while growing. This also includes a blockbot that is present recently in the Chinese community of Telegram users, which blocked a bunch of innocent people just for their ideas being in conflict with the maintainer's. Our lovely followers of this bot, without knowing anything, blocked such people from every controllable group.

Blocking is a destructive operation. It should be the last resort following failure to communicate, rather than something to be automated and to be blindly followed. If the maintainers of blocklists call them Hatelists, I will be completely fine for them, since by doing so they are actively informing people that this will include personal ideas, and this is not something to be subscribed to without further thinking. As long as they are still called Blocklists, I would say a big, big "NO" to them.

Dear Mastodon administrators, please always remember that, unless you share the same value with the maintainers of blocklists now, forever and for all the possible foreseeable future, do think twice before you follow someone to block a domain or a user. Do not ruin the Fediverse by your own hands.

Because I really don't know what will be the next Mastodon Fediverse to go to.

References

  1. Blockchain Blocklist Advisory
  2. PR #7059: Domain blocking as rake task
  3. When do Twitter block lists start infringing on free speech?