Peter Cai

@PeterCxy

Some random guy out there. en_US / zh_CN

2,189 words

https://sn.angry.im/@PeterCxy
You'll only receive email when Peter Cai publishes a new post

Troubleshooting a mysterious Mastodon bug: the Accept-Encoding header and federation

May 28, 2018

The story

As you may all know, I am the administrator of a Mastodon instance, https://sn.angry.im. One thing that is really fun doing this job (and every SysAdmin job) is that you run into different problems from time to time, sometimes without doing anything or sometimes after some upgrade.

Last week, Mastodon v2.4.0 was out and I, along with my friend, admin at https://cap.moe, decided to upgrade to the new release as quickly as possible. Since there was nothing breaking in the new version, it didn't take long before we both finished executing a few Docker commands and restart into the new version. As usual, we tried to post something to ensure that everything works fine after any upgrade, and this is where things started to break.

We first noticed that I cannot see anyone on cap.moe on my home timeline, while he could see everyone from my instance on his timeline. We thought this was a problem of subscription, so we both did a resubscription task in the administrator panel of our Mastodon instances. However, it was not fixed in any way by this. We then tried to mention each other in a toot to find out if it was because a timeline logic error, but it was not. Still, he could see me but I can't see anyone on his instance.

One thing interesting is that, since some other instances, for example, pawoo.net, can see both of our instances' posts, I can simply retoot one of his toots on pawoo and I will receive the toot on my instance in several seconds. I didn't know what this meant, but it was really something 面白い.

Since other mysterious bugs have happened before and just magically fixed themselves after a while, I decided that it was a good idea to leave it alone and see if things go back to normal. Now it is a week after the initial upgrade, and nothing has changed throughout the entire week, and I can't bear a Mastodon timeline without the jokes from fakeDonaldTrump account of cap.moe to fill my spare time anymore. I finally decided to troubleshoot this "bug".

Attempts

My first idea was that it could be caused by some errors in the task queue or something in the database, both of which could be reset by applying an instance block and removing it after everything is cleared from my instance, at least this was what I believed. This, obviously, was not the case. After removing the instance block, everything was still like what they were before. Mastodon provides no support for really removing users anyway, at least in the database. As what the admin of cap.moe said:

This is completely suicide attack.

If you are an administrator, do NEVER attempt anything that works like a suicide attack, because it solves nothing but adds complexity.

The only option left here is to dump all the traffic and see what's going wrong with the requests. As I had already known, the ActivityPub protocol, which Mastodon relies on, uses active pushes rather than passive pulls to distribute messages. Thus, it could be something on my side that prevented the push to succeed. I decied to capture all the traffic by tcpdump and inspect it using Wireshark.

Since all the traffic of my Mastodon instance is HTTPS-encrypted behind a reverse proxy, I could only dump all the traffic between Nginx and the upstream, then feed all of them into Wireshark to filter by HTTP headers. This was a pain, but I eventually did it and figured out something from the traffic: My instance was replying with 401 Unauthorized to the pushes from cap.moe.

A little inspection into the source code indicated that such error is linked to signature verification. Each ActivityPub needs to be signed by an Actor's private key, which can be verified using the public key. I assumed that this could only be caused by database errors -- my database must have stored a different public key from the original one, either by an error in database upgrade or some random cosmos radiation. I checked the public key by

account = Account.find(id_on_cap_moe)
account.public_key

in the Ruby console of Mastodon. I also asked the admin of cap.moe to run the same command with the id on his own instance, and then we compared the output public key. Unfortunately, they are exactly the same -- This can't be the problem either.

The solution

With all the attempts above failed, I decided that I should compare the request of a successful delivery with the failed one. I tried to toot something on pawoo and then toot something on cap.moe, while I kept tcpdump running. After this, I fed them to Wireshark as usual and followed the individual HTTP streams. The Siganture header drew my attention.

This is the header in the failed request

Signature: keyId="https://cap.moe/users/PeterCxy#main-key",algorithm="rsa-sha256",headers="(request-target) user-agent host date accept-encoding digest content-type",signature="ZC4c0wxPRn+RVYTeAaPjEgA3PDW/jHQ3CdUSn3u+mH2HUxsiQV3TV0dObzC4Z9VGOmY0ZE0cbQ9KiketDxPAq99InDnDjJ49aUT6/L0gSXJQlpM4SGGT8VyipkFm/dzoxbJ8jiT9WjcrXwD1/sJV4IvuA0LJs96mRkuexykguSu2PefvS7PTw5ufAxGTWn3YmtvkMeYLBi5V7LUz3xcONe2iqcSO6hKZ77puTvvWJZgfeNxMyoRXyrcrKUSUZhgfR8z7rwPgxvcoigfiL/SH0xrKyBIdO6HjjjuMsTOSa4xRsrGgopowpAx19ya83YiTRdvkO720u3Dy3ZsWifoRCw=="

And in the successful request from pawoo

Signature: keyId="https://pawoo.net/users/PeterCxy#main-key",algorithm="rsa-sha256",headers="(request-target) user-agent host date digest content-type",signature="Esf8TAlrYId7XhP7AKlRdGTz+tWXT+/ehYCrCLKCgx3UWPxnzNBssawr7oG5xPuB1QU/TLw6M09Rp9pd+0+F20GaEVUE2UTLNwKDizDbEj2XmK7RjEE4ys3Md1b8E+d4YbTVnUWqi0WnufUNTrjLCdyPCPHn3fqJ5Bv9/W4aUDF+nFbJAZr2n1cmu6Nb28nhS1PQAz7AzzsZy/Du+R6S3x91OjRMIa7Xt1EgLWH6/TEchUsxiP78QKZIbzIlEca+BhWCQiQ2qjO+VtwNDDypqh9HheNn23iuy4xm6hKwjHiVVkfekbEK47fNRXH5fakhmHmN7Zl813lrotkIGbDrdA=="

Notice that the headers in the failed signature indicated that the accept-encoding header is also signed, while it was absent in the successful request.

Now I knew what was wrong with the Mastodon stuff: I erased the Accept-Encoding header in my Nginx reverse proxy configuration! This was due to the use of sub_filter, since I needed to insert something into the HTML of Mastodon while I was too lazy to modify the source code and re-build the Docker image myself.

The solution seems easy now. Originally, my Nginx configuration included

proxy_set_header Accept-Encoding "";

Since I do still want to use sub_filter for HTML pages, I changed it to

set $my_encoding $http_accept_encoding;
if ($http_content_type != "application/activity+json") {
  set $my_encoding "";
}
proxy_set_header Accept-Encoding $my_encoding;

This erases the Accept-Encoding header except when the content type is application/activity+json, which is used to communicate between Mastodon nodes.

Save and reload the Nginx configuration, everything works fine now.

The cause and more questions

After asking the maintainer of Mastodon, @Gargron@mastodon.social, I figured out where was this problem introduced:

https://github.com/tootsuite/mastodon/pull/7425/commits/4de98db0312de2a45d8f08d6f6611ebc64eed8b1

This pull request added direct support of gzip compression in Mastodon, thus bringing the Accept-Encoding header into the signature. My erasure of this header, obviously, broke the signature check and made all of these happen.

However, these questions are still not answered after all of these:

  1. Why am I only losing federation with some 2.4.0 instances but not all? The pull request seemed to be enabled by default and there should be no way to disable it.
  2. What's the point of including this header in the signature?

I couldn't find the answer on my own, and I decided not to because nothing is wrong now.

And that's it, the process of troubleshooting a mysterious bug.

"Blocklists"

April 22, 2018

There just really can't be any idea worse than blocklists.

As a Mastodon instance administrator, I've seen the growth and popularization of Mastodon as a decentralized social media, especially after the recent case of data leakage of Facebook. This can't be a better phenomenon as to us, since we have always hoped that people will one day wake up from the dream that large entities, such as governments and companies, would ever protect their freedom and / or privacy. However, while the amount of users and administrators of Mastodon increases, unexpected things also happen, due to the fact that some of the users just followed others to join Mastodon without knowing what they are actually doing. One of these is the emergence of Mastodon blocklists.

I saw such blocklist for the first time on a Mastodon post, which was published as an artical on Telegraph [1]. To be honest, it was really disturbing to me at the first sight, because I was not expecting this to happen so soon on Mastodon -- I was just talking about the possibility of such things happening on Mastodon with my friend that morning. Not surprisingly, this blocklist is, just like every other blocklists I've seen, full of personal prejudice and unjustified / unclear criteria. What's more disturbing is that people are actually requesting Mastodon to introduce auto-subscription to these blocklists [2], with unmanned scripts to download and apply every line in the blocklists published by some unknown and maybe prejudiced guy.

To make it clear, I am personally totally fine with the idea of doamin blocks / account blocks that is present in Mastodon for a long time. These are essential tools for some Mastodon instances to be legal, because instances have different values and different applicable laws. To maintain federation, these differences must be respected. What I am entirely against is to brainlessly take some random guy's blocklist and apply them blindly to your own instance, believing that the list completely correspond to your own value, and thinking that you have avoided a lot of extra work of blocking SPAM / Child Porn / ... instances and accounts.

Once people got the power of "control", they're making there own place where they escape from before, there is nothing new under the sun.

This was the response from my friend @AstroProfundis on this issue.

Truly, there is nothing new under the sun. It has not been long after the case that an activitist on Twitter was blocked by a popular blocklist that everyone just blindly follows [3], and people are fleeing from Twitter and Facebook for their overwhelmingly centralized power, and now people are again building their own centralized kindoms using blocklists, pretending that every instance is still independent even when they are using the same list of blocked users and domains. Well, unless you call them federate laws.

What are we hoping from a federated social media in the first place? Think about it. To me, it's the ability to scatter users into different instances with diverse values and views of the world. It's the possibility that if several instances are compromised or act against what users want, they can simply switch to the others and still get the same happy life as before. It's also the opportunity that every minority group can have their voice conveyed through the entire Fediverse. Sure, instances can each have their own rules of blocking, but they will never affect the Fediverse as a whole, and, as I personally believe, there will never be a consensus so wide that most of the instances will block a particular group of people. And, our lovely well-crafted blocklists will completely ruin these.

I've set up my own e-mail server before, which is a federated protocol with an idea similar to Mastodon, and what I discovered is that, with the blocklists, one will be essentially prevented from doing so if he / she wants the e-mails to be delivered properly to most of the e-mail hosts. These lists, by trusting popular IPs and distrusting unpopular ones, are essentially favoring gigantic hosts that owns the resources to perform complex machine-learning based fancy filtering algorithms on their outgoing e-mails. (Or even filter the outgoing e-mails by hand? Huh.) Moreover, once blocked, the process of disputing and unblocking will be overwhelmingly hard and complex for any individual e-mail host to get through. Yes, there are multiple lists following seemingly different standards. Yes, there are ways you could get yourself unblocked providing that proper justification is given. Will these make any difference? No. Even North Korea says that its people can put up disputes against their jurisdictional decisions -- despite the fact that this would never work.

I really hope that there will be some study on how much of these blocklists reflect their criteria written on paper, without much prejudice. Since there has been none, I can only conclude from my personal experience that such blocklists tend to become prejudiced while growing. This also includes a blockbot that is present recently in the Chinese community of Telegram users, which blocked a bunch of innocent people just for their ideas being in conflict with the maintainer's. Our lovely followers of this bot, without knowing anything, blocked such people from every controllable group.

Blocking is a destructive operation. It should be the last resort following failure to communicate, rather than something to be automated and to be blindly followed. If the maintainers of blocklists call them Hatelists, I will be completely fine for them, since by doing so they are actively informing people that this will include personal ideas, and this is not something to be subscribed to without further thinking. As long as they are still called Blocklists, I would say a big, big "NO" to them.

Dear Mastodon administrators, please always remember that, unless you share the same value with the maintainers of blocklists now, forever and for all the possible foreseeable future, do think twice before you follow someone to block a domain or a user. Do not ruin the Fediverse by your own hands.

Because I really don't know what will be the next Mastodon Fediverse to go to.

References

  1. Blockchain Blocklist Advisory
  2. PR #7059: Domain blocking as rake task
  3. When do Twitter block lists start infringing on free speech?