Hosting Mastodon in Kubernetes

Posted by Thomas Gideon on 7 January 2023

One reason I was eager to use Kubernetes with my recent self-hosting upgrade was how much faster it will make experimenting with new services. Unlike my older docker-compose based setup there is an ecosystem of pre-packaged resources to add to a cluster with minimal configuration. For any custom configuration not from a package, jsonnet and kubecfg help me abstract over the often repetitive syntax for Kubernetes. The first new service I wanted to try out in my cluster was Mastodon.

I use helm with my cluster. helm is a package manager for Kubernetes. You can install an application or service from a pacakge, called a chart, and customize as needed with a bit of yaml. You can even install the same chart more than once giving each instance its own name and configuration. This is similar to npm for node, dockerhub for Docker images, etc. My hosting provider, Digital Ocean, uses helm to support installing third party resources that make their managed Kubernetes even simpler to use.

I first installed the recommended nginx ingress controller and the cert manager from helm charts Digital Ocean maintains. Without any kind of ingress, you can deploy a service to a cluster but there is no way for a user to access it. For testing and development, kubectl, the command line tool for Kubernetes, supports tunneling and port forwarding. Once your application is ready to go, you will need ingress. Commonly, an ingress resource is defined as an actual load balancer. That is how Digital Ocean works. To configure most kinds of ingress, you add rules to map hosts and paths to services in your cluster.

I abstracted over my ingress with a bit of jsonnet that looks like this.

// these are the templates for common k8s manifests available for use with
// kubecfg
local kube = import './manifests/kube.libsonnet';

// a convenience to remove some repetition--this function outputs a port
// mapping for a specific service in the cluster
local http(service, port) = {
  paths: [
    {
      path: '/',
      pathType: 'Prefix',
      backend: {
        service: {
          name: service,
          port: {
            number: port,
          },
        },
      },
    },
  ],
};

// a comprehension to map the provided hosts, setting up the needed port
// mapping
local hosts(service, hosts, port=80) = [
  {
    host: host,
    http: http(service, port),
  }
  for host in hosts
];

// this is the actual output template for this jsonnet recipe, a manifest for
// an ingress resource, deploy with kubcfg update ingress.jsonnet
{
  ingress: kube.Ingress('ingress') {
    metadata+: {
      annotations: {
        // this associates our LE certificate manager with our controller so we
        // have valid TLS
        'cert-manager.io/issuer': 'letsencrypt-nginx',
      },
    },
    spec+: {
      // this stanza works with the certificate manager to request one for the
      // listed hosts
      tls: [
        {
          hosts: [
            'thecommandline.net',
            'thecommandline.social',
            'www.thecommandline.net',
          ],
          secretName: 'letsencrypt-nginx',
        },
      ],
      // there are other kinds of ingress with other providers
      ingressClassName: 'nginx',
      // the mapping from virtual host names through to cluster private
      // services and ports
      rules:
        // this is for the static site, itself an nginx base container image with
        // all the generated files added in
        hosts('the-command-line', ['thecommandline.net', 'www.thecommandline.net']) +
        // this is the mapping for Mastodon
        hosts('thecommandline-social-mastodon-web', ['thecommandline.social'], 3000) +
        // my full config has entries for all the rest of my static sites and
        // other web services
    },
  },
}

thecommandline-social-mastodon-web is the cluster private name of the Mastodon instance that supports https://thecommandline.social. I installed Mastodon into my cluster using this chart from Codechem. I picked it after reading the sample values.yaml which I found to be well organized and thoroughly documented. You can spin up additional containers for things like indexing/search and media caching. Or you can configure Mastodon to use existing, external hosts. I did so for the database, using a managed one I created in my Digital Ocean account.

I installed a second instance of the same Codechem Mastodon chart for another instance I run. I tried, unsuccessfully after several attempts, to create separate elasticsearch and redis services in my cluster to share among my Mastodon instances. Everything started but I saw cached media leaked across instances. I suspect some more specific configuration, probably using different logins to the external services or specific prefixes, will be required. I bookmarked that as a project for another weekend.

I used a Kubernetes secret for my Vapid key pair and the other secrets needed to set up Mastodon's API. The example values.yaml includes the commands to generate these correctly. I keep those manifests encrypted in my source control. Another task I need to look into is adding an encryption provider in my cluster so they can remain fully secure. Limited ingress and securing my control plane helps protect them once they are deployed but isn't ideal.

One additional piece of advice I received and will pay forward: use the S3 storage for media, not the file system. Persistent volume claims, the resources in Kubernetes that represents mounted disk storage, are very finicky, especially with security and permissions. Digital Ocean and other providers offer S3 compatible services that will work fine.

I did not get SMTP working. I didn't really try since both my instances don't need them. One is single user, the other is only my wife and me. Digital Ocean lacks an email sending service, which would be my preference anyway.

If you are going to allow others to sign up, you'll probably want to configure email. The admin interface does let you directly confirm a user after the email goes out. That is how I get my wife set up on the instance we share. Might work for you if you aren't expecting a lot of sign ups. You do not need email for the admin user, a limitation of either older versions or some packages. Again, the example values.yaml included instructions on how to reset the admin password from the command line.

To get everything running well, I did have to size up the nodes in my cluster a bit. Mastodon with search and cache enabled spins up over a dozen containers per install. Search alone requires three. I upgraded my node pool to two sets, a fixed pair of larger servers and a scaling group of one to three of the smallest server available. My cluster scheduled most of the Mastodon related pods to the larger nodes. My static websites are served mostly out of the small, auto scaling group.

My next stumbling block was the database. I initially added new schemas within the existing one I used with my RSS aggregator. Once I had both instances up, I started noticing some slowness, then 500 errors from the web UI. I realized the database was maxed out. I split the RSS service off to its own external database. I doubled the resources for the remaining database shared by both Mastodon instances. That database supports about fifty connections now which seems to be enough. I still may split the database so I can scale each based on each instances needs as they grow.

One other area I want to spend some more effort on is monitoring. I run an instance that I pay masto.host for. Their setup lets me easily access a dashboard for sidekick. When Mastodon bogs down, there are some more targeted things to do within the cluster aided by knowing what is going on with queues. I would like to add a metrics server to my cluster so I can try to monitor with grafana or something like it. I have some resource alerts on the cluster itself for now.

If you want a quick start to getting Mastodon running on Kubernetes, I hope these notes help. Let me know in the Fediverse if you have any questions, suggestions, or corrections.