(Also note that the considerations in this post would equally apply to applications written with other languages and/or frameworks.)
The most common type of deployment, which we refer to as “classic”, consists of copying the application code onto a metal server or VM then running it via the OS services management system. The steps involved typically look like this on a Linux-based server:
These steps are usually automated thanks to tools like Capistrano or Mina, with the help of Foreman or Procsd to manage systemd services for each component of the application (web server, background job processor, message queue listener, etc…)
This server needs to have the correct version of Ruby installed as well as any dependencies such as MySQL client library, OpenSSl, etc…
Once deployed, the main concern of running a production application is maintaining reliability and performance. The most straightforward way to address both these is to run multiple instances of the application across several servers with a load balancer set up in front to spread the incoming requests between them.
Both Capistrano (natively) and Mina (via plugin) accordingly allow deploying to multiple servers at once. Since this can extend the time it takes to complete all deployment steps, they also provide locking mechanisms to prevent conflicts.
This “classic” deployment is fast and straightforward and it got us a long way during the first few years of building Streem, using mina tasks triggered directly by engineers from their machines. However it does have a number of limitations that resulted in growing friction as our systems and team expanded.
Having a set of specific servers onto which the application is deployed means the OS, runtime and dependencies on each of them need to be kept in sync to avoid any “drift” that could cause a different behaviour on one but not other.
Even with the help of tools such as Ansible and a team effort to keep all our applications running on the same latest Ruby version, this can become a burden with a non-trivial number of servers.
Additionally, with the application codebase becoming larger over time stopping the previous version and starting the new one under load can take several seconds, during which multiple servers or even processes on the same server might run different versions of the application. The Load Balancer has limited visibility on the state of the rollout to decide where to route incoming requests as it relies on a simple health check every few seconds.
Making sure all code changes are backward-compatible and configuring a web server like puma with phased-restarts can help mitigate this, but we still experienced some dropped requests every now and then and gem loading issues due to memory shared between processes.
Another issue is the difficulty of sizing servers to optimize performance & stability versus costs, given the difference in resources used by each application component at different times.
For example the load on our background job processor Sidekiq is highly variable as content is ingested in irregular batches, while requests to web server instances from users increase steadily during working hours, and our message queue listeners are always busy but consume very little CPU.
Setting up a dedicated VM for each component makes maintenance and deployment very tedious, while squeezing everything onto one big one put all of them at risk in case of issue (misbehaving process, broken dependency update…) Auto-scaling VMs is possible but difficult to set up and limited in ability to respond to rapid change.
Last but not least, one of the key element of our goal to move fast is the ability for engineers to easily deploy previews of any work in progress that product managers and stakeholders can test. This requires keeping even more servers ready for engineers to deploy to without conflict, which is another drain of maintenance time and cost.
Thankfully the last decade has seen new approaches to solve these challenges mature and become more widely available to engineering teams.
First, the advent of software containers where OS, runtime libraries and application code are all bundled into one image solves the consistency challenge by making sure the exact same code and dependencies are running across any number of servers.
Docker is the most popular tool for this. The Dockerfile that defines the image configuration is itself versioned alongside the application to make sure every change is recorded and reviewed.
Images are stored in a registry from which it can be pulled by deployment servers that will use it to start containers running the application processes. They can also be used for development to ensure consistency across engineers machines.
The deployment of containers to run and expose multiple isolated processes with appropriate resources and dynamic scaling in response to usage are the responsibilities of an orchestrator.
Kubernetes has become the de-factor standard for orchestration thanks to the support of all major cloud providers. We use it via the fantastic managed Google Kubernetes Service. Kubernetes intelligently schedules containers depending on the capacity of available VMs, and can also automatically provision or delete them following resizing of the application processes or temporary deployments.
Processes for each component of our backend application are started from the same Docker image with fine-grained memory and CPU resources and smart autoscaling behaviour such as scaling sidekiq workers by number of jobs in the queues thanks to kube-sidekiq-autoscale.
Advanced load balancing capabilities make it easy to run multiple replicas of the same process in parallel for different purposes (web, mobile, admin…) to provides extra flexibility and visibility, or different code branches and environments for previewing changes.
Example of backend components processes running on GKE
Whenever a change is merged into our code repository, Cloud Build prepares a new Docker image and pushes it to Container Registry.
As we appreciated the flexibility of manual deployment we’ve stuck with our old friend Mina and extended it by building the mina-kubernetes plugin that wraps the krane gem, which calls the official Kubernetes CLI kubectl under the hood.
Krane lets us define Kubernetes resources using .yml.erb templates that receive variables such as the location of our Docker repository and which Docker image to use, as well as values for the Rails environment and the Rails credentials key. It then deploys them in controlled manner to a given namespace on the destination Kubernetes cluster.
mina-kubernetes provides simple tasks such as mina kubernetes:deploy that pushes all the resources to a given Kubernetes cluster. It also makes it easy to deploy the image from a given branch onto a dynamically created namespace for preview testing.
Throughout the years we’ve also considered other solutions such as Heroku, AWS Beanstalk, GCP App Engine, etc… but found the trade-offs between costs, flexibility, complexity and proprietary lock-in to be in favour of Kubernetes.
We’ve come across multiple claims that Kubernetes is a very complex system that is overkill for startups, but given most of the complexity is abstracted by GCP Kubernetes Engine we’ve been able to leverage many of its benefits with comparatively less work than the classic deployment approach required.
While it’s not perfect and always a work in progress (one notable improvement would be to fully automate the deployment pipeline with no human intervention required after code merge) we find this deployment process convenient and flexible, which is what matters most to us as we deploy code changes multiple times a day to the different applications that compose Streem’s backend.
Interested in joining the Streem team? Check out our Careers page.