https://governmentasaplatform.blog.gov.uk/2016/11/08/lessons-migrating-trade-tariff/

Guest post: Lessons learned migrating the Trade Tariff service

Hand-drawn diagram to show the infrastructure and hosting tasks that service teams have to manage without a Platform as a Service
Infrastructure and hosting tasks service teams have to manage without the Platform as a Service

This is the second guest post from Matthew Ford, the Technical Director of Bit Zesty, on the migration of the GOV.UK Trade Tariff service to the Platform as a Service for government (PaaS). 

Bit Zesty is a Digital Marketplace supplier that runs the Trade Tariff service on behalf of HMRC and GDS.

In my first post I spoke about the Trade Tariff which was the first service to migrate to the PaaS. During the migration we captured lessons learned so we could share them with our GDS colleagues and any other organisations migrating to the PaaS.

Some of these lessons are listed below. It’s worth noting they assume a high degree of technical knowledge.

Prevent downtime by using the blue-green deployment pattern

Deployments by Cloud Foundry by default can cause downtime for your application. To avoid this you need to implement the blue-green deployment pattern.

We used a 3rd party plugin, Autopilot, which worked well. However, Autopilot will cause your application deployments to use double the memory resources (RAM). So make sure you have enough resource quotas in place.

We hit the total memory limits for our application and needed to ask the PaaS team to increase the quota.

Disable process binding for background jobs

We spotted our background processes (Sidekiq) were crashing. This is because Cloud Foundry expects all processes to bind to a port to serve web traffic (or it will stop the process). Only the web servers need to serve traffic - our background processing jobs don’t need to to do this. We fixed this by changing the health check setting cf set-health-check <app name> none.

Don’t be tempted to run your database migrations at startup

It might be tempting to run your database migrations before your application starts. But we advise against this even though it's a simple way to ensure migrations have run, and Cloud Foundry recommend it if you migrate frequently.

Like most container runtimes, Cloud Foundry can only run one process at a time. Running your database migration will cause a delay until your application boots up. And because Cloud Foundry sends traffic to the instances straight away, some traffic will be dropped on each deployment, particularly if your migrations take a long time to run.

To avoid this problem, setup a duplicate copy of your application that doesn’t serve web traffic but shares the same database connection as the main application. The copy will have the database migration command on startup so you’ll need to disable health checks, for example:

  • cf push tariff-backend-migrations -c "rake db:migrate"

  • cf set-health-check tariff-backend-migrations none

You can now push code updates to both these applications and run the migrations in the duplicate copy.

To prevent downtime, you’ll still need to ensure your database migrations are compatible with both the old and new versions of the application code.

Importing your data may be tricky

The database is in a secure environment with no direct access, so getting data into it can be tricky. The Trade Tariff database is very large, which slowed down the importing process.

We first uploaded a copy of the database to a cloud file store - Amazon S3. We then downloaded it into an interactive shell running in Cloud Foundry using the 18F cf-ssh tool. We could then import the database via the temporary shell process.

You may be able to import the database with SSH and local port forwarding, however at the time this didn’t work for us.

Plan for production level traffic and test scaling out

For applications that are already live, it’s a good idea to load test your deployment with similar levels of traffic.

The Trade Tariff sits behind the GOV.UK router so we could mirror live production traffic to the new platform before we launched.

You can also do this using a tool like gor for services that aren’t behind the GOV.UK router.

After the launch, the service was hit by a number of unexpected spikes. These were double peak-traffic levels. Normally this might be an issue, but it was simple for us to scale up the web servers to meet the demand using Cloud Foundry. But scaling up the database instance took a bit longer and we needed help from the PaaS team to change our database.

Allowing government digital teams and external agencies to scale databases themselves is on the PaaS team’s backlog.

Wrapping up

In summary the migration went smoothly. We had a few bumps along the way but they were quite easy to resolve and the PaaS team provided great support.

If you’re interested in replicating our continuous deployment pipeline (with database migrations and blue-green deployment on the PaaS), take a look at our deployment scripts for the PaaS. These are open source and can be found in the Trade Tariff GitHub repository.

We look forward to transitioning other services to the PaaS in the future - this will free us up to spend more time focusing on improving services and less time managing hosting.

Follow Matt Ford on Twitter and don't forget to sign up for email alerts.

GDS is expanding, and we have a number of positions that need to be filled - especially on the Government as a Platform team. So we’re always on the lookout for talented people. Have a look at our videos describing how we workour vacancies page, or drop us a line.

1 comment

  1. Graham Bleach

    After listening to feedback from Matt and others, we've now enabled use of Cloud Foundry's built-in SSH support, so the instructions for sshing to services should now work: https://docs.cloudfoundry.org/devguide/deploy-apps/ssh-services.html

    It will soon be possible to run database migrations as one-off tasks as an existing app on the PaaS, instead of making a whole new app. There isn't a CF CLI release that supports this yet, but the impatient/brave can try it out using "cf curl" https://gist.github.com/bleach/25fb546421f826a48eb1230f33eacc69

    Link to this comment