Indraneel's blog

Feeble attempts at grokking the incomprehensible.

Service deployment script - part 2: Deployment scenarios

I laid down some deployment jargon in Part 1 and identified a canonical environment. Let us now consider how we would run the various deployment phases for such an environment.

Allocation: Fresh deployment/scale out/disaster recovery scenarios

This involves allocating 5 machines, ensuring that 2 of them have web server prerequisites, 2 have middle tier prerequisites (depending on business requiremnts), while the last has sql prerequisites. You may also have to install other prerequisites like certificates for the reserved domain, confirm that the networking requirements/load balancer configs etc are set up. How you do that depends on the level of automation provided by your hosting service provider. In some cases (e.g. AWS/Azure), this step is automatically executed during the actual service deployment.

Code deployment: Common to all scenarios

This involves installing the code on the web and middle tier boxes, applying the SQL schema + data on the database, running all the configuration steps, and starting the services. The exact steps depend on the code, its dependencies, the operating system, the sql vendor etc. but as such this stuff follows some common rules.
You must be able to partition your service so that a subset of machines can be taken offline and patched with out affecting the machines that are online.

Service partitioning approaches

  • For front end boxes, this may be as simple as taking some of them off the load balancer.
  • For middle tier boxes, things often get more complicated. You may have these on a load balancer, or you could be using some form of clustering software, using the database itself for leader election. Regardless, the point is to drain some of those machines and make them available for patching
  • Perhaps the most complicated is the data layer, which can't be split up with out incurring risks of having to reconcile divergent data copies. A common choice here is to patch using some conventions that ensure that the databases can stay up and accept requests while being patched

Service Monitoring deployment: Common to all scenarios

You just deployed new code, which may need new ways of monitoring it. While you could potentially get away by taking downtime for your monitoring infrastructure, it is a best practice to

  • To never incur downtime on your monitoring infrastructure - on the principal that your customers are everywhere.
  • Keep your monitoring running while you are doing the code deployment step.

The steps for running this deployment are essentially the same as those listed in the code deployment step. You are merely using a different set of builds/configuration/secrets. In fact, if you can do so, plan on using the same deployment infrastructure for this as well.

Validation: Common to all scenarios

Depending on your situation, this can be a combination of manual validation of new/old/primary scenarios, analyzing results from service monitoring. The interesting question in validation is almost always around what you do if validation reports errors that need fixing. I expect to cover this in more detail at a later post.

Infrequent scenario 1 - Data restore: Disaster Recovery sceario

This involves taking backups of your data, restoring it to a separate datastore and then pointing your front end and middle tier boxes to the new datastore. As such, this is a process that almost always involves data loss and some downtime.

Infrequent scenario 2 - Compute restore: Disaster Recovery and Machine replacement sceario

This involves replacing your existing front end/middle tier/datastore machines with new replacement machines and applying new code on it. In the case of machine replacement, this should ideally not cause downtime. This may be a downtime step when it occurs as part of disaster recovery.