How Blue/Green Deployment Strategy Works

Blue/Green deployment strategy is used for zero-downtime deployments. The importance of this method is that even if something goes wrong during deployment, it reduces the downtime and risk by switching blue/green server availability.

One of the challenges with automating deployment is the cut-over itself, taking software from the final stage of testing to live production. You usually need to do this quickly in order to minimize downtime. The blue/green deployment approach does this by ensuring you have two production environments, as identical as possible. At any time one of them, let’s say blue for the example, is live. As you prepare a new release of your software you do your final stage of testing in the green environment. Once the software is working in the green environment, you switch the router so that all incoming requests go to the green environment – the blue one is now idle. – Martin Fowler

To review some of the load balancer concepts please visit this page.

Steps to see how it works:

For our example, let’s consider a load balancer with four servers

1. Split the server pool into two pools, the blue one and the green one. All pools are available for the moment

deployment strategy-step1

2. Make the green pool unavailable and only the blue pool available. You should wait for the green pool for connection draining.

deployment strategy-step2

3. Update the green pool with new software. Then, you have to test the green pool after software update

deployment strategy-step3

deployment strategy-step3.2

4. Switch pools, the green pool becomes available in the load balancer, the blue pool will become unavailable after connection draining.

deployment strategy-step4

Now, you could wait some time to see the behavior of the new software in production before continuing deployment. We recommend you to look at logs and monitoring charts.

If any issues come up, causing the system to fail, you can do a Rollback by simply switching pools again, the green pool with new software becoming unavailable and the blue pool with old software becoming available. To complete the rollback, the green pool will be updated with the old version of the software.

5. If everything is ok with the green pool, the new software will be installed on the blue pool as well.

deployment strategy-step5

6. Reconnect blue pool to load balancer.

deployment strategy-step6

After this step, if any errors occur and the system fails, the rollback procedure is executed using also the Blue/Green procedure.

When you should use this method:

The Blue/Green deployment strategy works if the infrastructure can support all traffic with half the capacity during deployment.

If the infrastructure can not handle all traffic with only half of the servers, then this method can not be used.

Cons:

During deployment, all traffic is being handled by half of the capacity
Before switching green servers with blue servers, in order to have zero downtime, the blue serves must handle any outstanding transactions/requests before they are removed from the load balancer. This implies that for a short period, the blue and green servers will be available at the same time.

Pros:

Rollback is easy, all you have to do is to switch pools from green to blue when the blue ones have old software or create another blue/green deploy if all serves where updated.
The green serves can be considered the staging so we can use them to perform security tests, load tests, and integration tests before making them available in the load balancer.
Since the green servers are not getting any traffic, any service from them can be restarted without affecting the downtime

Observations

Database system can be affected by the green servers updated to a new version of software in which case the rollback procedure to the old software version may fail. To solve this problem you can create a single database for all servers (blue and green), change database structure and information for green servers but with backward compatibility so that the old software can work. When we are sure that the new system works as expected after deploy, we can remove all database support left for rollback.
There may be times where the old software can not work with the affected database at all without changing a little bit the software. In that case, you can make a preliminary deploy with the old software changed to work in case of rollback, after this the main deploy can start because all systems now have backward compatibility
This issue explained for the database is a more general one, with any kind of common resource used by a new and old version of software, the solution can look the same. Make software changes with the rollback in mind and create as many preliminary deploy steps as you need to your system to work in case of rollback
If any resource used as shared between blue and green servers becomes unavailable (like database migration, restart services) you can create a maintenance page and configure the system to switch all incoming requests from blue servers to maintenance page during deploy
A common issue is handling the cache keys which holds the information prepared by the new or the old servers. If the same key holds as value different structure of information and is affected by both types of instances the system will fail because an instance knows how to process only one type of structure. To solve this, a solution is to version the cache key, this way the new servers will not enter in conflict with old servers.