Launching Products Reliably (SRE)

Harinder Singh
2 min readMar 13, 2022

--

How do you launch products? How do you do that repetitively and reliably?

Let’s look at how one of the big techs(Google, in this case) does it.

There’re a number of things that should be in order to have successful launch(but not limited to):

  1. Builds
  2. Configurations
  3. Monitoring and logging stack setup to track the right metrics
  4. Rollout strategy(canary, blue-green, etc.) — depends on how you implement (ex. — by using Feature flags framework in the case of Google)
  5. Rollback strategy(should be tested beforehand)

Getting the obvious out of the way, let’s proceed.

Now, do you have a launch checklist? (Pssst: You should).

Why do I need one? Because these can be used to reduce chances of failure and ensure consistency.

Ex:

Could a user potentially abuse the service?

Action Item: Implement rate limiting and quotas.

It should be concise, practical and actionable enough for the developer(s).

LC can be defined for multiple areas of product.

Architecture

Ex: Have you figured out the dependencies/path correctly? Are they provisioned correctly? Have they been tested and reviewed?

Integration

Ex: Is monitoring stack setup and up-to-date(in case of new features)?

Capacity Planning

Ex: Are the outreach teams(marketing, blogging, PR, etc.) up to date with the launch? How much traffic is expected and do you have the resources to handle that?

Failure Modes

Ex: What if one of the components goes down? Is there a single point of failure?

Client Behavior

Ex: How does a miss behaving user or a DoS attack affect your service?

Processes (Manual and Automated)

Ex: What if a cluster or a DC is compromised? How can your service handle the load? How can you move it to a new cluster/DC?

Development Process

Ex: Are you using version control(not only for code but also for configurations)?

External Dependencies

Ex: Do you have any third party or a partner dependent on your service? Are they aware of the updates and how these would affect their service(s)?

Rollout Planning

Ex: Canary, blue-green, etc.

Google has a dedicated role (Launch Coordination Engineers) who are some of the experienced SREs and help a development team to work through these challenges and deliver/deploy quickly.

You can read more about this topic at https://sre.google/sre-book/reliable-product-launches/.

--

--