Launching Products Reliably (SRE)

Harinder Singh
2 min readMar 13, 2022

How do you launch products? How do you do that repetitively and reliably?

Let’s look at how one of the big techs(Google, in this case) does it.

There’re a number of things that should be in order to have successful launch(but not limited to):

  1. Builds
  2. Configurations
  3. Monitoring and logging stack setup to track the right metrics
  4. Rollout strategy(canary, blue-green, etc.) — depends on how you implement (ex. — by using Feature flags framework in the case of Google)
  5. Rollback strategy(should be tested beforehand)

Getting the obvious out of the way, let’s proceed.

Now, do you have a launch checklist? (Pssst: You should).

Why do I need one? Because these can be used to reduce chances of failure and ensure consistency.

Ex:

Could a user potentially abuse the service?

Action Item: Implement rate limiting and quotas.

It should be concise, practical and actionable enough for the developer(s).

LC can be defined for multiple areas of product.

Architecture

Ex: Have you figured out the dependencies/path correctly? Are they provisioned correctly? Have they been tested and reviewed?

Integration

Ex: Is monitoring stack setup and up-to-date(in case of new features)?

Capacity Planning

Ex: Are the outreach teams(marketing, blogging, PR, etc.) up to date with the launch? How much traffic is expected and do you have the resources to handle that?

Failure Modes

Ex: What if one of the components goes down? Is there a single point of failure?

Client Behavior

Ex: How does a miss behaving user or a DoS attack affect your service?

Processes (Manual and Automated)

Ex: What if a cluster or a DC is compromised? How can your service handle the load? How can you move it to a new cluster/DC?

Development Process

Ex: Are you using version control(not only for code but also for configurations)?

External Dependencies

Ex: Do you have any third party or a partner dependent on your service? Are they aware of the updates and how these would affect their service(s)?

Rollout Planning

Ex: Canary, blue-green, etc.

Google has a dedicated role (Launch Coordination Engineers) who are some of the experienced SREs and help a development team to work through these challenges and deliver/deploy quickly.

You can read more about this topic at https://sre.google/sre-book/reliable-product-launches/.

--

--