Request access

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

What is an SRE Product Manager?

António Araújo
António Araújo
Go To Market Lead
Rely.io
António Araújo
April 28, 2022
4
 min read
What is an SRE Product Manager? | Rely.io

Background

Site Reliability Engineering (SRE) and its interconnected areas such as Observability, Platform Engineering, and DevOps, have typically operated without Product Managers. I believe that’s happened because IT Operations was seen solely as a cost center and not as a source of competitive advantage. 

With the rise of technology giants such as Google, Amazon or Facebook, other companies started adopting similar SRE practices that improve efficiency, security, development speed, and the reliability performance of large-scale systems. Everyone is trying to move at the same speed as big tech and nimble startups. Bets on SRE or DevOps are now seen as investments with positive returns, rather than sunk costs. 

There’s little to no literature coming from Google describing how Product Managers can be part of an SRE team. Although there’s been lots to say about the SRE Team Lifecycles and their different topologies, there hasn’t been much around bringing non-engineers into this function. I think that’s going to change soon.

Why do SRE teams need Product Managers?

There’s an increasing number of product owners and program managers in SRE and Platform teams because they have to:

  • Build products for users (other engineering teams)
  • Prioritize which reliability investments have the highest impact on customers
  • Create the long-term reliability strategy for a company
  • Make Build Vs. Buy decisions
  • Liaise with several functional groups, including teams outside of engineering 
  • Define reliability targets and report on performance from the perspective of customers' expectations
  • Manage relationships with new and existing vendors

An SREs plate is full already so the tasks listed above are arguably stealing time from reliability-ensuring activities. A few weeks ago, not thinking I’d be writing this blog today, I ran a poll on r/sre asking How do you spend most of your time?

The results here were not that surprising. It validated that a large proportion of SREs do actually build and/or manage developer tooling, meaning that they must care for users. Also, those who commented did mention that a portion of their time is spent answering questions, doing admin tasks, and in vendor meetings.

We expect our Technical Product Managers in the platform tribe to have a tight working relationship with the product engineering, infrastructure and security teams as they’re usually the key stakeholders and consumers of the products that our platform teams are building

In Life in the Wise Platform team as a Technical Product Manager, by Laura Woo

What do SRE Product Managers do?

Product Managers supporting SRE and Platform teams are asked to bring traditional product management techniques, such as user research, roadmap prioritization, and stakeholder alignment into the reliability world. According to several job descriptions I’ve analyzed, their responsibilities often include:

  • Partnering with engineering and product leads to build product roadmaps for SRE
  • Creating a long-term strategy for observability and tooling investments, including managing vendor relationships
  • Implementing and maintaining Service-Level Indicators (SLIs) and Service-Level Objectives (SLOs)
  • Creating profiles of users (software engineers) and ensuring SRE’s products addresses their needs
  • Championing reliability ownership across non-SRE teams and enabling them to account for & track reliability of the services they’re responsible for
  • Owning the vision and strategy for: incident management, disaster recovery, performance testing, chaos engineering, etc.

Note: Responsibilities will vary from one organization to another, as well as job titles — SRE Product Lead, Technical Program Manager, SRE Product Owner, etc.

Below is a visual example of how a Product Manager might be part of an SRE team and some of their responsibilities — don’t take the SRE’s work areas as an absolute truth, I know there are many missing and some of these are always shared responsibilities across the team!

Example of an SRE group and some of their responsibilities

Given SRE’s principle of applying software to manage and automate IT, the function has successfully taken on many areas of responsibility. And it has been able to do so with less people than it would normally have been needed to move at the same speed reliably. That means complexity has increased drastically and now there’s a need for a focused strategy, planning and management function within SRE. 

I believe that we will start seeing more and more product managers step into this area or, most likely, more engineers formally take on a technical product management role within reliability. My second hypothesis is that the SLO methodology will become the product manager’s best friend because it will allow them to:

  • Agree with non-engineering functions on the reliability goals needed to meet or exceed customer expectations
  • Communicate about reliability performance with SLIs/SLOs as a standardized language
  • Prioritize roadmap according to SLO historical performance
  • Design better alerting and incident management strategies with burn rate alerting
  • Enable teams to own reliability of their services with out-of-the-box service SLIs
  • Monitor data-driven KPIs/OKRs, allowing for weighted, justified and fast decision making

More on the above with demos of Rely.io on a future blog post coming soon!

António Araújo
António Araújo
Go To Market Lead
Rely.io
António Araújo
On this page
Contributors
Previous post
There is no previous post
Back to all posts
Next post
There is no next post
Back to all posts
Our blog
See related articles
Migrate from Backstage to Rely.io
Migrate from Backstage to Rely.io
Backstage tries to solve the “big infrastructure problem” with a platform that aims to collect all your resources under a single roof but has some shortcomings that Rely.io solves effortlessly
John Demian
John Demian
April 2, 2024
9
 min
Building a business case for an Internal Developer Portal
Building a business case for an Internal Developer Portal
We’re diving into the ‘nitty-gritty’ of putting together the actual business case for your IDP.
Peter Evans
Peter Evans
October 18, 2023
14
 min
How do I convince my boss I need an Internal Developer Portal?
How do I convince my boss I need an Internal Developer Portal?
How to affect change in your organization by Empowering Engineers with IDPs
Peter Evans
Peter Evans
September 25, 2023
4
 min