Introduction

Completed

The reliability of your systems, services, and products are crucial to your success and the success of your organization. This reality is something we all know as people in Operations, IT and IT management, DevOps, SRE, or even as a developer responsible for the creation of software.

Let's say you've invested heavily in a product or service. You've hired the best people, added fabulous features, and trained up a sales force to sell it. But, it isn't reliable. If it isn't available when your customers need to use it, all of your effort has been wasted. As Lenny Fenster, the CTO for the Microsoft Enterprise Services group says: "all the bells and whistles in the world aren't worth so much if the bells cannot be rung nor the whistles blown."

The good news is, modern operations practices like site reliability engineering and DevOps can help. Maybe you're reading this learning path because you've had a problem in the past with reliability. Or, maybe you're concerned that if you don't nail it now, it could be a problem. In either case, the Improve your Reliability learning path can help you get started with your reliability challenges.

One of the things that makes working on reliability in an organization difficult is there are so many different ways to approach the subject. It can be hard to know where to begin and in what order to proceed. This module provides a map that can help you decide what to work on and in what order. The other modules in the Improve your Reliability learning path expand on each of the topics in this map.

Learning objectives

By the end of this module, you're able to:

  • Express why reliability is crucial to your success.
  • Describe modern operations practices that offer tools you can use to work on your reliability challenges.
  • Explain the Dickerson hierarchy of reliability and the map it provides for approaching reliability challenges.