Similar authors to follow
Manage your follows
About David K. Rensin
David K. Rensin (1972-) is an American technology entrepreneur, computer scientist, and best-selling author based in Silicon Valley (just south of San Francisco, CA). He writes on a wide range of technical topics from advanced data management to best practices for building distributed systems and companies.
Drawing on his experience founding and leading companies, helping to take them public, and having them acquired, Mr. Rensin writes from the perspective of a practitioner who has seen nearly all of the good, bad, and ugly that exists in the technology space.
(If you meet him, ask him about the time he told Steve Jobs that the original iPhone was "destined to fail" -- a conversation which went as well as you might imagine.)
He is currently SVP of Engineering at pendo.io.
Follow him online at:
Customers Also Bought Items By
In 2016, Google’s Site Reliability Engineering book ignited an industry discussion on what it means to run production services today—and why reliability considerations are fundamental to service design. Now, Google engineers who worked on that bestseller introduce The Site Reliability Workbook, a hands-on companion that uses concrete examples to show you how to put SRE principles and practices to work in your environment.
This new workbook not only combines practical examples from Google’s experiences, but also provides case studies from Google’s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t.
Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is.
- How to run reliable services in environments you don’t completely control—like cloud
- Practical applications of how to create, monitor, and run your services via Service Level Objectives
- How to convert existing ops teams to SRE—including how to dig out of operational overload
- Methods for starting SRE from either greenfield or brownfield
Site reliability engineering (SRE) is more relevant than ever. Knowing how to keep systems reliable has become a critical skill. With this practical book, newcomers and old hats alike will explore a broad range of conversations happening in SRE. You'll get actionable advice on several topics, including how to adopt SRE, why SLOs matter, when you need to upgrade your incident response, and how monitoring and observability differ.
Editors Jaime Woo and Emil Stolarsky, co-founders of Incident Labs, have collected 97 concise and useful tips from across the industry, including trusted best practices and new approaches to knotty problems. You'll grow and refine your SRE skills through sound advice and thought-provokingquestions that drive the direction of the field.
Some of the 97 things you should know:
- "Test Your Disaster Plan"--Tanya Reilly
- "Integrating Empathy into SRE Tools"--Daniella Niyonkuru
- "The Best Advice I Can Give to Teams"--Nicole Forsgren
- "Where to SRE"--Fatema Boxwala
- "Facing That First Page"--Andrew Louis
- "I Have an Error Budget, Now What?"--Alex Hidalgo
- "Get Your Work Recognized: Write a Brag Document"--Julia Evans and Karla Burnett
Although service-level objectives (SLOs) continue to grow in importance, there’s a distinct lack of information about how to implement them. Practical advice that does exist usually assumes that your team already has the infrastructure, tooling, and culture in place. In this book, recognized SLO expert Alex Hidalgo explains how to build an SLO culture from the ground up.
Ideal as a primer and daily reference for anyone creating both the culture and tooling necessary for SLO-based approaches to reliability, this guide provides detailed analysis of advanced SLO and service-level indicator (SLI) techniques. Armed with mathematical models and statistical knowledge to help you get the most out of an SLO-based approach, you’ll learn how to build systems capable of measuring meaningful SLIs with buy-in across all departments of your organization.
- Define SLIs that meaningfully measure the reliability of a service from a user’s perspective
- Choose appropriate SLO targets, including how to perform statistical and probabilistic analysis
- Use error budgets to help your team have better discussions and make better data-driven decisions
- Build supportive tooling and resources required for an SLO-based approach
- Use SLO data to present meaningful reports to leadership and your users
Run your entire corporate IT infrastructure in a cloud environment that you control completely—and do it inexpensively and securely with help from this hands-on book. All you need to get started is basic IT experience.
You’ll learn how to use Amazon Web Services (AWS) to build a private Windows domain, complete with Active Directory, enterprise email, instant messaging, IP telephony, automated management, and other services. By the end of the book, you’ll have a fully functioning IT infrastructure you can operate for less than $300 per month.
- Learn about Virtual Private Cloud (VPC) and other AWS tools you’ll use
- Create a Windows domain and set up a DNS management system
- Install Active Directory and a Windows Primary Domain Controller
- Use Microsoft Exchange to set up an enterprise email service
- Import existing Windows Server-based virtual machines into your VPC
- Set up an enterprise-class chat/IM service, using the XMPP protocol
- Install and configure a VoIP PBX telephony system with Asterisk and FreePBX
- Keep your network running smoothly with automated backup and restore, intrusion detection, and fault alerting