Similar authors to follow
Manage your follows
About Niall Richard Murphy
- Currently head of Azure SRE in Microsoft - Dublin, Ireland
- Twitter http://twitter.com/niallm
- Photos at http://www.edge-cases.photos
Customers Also Bought Items By
In 2016, Google’s Site Reliability Engineering book ignited an industry discussion on what it means to run production services today—and why reliability considerations are fundamental to service design. Now, Google engineers who worked on that bestseller introduce The Site Reliability Workbook, a hands-on companion that uses concrete examples to show you how to put SRE principles and practices to work in your environment.
This new workbook not only combines practical examples from Google’s experiences, but also provides case studies from Google’s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t.
Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is.
- How to run reliable services in environments you don’t completely control—like cloud
- Practical applications of how to create, monitor, and run your services via Service Level Objectives
- How to convert existing ops teams to SRE—including how to dig out of operational overload
- Methods for starting SRE from either greenfield or brownfield
The overwhelming majority of a software system’s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems?
In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization.
This book is divided into four sections:
- Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices
- Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)
- Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems
- Management—Explore Google's best practices for training, communication, and meetings that your organization can use
What once seemed nearly impossible has turned into reality. The number of available Internet addresses is now nearly exhausted, due mostly to the explosion of commercial websites and entries from an expanding number of countries. This growing shortage has effectively put the Internet community--and some of its most brilliant engineers--on alert for the last decade.Their solution was to create IPv6, a new Internet standard which will ultimately replace the current and antiquated IPv4. As the new backbone of the Internet, this new protocol would fix the most difficult problems that the Internet faces today--scalability and management. And even though IPv6's implementation has met with some resistance over the past few years, all signs are now pointing to its gradual worldwide adoption in the very near future. Sooner or later, all network administrators will need to understand IPv6, and now is a good time to get started.IPv6 Network Administration offers administrators the complete inside info on IPv6. This book reveals the many benefits as well as the potential downsides of this next-generation protocol. It also shows readers exactly how to set up and administer an IPv6 network.A must-have for network administrators everywhere, IPv6 Network Administration delivers an even-handed approach to what will be the most fundamental change to the Internet since its inception. Some of the other IPv6 assets that are covered include:
- integrated auto-configuration
- quality-of-services (QoS)
- enhanced mobility
- end-to-end security