DevOps has been a major cultural force in IT for the past ten years. But a gap remains between what companies expect to get out of DevOps and the day-to-day realities of working as a systems engineer on a IT team.
Systems engineers often spend too much time putting out fires and manually building, configuring, and maintaining cloud infrastructure. A recent survey found that it takes more than a month to deliver new infrastructure for 33 percent of companies, and more than half had no access to self-service infrastructure. The result is that systems engineers burn out quickly, developers are frustrated, and new projects are delayed. Add to the mix a constantly shifting regulatory landscape and dozens of new platforms and tools to support, and chances are that your operations team is pretty overwhelmed.
So how should we build and manage new cloud systems? As systems engineers, which principles should we live by?
Below, the engineering team at Logicworks has tried to encapsulate all our cloud management best practices into a Cloud Operations Manifesto — a guide for any system engineer on how to operate in the cloud. Obviously, our manifesto is strongly influenced by the Agile Manfesto and core DevOps principles. You can download a PDF of the Manifesto here. Think we’ve left something out? We’d love to hear about it in comments.
A Cloud Operations Manifesto
1. End-user experience is our responsibility.
Uptime and high performance should be constantly measured and optimized.
2. Infrastructure is ephemeral, immutable, and utilitarian.
Don’t fix it, replace it.
3. Infrastructure is software.
It’s programmable, modular, and ideally deployed as part of the application. Everything can change quickly, is tested frequently, and is ready to scale.
4. Make data useful.
Log everything centrally, ingest logs into a BI tool, and get data that benefits the business.
5. Let automation do the boring stuff.
Patching, backups, key rotation etc. should be automated.
6. Security first.
Think about security in the beginning of the architecture process, not as a last-minute add-on.
7. Understand the difference between security and compliance.
Security reduces operational risk. Compliance reduces business risk. You need both.
8. Anticipate failure.
Expect everything to fail, plan for failure, and deliberately break things to test failover.
9. Have a Runbook.
No matter how sophisticated your technology, you still need a response plan for issues, especially after-hours.
10. Don’t be afraid to fail quickly.
Not every cloud technology is going to be right for you, but you won’t know unless you stay current and test.
Download a PDF of the Cloud Operations Manifesto
Logicworks is a leading provider of cloud migration and managed services for AWS and Azure. To learn more about us, visit our website or contact us.
2 Comments
Omar Assem
August 11, 2019
On item 2, “immutable” is correct? You said that infrastructure is ephemeral, ok. But I don’t understand how it can be also immutable.
Sorry if it is a basic question, but I really didn’t understand it.
Regards.
Lindsay Syhakhom
August 12, 2019
Hi Omar – That’s a good question, and something we get asked a lot. We go into more detail on immutable infrastructure here: https://www.logicworks.com/blog/2016/07/immutable-infrastructure-in-aws-devops/
In essence, it’s immutable because rather than “fixing” whatever’s broken, you tear it down and rebuild it. So in the case of an AWS EC2 instance, for example, you never actually change the instance directly; if you need to modify its configurations, you change whatever template/machine image/script you used to build the instance, and then use that automation process to build it again.
Hope that helps!