Building a cloud operations team is not a new concept. But what if you need to build an operations team right from the start for applications that run on the Google Cloud Platform?
Over the eight years of rack space, I’ve had the opportunity to build an operations team that has supported thousands of customers. When I had the opportunity to build a new operations team with an emphasis on Google Cloud Platform managed services, I knew I needed to take a slightly different approach.
Google currently offers eight applications with more users than BILLION. If necessary, we create site reliability engineering to solve performance and availability issues. Rackspace has thousands of customers who ask our fanatic support team to make their applications available to end users. The goal is the same for each organization, but the approach is quite different.
Of course, I wanted to borrow elements from both sides and create a new operations team. Let’s show the three key considerations we’ve discussed to create a scalable process, use the latest tools, and build a team that effectively addresses incidents.
Rent for a great photo
To create an effective management team, you start with the person you hire. These engineers need to have balanced skills and experience. If not properly balanced, the operations team will find it difficult to be effective for ongoing emergencies.
Cloud engineers and modern Google Cloud Platform operational engineers need flexible skills, such as problem-solving skills, extensive technical knowledge (beyond GCP knowledge), conflict resolution, and critical listening. It is also important to understand the team’s weaknesses and focus on development in this field. Tools and tactics change frequently and the operating team can always continue to invest in individual development over time.
Construction and application of tools to allow scale
The Google Cloud Platform provides a robust infrastructure to facilitate rapid application innovation and expansion. The expansion of commercial capacity at the same pace is not practical considering human factors (work life balance, burnout, performance, growth). By investing in the right tools or building the right tools, the operations team can minimize work and maximize efficiency.
We monitor the performance of the application / infrastructure and understand the health of the application. You can use tools like StackDriver to trigger alarms, trigger automation to scale in advance, or deal with application problems
Configuration management to apply standards to the entire environment. The impossibility is balanced between scalability and complexity.
Infrastructure as a code to simplify deployment and enforce policies. Google Deployment Manager provides a native interface to the GCP resource API through yaml and python.
Using the CI / CD pipeline, we summarize the operational activities of commit to deploy. Jenkins and Spinnaker are tools that visualize application development and the deployment lifecycle and provide more control.
Standard Definition and Deployment
Strong Google Cloud Platform services are tricky. By setting standards and enforcing their use, operations teams can provide important building blocks for development and product development teams to explore. These standards take into account the complex network configuration, ensure the appropriate version control in the dynamic application deployment workflow, and describe the ID and access management policy. This infrastructure allows the operations team to respond effectively within defined service level objectives.
Highly skilled people who use the right tool on a solid foundation of policies and standards are recipes for success.
Does this seem like a difficult task? Need help to strengthen your operations team? Visit Rackspace to discover support for the world’s leading cloud, such as the Google Cloud Platform.