How IONOS got to release 3x faster and 3x more often, with failure rate reduced by 30% and at 70% of the cost
Engagement Outline
- Scope: 40 ICs in 5 teams at 2 locations
- Time period: 2 years, 3 days per week, 2 coaches
- Tech: Java (Spring, Payara, Quarkus), GitHub Actions, ArgoCD, Kubernetes, Docker, Grafana LGTMStack
Services
- Technical Coaching
- Ensemble Programming
- Continuous Delivery & Kubernetes workshops
IONOS is the hosting and cloud partner of choice for small and medium-sized businesses. As the largest hosting company in Europe, IONOS manages more than 8 million customer contracts and hosts more than 12 million domains in their own regional data centres around the globe. All on board the hyperscaler train as IONOS has ambitious goals following their IPO and an annual projected EBITA growth of 10%!
Sitting just slightly above the data centers, Lars Gentsch and his teams work on the provisioning engine, the system that provisions every resource a customer asks for. With 13 years in the making and the organizational agile transformation in full swing, Lars brought in crafted. to accelerate their value delivery using state-of-the-art methodologies.
The Challenge
The heart of every cloud operator is their provisioning engine, the piece of software that ties the together on-demand scalability for the customer with bare metal servers in data centers all over the world. This engine needs to be fast and reliable, so pupils at German public schools can access their cloud storage with the school bell on Monday, and eCommerce companies can sit back and focus on their core business during Christmas season.
Naturally, the teams working on the heart of IONOS are the perfect candidates to be the lighthouse teams for their agile transformation. While the organization introduced SCRUM, we were brought in by Lars to help with the teams on the ground, providing them with guidance on agility and to equip them with the technical expertise they need to develop software in the 21st century.
Our mission statement was aligned with the DORA metrics and the methods and habits that predict well-performing teams: Reduction in lead time, increase in deployment frequency and minimizing both change failure rate and mean time to recovery.
As with every successful organization that has been in business for more than a decade, the organization was faced with growth in all directions: The engine had to keep up with load doubling every year, while the business kept expanding to offer more products that needed to be implemented and maintained.
The teams were grappling with outages that had become increasingly complex, making it challenging to pinpoint singular root causes. In this dynamic and fast-paced environment, the growing demand posed the risk of surpassing the teams' capacity to keep up.
Bringing in Chris & Raimo was among the best decisions we've made at Provisioning. Right at the start of their engagement, we were facing severe stability issues that they not only helped us work through, but also made sure that we're following up with impactful changes to handle any future problems better.
A strong start, and they continued to deliver from there. With their help, we're now faster and more stable than ever, thanks to them introducing zero-downtime deployments, horizontal scalability and trunk-based development.
They provided the much needed technical expertise and operative knowledge that completed our Agile Transformation. With them, we moved from having Scrum meetings, to truly understanding what agility means and how technical skills, autonomy and trust are what make agile teams work well and sustainably at the end of the day.
I'm looking forward to working with Chris & Raimo again in the future.
Our Achievements
3x faster, 3x more often, 7x smaller, and cheaper!
- Deployment Automation
- Continuous Deployment
- Zero-Downtime Deployments
- Compliance
Our focus on the practices of Continuous Delivery paid off big time: Together with the teams, we rigorously automated their deployment process and brought it down to 44 minutes, that's a third of what it used to be in 2020! Later, introducing zero-downtime deployments meant we could do them more often, so we did just that:
The teams have deployed their engine over a hundred times in 2023, three times the deployments they did back in 2020.
The numbers speak for themselves: The cycle time for any given change is down to only three days, having started at eight days, and the median batch size went down by a factor of seven. The reduced risk and faster turnaround results in effective cost savings of around 30% , with less time and fewer staff members having to be involved in any deployment.
And for the can't-do-that-here's: We didn't ditch any of their enterprise policies along the way, but adhered to them rigorously and even improved the data quality in their auditing system accidentally. Sorry, not sorry!
Sustainability, with the teams at the center
- Extreme Programming
- Test-Driven-Development
- Test Automation
- Continuous Delivery
- Kaizen
We consider Extreme Programming and Continuous Delivery to be the only way to sustainably develop software: A deployment is safer the smaller it is and automated tests give you that certain peace of mind, even and especially on an emergency rollout on Sunday. Together, we've automated their deployment process until every dev in the team was able to roll out a change themselves, and we've made it safer than ever by including automated tests along the way.
To encourage this kind of localized decision making and continuous improvement, we introduced and shaped the roles of team- and tech-leads. Together with those leads, we were able to move larger mountains:
We've worked with them to expand their on-duty rotation from individuals to whole teams, so that improvements could be made right as problems arise (Kaizen), and knowledge about the platform is shared not just among a few potential lottery winners, but every team member equally.
We've also made sure the teams have all the capabilities they need at hand by bringing in infrastructure- and database-experts from other departments, including them in planning and development processes early on.
When I joined IONOS 18 months ago, I became one of the first two team leads at Provisioning Platform, not the least due to Chris and Raimo pushing for the introduction of that role.
Right from the start, they helped us define the role, its responsibilities and most importantly the extend of our authority so we can be effective in our position and support our team members best. Especially in the first few months of my tenure, the spaces they created for us were pivotal to the success of our role.
It was remarkable just how well they understood the organization and the levers we had to turn to change our ways of working for the better. For example, on their initiative, we moved from second level support handled by only individual developers to being handled by teams, which vastly improved how quick we resolve incidents and how we share knowledge among the teams.
With the regular workshops and trainings they ran with the teams, I was confident to focus on my people-responsibilities, knowing that technical skill development is taken care off by them.
Having a lasting impression
- Remote Collaboration
- Ensemble Programming
- Open Space Technology
The biggest achievements any consultant may dream off are the practices that stick, long after their engagement has ended.
At the height of the pandemic, we introduced Miro to the teams, giving them a fantastic tool for remote collaboration and showing them the ropes of how to use it effectively. With Miro, they had an easy time adopting our facilitation techniques to their needs and taking initiative themselves. Miro is now an essential part to most of their initiatives and meetings, with an incredible wealth of knowledge on all those Post-Its.
To encourage cross-team collaboration, we've introduced the Open Space format to their Tech Community Of Practice, giving them a whole day with all the people they need, to work on their skills and on cross-cutting improvements to their system.
As with all our workshops and trainings, we've rigorously worked with Ensemble Programming techniques that have found their way into the toolbox of the teams.
Chris and Raimo played a pivotal role in elevating our coding practices. Through their guidance, we not only learned how to effectively implement Test-Driven Development (TDD) but also witnessed a remarkable improvement in our code quality.
Their commitment to rejuvenating our development process extended beyond TDD though: Chris and Raimo successfully introduced regular coding katas and mob programming sessions, turning the maintenance of an old monolith into an enjoyable and engaging experience.
The CI/CD workshop led by Chris and Raimo was a highlight for our team. Not only was it a fun and challenging experience, but its long-term impact is evident in the adoption of hexagonal architecture and the gradual refactoring of our application towards a microservices architecture. This workshop marked a turning point in our development journey.
Their expertise, guidance, and infectious enthusiasm have left an indelible mark on our team. Without a doubt, the decision to bring them on board was the best our management could have made.