Study reveals current incident management processes and tools increasingly insufficient for driving innovation and maintaining uptime
Enterprise digital transformation budgets continue to increase despite a recession, developers find it challenging to innovate and standard incident management tools and processes hinder digital service resilience, according to xMatters’ State of Automation in Incident Management report. Digital service resilience is the ability to recover quickly, adapt and learn from incidents such as outages and interruptions to prevent future technology and customer-impacting issues. The report also analyzed the varying degrees of incident management readiness or preparedness within an organization to identify its position in the Incident Management Spectrum. The research found that comparatively, across the Incident Management Spectrum, only the most advanced organizations have isolated keys to success across business and incident management functions.
“Through a series of research reports over the past year, we studied the growing challenges faced by those tasked with the delivery and maintenance of digital services. Customer-impacting issues continue to be a roadblock to innovation as today’s digital, fast moving environment requires technology teams to spend more time supporting operations,” said Troy McAlpin, CEO at xMatters. “However, there is an opportunity for technology professionals to evolve incident management approaches through incident response automation, collaboration and constant learning in order to achieve customer delight and further innovation.”
Pandemic Forces Digital Transformation
Spending on digital transformation has increased continually since the November 2019 Incident Management in the Age of Customer-Centricity research. 20% of companies with 1,001-5,000 employees are budgeting more than USD 10 million on digital transformation initiatives, compared with 9.3% in November 2019. This focus on digital transformation was accelerated by the COVID-19 pandemic. Findings from the April 2020 Impact of COVID-19 on Digital Transformation survey showed more than half of consumers experienced a rise in application performance issues, forcing many companies to accelerate digital transformation in order to deliver accessible digital experiences for customers and employees.
Customer-impacting Issues are a Roadblock to Innovation
The State of Automation in Incident Management research found that the proportion of technology professionals affected by customer-impacting issues when building out services has increased by almost ten percentage points to 84.3%, compared to results from the November 2019 Incident Management in the Age of Customer-Centricity research. Overall, there is a marked need for improvement in customer experiences and an organizational commitment to innovation across industries.
A majority of respondents (72.3%) — across a variety of titles including development, SRE, IT operations and management — reported that at least half of their team’s time is spent resolving incidents compared to time spent on innovation. Of these respondents, over a quarter (27.3%) said at least 80% of their team’s time is spent resolving incidents.
Opportunity for Advancement in the Incident Management Spectrum
To assess the efficacy of incident management in organizations, the State of Automation in Incident Management analyzed components of a comprehensive incident management practice (i.e., team structure, tools) and how organizations detect, resolve and learn about incidents.
Responses to survey questions were further analyzed and scored to determine an organization’s position in the Incident Management Spectrum based on approaches to incident management. The four categories within the Incident Management Spectrum include: ad hoc where there is no formal incident management practice; traditional incident management, an approach driven by service desk tickets and ITIL processes; modern incident management where individual teams detect and resolve service-based issues; and adaptive incident management where a scalable and service-centric model harnesses as much automation as possible. The results of the research found that almost all respondents employ either a traditional (40.1%) or modern (58.6%) approach to incident management.
“Traditional teams spend much of their time on firefighting and completing non-value-added tasks compared to innovation, while modern teams, who have allocated more budget toward digital transformation, spend equal amounts of time resolving incidents and building out features,” continued McAlpin. “Most technology organizations want to spend more time building differentiated features and new services instead of frequently dealing with incidents. Organizations must shift their approaches toward the modern and adaptive categories of the Incident Management Spectrum, which will enable teams to automate more components of the incident management lifecycle. The result: more time back in order to release products and put new innovations into the market while ensuring products are as reliable as possible for customers.”
Automation, Collaboration and Constant Learning Are Key to Superior Customer Experience
While most technology professionals reported the implementation of team-oriented incident management processes, there is room for advancement in multiple aspects of day-to-day processes. Nearly half of technology professionals (43.4%) deploy less sophisticated processes such as alerting; emailing and paging; conference bridges; or manual setup and outreach to engage team members, stakeholders and customers during an incident. Most organizations that employ a traditional approach to incident management use service desks and process-heavy approaches, whereas modern organizations leverage incident management tools for incident response and management. Moreover, as companies look to reliable digital services as an indicator of customer success, there is an opportunity to automate the postmortem process. When asked about top benefits of using artificial intelligence or machine learning for incident management, respondents identified informing post-incident reporting with data from previous, related incidents (36%) and aggregation of data to detect anomalies early (28.9%).
The State of Automation in Incident Management research findings are based on the results of a survey that examined the opinions of over 300 DevOps, SRE, ITOps and business leaders from organizations of varying sizes, including midsize businesses to those with over 5,000 employees that deliver digital services.