Cloud, Big Data, and the Internet of Things

Automic Blog

Subscribe to Automic Blog: eMailAlertsEmail Alerts
Get Automic Blog via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Application Performance Management (APM), DevOps Journal

Blog Feed Post

Five Steps to Avoid Outages By @Automic | @DevOpsSummit #APM #DevOps

Modern computing systems are stunningly complicated; however it is time to stop blaming complexity for failures

Five Steps to Avoid Outages That Will Put You in the News
By Ralf Paschen

United Airlines flights nationwide were grounded early Wednesday morning, creating delays and long lines at airport terminals. The New York Stock Exchange (NYSE) also suspended trading on Wednesday and stayed down for nearly four hours. Minutes later, The Wall Street Journal's (WSJ) homepage experienced an outage, possibly under intense traffic as a result of the NYSE news.

Root causes were quickly identified and published - "a router degraded network connectivity for various applications" and "the root cause was determined to be a configuration issue."

It reminds me of the maxim, ‘when the tide goes out, you can see who is bathing naked'. The simple fact is that failures such as these frequently point to more deep-seated problems with ageing IT infrastructures. It is rare to publicly hear of such problems due to the need to protect brand image, customer loyalty and shareholder value.

However, these things happen and will continue to happen - a European federal agency, for example, recently experienced a three-day outage owing to a network security problem, and stated that certain switches were being replaced at the time. The reality was that the organization was rolling out an application and had no chance to roll back in a timely manner.

Don't put the blame on complexity
Modern computing systems are stunningly complicated; however it is time to stop blaming complexity for failures. The vast majority of outages are related to change in some way: usually a software update that took place shortly before the failure.

Business automation enables businesses to cope with this complexity, providing complete end-to-end visibility of these critical business processes, eliminating steps that slow execution time and removing human error.

So let's take a look at the top five reasons for failed deployments and deployment-related outages in large enterprises - and how business automation can solve them.

1. Lack of centralized packaging and tracking
Traditionally, applications are developed by disconnected teams, each producing dozens of new artefacts and artefact versions, including executable, configuration files and database schemas. Often there is interdependency between artefact versions from one team to another. This interdependency is only tested when a new version of the system is tested. By the time an application is rolled out to the operations environment, the list of artefacts and their interdependencies can grow significantly. Automation packages and transports the correct artefacts, with the correct versions to the exact targets, thereby eliminating potential failure.

2. Process failures
Deploying application changes means following guidebooks in an exact order. As the number of environments, server and application tiers grow, so does the length of the instructions. The complexity of the process is a major source of errors and failures that can cause a service provider to black out.

In response, administrators attempt to automate the process with scripts and configuration tools that they have available, but the slightest mistake is often fatal. Here again, a standardized, fully automated and well-tested deployment capability can prevent such errors occurring.

3. Lack of a generic deployment model
Every enterprise application passes through a lifecycle of development, functional testing, integration and load testing and rollout. With each stage, the application is put on a different set of servers. However, these environments are many times more complex in terms of hardware, operating systems and even the application infrastructure stack. The potential for error - and therefore outage prevails.

Business automation takes control of the subtle changes in the execution environment between stages and artefact transferring, introducing a generic deployment model for more reliable development and roll-out.

4. Lack of a snapshot validation stage
Every application update creates a new baseline of configuration, in terms of artefact versions, as well as values that are changed. Unfortunately, there is no practical way for administrators to review configuration states as part of any new update.

Emergency patches are a weak point as they tend to ‘disappear' with new versions and sometimes cause outages that can take a long time to resolve. Business automation introduces a baseline snapshot at the beginning and end of each new update that resolves many potentially serious issues from occurring in the first place.

5. Inability to automatically rollback in time
As we have seen, software complexity is not the root cause of outages. It is the deployment and release processes are the heart of the problem. The lack of packaging, reusable workflows, generic deployment models and the ability to validate application snapshots and automatically rollback in the middle of any deployment process, is the single most important cause of failures in enterprises

Software-related outages are an everyday occurrence, as aging IT infrastructure struggles to cope with the demands of the always-on, always-connected world. Business automation has the answer.

What do you think?  I would be interested to hear your views.

Read the original blog entry...

More Stories By Automic Blog

Automic, a leader in business automation, helps enterprises drive competitive advantage by automating their IT factory - from on-premise to the Cloud, Big Data and the Internet of Things.

With offices across North America, Europe and Asia-Pacific, Automic powers over 2,600 customers including Bosch, PSA, BT, Carphone Warehouse, Deutsche Post, Societe Generale, TUI and Swisscom. The company is privately held by EQT. More information can be found at www.automic.com.