Read The Times Australia

Daily Bulletin

Air traffic control failure shows we need a better approach to programming

  • Written by: The Conversation
imageThe higher they are, the further they have to fall.Ramil Sagum, CC BY

The causes of the National Air Traffic Services (NATS) flight control centre system failure in December 2014 that affected 65,000 passengers directly and up to 230,000 indirectly have been revealed in a recently published report.

The final report from the UK Civil Aviation Authority’s Independent Inquiry Panel set up after the incident examines the cause of and response to the outage at the Swanwick control centre in Hampshire, one of two sites controlling UK airspace (the other is at Prestwick in Scotland). Safety is key, said the report. I agree. And safety was not compromised in any way. Bravo!

“Independent” is a relative term, after all the panel includes Joseph Sultana, director of Eurocontrol’s Network Management, and NATS’s operations chief Martin Rolfe, as well as UK Civil Aviation Authority board member and director of safety and airspace regulation Mark Swan – all of whom have skin in the game. (Full disclosure: a panel member, Professor John McDermid, is a valued colleague of many years.)

For a thorough analysis, however, it’s essential to involve people who know the systems intimately. Anyone who has dealt with software knows that often the fastest way to find a fault in a computer program is to ask the programmer who wrote the code. And the NATS analysis and recovery involved the programmers too, Lockheed Martin engineers who built the system in the 1990s. This is one of two factors behind the “rapid fault detection and system restoration” during the incident on December 12.

The report investigates two phenomena: the system outage, its cause and how the system was restored. It also examines NATS' operational response to the outage. The report also looks at what this says about how well the findings and recommendations following the last major incident, a year earlier, had been implemented. I just look at the first here, but arguably the other two are more important in the end.

Cause and effect

In the NATS control system, real-time traffic data is fed into controller workstations by a system component called the System Flight Server (SFS). The SFS architecture is what is called “hot back-up”. There are two identical components (called “channels”) computing the same data at the same time. Only one is “live” in the running system. If this channel falls over, then the identical back-up becomes the live channel, so the first can be restored to operation while offline.

This works quite well to cope with hardware failures, but is no protection against faults in the system logic, as that logic is running identically on both channels. If a certain input causes the first channel to fall over, then it will cause the second to fall over in exactly the same way. This is what happened in December.

The report describes a “latent software fault” in the software, written in the 1990s. Workstations in active use by controllers and supervisors either for control or observation are called Atomic Functions (AF). Their number should be limited by the SFS software to a maximum of 193, but in fact the limit was set to 151, and the SFS fell over when it reached 153.

Deja vu

My first thought is that we’ve heard this before. As far back as 1997-98, evidence given to the House of Commons Select Committee on Environment, Transport and Regional Affairs reported that the NATS system, then under development, was having trouble scaling from 30 to 100 active workstations. But this recent event was much simpler than that – it’s the kind of fault you see often in first-year university programming classes and which students are trained to avoid through inspection and testing.

There are technical methods known as static analysis to avoid such faults – and static analysis of the 1990s was well able to detect them. But such thorough analysis may have been seen as an impossible task: it was reported in 1995 that the system exhibited 21,000 faults, of which 95% had been eliminated by 1997 (hurray!) – leaving 1,050 which hadn’t been (boo!). Not counting, of course, the fault which triggered the December outage. (I wonder how many more are lurking?)

How could an error not tolerated in undergraduate-level programming homework enter software developed by professionals over a decade at a cost approaching a billion pounds?

Changing methods

Practice has changed since the 1990s. Static analysis of code in critical systems is now regarded as necessary. So-called Correct by Construction (CbyC) techniques, in which how software works is defined in a specification and then developed through a process of refinement in such a way as demonstrably to avoid common sources of error, have proved their worth. NATS nowadays successfully uses key systems developed along CbyC principles, such as iFacts.

But change comes only gradually, and old habits are hard to leave behind. For example, Apple’s “goto fail” bug which surfaced in 2014 in many of its systems rendered void an internet security function essential for trust online – validating website authentication certificates. Yet it was caused by a simple syntax error – essentially a programming typo – that could and should have been caught by the most rudimentary static analysis.

Unlike the public enquiry and report undertaken by NATS, Apple has said little about either how the problem came about or the lessons learned – and the same goes for the developers of many other software packages that lie at the heart of the global computerised economy.

Peter Bernard Ladkin presented evidence to the UK House of Commons Transportation Sub-committee on the development of the Swanwick system in 1997 and 1998. His tech-transfer company Causalis Limited received consulting payments from BT Systems, as well as from Serco for due-diligence analysis of the Swanwick system, for their bids during the privatisation of NATS near the turn of the millennium.

Authors: The Conversation

Read more http://theconversation.com/air-traffic-control-failure-shows-we-need-a-better-approach-to-programming-42496

Business News

How Telematics Helps Australian Companies Improve Productivity

Operating a commercial fleet in Australia is a uniquely demanding endeavour. Between the sprawling urban sprawl of cities like Sydney and Melbourne and the immense, unforgiving stretches of the Outb...

Daily Bulletin - avatar Daily Bulletin

Inside the Icon: The BridgeMuseum Officially Opens at the Sydney Harbour Bridge

A bold new way to experience one of Australia’s most recognisable landmarks has arrived, with BridgeClimb Sydney officially opening the all-new BridgeMuseum.  Located inside the Sydney Harbour Bridge...

Daily Bulletin - avatar Daily Bulletin

Is Your Brand Showing Up in AI Search? Most Melbourne Brands Aren't.

The New Front Door Nobody Told You About Something changed. Quietly. Without a press release. The way buyers find businesses in Australia has been rewired. Not replaced, rewired. Google isn't dead...

Daily Bulletin - avatar Daily Bulletin

How Australian Businesses Can Measure SEO ROI

SEO can feel vague when you are staring at a dashboard full of numbers that do not clearly connect to revenue. The key is to measure the right signals in the right order, then tie them back to outcome...

Daily Bulletin - avatar Daily Bulletin

How Commercial Roller Shutters Improve Site Security Without Slowing Operations

Security upgrades can be frustrating when they make everyday work harder. A door that takes too long to open, creates bottlenecks at shift change, or fails at the worst time can turn “better protectio...

Daily Bulletin - avatar Daily Bulletin

Why a Document Destruction Service Still Matters for Modern Businesses

Businesses generate large volumes of information every day, from staff records and contracts to invoices, reports and customer files. While attention often focuses on how documents are stored, the way...

Daily Bulletin - avatar Daily Bulletin

Bicycle Rack Safety and Space-Smart Storage

Bike storage problems usually show up as small annoyances first: tangled handlebars, scratched frames, and bikes that topple when you pull one out. Over time, those issues become safety risks, especia...

Daily Bulletin - avatar Daily Bulletin

How to Tell if a Childcare Centre Is a Good Fit for Your Child

Choosing childcare can feel like you’re making a huge decision with limited information. Tours are short, centres are often on their best behaviour, and your child might act differently in a new space...

Daily Bulletin - avatar Daily Bulletin

Car Import Timeline: What Usually Happens at Each Stage

Importing a car into Australia can feel confusing because multiple agencies and checkpoints are involved, and the timeline is shaped as much by paperwork quality as it is by shipping speed. The most u...

Daily Bulletin - avatar Daily Bulletin

The Daily Magazine

Gold Migration Lawyers in Liquidation: How the Closure Affects Your ART Appeal

If your appeal was with Gold Migration Lawyers, a recent change to how the Tribunal decides cases ...

The pressure cooker: life in urban Australia in 2026

Australian cities have always been demanding. Long commutes, rising housing costs, busy schedules a...

What Actually Makes a Good Criminal Lawyer in Melbourne

Most people only think about this question once. That is usually too late. Most people charged wi...

Why Working With A Chatswood Tutor Can Improve Academic Performance

Academic expectations continue increasing for students across primary school, high school, and senio...

Is It Worth Getting Solar Panels in Melbourne?

The real question is not whether solar works in Melbourne. It works. The question is what it is co...

How A Diploma Of Project Management Builds Practical Skills For Modern Work Environments

Developing the ability to plan, execute, and deliver outcomes efficiently is a key requirement in to...

How to Choose the Right Football for Every Level

Choosing a football may seem straightforward, but the right option depends on who will be using it a...

What to Ask a Wedding Photographer Before You Book

Booking a wedding photographer can feel deceptively simple: you like the photos, you like the vibe...

Why Stress Relief For Dogs Is Essential For Emotional Balance And Long-Term Wellbeing

Managing emotional health is just as important as physical care when it comes to pets, which is why ...