Three Mile Island
The Three Mile Island accident took place on March 28, 1979 and is still considered to be the worst nuclear disaster in American history. One of the reactors experienced a partial meltdown but luckily did not break the containment systems. As with the Chernobyl incident, the main reason for the accident was human error.
The failure of one of the controlling coolant flow valves caused the amount of cooling water entering the reactor to decrease causing an increase in the core temperature. The computerized controlling systems engaged and scrammed the reactor - that is, the fusion process was halted - stopping the nuclear chain reactions from occurring. Due to residual heat in the reactor and energy from decaying fission products in the fuel rods, the temperature still rose.
The EMCCS (Emergency Core Cooling System) automatically turned on, which should have provided enough extra coolant to make up for the stuck valve. Unfortunately, the operator manually turned it off too early thinking that enough coolant was in the core. The core’s temperature kept rising so a valve at the top of the core was automatically opened to release the steam buildup. This valve did not close properly. The indicator light in the control room was covered by a maintenance tag so was not visible to the operator. A little radioactive gas and water were vented to the environment around the reactor, contaminating it.
Once again, the accident was caused by human operator error. If the operator had left the computerized systems to take control of the incident, the accident would not have occurred. Operators should have the power to override the computer systems, but only if they actually fail to operate.
Interface Designs
I will briefly look at user interface issues for critical systems - i.e. for systems in which failures are either costly, or dangerous, posing a threat to human health.
Humans are by no means perfect and therefore the systems they use should be designed in such a way that they do not require perfect decisions. The systems should be error tolerant and must be able to make some time conscious decision themselves using logic.
Accidents are the unanticipated interactions of events. Therefore, control programs cannot be designed to automatically handle all accidents. In my opinion, the solution to human error is not to replace the operator but to increase the operator’s options for monitoring themselves and recovering from their errors. To help avoid accidents, humans should be provided with an on-the-spot diagnosis of the problem along with error tolerant corrective actions.
Human operators should continuously monitor and try to detect failures of system components. Failures are very significant, as they can affect the accuracy of the control system algorithms. A good example of such a failure is the NORAD moon attack. A faulty sensor gave incorrect information to the system and missiles were nearly launched. One General decided that the sensor was faulty and stopped the missiles from firing.
Starting unsafe operations should be difficult, while stopping such actions should be easy. For example, a missile should have to be armed before it can be fired. The fire button should not be touch sensitive and should require a complicated code to be entered. This would help to prevent an accidental firing.
The interface should augment human abilities and not replace them. The operator’s task should not be oversimplified. It should provide facilities for operators to experiment, and to update their mental models of the system. A method of providing alternative sources of critical information in case the display fails should be available to the operator.
Software
The question ‘Why are computers used in so many risky situations?’ still remains unanswered. I will attempt to give some reasons for their use.
The cost of computers is usually lower than that of using analog or electromechanical devices. They provide greater reliability than their predecessors. Software is easily changed and by increasing software reliability, safety will also increase. Safety can also be increased by using tried and tested, re-useable code. Computers allow for finer control and computerized systems allow operators to work further away from hazardous areas. Computers can also provide better information to operators, as long as they have been programmed correctly. The problem with software is that it can be very complex and can produce unexpected interactions.
Accident Causes
There are a number of reasons why accidents occur. The main reason is operator overconfidence and complacency within a working area. Many people rely heavily on redundancy. This should not be the case. Warning signs are often ignored simply due to insufficient resources, or by considering the short term gain of an action over the long term risk.
Airbus A320
The system used by the Airbus 320 shows that reliance on computer technology is not always a good thing. Three computers control all of the flight surfaces and the engines of the aircraft. If there is not a unanimous decision, they will make a majority decision as to the appropriate course of action. If the computers disagree with the action of the pilot then the computer decision will be accepted. To minimize the possibility of software error, all three computers were programmed by different teams of designers.
In June 1988, the Airbus A320 crashed at an air show while making a low level pass over spectators. The pilot reported that his request to gain altitude had been overruled by the onboard computer.
An Indian Airlines plane on a routine flight crashed in perfect weather conditions as it was landing. An independent investigator found that the Airbus A320 did not obey the pilot’s commands in the last few seconds before impact.
As I have already suggested, humans should be able to take overall control of a process in an emergency situation. The idea of a pilot not being able to take control of his aircraft is ludicrous, especially when a computer is given complete control. Flying is a risky business and should not be left purely to computers.
Automatic Trains
In Helsinki, the underground trains are run entirely by computers – there are no drivers. The trains are controlled in such a way as to make use of regenerative braking. As one train slows down, its energy is used to make another train accelerate. The trains are positioned to prevent heat loss when they pass a platform. The platform doors are positioned with the train doors and can only open when a train is in position. This prevents people from standing on the track.
The underground Helsinki trains have an excellent safety record and are always on time. This is a very efficient system. Repetitive tasks are out of the hands of humans. It follows that such a small system is going to be safe and efficient when computer controlled. The system has been operating for a number of years, and so far not one accident has occurred in the day to day running of the trains. That is an impressive statistic.
Coal Mining
Coal mining has been totally transformed by computers and computerized machinery. Years ago, mines used to be death traps for the miners. Young children were sent down dark, dingy mines to obtain coal. Luckily, society has advanced socially as well as technically, and risky jobs such as these are now carried out by computerized machinery. This is much faster, more efficient, and is a lot safer.
The Twentymile Longwall coal mine is situated in America and is the world’s most productive underground mine in terms of output per year. Its record output has been 7.7Mt of power station fuel. This was achieved in 1999. Since the mine was taken over 5 years ago, £21 million has been invested in it.
The mine uses two automated development units, each consisting of a Joy 12CM12 continuous miner, two Fletcher roofbolters, Joy shuttle cars and a Stamler feeder breaker. The coal transport system consists of a conveyor belt capable of transferring 5,000 tonnes of coal per hour at a speed of 4 metres a second. The shearer can cut 900mm of coal from the face on each pass. The maximum daily output (24hrs) has been 46,340 tonnes of coal.
The shearer can mine such a large amount of coal due to the fact that the mine is operating 24 hours a day. Machines do not need a rest or break. A well designed machine can be very efficient. All of the processes in the mine are computerized. This includes cleaning the coal to remove its natural impurities, sorting it into size groups, and drying it to remove moisture. As these stages are computerized, it ensures that customers are getting coal of a consistently high quality.
By computerizing the coal mining industry, the risk to human workers has been greatly reduced. This is an industry that has benefited greatly by computerization.
Personal Views
The issue over who should be responsible for the detection of an impending accident and initiate shutdown procedures should be considered. Is it better for a human or a machine to make the judgment? My opinion is that if a system is so risky that a machine must initiate the shutdown, perhaps the system should not be built at all. Machinery is replaceable, but human life is not.
References
Chernobyl Nuclear Accident
Three Mile Island Nuclear Accident
Interface Design
Coal Mining
Automatic Trains and Airbus A320
A Level Computing, 3rd Edition, By P.M. Heathcote
Understanding Computer Science for Advanced Level, 4th Edition, By Ray Bradley