- After a successful lift-off, the Ariane 5 launcher lost control after about 37 seconds. The Ariane 5 software reused the code from the Ariane 4, but the flight path of Ariane 5 was different and the Ariane 5's faster engines trigger a bug in an arithmetic routine inside the rocket's flight computer. Because of the internal SRI software exception, a data conversion from a 64-bit floating point to 16-bit signed integer. This resulted in an Operand Error. In the Ariane 5 has the faster engines lead the 64-bit numbers too large than in the Ariane 4, triggering arithmetic overflow that bring about the flight computer crashing. The main computer crashed, then 0.05 seconds later the backup computer of the flight 501 crashed. Since computer crashed, the main rocket processor overwhelmed the rocket engines and leaded the rocket disintegrated after 37 seconds. [5]
It showed that not all conversions were protected, because for the SRI computer, maximum workload target of 80% has been set. To make sure the vulnerability of unprotected code performed the analysis for each operation that may cause abnormal, contains Operand Error. In particular, the floating point value is converted to an integer is analyzed, and operations involving seven variables in a number of errors be dangerous. This led to protection being added to four of the variables, the evidence appears in the Ada code. However, another three variables were unprotected. The reasons for the decision not find a direct reference to the source code.
The reason for other three variables that contains one denoting horizontal bias that being unprotected was that further reasoning indicated that they were either physically limited or there was a large margin of safety, a reasoning which in the case of the variable BH turned out to be wrong. It is important to note that the decision to protect some of the variables but not others was taken jointly by project partners at several contractual levels.
Any trajectory data were used to analyze the behavior of the unprotected variables have not evidence, and also important to note that it does not include the Ariane 5 trajectory in the SRI requirements and specification.
Although the operations of several error sources have been identified, which itself will not lead to mission failure. Specification of the exception handling mechanism also contributed to the failure. Exception in any case, the system specification stated that: the failure should be indicated on the databus. In case of failure should be stored in EEPROM memory. Finally the SRI processor should be shut down. [1]
- By reason of high aerodynamic loads due to an angle of attack of more than 20 degrees that the launcher started to disintegrate. And it made the boosters separated from the main stage, then triggering the self-destruct system of the launcher.
- Because the Ariane 4’s flight path was different to Ariane 5.When the Ariane 5 lift-off, the function continues for about 40 seconds of flight. But this time sequence is based on a requirement of Ariane 4 and is not required for Ariane 5.[1]
- Although the inertial reference system cannot be tested directly by “black box" approach without the actual flight test, it could have been simulated by providing the appropriate input processor launch. This would have found fault, but this type of test is not running. On the contrary, the SRI was replaced by a simulator that during the before test the correct values to the main onboard computer.
- In addition, the system requirements for the Ariane 5 were different from the Ariane 4 system, despite their overall similar. Change requirements, but did not take into account when the reuse of code and any related assumptions, there is no review to ensure its effectiveness.[17]
Then in my opinion the main contributor to the failure is programming error led to Operand Error. A data conversion from a 64-bit floating point to 16-bit signed integer the SRI computer to stop. The software failed and this system and the backup system shut down. Then the launcher veered off its flight path, broke up and is disintegrated.
3. Consideration of standards
A powerful tool to support innovation and increase productivity standards organizations of all sizes. [6]It has the standards in the software design. The standards have the positive contribution in the circumstances described, as well as have the shortcomings.
The positive contributions are:
- Standards built an international consensus on terminology make technology transfer easier and safer. They play an important stage to improve the new technologies. [7]
- Standards are a respected badge of quality. If provide the standards, the customer reassured
-
Standards are powerful marketing tools [6]
- The standard is a more widely understood form of environmental policy.
The shortcomings of standards are:
- To achieve effective standards require frequent changes in response to rapidly changing circumstances. However, in practice the laws often do not keep up with the pace of change.
- Penalties for violation of standards are often too low, and the law enforcement has reduced.
- Financial costs may be high
- There also be political costs if the standards of rigorous, business are affected.[8]
4. Risk & hazard Analysis review
“A hazard is a situation in which there is actual or potential danger to people or to the environment.”
For improve the safety of a system, perhaps the most important mechanism is to determine a way, it can cause harm. Once these problems are found, you can assess its importance. If they are significant, it can take appropriate steps to eliminate or mitigate their effects.
In the development of any embedded computer system hazard analysis plays an important part. It results not only affect the system design, but also the development of the method. So hazard analysis must be carried out at the beginning of the project, and the results as all aspects of the project have a great influence. In fact, hazard analysis is concerned not only about the characteristics of the system but as well with the design details. Therefore, when an initial analysis indicates that the system is safe, hazard analysis should also throughout the development process. [9]
“Risk is a combination of the frequency or probability of a specified hazardous event, and its consequence.” [10]
Risk analysis is a good general standard, so that we can determine the effectiveness of our security design. Because about 50 percent of security issues are from design weakness, risk analysis at the design part play an important role of the software security program. In the risk analysis process is continuous for a number of different levels, immediately identify system-level vulnerabilities, the probability distribution and impact, and to determine reasonable mitigation strategies. [11]
Sometimes the terms such as risk analysis and hazard analysis used interchangeably, but there is one important difference. Hazard analysis only for the identification of hazards and the assessment of hazard level. And risk analysis increase of identification and assessment of environmental conditions along with exposure or duration. Therefore, hazard analysis is a subset of risk analysis. [12]
5. What should have been done?
In this part, will analysis some things that can conducted more successfully overall. And can help avoid the failure.
1) For the test side, the failure of Ariane 5 Flight 501 shows that the validation and verification and testing are not enough. [13] It can through set up a group that would meet the requirements of the software development procedures, propose strict rules for confirming such qualification, and determine that specification, verification and testing of software that are of a consistently high quality in the Ariane 5 programmed. Including external RAM experts is to be considered. And also can prepare a test facility that includes the real equipment as technically feasible. Input the real data and then perform complete, closed-loop and system testing. Before any mission the simulation must take place. High test coverage has to be obtained. [14]
2) In the programming side, it has been reused from Ariane 4 launch vehicle. An incorrectly handled software exception resulted from a data conversion from a 64-bit floating point to 16-bit signed integer. It results in the overflow. Ariane 5 reusing the Ariane 4 code. Because of it is the different flight path from the Ariane 5. Improve the technique for keeping code and consistent of its justifications. Better programming practice would have prevented this failure [13]
3) In the design part, there also have some problems should be improved. Because the system design specification accounted for random hardware failures only, the exception handling mechanism cannot be recover from a random software error. Then the normal function processor of the inertial reference system processor is closed, and soon the backup processor closed in the same way. If disallowed software exceptions from halting hardware units that were functioning correctly, will prevent failure. [13] Do not allow any sensor, such as the inertial reference system, to stop sending best effort data. [14]
4) Review of existing equipment and expand the scope of testing, where it is considered necessary. Include external participants when reviewing specifications, code and justification documents. Confirm that these reviews consider the substance of arguments, instead of check that verifications have been made. All flight software including embedded software should be reviewed. Identify all implicit assumptions made by the code and its justification documents on the values of quantities provided by the equipment. [14]
5) Improvement and systematic two-way flow of information:
Up from the equipment to the system: nominal and failure modes of behavior.
Down from the system to equipment: in flight use of items of equipment. [15]
More stringent design can prevent the occurrence of such a programming error. Better requirements analysis and specification will have more design verification for localization. Improve project management can provide a more effective organization process; recognize the influence of change requirements on system functions and behaviors.
Generally speaking, improved programming and design techniques, as well as better project management, may help prevent such failures from occurring. However, Anthony Finkelstein and John Dowell indicated that for large software systems, the complexity of the issues and solutions is such that the real reason for failure is usually systemic. [13]
6. What lessons have (or should have) been learned
The Ariane 5 flight 501 failure was a direct result of a software failure. However, it is a more general system validation failure symptom. From this case, a lot lessons has been learned:
- Not only test what the system should do, but also need to test what the system should not do. [16]
- Do not run software in critical systems unless it is actually necessary. [16]For example, after the flight lift-off should immediately switch off the alignment function of the inertial reference system. There is no software function should run during flight unless it is necessary.[14]
- Hazard and risk analysis are play important role in design the system. They must be carried out at the beginning of the project, and should also throughout the development process.
- Try to use the real equipment and not the simulation
- In the review part, external participants and review all the assumptions made in the code should improved this process. During the development process the design and code of all software should be reviewed for problems
- The designer's of Ariane 5 made a critical an elementary error.[16]
- When the reuse of code from another system to ensure that any existing rules or assumptions are still valid. You should be able to trace from requirements to code. Repeated use of the code still needs to be tested. It may have less error, if it had worked, but has changed the context of reuse may be new or old the wrong exposure.[17]
- The single component failure should be independent. It can’t cause the entire system to fail. This is the important of design the system.[16]
7. References
[1] Ariane 5 Flight 501 Failure, Report by the Inquiry Board
[Accessed on18 March 2010]
[2] ”Ariane5”
[Accessed on18 March 2010]
[3] “Ariane5”
[Accessed on18 March 2010]
[4] Ariane 5 Flight 501 Failure, Report by the Inquiry Board
[Accessed on12 April 2010]
[5]
“History's Worst Software Bugs” [Accessed on12 April 2010]
[6]
“What are the benefits of standards?” [Accessed on16April 2010]
[7] “Discover ISO” [Accessed on16April 2010]
[8]
“Advantages and Disadvantages of Standard” [Accessed on16April 2010]
[9] Neil Storey, “Safety Critical Computer Systems”, Published 1996, chapter 3- Hazard analysis [Accessed on 22 April 2010]
[10] Neil Storey, “Safety Critical Computer Systems”, Published 1996,
Chapter 4-Risk analysis [Accessed on 22 April 2010]
[11]
“Risk Analysis in Software Design” [Accessed on 22 April 2010]
[12] Nancy Leveson, “Safeware”, Published 1995, chapter 9-Terminology
[Accessed on 22 April 2010]
[13]
Nuseibeh, Bashar “Ariane 5: Who Dunnit?” [Accessed on 23 April 2010]
[14] Ariane 5 Flight 501 Failure, Report by the Inquiry Board
[Accessed on23 and 24 March 2010]
[15] [Accessed on 23 April 2010]
[16]
“The Ariane 5 Launcher Failure” [Accessed on 24 April 2010]
[17]
“Software Engineering: Ariane 5” [Accessed on 26 April 2010]