Software Assurance Measurement – Establishing a Confidence that Security is Sufficient


Posted: November 2, 2017 | By: Dr. Carol C. Woody, Dr. Robert J. Ellison

Measuring the software assurance of a product as it functions within a specific system context involves assembling carefully chosen metrics that demonstrate a range of behaviors to establish confidence that the product functions as intended and is free of vulnerabilities. The first challenge is to establish that the requirements define the appropriate security behavior and the design addresses these security concerns. The second challenge is to establish that the completed product, as built, fully satisfies the specifications. Measures to provide assurance must, therefore, address requirements, design, construction, and test. We know that software is never defect free. According to Jones and Bonsignour, the average defect level in the U.S. is 0.75 defects per function point or 6,000 per million lines of code (MLOC) for a high-level language (1). Very good levels would be 600 to 1,000 defects per MLOC, and exceptional levels would be below 600 defects per MLOC. Thus, software cannot always function perfectly as intended. Additionally, we cannot establish that software is completely free from vulnerabilities based on our research, which indicates that 5% of defects should be categorized as vulnerabilities. However, we can provide sufficient measures to establish reasonable confidence that security is sufficient. Security measures are not absolutes, but we can collect information indicating that security has been appropriately addressed in requirements, design, construction, and test to establish confidence that security is sufficient.



There is always uncertainty about a software system’s behavior. Rather than performing exactly the same steps repeatedly, most software components function within a highly complex networked and interconnected system of systems that changes constantly. A measure of the design and implementation is the confidence we have that the delivered system will behave as specified. Determining that level of confidence is an objective of software assurance, which is defined by the Committee on National Security Systems (2) as

Software Assurance: Implementing software with a level of confidence that the software functions as intended and is free of vulnerabilities, either intentionally or unintentionally designed or inserted as part of the software, throughout the lifecycle.

At the start of development, we have a very general knowledge of the operational and security risks that might arise as well as the security behavior that is desired when the system is deployed, and we have a limited basis for establishing confidence in the behavior of the delivered system. Over the development lifecycle, as the details of the software incrementally take shape, we need to incrementally increase our confidence level to eventually confirm that the system will achieve the level of software assurance desired for the delivered system. For example, an objective of the Department of Defense (DoD) milestones reviews is to identify issues that represent risks that could adversely affect our confidence that the deployed system will function as intended.

One practice that can be performed is to analyze the software using a series of vulnerability analysis tools and remove detected vulnerabilities from the code. We know from experience that the tools will only show a small portion of the existing vulnerabilities, so how do we measure the improved confidence?

A comparison of software and hardware reliability provides some insight into challenges for managing software assurance. For hardware reliability, we can use statistical measures, such as the mean time between failures (MTBF), since hardware failures are often associated with wear and other errors that are frequently eliminated over time. The lack of hardware failures increases our confidence in a device’s reliability.

As noted by the 2005 Department of Defense Guide for Achieving Reliability, Availability, and Maintainability (RAM) a lack of software defects is not necessarily a predictor for improved software reliability. The software defect exists when the software is deployed, and the failure is the result of the occurrence of an unexpected operating condition. Too little reliability engineering was given as a key reason for the reliability failures by the DoD RAM guide. This lack of reliability engineering was exhibited by

  • failure to design-in reliability early in the development process
  • reliance on predictions (use of relia-bility defect models) instead of con-ducting engineering design analysis

The same reasoning applies to software assurance. We need to engineer software assurance into the design of a software system. We have to go beyond just identifying defects and vulnerabilities towards the end of the lifecycle and evaluate how the engineering decisions made during design and requirements affect the injection or removal of security defects. For example, the Common Weakness Enumeration (CWE) is a list of over 900 software weaknesses that resulted in software vulnerabilities exploited by attackers. Many of them can be associated with poor acquisition or development practices.

Define the Software Assurance Target

All good engineering and acquisition starts with defined requirements, and software assurance is no different. We must define the specific software goal for the system. From the goal we can identify ways in which the engineering and acquisition will ensure, through policy, practices, verification, and validation, that the goal is addressed.

If the system we are delivering is a plane, then our stated software assurance goal might be “mission-critical and flight-critical applications executing on the plane or used to interact with the plane from ground stations will have low cybersecurity risk.”

To establish that we are meeting this goal, a range of evidence can be collected from the following: milestone reviews, engineering design reviews, architecture evaluations, component acquisition reviews, code inspections, code analysis and testing, and certification and accreditation. Is what we have always been doing sufficient or do we need to expand current practice? We can use the Software Assurance Framework (SAF), a baseline of good software assurance practice for government engineers assembled by the SEI, to confirm completeness and identify gaps in current practices (3).

Software Assurance Framework (SAF)

The SAF defines important cybersecurity practices for the following four categories: process management, project management, engineering, and support. Each category comprises multiple areas of cybersecurity practice. In the SAF, a set of cybersecurity practices is defined for each area and relevant acquisition and engineering artifacts are documented for each of these cybersecurity practices. An evaluator can look for evidence that a cybersecurity practice has been implemented by examining the artifacts related to that practice. In the next section of this article, we will show how measurements can be linked to these cybersecurity practices. While the same approach can be applied to all four practice categories, we will focus the remainder of our efforts in this article on engineering.

Improving Assurance

As noted in the introduction, we have to go beyond just identifying defects and vulnerabilities towards the end of the lifecycle and evaluate how the engineering decisions made during design and requirements affect the injection or removal of security defects.

Justifying Sufficient Cybersecurity Using Measurement

Measurement is a mechanism for understanding and control of software processes and products and the relationships between them. It helps in making intelligent decisions that lead to improvement over time and is essential for acquisition and development management.

A formal engineering review requires more than a description of a design. Security depends on identifying and mitigating potential faults, and a design review should verify that faults associated with important operational risks have been identified and mitigated by specific design features. We need to document the rationale behind system design decisions and provide evidence that supports that rationale. Such documentation is called an assurance case. It does not imply any kind of guarantee or certification. It is simply a way to document rationale behind system design decisions. Metrics can provide evidence that justifies the assurance associated with design decisions.

Assurance case:3 a documented body of evidence that provides a convincing and valid argument that a specified set of critical claims about a system’s properties are adequately justified for a given application in a given environment.

If something is important, it warrants figuring out a way to measure it. Effective measurements require planning to determine what to measure and what the measures reveal. Tracking results aids understanding of whether efforts are achieving intended outcomes.

Software assurance metrics have to evaluate the engineering practices as well as the security of the product. Answers to the questions shown in Table 1 provide evidence that the engineering applied improved security.

Table 1: Engineering Questions


Was applicable engineering analysis incorporated in the development practices?


When multiple practices are available, have realistic trade-offs been made between the effort associated with applying a technique and the improved quality or security that is achieved (i.e., the efficiency and effectiveness of the techniques relative to the type of defect).


How well was the engineering done?

Results applied

Was engineering analysis effectively incorporated into lifecycle development?

It is essential to link a measure to a development practice. For example, product measures such as the number of defects per million lines of code (MLOC) or the output of static analysis of the source code. How should a program respond if the number of defects per MLOC appears to be too high or there is a significant number of static analysis warnings or errors reported?

We use the Goal/Question/Metric (GQM) paradigm4 to establish a link among mission goals and engineering practices. The GQM approach was developed in the 1980’s as a structuring mechanism and is a well-recognized and widely used metrics approach. For example, a top-level goal shared among all efforts to build security into software development is to identify and mitigate the ways that a software component could be compromised. Such a goal cuts across all phases of the acquisition and development lifecycles. The first challenge is to establish that the requiremen¬¬ts define the appropriate security behavior and the design addresses these security concerns. The second challenge is to establish that the completed product, as built, fully satisfies the specifications. Measures to provide assurance must, therefore, address requirements, design, construction, and test. We can identify supporting sub-goals associated with these primary acquisition/development lifecycle activities, which may be repeated at various phases across the lifecycle.

  • Requirements: Manage requirements for software security risks.
  • Architecture through design: Incorpo-rate security controls and mitigations to reduce security risks for design of all software components.
  • Implementation: Minimize the number of vulnerabilities inserted during coding.
  • Testing, validation, and verification: Test, validate, and verify software se-curity risk mitigations.

For each of these engineering sub-goals we will explore relevant practices, outputs, and metrics that could be used to establish, collect, and verify evidence. Since each project is different in scope, schedule, and target assurance, actual implemented choices will need to vary.

Sub-Goal: Requirement management sufficiently incorporates security analysis.

We should consider if we can demonstrate sufficient assurance for a requirement as we write it. For example, consider requirements that address adverse security events for an unmanned aerial vehicle (UAV). Could a non-authorized actor take control of that device or could communications with that device be disrupted? The requirement that a UAV only acts on unmodified commands received from the group station addresses the first event, and using security controls such as encryption, we can demonstrate full assurance for that requirement. We do not have engineering techniques to guarantee continued operation during a wireless network denial of service (DNS), but we can demonstrate assurance for a requirement that specifies the actions a UAV should take to protect itself during a DNS attack (4).

Practice: Security risk assessment: Conduct an engineering-based security risk analysis that includes the attack surface (those aspects of the system that are exposed to an external agent) and abuse/misuse cases (potential weaknesses associated with the attack surface that could lead to a compromise). Following Microsoft’s naming convention in (5), this activity is often referred to asthreat modeling.

Outputs: Output specificity depends on the lifecycle phase. An initial risk assessment might only note that the planned use of a commercial database manager raises a specific vulnerability risk that should be addressed during detailed design. Whereas the risk assessment associated with that detailed design should recommend specific mitigations to the development team. Testing plans should cover high-priority weaknesses and proposed mitigations.

GQM outputs represent what an acquisition wants to learn. Examples of useful outputs appear in Table 2.

Table 2: Outputs

recommended reductions in the attack surface to simplify development and reduce security risks

prioritized list of software security risks

prioritized list of design weaknesses

prioritized list of controls/mitigations

mapping of controls/mitigations to design weaknesses

prioritized list of issues to be addressed in testing, validation, and verification

We need to evaluate the output of the engineering practices. The outputs of a security risk assessment are very dependent on the experience of the participants as well as on constraints imposed by costs and schedule. Missing likely security faults or poor mitigation analysis increases operational risks and future expenses. The rework effort to correct requirement and design problems in later phases can be as high as 300 to 1,000 times the cost of in-phase correction, and undiscovered errors likely remain after that rework [10].

Practice: Conduct reviews (e.g., peer reviews, inspections, and independent reviews) of software security requirements.

Output: Issues raised in internal reviews

The analysis of the differences arising in the outputs should answer the questions shown in Table 3.

Table 3: Technical Review Analysis

What has not been done: number, difficulty, and criticality of “to be determined” (TBD) and “to be added” (TBA) items for software security requirements

Where there are essential inconsistencies in the analysis and/or mitigation recommendations: number/percentage, difficulty, and criticality of the differences

Where insufficient information exists for a proper security risk analysis. Examples include emerging technologies and/or functionality where there is limited history of security exploits and mitigation

The Heartbleed vulnerability is an example of a design flaw (6). The assert function accepts two parameters: a string S and an integer N and returns a substring of S of length N. For example, assert (“that”,3) returns “tha”. The vulnerability occurs with calls where N is greater than then length of S such as assert(“that”,500) which returns a string starting with “that” followed by 496 bytes of memory data stored adjacent to the string “that”. Such calls would enable an attacker to view what should be inaccessible memory locations. The input data specification that the value of N was less than or equal to the length of the string was never verified. This is CWE-135, one of the SANS Top 25 vulnerabilities5 and should be discovered during design reviews.

Practices that support answers to the engineering questions in Table 1 (examples shown in Table 1a below) should provide sufficient evidence to justify the claim that the Heartbleed vulnerability has been eliminated.

Table 1a: Practices/Outputs for Evidence Supporting Table 1 Questions

Practice Output

Threat modeling

Software risk analysis identifies input data risks with input verification as mitigation.

Design includes

Input data verification is a design requirement.

Software inspections show implementation

Confirms the verification of input data.


Testing plans include invalid input data.

Test results show mitigation is effective for supplied inputs.

Sub-goal (Architecture through design): Incorporate security controls and mitigations to reduce security risks for design of all software components.

This sub-goal describes the security objective for the architectural and design phases of a development lifecycle. The outputs of this phase of development are shown in Table 4.

Practice: Security risk assessment as applied to the architecture and design.

Table 4: Architectural and Design Outputs

prioritized list of design weaknesses

prioritized list of controls/mitigations

mapping of controls/mitigations to design weaknesses

The assurance question is whether an acquisition should accept a developer’s claim that these outputs provide sufficient security.

Potential security defects can arise with any of the decisions for how software is structured, how the software components interact, and how those components are integrated. An identification of such weaknesses has to consider both the security and functional architectures and designs. The security architecture provides security controls such as authentication, authorization, auditing, and data encryption, and the functional/system architecture describes the structure of the software that provides the desired functionality.

The functional rather than the security architecture is the more likely source of security defects. The security architecture is typically designed as an integrated unit with well-defined external interfaces. The functional architecture is increasingly likely to include commercial software with vague or unknown assurance characteristics. Commercial product upgrades can significantly change assurance requirements, interfaces, and functionality. Systems are rarely self-contained and by design accept input from independently developed and managed external systems. Such diversity increases the difficulty of identifying and prioritizing security risks and increases the importance of independent reviews.

Identifying and mitigating architectural and design weaknesses depends on a developer not only using good engineering practices and strong tools but also on that developer’s understanding of the attack patterns that could be used to exploit design, architecture, and features incorporated in the proposed system.

SQL (Structured Query Language) injections provide a good instance of where attack-pattern analysis is required to mitigate exploits and how proactive application of resulting knowledge can reduce the opportunities for design vulnerabilities. Choice of a mitigation depends on the nature of required queries. Mitigation for open-ended queries likely require the use of a vetted query library. Where we have a set of well-specified queries, such as with the account-balance example, a mitigation using a parameterized query can be effective6. Such a mitigation would verify that the value constructed for the variable customer_name meets the database specification for a name.

Evidence: Many of the secure design metrics shown in Table 5 assess how well attack knowledge was incorporated in the architectural and design activities. For a security assessment to be effective, the results have to be documented and disseminated across a development team. Outputs from the developer’s design activities should include reports on security risk analysis, mitigation analysis and selection, design inspections, and implementation and testing guidance. An acquisition should be able find the evidence determining assurance from that collection of development documentation.

The question for an acquirer is whether the assembled evidence demonstrates that an assurance case for a claim that SQL injections have been sufficiently mitigated could be constructed at the completion of development.

Table 5a: Practices/Outputs for Evidence Supporting Table 5 Secure De-sign Measures



Threat modeling

Software risk analysis identifies input data risks with input verification as mitigation.


Vetted library and parameterized queries are widely used and effective mitigations.


Coding guidance is provided for chosen mitigation.

Vetted library: A software scan is proposed to verify that the library has been correctly installed and used (7).

Parameterized queries: Guidance to verify usage is provided for source code inspections.


Testing plans incorporate dynamic testing for SQL injections. .

Metrics: Some measures for incorporating security into the architecture and design practices are shown in Table 5.

Table 5: Secure Design Measures

Has appropriate security experience been in-corporated into design development and re-views?

Have security risks associated with security assumptions been analyzed and mitigated?

Has attack knowledge identified the threats that the software is likely to face and thereby determined which architectural and design features to avoid or to specifically incorporate?

Have mitigations for the incorporated architec-tural and design features been guided by at-tack knowledge?

Have security functionality and mitigations been incorporated into the application devel-opment guidance?

Has guidance been provided for coding and testing?

Sub-Goal: Minimize the number of vulnerabilities inserted during coding.

Practices: Differences in efficiency and effectiveness encourage the use of a combination of inspections, static analysis, and testing to remove defects from existing code (8). Testing is covered in the following section. The effectiveness of inspections is very dependent on the experience of the participants, while the effectiveness for static analysis depends on the quality of the rules that are applied and the interpretation of the output.

The choice of a practice can depend on the type of vulnerability. The design flaws should be identified during design and inspections and not by static analysis applied to the implementation. Static analysis tools can be more effective than inspections for identifying potential weaknesses in data flows that involve multiple software components for use at integration.


Table 6: Coding Outputs

prioritized list of coding weaknesses associated with programming languages used and application domain

static analysis associated with coding identified weaknesses

mitigation of identified coding weaknesses

code inspection results

Improving security for design concentrates on proactively analyzing compromises as the design was created rather than at its completion. A developer can be more proactive with coding by creating and enforcing guidelines to eliminate a number of coding vulnerabilities. For example, buffer overflows too often occur with a subset of the text string processing functions in the C programming language. Coding guidelines can prohibit the use of that subset of functions, and C compilers can enforce those guidelines to reject code use of those functions.

Metrics: The effectiveness for static analysis is also affected by the trade-offs made by the tool designer regarding the time required for the analysis, the expert help required to support the tool’s analysis, and the completeness of the analysis. Most static analysis tools use heuristics to identify likely vulnerabilities and to allow completion of their analysis within useful times. Thus static analysis is almost always incomplete and rather than showing flaws automatically, the output of such tools frequently serve as aids for an analyst to help them zero in on security-relevant portions of code so they can find flaws more efficiently.

Table 7: Static Analysis Measures

If coding guidelines are used, has static analysis that enforces those guidelines been universally applied?

Are the tools used and the rules applied appropriate for the coding risks identified during the security risk analysis?

If formal inspections and in-depth static analysis have been applied to only a subset of the components, does that coverage include the code that manages the critical risks identified by security risk analysis?

Have the cross-component data flows with security risks been subject to static and dynamic analysis?

Sub-Goal: Test, validate, and verify software security risk mitigations.

Practice: In some instances testing is the only time that dynamic analysis applied. Some kinds of security problems are easier to detect dynamically than statically, especially problems with remote data and control flow. Testing can also supplement static analysis in confirming that the developers did not overlook some insecure programming practices.

Outputs: Output includes testing plans, outputs of tests, and analysis of the results. For example, distinguishing true vulnerabilities from those that cannot be exploited can be easier when dynamic analysis and static analysis are combined.

Security requirements can be positive in terms of what a system should do or negative in terms of what it should not do. The requirements for authentication, authorization, encryption, and logging are positive requirements and can be verified by creating the conditions in which those requirement are intended to hold true and confirming from the test results that the software meets them. A negative requirement states that something should never occur. To apply the standard testing approach to negative requirements, one would need to create every possible set of conditions, which is infeasible. Risk-based tests target the weaknesses and mitigations identified by the security risk analysis and can confirm the level of confidence we should have that the security risks have been sufficiently mitigated. Testing metrics are shown in Table 8.

Table 8: Testing Measures

What percentage of software security requirements are covered by testing?

What percentage of the security risk mitigations are covered by testing?

Does security testing include attack patterns that have been used to compromise systems with similar designs, functionality, and attack surfaces? Tests should have developed based threats, vulnerabilities, and assumptions uncovered by the security analysis. For example, tests could be developed to validate specific design assumptions or the interfaces with external software components.

Have dynamic security analysis and security testing been applied to mitigations?

Has test coverage increased in risky areas identified by the analysis? For example, a specific component, data flow, or functionality may be more exposed to untrusted inputs, or the component may be highly complex, warranting extra attention.

Assembling the Software Assurance Case

Using the Software Assurance Framework (SAF), which provides a structure of best practices, we have created a line of site from the software assurance goal to potential metrics that would provide evidence about how the goal is addressed through good software engineering practices. Each organization will need to select a starting set of evidence in which there is justification to invest time and effort, and these may vary by the type of technology product to be acquired or developed, since concerns for software assurance will vary depending on usage.

At each technical review throughout the acquisition and development lifecycle, activity progress, outputs, and metrics should be reviewed and evaluated to confirm that progress is being made to address the sufficiency for software assurance. Each review should consider the security aspects of the solution as well as the functional capabilities of the target outcome. For each engineering review, the following result should be supported by the gathered evidence:

  • Initial Technical Review (ITR). As-sess the capability needs (including security) of the materiel solution ap-proach.
  • Alternative Systems Review (ASR). Ensure that solutions will be cost ef-fective, affordable, operationally ef-fective, and can be developed in a timely manner at an acceptable level of software security risk.
  • System Requirements Review (SRR). Ensure that all system re-quirements (including security) are defined and testable, and consistent with cost, schedule, risk (including software security risk), technology readiness, and other system con-straints.
  • Preliminary Design Review (PDR). Evaluate progress and technical ade-quacy of the selected design approach.
  • Critical Design Review (CDR). De-termine that detail designs satisfy the design requirements (including soft-ware security) established in the spec-ification and establish the interface re-lationships.


  1. Jones, Caper and Bonssignour, Oliver. The Economics of Software Quality. s.l. : Addison-Wesley Professional, 2011.
  2. Committee on National Security Systems. National Information Assurance (IA) Glossary CNSS Instruction (CNSS Instruction No. 4009). Fort George G. Meade, MD : s.n., 2010.
  3. Alberts, Christopher J. and Woody, Carol C. Prototype Software Assurance Framework (SAF): Introduction and Overview. [Online] 2017. CMU/SEI-2017-TN-001.
  4. Andrew Requirements and Architectures for Secure Vehicles. Whalen, Michael W., Cofer, Darren and Gacek, Andrew. 4, 2016, IEEE Software, Vol. 33.
  5. Howard, Michael and Lipner, Steve. The Security Development Lifecycle. s.l. : Microsoft Press, 2006.
  6. Heartbleed 101. Carvalho, Marco, et al. 4, s.l. : IEEE, July-August 2014, Security & Privacy, Vol. 12, pp. 63-67.
  7. Consortium for IT Software Quality. CISQ Specifications for Automated Quality Characteristic Measures. Consortium for IT Software Quality. [Online] 2012. 2012/09/CISQ-Specification-for-Automated-Quality-Characteristic- Measures.pdf.
  8. Jones, Capers. Software Quality in 2012: A Survey of the State of the Art. Nancook Analytics LLC. [Online] 2012.
  9. Kelly, Tim P. Arguing Safety – A Systematic Approach to Safety Case Management. Department of Computer Science Report YCST, York University. May 1999. DPhil Thesis.
  10. Davis, Noopur & Mullaney, Julia. The Team Software Process (TSP) in Practice: A Summary of Recent Results (CMU/SEI-2003-TR-014). Software Engineering Institute, Carnegie Mellon University, 2003.

Want to find out more about this topic?

Request a FREE Technical Inquiry!