Developmental Test, Evaluation and Assessment

About the Office of Developmental Test, Evaluation and Assessment

Developmental Test & Evaluation of Artificial Intelligence Enabled Systems

Developmental Testing and Evaluation of Artificial Intelligence

The Developmental Testing and Evaluation (T&E) of Artificial Intelligence enabled systems (AIES) faces significant challenges for the Department of Defense. This technology requires assessment methodologies and tools that are still emerging, while AI models and applications continue to evolve in more expansive ways for the Solider.

The key feature in the testing of Artificial Intelligence (AI) is that comprehensive testing has become challenging for many AI components or AI-enabled systems. ML systems often exhibit chaotic changes in response to small input variations, making comprehensive testing very challenging. Comprehensive testing for systems with a large state space is feasible only when the performance envelope can be described, allowing for interpolation between test points and extrapolation outside the tested region. Unfortunately, the absence of an underlying theory for ML systems makes this difficult. These systems are designed to be continuously evolving, making fixed configurations for testing a thing of the past. Under these conditions, traditional test design and strategy may be insufficient. This situation calls for an iterative approach to testing and assessment that continues even after fielding. Additionally, ML models are usually based on large, complex data sets, which themselves require careful evaluation.

This shift in testing dynamics impacts the relationship between 'test' (measurements) and 'evaluation' (assessments). While comprehensive testing and fixed configurations are unrealistic, the need for evaluation remains crucial. This requires a broader approach to ensure independent T&E evidence is gathered at the most effective points in the development processes to provide the greatest insights. These approaches will require the T&E professional to interpret these results applicability beyond specific test conditions.

In order to address these challenges, DTE&A is exploring and developing new methods in collaboration with the DOD, academic and industrial base to advance T&E of AIES practices. These include increased reliance on Modeling and Simulation (M&S), collaborating with contractors to streamline testing processes while protecting intellectual property, engaging with requirements communities to ensure feasibility, directly involving users in design considerations, and working closely with system designers to balance testing requirements with cost and schedule considerations.

These AIES advancements must be pursued balancing risk and fiscal responsibility. Expanding testing scope may lead to higher costs and longer schedules, so it is critical to research and mature DT&E practices in collaboration with DOD stakeholders to ensure the timely and effective T&E approaches will be applied.

Policy, Guidance and Emerging Guidance/Best Practices

DTE&A is collaborating with stakeholders, within and outside the DoD, to develop policy, guidance and to support the T&E of AIES.

Policy

DTE&A has begun to address the challenge of formalizing DT&E of AI Policy. This structure is necessary to provide the service level T&E organizations with clear and attributable expectations, ensuring the most critical processes and resources can be put in place to enable effective T&E of AIES. Formally approved DTE&A policy is not yet available, in the meantime please see the guidebook and emerging guidance.

Guidance: Developmental Test and Evaluation (DT&E) Of AI Guidebook

DTE&A has now published a guidebook that offers focused guidance and recommended practices for early and developmental test and evaluation (DT&E) of AI applications and AI-enabled systems. This guidebook builds upon the general T&E guidance provided by the Test & Evaluation Enterprise Guidebook, with a specific focus on the implications of AI models on test strategy, planning, preparation, execution, analysis, and reporting.

Download Document

Emerging Guidance/Best Practices

An Assurance Case Framework for Trustworthy AI and Autonomy

There is widespread interest and concern within the DoD regarding the test, evaluation, verification, and validation (TEV&V) of military systems with autonomous capabilities. For such systems to be fielded and used, senior decision makers must be sufficiently confident in the systems’ trustworthiness (e.g., safety, security, reliability, effectiveness) to authorize deployment. It isn’t enough that the system be trustworthy, or even that the developers know that it is trustworthy. They must be able to demonstrate to certifying authorities and military leaders that the system is sufficiently trustworthy. Employing forces must also understand any operational limits needed to ensure dependability, such as restrictions on geographic locations, weather conditions, or other environmental factors.

To support these decisions, developers and testers will need to produce effective assurance cases. An assurance case is a structured argument that a system is sufficiently trustworthy to permit fielding in a specific range of operational contexts. Existing standards and regulatory bodies already require explicit assurance cases for complex systems with regard to safety, cybersecurity, and reliability.

DTE&A has funded researchers to develop a framework for structuring and executing assurance cases for systems with autonomous capabilities, and to understand the implications of this framework for TEV&V.

The framework developed includes specification of necessary assurance arguments, identification of key evidence needed to support those arguments, corresponding measurements to produce that evidence, implied instrumentation needs, and resulting test infrastructure requirements.

SEPTAR I

The Department of Defense (DoD) is making sizable investments in Artificial Intelligence (AI) Research and Development (R&D) and acquiring AI through programs of record. Ensuring proper process execution enables these investments to be realized, especially the processes that ensure effective evaluation of the intended AI-enabled system (AIES). SEPTAR (Systems Engineering Processes to Test AI Right) presents benefits and best practices for proactive planning for Test and Evaluation (T&E) activities for AIES. By following these best practices, AIESs are more likely to be delivered on time, to meet budgetary goals, and to perform effectively to meet mission expectations.

Three major themes are targeted:

Broadening the T&E continuum
Defining data needs for AIES up front
Evaluating the Systems Engineering Life Cycle (SELC) to inform AIES trustworthiness

Future State of T&E of AIES

While of great potential benefit, Artificial Intelligence (AI) presents new challenges and exacerbates some existing ones for the Department of Defense (DoD) Test and Evaluation (T&E) community. T&E professionals will need to work to ensure that AI-enabled systems’ (AIES) complex and variable nature can be sufficiently characterized by the boundaries of acceptable performance. To help the DoD understand and prepare for the challenges of T&E of AIES, we convened a group of AI adoption, AI development, and policy experts to develop a future vision of T&E with respect to AI. The result is a vision of T&E that incorporates the unique requirements for AIES, encompassing policy changes, user engagement approaches, measures and metrics, data, infrastructure, and cybersecurity. This future vision is accomplishable by identifying focus for efforts across DoD, academia, Federally Funded Research and Development Centers (FFRDCs) and industry to provide processes, policy/standards, tools, data, and infrastructure. Thus, we can assure a more feasible future for the T&E of AIES.