Test and evaluation – what is it and why do I care?

INSIGHT BY BELCAN

Federal Insights

Test and evaluation – what is it and why do I care?

This content is provided by Belcan Test and Evaluation (T&E), what’s the big deal? I get in my car in the morning and it works. I pick up my cell phone...

September 11, 2018 3:41 pm

This content is provided by Belcan

Test and Evaluation (T&E), what’s the big deal? I get in my car in the morning and it works. I pick up my cell phone and it connects almost every time. I mean if the engineer is doing their job right, the systems work, right? In theory, sure. But as Yogi Berra said, ”In theory there is no difference between theory and practice. In practice there is.” (https://www.brainyquote.com/quotes/yogi_berra_141506) As engineers we follow design guidelines, best practices, and government/commercial standards when we design a system, whether it is for commercial, military, or other government agency use. Nevertheless, that doesn’t mean it will work in every situation. We see this every day – product recalls, rebooting our computers when they freeze (or even worse get the Blue Screen of Death!), news stories where autonomous vehicles are involved in deadly accidents, and the list goes on.

Great, so we can’t design for every instance, big deal, just use the system where it works, right? So where does the system work and where does it not work? That is where T&E comes in! When we train to use a system, the old adage in the Department of Defense (DoD) is, “train as we fight.” That means we use systems, whether they are weapon systems or commercial products, in the environment we expect them to operate in. When we test a system, we push it to the edge of where we need it to operate. While we design a computer server to operate in a controlled temperature environment at 68 degrees Fahrenheit, we expect that the environment will vary between 55 degrees Fahrenheit and 80 degrees Fahrenheit. In extreme situations, we need it to operate between 30 degrees Fahrenheit and 110 degrees Fahrenheit. So when we test the server, we want to see how low it can go and still operate – can it operate at zero degrees Fahrenheit? Likewise, can it operate at 150 degrees Fahrenheit? Can it operate in extreme humidity? How long can it operate in these extremes? If we have a catastrophic event that causes us to lose our environmental systems, how long can we continue to operate before we have to shut the system down, which if we are a commercial enterprise, can cost millions of dollars.

This same logic goes for your home products. Remember the hover board problems. The battery design could overheat causing the hover board to catch fire and burn down the house. While I don’t know what level of testing was conducted, that appears to be a case where rigorous testing should have caught the potential problem. Your vehicle goes through rigorous testing. We have all seen the crash dummy commercials where every part of your car is tested to make sure it not only operates under normal conditions, but also in extreme conditions, like in an accident, wet roads, etc.

So what about government systems? There is a huge liability if government operated systems fail. The body scanners at the airport underwent a thorough test program within the Department of Homeland Security (DHS) before they were deployed. And even afterwards, they had to make adjustments to make sure they didn’t disclose too much information (who would have thought they would show the details under someone’s clothes). Our public safety systems have to operate reliably all the time – having the system fail is not an option.

Our military systems likely undergo the most rigorous test process, as they should! These systems are used by our soldiers, sailors, airmen, marines and guardsmen on the front line every day. Given the importance of their mission, they need the most reliable systems that we can provide them. So how do we do that? And what is the difference between commercial testing, government testing, and DoD testing? All are great questions!

[So for T&E professionals, I ask for a little leeway here. This is really intended as an entry-level document. Describing in detail the T&E processes would take a book, not a paper. I am covering this at a high level, so I am simplifying the process a lot!!]

Let’s start with DoD testing. The DoD uses a very formalized rigorous T&E process, and has a plethora of organizations involved in T&E. Different organizations have different roles within the T&E process. Testing occurs throughout a system’s development. It begins with experimentation and technology demonstrations, such as those sponsored and conducted by the Defense Advanced Research Projects Agency (DARPA). While not as formal as other T&E events, these early tests help shape the technologies and capabilities that will ultimately become military capabilities. Once a technology has matured to the point of entering the formal system acquisition process, it also begins the formal T&E process. It starts early in the systems development by creating a Test and Evaluation Master Plan (TEMP). The TEMP is reviewed and approved by senior DoD officials both within the Services and the Office of the Secretary of Defense (OSD). The first formal testing is Developmental T&E. Developmental T&E starts early in the acquisition lifecycle (though some will argue not early enough). It evaluates the performance of a system relative to its specification. While usually conducted with the system developer, it is done with government oversight and should include actual operators. At the end of the Developmental Test, the government should know how well the system performs against the design requirements. But that is not enough. Just because a system operates the way it was designed to doesn’t mean it will be effective in the operational environment. That is where Operational T&E comes into play. Operational T&E determines the effectiveness of the weapon system to perform its intended mission. That may be different from the design specification. For example, a specification may talk about how well a communications system should operate. But if the specification fails to address that it must be carried by a soldier with a full pack and the system weighs too much, then it may meet its specification, but not be operationally effective. (That is just a made up example – I have real world ones that would be inappropriate to share). Operational T&E is a very formalized process at the DoD. Depending on the system, the results can be approved within the service, or may have to go to the Director of Operational Test and Evaluation (DOT&E). Congress is often interested in the results because they can have significant implications for both budgets and defense operations. Within Operational T&E, there are sub-elements, such as Live-Fire T&E, which subjects weapon systems to a “live-fire” environment to evaluate both lethality and vulnerability. DOT&E also oversees Joint T&E, which is non-acquisition T&E to solve operational warfighter issues using Operational T&E methodologies. Joint T&E creates Concepts of Operation and Tactics, Techniques, and Procedures that allow the military to use current capabilities to address new operational challenges. DOT&E also conducts assessments of operational suitability, which I call the “ilities” – reliability, availability, maintainability, supportability, and compatibility. If it can perform its mission and survive (Operational Effectiveness) and it can operate in its environment (Operational Suitability), then the system “passes” its Operational Test. Of course, I have talked a lot about testing, but the “E” in T&E is just as important. DoD conducts an evaluation of the test results. It is a formal process that involves comparing the test results with the performance metrics. It is not just a “pass or fail” assessment. If the system works, how well does it work? Is it an improvement over current systems? Evaluating the performance is critical to determining whether a system is ready for procurement and fielding.

DOD Test and Evaluation Organizations (Source: www.dau.mil)

Non-DoD government organizations use a wide range of test approaches depending on their specific areas of concern. DHS, which encompasses a number of public safety organizations, uses approaches that are similar to the DoD, though tailored to their mission set. Within DHS, the Office of Test and Evaluation (OTE) oversees the conduct of T&E on DHS’ major acquisition programs to ensure they are reliable, interoperable and effective. DHS tests everything, from Coast Guard cutters, to our nuclear detection capabilities employed by the Domestic Nuclear Detection Office (DNDO), to the databases used by US Immigration and Customs Enforcement (ICE). Their process begins with a TEMP and moves through both Developmental and Operational T&E, very similar to DoD. This is not surprising since much of the DHS T&E leadership also have DoD T&E experience. They leveraged what they knew worked and adjusted it for their environment. As an example, while the DNDO uses Developmental and Operational T&E like the DoD, they also have a Rapid Test program for fielded systems that performs similar to the Joint T&E process where they use a “tiger team” that rapidly develops a solution to a problem and then conducts a quick test to verify the problem is resolved. The Department of Energy has a varied T&E program. In many cases, such as testing our nuclear arsenal, they rely on modeling and simulation (M&S). The Department of Transportation is responsible for everything from roadways to our National Airspace System (NAS). Their test infrastructure is just as varied, but common to all of these organizations is a formalized process to both test and assess (evaluate) how well the systems they provide the public will operate safely and provide the services they were intended to provide.

All of this leads us to commercial systems, which is the stuff we use every day. Commercial testing can come in a wide variety of forms. Some organizations use a very formal T&E process akin to the government. Others employ user groups or beta testers. Think of the challenge of testing the Windows Operating System (OS). Microsoft claims that more than 400 million devices are running Windows 10 in 192 countries. Because the Windows OS runs on different platforms with different application software on different networks and security protocols, there are as many different configurations of Windows 10 as there are devices running Windows 10! I once heard a Microsoft tester talk at a conference where he described how Microsoft deploys a structured software development process and conducts extensive internal testing, but that they were very reliant on beta testers because there was no way they could test every possible configuration of Windows. Boeing’s commercial aircraft groups embed the test engineers with the developers. It provides a novel approach to making sure that they instill quality in every step of the process. Given the importance of aircraft safety, and the liability of failure, they are motivated to get it right! Other organizations use independent testers, such as Underwriter’s Laboratory (UL), to conduct assessments of their products and provide a level of confidence to the consumers of their products. There are as many commercial models for conducting testing as there are markets for commercial products. They range from very formal (like commercial aircraft and automobiles) to minimal. It all depends on the product and the market.

So there you have it. A high-level overview of T&E. It touches every part of your day, from the chair you sit on, to the food you eat, to your computer and your phone. It is an unseen and forgotten part of our everyday existence. However, it is critical to ensuring that what we use, and what those we love use, is safe and reliable. When the “next big thing” comes out, remember a T&E professional probably used it first!