With Congress recommending both guardrails and a “full steam ahead” mindset for federal artificial intelligence deployments, agencies will feel the pressure to deliver AI-enabled services to citizens quickly. But how do they know their bots will not introduce harm and put individual team members, their organizations and the citizens they serve at risk?
Government agencies have an obligation to provide accurate information to citizens, and a bad bot can have both legal and moral implications. Last year, for example, the IRS was cited by the Government Accountability Office for its use of AI in flagging tax returns for audit, after the technology was found to possibly include unintentional bias. The IRS had humans in the loop with this system, but other guidance from the executive order and other directives appeared not to have been implemented at the time the potential for bias was discovered.
The IRS incident is a reminder of how important it is for agencies to do everything possible to avoid risk to citizens and safeguard government and personal data, before risk becomes reality. That may sound daunting, but federal guidance and frameworks highlight what is needed, including understanding AI risks, having DevOps and DevSecOps teams operate concurrently, establishing an independent red team that ensures the model delivers the highest quality results, and more, even if details on how to do this are not as clear. However, leaning on best practices already defined in data security and software development overall provides a clear path for what is needed to ensure AI does not introduce risk.
Keep risk front and center
Validating AI can be daunting because many AI models make a tradeoff between accuracy and explainability — but it’s necessary to mitigate risk. Start by asking questions that quality assurance (QA) would ask about any application. What’s the risk of failure, and what’s the potential impact of that failure? What potential outputs could your AI system produce? Who could it present them to? What impact might that have?
A risk-based approach to application development isn’t new, but it needs to be reinforced for AI. Many teams have become comfortable simply producing or buying software that meets requirements. Additionally, DevOps processes embed quality and security testing into the process from the beginning. But since AI requires taking a hard look at ways the system might “misbehave” from its intended use, simply applying the current QA processes is the wrong approach. AI cannot simply be patched if it makes a mistake.
Adopt an adversarial mindset
Red teams are routinely deployed to uncover weaknesses in systems and should be used to test AI, but not in the same manner as with traditional application development. An AI red team must be walled off from the day-to-day development team and their success and failure.
AI red teams in government should include internal technologists and ethicists, participants from government-owned laboratories, and ideally, trusted external consultants — none of whom build or benefit from the software. Each should understand how the AI system may impact the broader technology infrastructure in place, as well as citizens.
AI red teams should work with an adversarial mindset to identify harmful or discriminatory outputs from an AI system along with unforeseen or undesirable system behaviors. They should also be looking specifically for limitations or potential risks associated with misuse of the AI system.
Red teams should be free of the pressures of release timing and political expectations and report to someone in leadership, likely the chief AI officer (CAIO), who is outside of the development or implementation team. This will help ensure the effectiveness of the AI model and align with the guardrails in place.
Rethink validation to development ratio
Advances in AI have brought massive improvements in efficiency. A chatbot that might have taken months to build can now be produced in just days.
Don’t assume AI testing can be completed just as quickly. Proper validation of AI systems is multifaceted, and testing time to development time ratio will need to be closer to 70% to 80% for AI rather than the typical 35% to 50% for enterprise software. Much of this uplift is driven by the fact that the requirements are often brought into sharp relief during testing, and this cycle becomes more of an “iterative development mini cycle” rather than a traditional “testing” cycle. DevOps teams should allow time to check training data, privacy violations, bias, error states, penetration attempts, data leakage and liabilities, such as the potential for AI outputs to make false or misleading statements. Additionally, red teams need their own time allotment to make the system misbehave.
Establish AI data guidelines
Agencies should establish guidelines for which data will and will not be used to train their AI systems. If using internal data, agencies should maintain a registry of the data and inform data generators that the data will be used to train an AI model. The guidelines should be particular to each unique use case.
AI models don’t internally partition data like a database does, so data trained from one source might be accessible under a different user account. Agencies should consider adopting a “one model per sensitive domain” policy if their organization trains AI models with sensitive data, which likely applies to most government implementations.
Be transparent about AI outputs
AI developers must communicate what content or recommendations are being generated by an AI system. For instance, if an agency’s customers will interact with a chatbot, they should be made aware the content is AI-generated.
Similarly, if an AI system produces content such as documents or images, the agency might be required to maintain a registry of those assets so that they can later be validated as “real.” Such assets might also require a digital watermark. While this isn’t yet a requirement, many agencies already adopt this best practice.
Agencies must continually monitor, red team, refine and validate models to ensure they operate as intended and provide accurate, unbiased information. By prioritizing independence, integrity and transparency, models built today will provide the foundation agencies need to improve operations and serve citizens while maintaining the public’s safety and privacy.
David Colwell is vice president of artificial intelligence and machine learning for Tricentis, a provider of automated software testing solutions designed to accelerate application delivery and digital transformation.
Will your AI bot put citizens at risk?
Government agencies have an obligation to provide accurate information to citizens, and a bad bot can have both legal and moral implications.
With Congress recommending both guardrails and a “full steam ahead” mindset for federal artificial intelligence deployments, agencies will feel the pressure to deliver AI-enabled services to citizens quickly. But how do they know their bots will not introduce harm and put individual team members, their organizations and the citizens they serve at risk?
Government agencies have an obligation to provide accurate information to citizens, and a bad bot can have both legal and moral implications. Last year, for example, the IRS was cited by the Government Accountability Office for its use of AI in flagging tax returns for audit, after the technology was found to possibly include unintentional bias. The IRS had humans in the loop with this system, but other guidance from the executive order and other directives appeared not to have been implemented at the time the potential for bias was discovered.
The IRS incident is a reminder of how important it is for agencies to do everything possible to avoid risk to citizens and safeguard government and personal data, before risk becomes reality. That may sound daunting, but federal guidance and frameworks highlight what is needed, including understanding AI risks, having DevOps and DevSecOps teams operate concurrently, establishing an independent red team that ensures the model delivers the highest quality results, and more, even if details on how to do this are not as clear. However, leaning on best practices already defined in data security and software development overall provides a clear path for what is needed to ensure AI does not introduce risk.
Keep risk front and center
Validating AI can be daunting because many AI models make a tradeoff between accuracy and explainability — but it’s necessary to mitigate risk. Start by asking questions that quality assurance (QA) would ask about any application. What’s the risk of failure, and what’s the potential impact of that failure? What potential outputs could your AI system produce? Who could it present them to? What impact might that have?
Join us Jan. 27 for our Industry Exchange Cyber 2025 event where industry leaders will share the latest cybersecurity strategies and technologies.
A risk-based approach to application development isn’t new, but it needs to be reinforced for AI. Many teams have become comfortable simply producing or buying software that meets requirements. Additionally, DevOps processes embed quality and security testing into the process from the beginning. But since AI requires taking a hard look at ways the system might “misbehave” from its intended use, simply applying the current QA processes is the wrong approach. AI cannot simply be patched if it makes a mistake.
Adopt an adversarial mindset
Red teams are routinely deployed to uncover weaknesses in systems and should be used to test AI, but not in the same manner as with traditional application development. An AI red team must be walled off from the day-to-day development team and their success and failure.
AI red teams in government should include internal technologists and ethicists, participants from government-owned laboratories, and ideally, trusted external consultants — none of whom build or benefit from the software. Each should understand how the AI system may impact the broader technology infrastructure in place, as well as citizens.
AI red teams should work with an adversarial mindset to identify harmful or discriminatory outputs from an AI system along with unforeseen or undesirable system behaviors. They should also be looking specifically for limitations or potential risks associated with misuse of the AI system.
Red teams should be free of the pressures of release timing and political expectations and report to someone in leadership, likely the chief AI officer (CAIO), who is outside of the development or implementation team. This will help ensure the effectiveness of the AI model and align with the guardrails in place.
Rethink validation to development ratio
Advances in AI have brought massive improvements in efficiency. A chatbot that might have taken months to build can now be produced in just days.
Don’t assume AI testing can be completed just as quickly. Proper validation of AI systems is multifaceted, and testing time to development time ratio will need to be closer to 70% to 80% for AI rather than the typical 35% to 50% for enterprise software. Much of this uplift is driven by the fact that the requirements are often brought into sharp relief during testing, and this cycle becomes more of an “iterative development mini cycle” rather than a traditional “testing” cycle. DevOps teams should allow time to check training data, privacy violations, bias, error states, penetration attempts, data leakage and liabilities, such as the potential for AI outputs to make false or misleading statements. Additionally, red teams need their own time allotment to make the system misbehave.
Establish AI data guidelines
Agencies should establish guidelines for which data will and will not be used to train their AI systems. If using internal data, agencies should maintain a registry of the data and inform data generators that the data will be used to train an AI model. The guidelines should be particular to each unique use case.
Read more: Commentary
AI models don’t internally partition data like a database does, so data trained from one source might be accessible under a different user account. Agencies should consider adopting a “one model per sensitive domain” policy if their organization trains AI models with sensitive data, which likely applies to most government implementations.
Be transparent about AI outputs
AI developers must communicate what content or recommendations are being generated by an AI system. For instance, if an agency’s customers will interact with a chatbot, they should be made aware the content is AI-generated.
Similarly, if an AI system produces content such as documents or images, the agency might be required to maintain a registry of those assets so that they can later be validated as “real.” Such assets might also require a digital watermark. While this isn’t yet a requirement, many agencies already adopt this best practice.
Agencies must continually monitor, red team, refine and validate models to ensure they operate as intended and provide accurate, unbiased information. By prioritizing independence, integrity and transparency, models built today will provide the foundation agencies need to improve operations and serve citizens while maintaining the public’s safety and privacy.
David Colwell is vice president of artificial intelligence and machine learning for Tricentis, a provider of automated software testing solutions designed to accelerate application delivery and digital transformation.
Copyright © 2025 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.
Related Stories
DoD considers faster acquisition pathway for AI
Why artificial intelligence will never replace your job
New House bill specifies how agencies should use artificial intelligence