Sorry, you need to enable JavaScript to visit this website.
Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.

Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.

The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Required Evaluations

March 27, 2024
Earned Trust through AI System Assurance

Developing regulatory requirements for independent evaluations, where warranted, provides a check on false claims and risky AI, and incentivizes stronger evaluation systems.239 This view is captured in a recent civel society report expressing commonly held suspicions of “any regulatory regime that hinges on voluntary compliance or otherwise outsources key aspects of the process to industry.”240 One suggestion commenters made was that government should require internal impact assessments, rather than independent audits, for high-risk AI systems.241 Some commenters recommended mandatory audits242 and/or “red-teaming”243 in the particular context of foundational models that they fear may exhibit “dangerous capabilities.”

We acknowledge the arguments against audit requirements in general244 and especially if imposed without reference to risk.245 The arguments against required evaluations include the dearth of standards and the costs imposed especially on smaller businesses.246 According to one commenter, the cost drivers are “technical expertise,” “legal and standards expertise,” “deployment and social context expertise,” “data creation and annotation,” and “computational resources.”247

The costs of mandatory audits can be managed. Commenters recommended the following cost de-escalators, which are captured in other parts of this Report:

  • Create a modular governance system for AI, with a risk assessment standards board, to deduplicate costs for developing audit standards;248
  • Standardize “structured transparency” such that auditors may only ask specific questions rather than obtaining all the underlying data;249
  • Build on internal accountability requirements;250  and
  • Provide industry association or governmental compliance assistance.251

 


239 AFL-CIO Comment at 5 (voluntary evaluations insufficient); Farley Comment at 19 (“[M]arket incentives likely tilt towards incentivizing lax audits if there is any market effect at all,” and, therefore, “government has a role to play in bolstering auditors’ independence and ensuring adequate audits.); Protofect Comment at 8 (“[T]here are few incentives for companies to conduct external audits unless required by law or demanded by their clients or partners).

240 Accountable Tech, AI Now, and EPIC, supra note 50, at 4. See also CAP Comment at 9 (citing Microsoft, Empowering responsible AI practices) (existing “sparse patchwork of voluntary measures proposed and implemented by industry” is not sufficient). But see OpenAI Comment at 2 (At least “on issues such as pre-deployment testing, content provenance, and trust and safety,” voluntary commitments should suffice.).

241 See, e.g., BSA | The Software Alliance Comment at 2 (advocating mandatory impact assessments for both developers and deployers).

242 GovAI Comment at 9 (recommending requiring “developers of foundation models to conduct third-party model and governance audits, before and after deploying such models”).

243 Anthropic Comment at 10; ARC Comment at 6 (“It could be important for legislators, regulators, etc. to require measurement of potential dangerous capabilities before training and/or deployment of models that are much more capable than the current state of the art.”); Shevlane, supra note 228, at 7 (“Industry standards or regulation could require a minimum duration for pre-deployment evaluation of frontier models, including the length of time that external researchers and auditors have access.”)..

244 See, e.g., HRPA Comment at 7-8 (There should be no third-party assessments or audits required at this time in the employment context, because “[m]ature, auditable, and accepted standards to evaluate bias and fairness of AI systems do not yet exist …” and might be overly burdensome, deepen mistrust in such systems, and potentially violate IP rights); AI Audit Comment at 2 (policy focus should be on internal assessments rather than bureaucratic checklists); Business Roundtable Comment at 12 (Government should let the industry engage in self-assessments and should not impose uniform requirements for third party assessments); Developers Alliance Comment at 12 (“AI accountability measures should be voluntary, and risk should be self-assessed”); Blue Cross Blue Shield Association Comment at 3 (“[T]hird-party audits are immature as a mechanism to detect or mitigate adverse bias”); James Madison Institute at 6; TechNet Comments at 3 (TechNet members believe that it is premature to mandate independent third-party auditing of artificial intelligence systems).

245 See, e.g., Salesforce Comment at 5-6; SIFMA Comment at 4.

246 See, e.g., U.S. Chamber Technology Engagement Center Comment at 10 (estimating audit costs at “hundreds of thousands of dollars.”). But see Certification Working Group (CWG) Comment at 19 (costs are modest relative to costs to overall development costs, and small compared to technology’s impact); Protofect Comment at 9 (costs vary widely depending on company size, data complexity, importance of AI to the product; having tiers of auditing can reduce costs).

247 HuggingFace Comment at 12.

248 See Riley and Ness Comment at 14.

249 See, e.g., OpenMined Comment at 4. See also GovAI Comment at 9 (recommending government fund “research and development of structured transparency tools”).

250 See, e.g., Centre for Information Policy Leadership Comment at 31.

251 See Georgetown University Center for Security and Emerging Technology Comment at 15.