Required Evaluations

March 27, 2024

Earned Trust through AI System Assurance

Developing regulatory requirements for independent evaluations, where warranted, provides a check on false claims and risky AI, and incentivizes stronger evaluation systems.²³⁹ This view is captured in a recent civel society report expressing commonly held suspicions of “any regulatory regime that hinges on voluntary compliance or otherwise outsources key aspects of the process to industry.”²⁴⁰ One suggestion commenters made was that government should require internal impact assessments, rather than independent audits, for high-risk AI systems.²⁴¹ Some commenters recommended mandatory audits²⁴² and/or “red-teaming”²⁴³ in the particular context of foundational models that they fear may exhibit “dangerous capabilities.”

We acknowledge the arguments against audit requirements in general²⁴⁴ and especially if imposed without reference to risk.²⁴⁵ The arguments against required evaluations include the dearth of standards and the costs imposed especially on smaller businesses.²⁴⁶ According to one commenter, the cost drivers are “technical expertise,” “legal and standards expertise,” “deployment and social context expertise,” “data creation and annotation,” and “computational resources.”²⁴⁷

The costs of mandatory audits can be managed. Commenters recommended the following cost de-escalators, which are captured in other parts of this Report:

Create a modular governance system for AI, with a risk assessment standards board, to deduplicate costs for developing audit standards;²⁴⁸
Standardize “structured transparency” such that auditors may only ask specific questions rather than obtaining all the underlying data;²⁴⁹
Build on internal accountability requirements;²⁵⁰ and
Provide industry association or governmental compliance assistance.²⁵¹

²³⁹ AFL-CIO Comment at 5 (voluntary evaluations insufficient); Farley Comment at 19 (“[M]arket incentives likely tilt towards incentivizing lax audits if there is any market effect at all,” and, therefore, “government has a role to play in bolstering auditors’ independence and ensuring adequate audits.); Protofect Comment at 8 (“[T]here are few incentives for companies to conduct external audits unless required by law or demanded by their clients or partners).

²⁴⁰ Accountable Tech, AI Now, and EPIC, supra note 50, at 4. See also CAP Comment at 9 (citing Microsoft, Empowering responsible AI practices) (existing “sparse patchwork of voluntary measures proposed and implemented by industry” is not sufficient). But see OpenAI Comment at 2 (At least “on issues such as pre-deployment testing, content provenance, and trust and safety,” voluntary commitments should suffice.).

²⁴¹ See, e.g., BSA | The Software Alliance Comment at 2 (advocating mandatory impact assessments for both developers and deployers).

²⁴² GovAI Comment at 9 (recommending requiring “developers of foundation models to conduct third-party model and governance audits, before and after deploying such models”).

²⁴³ Anthropic Comment at 10; ARC Comment at 6 (“It could be important for legislators, regulators, etc. to require measurement of potential dangerous capabilities before training and/or deployment of models that are much more capable than the current state of the art.”); Shevlane, supra note 228, at 7 (“Industry standards or regulation could require a minimum duration for pre-deployment evaluation of frontier models, including the length of time that external researchers and auditors have access.”)..

²⁴⁴ See, e.g., HRPA Comment at 7-8 (There should be no third-party assessments or audits required at this time in the employment context, because “[m]ature, auditable, and accepted standards to evaluate bias and fairness of AI systems do not yet exist …” and might be overly burdensome, deepen mistrust in such systems, and potentially violate IP rights); AI Audit Comment at 2 (policy focus should be on internal assessments rather than bureaucratic checklists); Business Roundtable Comment at 12 (Government should let the industry engage in self-assessments and should not impose uniform requirements for third party assessments); Developers Alliance Comment at 12 (“AI accountability measures should be voluntary, and risk should be self-assessed”); Blue Cross Blue Shield Association Comment at 3 (“[T]hird-party audits are immature as a mechanism to detect or mitigate adverse bias”); James Madison Institute at 6; TechNet Comments at 3 (TechNet members believe that it is premature to mandate independent third-party auditing of artificial intelligence systems).

²⁴⁵ See, e.g., Salesforce Comment at 5-6; SIFMA Comment at 4.

²⁴⁶ See, e.g., U.S. Chamber Technology Engagement Center Comment at 10 (estimating audit costs at “hundreds of thousands of dollars.”). But see Certification Working Group (CWG) Comment at 19 (costs are modest relative to costs to overall development costs, and small compared to technology’s impact); Protofect Comment at 9 (costs vary widely depending on company size, data complexity, importance of AI to the product; having tiers of auditing can reduce costs).

²⁴⁷ HuggingFace Comment at 12.

²⁴⁸ See Riley and Ness Comment at 14.

²⁴⁹ See, e.g., OpenMined Comment at 4. See also GovAI Comment at 9 (recommending government fund “research and development of structured transparency tools”).

²⁵⁰ See, e.g., Centre for Information Policy Leadership Comment at 31.

²⁵¹ See Georgetown University Center for Security and Emerging Technology Comment at 15.

Program

Artificial Intelligence

Breadcrumb