Skip to content
    Back to writing
    November 12, 2024 · updated May 9, 2026 · 6 min read

    Healthcare-AI procurement is its own skill. The Texas AG settlement proved why.

    Healthcare-AI procurement is its own skill. The Texas AG settlement proved why — by Thomas Jankowski, aided by AI
    Marketed claims versus operational reality— TJ x AI

    The Texas Attorney General settlement with a healthcare-AI vendor in September 2024 was, for the procurement-class reading the case carefully, the first publicly visible enforcement action that named the gap between vendor-marketed AI accuracy claims and operational accuracy in production. The settlement did not impose substantial penalties relative to the company's revenue. The settlement did require the company to substantially modify its public claims and to disclose the actual performance metrics underlying its product. The procurement-class read on the settlement was that the gap between marketed performance and actual performance was large enough, and persistent enough, to warrant state-level enforcement intervention. The healthcare-IT procurement committees that had approved the vendor for clinical deployment had not, evidently, done the kind of due diligence the settlement implied was required.

    This is a case-study piece. The intent is to walk the settlement at the level of public observation, name the structural procurement-process gap it surfaced, and walk three pattern examples of what healthcare-AI procurement looks like when the procurement-class actually does the work the settlement implied was missing.

    What the settlement surfaced

    The settlement publicly available in late 2024 covered a healthcare-AI vendor whose product was deployed in clinical environments with marketing claims about accuracy that were not, on the AG's reading, supported by the operational performance data the vendor's own infrastructure had captured. The vendor had been sold into multiple major U.S. health systems through standard healthcare-IT procurement processes. Those processes had evidently relied on the vendor's accuracy claims without independent verification.

    The structural problem the settlement surfaced is that healthcare-IT procurement committees were built for a different generation of technology. The committees evaluate a candidate vendor against criteria like security posture, integration complexity, operational reliability, support availability, contract structure. These are appropriate criteria for traditional healthcare-IT (an EHR module, an interoperability tool, a billing-and-coding system) where the product's behavior is largely deterministic and the vendor's operational claims are testable through standard QA processes.

    Healthcare-AI products are not deterministic in the same way. The accuracy claims depend on the model's behavior on the specific clinical input distribution the deploying health system encounters, which may differ from the input distribution the vendor used to validate the claims. The operational behavior in production can drift from the validation behavior over time as the input distribution shifts, the underlying model is updated, or the use-case scope expands. The traditional healthcare-IT procurement committee does not typically know to ask the questions that would surface these issues.

    The Texas AG settlement was the visible signal that this structural gap had produced a real and enforceable problem. The procurement-class reading the settlement should treat it as a generalized warning about the existing committee structure, not as a one-off vendor problem.

    Why healthcare-AI procurement is structurally different

    Healthcare-AI procurement requires evaluation criteria that the existing committees do not typically apply. The procurement-class needs to evaluate the model's accuracy on the specific clinical input distribution the deploying health system will encounter, not on the vendor's validation cohort. The procurement-class needs to evaluate the operational drift posture the vendor commits to, including how often the model is updated, what testing happens before each update, and what validation is done with the deploying health system before the update is rolled out. The procurement-class needs to evaluate the failure-mode handling, including what happens when the model is uncertain, when the model is wrong, when the model is operating outside its validated input distribution.

    These criteria require domain expertise the existing healthcare-IT procurement committees usually do not have at the table. The clinical informatics leader, the chief medical information officer, the data-science-and-evaluation team, the legal-and-compliance class with healthcare-AI specific knowledge: these are the relevant participants for healthcare-AI procurement. Most committees as assembled in 2024 did not include all of them.

    Three pattern examples of doing the procurement well

    The first pattern is the parallel-validation pattern. The procurement-class commits to running an independent validation of the vendor's accuracy claims on the deploying health system's own data, before the production deployment, with the vendor cooperating on the validation methodology. The validation produces site-specific accuracy metrics that are independent of the vendor's marketing claims. The procurement decision is made against the validation results. Health systems running this pattern typically have a data-science-and-evaluation function with the capacity to run the validation, and the procurement timeline is six to twelve months longer than a traditional healthcare-IT procurement to accommodate the validation work. The pattern catches the marketed-vs-operational accuracy gap before deployment, not after.

    The second pattern is the staged-deployment-with-monitoring pattern. The procurement-class deploys the vendor in a limited clinical scope first (a single department, a single use case, a small patient cohort) with monitoring infrastructure that tracks the model's actual operational behavior in production. The deployment scope expands incrementally as the monitoring data validates the model's performance at each stage. The pattern catches drift, scope-creep, and unexpected failure modes before they affect a broader patient population. The infrastructure cost is meaningful. The risk reduction is substantial.

    The third pattern is the contract-structure pattern. The procurement-class negotiates contract terms that include performance-based clauses tied to the operational accuracy metrics, with explicit language about what happens when the model's accuracy drifts below the committed threshold. The contract typically includes a vendor-side commitment to performance reporting, an independent-audit provision, and a remediation-and-termination clause. The pattern shifts the risk-allocation between the health system and the vendor in a way that the standard healthcare-IT contract does not. Vendors who refuse to negotiate on the performance clauses self-select out of the procurement; vendors who accept them tend to be the ones with stronger operational performance.

    The three patterns are not mutually exclusive and the strongest healthcare-AI procurements run all three together. The procurement timeline is longer, the staffing requirement is heavier, and the resulting deployment is substantially more reliable. The cost-benefit calculation, given the kind of enforcement risk the Texas AG settlement made visible, favors running the heavier procurement.

    What the procurement-class should take from this

    Healthcare-AI procurement is its own skill. The healthcare-IT procurement committees that worked for traditional vendor categories do not, as currently constituted, do the work that healthcare-AI procurement requires. The Texas AG settlement was the first publicly visible signal that this gap has consequences. There will be more enforcement actions in the next 24-36 months. The procurement-class that reads the settlement carefully and modifies the procurement process accordingly will catch the marketed-vs-operational accuracy gap before it produces an enforcement-class problem at their own institution. The procurement-class that does not modify the process will face the same structural problem the Texas case surfaced, with the next AG action being the visible signal at their own health system.

    The skill is learnable. The patterns are documented. The vendors who deliver on their accuracy claims will welcome the heavier procurement; the vendors who do not will object to it. Both signals are useful. The procurement-class running the heavier process gets better information and produces better outcomes. The settlement made the case for that process more visible than the procurement literature had managed to make it before. The case is now made.

    —TJ