Soft Launch

Safety assessments are time consuming and expensive activities sometimes costing millions of dollars. Conversely, new products with profound safety impacts can be produced within 7 minutes and for less than a single dollar. Keeping up with the pace of digital safety requires safety assessments (e.g., audits, red teams, T&E programs, etc.) that can be applied as quickly as products are developed. To meet this challenge, we began developing the Dyff platform in 2022 as a technological bridge between the safety assessment, standards, and startup communities. It has been nearly two years of balancing the requirements of assessors, tech companies, and the public whose safety turns on whether these two groups structure their relationship in a scientifically rigorous and economically sound manner.

Now we are taking the first step in releasing our findings by publishing the infrastructure associated with our assessment program. Looking to make a difference at the intersection of computing and social impact, we named the system “Dyff.” This is a “soft opening” for supporting our close collaborators in an open setting and coincides with our publication presented during AAAI:

“AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards”

As we continue to explore the Dyff approach as detailed in the research paper, we will near the platform’s grand opening structured around the pre-release red teaming of a forthcoming large language model. At that point we will switch from working with “friends and family” [1] to a broader set of collaborators.

Acknowledgements

Audit report: We developed the proof-of-concept evaluation authority working closely with Arihant Chadda and Andrea Brennen at IQT Labs. Their previous work with Ricardo Calix, J.J. Ben-Joseph, and Ryan Ashley moved Dyff from abstract design thinking to solving the real world problems of assessors. The resulting programmatic audit report running on Dyff can be found here.
Research Code: A tremendous amount of work has gone into maturing the Dyff codebase to make it easy to run on local dev machines, easily deployable to multiple clouds, programmatically testable, documented, and more effort that goes above and beyond proof-of-concept robustness. Contributors to this effort include, Brett Weir, Natalie Poulin, and Emily Wright.
Evaluators: Additionally, forthcoming evaluations from Nick Judd, Md. Rafiqul Rabin, Homa Hosseinmardi, Austin Kozlowski, and [1] have greatly influenced new features that have landed since the original proof-of-concept work.

[1] This can be you! For more information, please email [email protected].