Open science infrastructure for long-lived AI safety assessments
Create rich AI assessments using standard Python data science tools like Jupyter and PyArrow. Develop your assessments locally, then upload the final version to run against any AI system hosted on Dyff.
from dyff.client import Client
client = Client()
dataset = client.datasets.create_arrow_dataset("/my/local/data")
client.datasets.upload_arrow_dataset(dataset, "/my/local/data")
jupyter_notebook = client.modules.create_package("/my/jupyter/proj")
client.modules.upload_package(jupyter_notebook, "/my/jupyter/proj")
Safety assessment results are meaningless if the system under test has been trained on the test data. Dyff protects test data and selectively exposes safety assessment results so developers can't game the test.
Dyff assessments are long-lived. Dyff is uncompromising on reproducibility, stores every parameter and result from every test run, and protects test data to preserve its validity over time.
Dyff demonstrates a path to an economically sustainable evaluation ecosystem by providing a platform where assessors can develop, publish, and eventually market their assessments.
python3 -m pip install dyff