Dyff | AI Audit Platform

What you can do with Dyff

Create assessments

Create rich AI assessments using standard Python data science tools like Jupyter and PyArrow. Develop your assessments locally, then upload the final version to run against any AI system hosted on Dyff.

from dyff.client import Client
client = Client()

dataset = client.datasets.create_arrow_dataset("/my/local/data")
client.datasets.upload_arrow_dataset(dataset, "/my/local/data")

jupyter_notebook = client.modules.create_package("/my/jupyter/proj")
client.modules.upload_package(jupyter_notebook, "/my/jupyter/proj")

Run assessments

Dyff orchestrates the computational resources for scalable assessment pipelines using Kubernetes. You can run the AI system-under-test on Dyff, too, enabling completely private assessments where data never leaves the platform.

from dyff.client import Client
client = Client()
systems_to_assess = client.inferenceservices.query(
    account="public", labels={"task": "text-completion"}
)
for system in systems_to_assess:
    ev = client.evaluations.create(
        {"dataset": dataset.id, "inferenceService": system.id, ...}
    )
    sc = client.safetycases.create(
        {"method": "my-ipynb", "inputs": {"completions": ev.id}, ...}
    )

Publish results

Easily publish the HTML-rendered output of your assessment notebooks hosted on the Dyff Web app. Assessment code and input data always stays private.

from dyff.client import Client
client = Client()

safetycase = client.safetycases.query(
    method="my-ipynb", inferenceService="system-under-test"
)[0]

client.safetycases.publish(safetycase.id, "public")

import webbrowser
webbrowser.open(f"https://app.dyff.io/reports/{safetycase.id}")

Why Dyff?

Assessment integrity

Safety assessment results are meaningless if the system under test has been trained on the test data. Dyff protects test data and selectively exposes safety assessment results so developers can't game the test.

Assessment lifetime

Dyff assessments are long-lived. Dyff is uncompromising on reproducibility, stores every parameter and result from every test run, and protects test data to preserve its validity over time.

Assessor viability

Dyff demonstrates a path to an economically sustainable evaluation ecosystem by providing a platform where assessors can develop, publish, and eventually market their assessments.

What's the Dyfference?