This seems like a solution looking for a problem. Can't you just share your model's hash when releasing it? This is exactly what happens when someone like Mistral share a magnet link to their model. It's just a hash.
> Finally, the statement itself contains subjects which are a list of (file path, digest) pairs a predicate type set to https://model_signing/signature/v1.0 and a dictionary of predicates. The idea is to use the predicates to store (and therefor sign) model card information in the future. The verification part reads the sigstore bundle file and firstly verifies that the signature is valid and secondly compute the model's file hashes again to compare against the signed ones.
It’s important to remember that these models tend to be released as multiple files so a single hash is insufficient (unless you do a hash of hashes).
We need remote models hosted in enclaves with remote attestation and end to end cryptography for inference. Then you can prove client-side that an output from a model was private, and direct without tampering by advertizers, censors, or propagandists.
Source: I have a relationship with OpenSSF but not directly involved. I'm involved in a "competing" standard.
As other commenters pointed out this is "just" a signature. However, in the absence of standardised checks, this is a useful intermediate way of addressing the integrity issue around ML supply chain; FWIW today.
Eventually, you want to move to more complete solutions that have more elaborate checks, e.g. provenance of data that went into the model, attested training. C2PA is trying to cover it.
Inference time attestation (which some other commenters are pointing out) -- how can I verify that the response Y actually came from model F, on my data X, Y=F(X) -- is a strongly related but orthogonal problem.
This lets you verify the signature on the model. It won’t help you tell that a decision came from that model. If you want to verify the inference that a model makes, check out https://github.com/zkonduit/ezkl (our project).
Personally (i know practically nothing about signing lol) i’m wondering how much actual users are going to use this. I kind of wonder if it’s gonna be gonna be kind of like a hash. Or is this going to be integrated into model software?
This is amazing to see from Sigstore! Looking forward to more ML specific features in the coming months!
Also looking forward to reading through the SLSA for ML PoC and seeing how it evolves. I was planning to use Witness for model training but wasn't sure how it would work for such a long and intensive process.
Could be.. let's say you deploy a version of a model that was trained by some bad actors to give some wrong output, you won't have a method to verify it without the hashing technique.
This seems like a solution looking for a problem. Can't you just share your model's hash when releasing it? This is exactly what happens when someone like Mistral share a magnet link to their model. It's just a hash.
That’s exactly what this is:
> Finally, the statement itself contains subjects which are a list of (file path, digest) pairs a predicate type set to https://model_signing/signature/v1.0 and a dictionary of predicates. The idea is to use the predicates to store (and therefor sign) model card information in the future. The verification part reads the sigstore bundle file and firstly verifies that the signature is valid and secondly compute the model's file hashes again to compare against the signed ones.
It’s important to remember that these models tend to be released as multiple files so a single hash is insufficient (unless you do a hash of hashes).
Signing models is a start, but not enough.
We need remote models hosted in enclaves with remote attestation and end to end cryptography for inference. Then you can prove client-side that an output from a model was private, and direct without tampering by advertizers, censors, or propagandists.
Source: I have a relationship with OpenSSF but not directly involved. I'm involved in a "competing" standard.
As other commenters pointed out this is "just" a signature. However, in the absence of standardised checks, this is a useful intermediate way of addressing the integrity issue around ML supply chain; FWIW today.
Eventually, you want to move to more complete solutions that have more elaborate checks, e.g. provenance of data that went into the model, attested training. C2PA is trying to cover it.
Inference time attestation (which some other commenters are pointing out) -- how can I verify that the response Y actually came from model F, on my data X, Y=F(X) -- is a strongly related but orthogonal problem.
This lets you verify the signature on the model. It won’t help you tell that a decision came from that model. If you want to verify the inference that a model makes, check out https://github.com/zkonduit/ezkl (our project).
Personally (i know practically nothing about signing lol) i’m wondering how much actual users are going to use this. I kind of wonder if it’s gonna be gonna be kind of like a hash. Or is this going to be integrated into model software?
This is amazing to see from Sigstore! Looking forward to more ML specific features in the coming months!
Also looking forward to reading through the SLSA for ML PoC and seeing how it evolves. I was planning to use Witness for model training but wasn't sure how it would work for such a long and intensive process.
Is this a problem today?
Could be.. let's say you deploy a version of a model that was trained by some bad actors to give some wrong output, you won't have a method to verify it without the hashing technique.