Editorial Summary :
This article contains my notes from my initial reading about privacy preserving ML for a hackathon project some 2y ago . Security is about ensuring only authorised people have access to something, while privacy has more to do with who is authorised . Most ML models are trained on a publicly available, unencrypted, and large datasets . The publicly available unencrypted data is accessible to almost anybody, hence it is neither secure nor private . At the very least, people who are training the model will have eyes on access to the data . In many domains, that can benefit from ML, privacy is of utmost importance . ML is, at its core, a way of pattern matching. Privacy in ML is not completely unheard of, it is not the most lucrative area of research at the moment, but it is starting to grow. It is not in a place to compete with the ‘State of the Arts’ yet, but some great progress have been made. It allows model trainers to develop model models while keeping privacy of the data owners in mind . Remote execution allows to train a model without explicitly handing them the data . Instead of you sharing your data, the model owner shares the model with various data owners . This is alright and expected because that is the point of ML model — to extract information from data . The real question is your privacy preserved in this exchange? The answer is no. Even if the data is securely with you and the eyes of model owners never really saw it, they can in some rare cases deduce some aspects of the data, even if we make the data owner anonymous . In federated learning, the results will be uploaded whenever available to the cloud . Data owners will reach out to data owners when they need it, and not listen for updates all the time . The above mentioned con can be overcome by using differential privacy in conjunction of remote execution . In differential privacy, in intellectual terms, we create an upper bound on the statistical uniqueness of the dataset aka privacy budget . The model owner can never be 100% sure, in cases such as the one mentioned above, that User X has a dog of the newly learnt species, because it could be due to noise . Privacy preserving ML is an exciting area of research and is in its early phase . No decryption is required for any step in the process . Individual parties cannot decrypt data on their own . For additional security, the model owners can also encrypt their models . You cannot do all kinds of computations in an encrypted state but most one the basic ones . Obviously there are more caveats and intricacies involved here and this is merely a simplified overview of all the sub-topics mention . If you want a follow up on any of them or explore any other concept in this space, please feel free to reach out to me .
Key Highlights :
- This article contains my notes from my initial reading about privacy preserving ML for a hackathon project some 2y ago .
- It is about ensuring only authorised people have access to something, while privacy has more to do with who is authorised .
- Privacy in ML is, at its core, a way of pattern matching .
- In many domains, that can benefit from ML, privacy is of utmost importance .
- Remote execution allows the model owner to train a model without explicitly handing them the data .
- The data never leaves participants’ device and the ML models of the world can still train on .
- Differential privacy is used to create upper bound on statistical uniqueness of the dataset .
- It is not always practical to be the owner of your own data .
- Differential Privacy allows you to securely distribute data across multiple parties .
- Privacy preserving ML is an exciting area of research and is in its early phase .
- The individual parties cannot decrypt data on their own .
- For additional security, the model owners can also encrypt their models .
The editorial is based on the content sourced from medium.com