This recent article by Yves-Alexandre de Montjoye, Florimond Houssiau, Andrea Gadotti and Florent Guepin of the Computational Privacy Group lists eight questions that we should ask to assess privacy in contact tracing applications.
“When it comes to contact tracing, it is required to go beyond simple reassurances that phone numbers aren’t recorded, that everything is encrypted, that pseudonyms are changing, or that the app is based on consent. Indeed, a large range of techniques exist to circumvent those protections. For instance, scores have been developed to re-identify individuals in location or graph datasets, session fingerprinting could be used to link a pseudonymous app user to an authenticated web visitor, and node-based intrusions would allow to track users.”
SafeTrace is an API which connects to a privacy-preserving storage and private computation service. Learn about SafeTrace by reading our announcement blog post and executive summary. You can read more and follow our progress in the GitHub repository.
1. How do you limit the personal data gathered by the authority?
SafeTrace API uses Trusted Execution Environment technology (Intel SGX). SafeTrace API enables applications (such as a web application or a mobile application) to submit encrypted user location data and infection status, run analytics and receive results, without ever revealing plaintext data to anyone, including the SafeTrace server operator.
This analysis can return two types of output:
- A “local view,” which is an individual report showing users where and when they’ve overlapped with individuals who have since tested positive
- A “global view” heat map that can help us better understand and curtail the spread of disease caused by COVID-19
2. How do you protect the anonymity of every user?
When data is submitted to SafeTrace, it is encrypted inside the user’s device and transmitted to SafeTrace in this encrypted form. When computations are run, the data is decrypted and used only within the secure enclave. When a computation is complete, the data used in that computation is “sealed,” which means that it is re-encrypted prior to storage in such a way that it can only be decrypted once again within the secure enclave.
The results can be shared in two ways:
- Individual users: SafeTrace API encrypts outputs such that only a specific individual user may decrypt and view it. This is achieved by applying Diffie-Hellman encryption scheme between the user and SafeTrace API.
- Globally shared data Global view analysis runs an n clustering on m users, where n<<mand share the outcome with a mapping API. This clustering algorithm that runs inside SafeTrace API prevents and protects anonymity of each user.
3. Does your system reveal to the authority the identity of users who are at risk?
No, the operator of SafeTrace API (authority) has no access to user data in plaintext format.
4. Could your system be used by users to learn who is infected or at risk, even in their social circle?
SafeTrace API notifies users if they have been in proximity to another user who’s tested positive (diagnosed patient). This notification process does not give users the identity of others who are infected or at risk. One might infer the identities of diagnosed individuals in their social circle based on memory. However it’s not possible to create a system attack to de-anonymize diagnosed patients.
5. Does your system allow users to learn any personal information about other users?
No. Within the “individual view,” SafeTrace only informs users that contact with an infected person has occurred at a given location and time.
6. Could external parties exploit your system to track users or infer whether they are infected?
No, the only information that can be inferred from SafeTrace API is potential identities of diagnosed patients at your social circle. However given SafeTrace users GPS data (no device to device communication), it’s impossible to identify with 100% confidence, given that there are usually other people present in a 10 ft radius in a given time.
7. Do you put in place additional measures to protect the personal data of infected and at-risk users?
We are looking into how Differential Privacy can complement our TEE based approach.
8. How can we verify that the system does what it says?
Intel SGX provides a service called “attestation.” This process affirms certain things about the enclave and the code that has been deployed inside it, for example what version of SGX is being run and what code will be deployed over data submitted to the enclave. If any of those elements are modified, for example, the code is changed, the quote used in the attestation process will need to be modified as well. The modified enclave will no longer be able to unseal data that had already been submitted, and new clients will see changes in the attestation when they connect with the enclave. This is a necessary protection to ensure that the enclave isn’t maliciously modified to run code that the user did not approve.
These questions focus on privacy aspects; however, ensuring security is also crucial. This means, for example, supervising the integrity and authenticity of the crowdsourced data, evaluating how mobile malware could affect the app’s behavior, or assessing the resilience of the authority’s servers against intrusions.
As our team continues to make progress developing a minimum viable SafeTrace product, we are carefully thinking about each of the above questions, along with many other insights from app developers, healthcare professionals, government officials, potential users, and others. The goal is to build the most useful platform to support all kinds of people around the world, especially those in vulnerable populations. Together, we can build solutions that protect our universal right to privacy as well as public health.