New research unveils privacy risks from combinations of demographics.
There are increasing pressures for health care providers to make individual-level data readily available for research and policy making. But Canadians are more likely to allow the sharing of their personal data if they believe that their privacy is protected. A new report by Dr. Khaled El Emam, the Canada Research Chair in Electronic Health Information at the University of Ottawa and the Children's Hospital of Eastern Ontario Research Institute, suggests that Canadians can be uniquely identified from their date of birth, postal code, and gender. This means if this triad of data exists in any database, even if it has no names or other identifying information, it would be possible to determine the identity of those individuals.
The report is now available in BMC Medical Informatics and Decision Making Journal
"Most people tend to think twice before reporting their year of birth [to protect their privacy] but this report forces us all to think about the combination or the totality of data we share," said Dr. El Emam. "It calls out the urgency for more precise and quantitative approaches to measure the different ways in which individuals can be re-identified in databases - and for the general population to think about all of the pieces of personal information which in combination can erode their anonymity."
The research study used a sizable Montreal-based population. The provincial health insurance claims database of Quebec holds demographic information on all citizens that have health insurance. Because it is publicly financed insurance, it effectively captures the whole population. For the purpose of the investigation, only date of birth, gender and full postal code data were obtained.
Using only the postal codes, the proportion of individuals who are unique is significant. When the full date of birth is used together with the full postal code, then approximately 97% of the population are unique with only one year of data. When the full date of birth and a multi-year residential trail are considered, then almost 100% of the population is unique.
Reducing the granularity of the postal code to 1 character together with the full date of birth does reduce the proportion uniqueness considerably.
"The findings are important because they offer yet another onion skin to peel back in the overarching dialogue about individual privacy rights. We need to continuously evaluate these risks to privacy, and put in place measures to protect anonymity, whether technological or policy-based. Failure to do so will result in a public unwilling for their health data to be used for secondary purposes, such as health research," said Dr. El Emam. "Take for example, if only a three character postal code is combined with the full date of birth, close to 80% of the population is unique - or easily identifiable. I suspect this will surprise most people."