July 7, 2009
Social Security Numbers Predicted Using Public Information
Carnegie Mellon University researchers found that the nine-digit social security number, which each individual guards as sacred, can be predicted by using accessible personal information found on public records or online social networks.
"It's good that we found it before the bad guys," Alessandro Acquisti, associate professor of information technology and public policy at Carnegie-Mellon University in Pittsburgh, said of the method for predicting the numbers.
In Tuesday's edition of Proceedings of the National Academy of Sciences, Acquisti and his colleague Ralph Gross report that they were able to make surprisingly accurate predictions of most or all of the nine-digit social security numbers (SSN) using data available in public records along with information such as birthdates commonly found on social networking sites such as MySpace and Facebook.
For people born after 1988, when the government began issuing numbers at birth, the researchers were able to name the first five Social Security digits for 44% of the people. And with less than 1,000 tries, they accurately predicted the complete nine-digit number for 8.5 percent of those people.
"If you can successfully identify all nine digits of an SSN in fewer than 10, 100 or even 1,000 attempts, that Social Security number is no more secure than a three-digit PIN," the authors noted.
The authors concluded that the SSA could decrease this vulnerability by assigning numbers to people based on a randomized scheme, but ultimately an alternative means of authenticating identities must be used.
Social Security spokesman Mark Lassiter says that this finding should not worry the public "because there is no foolproof method for predicting a person's Social Security number."
"The suggestion that Mr. Acquisti has cracked a code for predicting an SSN is a dramatic exaggeration," Lassiter claimed in an email.
He mentioned that the agency has already been working on a system to be put in place next year that is designed to randomly assign SSNs, but he insists it has nothing to do with the report.
The researchers say that they did not include sensitive details of the prediction strategy in the report that would give criminals a how-to guide identify the numbers.
Identity theft has plagued people with fear for years, and has cost Americans nearly $50 million in just 2007. If the method of predicting the numbers falls into the wrong hands, the risk of identity theft would greatly increase, according to Acquisti.
When Social Security was devised in the 1930s, no one could foresee the precious number being so commonly used for passwords and other forms of authentication. The Social Security Administration has been warning educational, financial and health care institutions for years not to use the numbers as personal identifiers.
"In a world of wired consumers, it is possible to combine information from multiple sources to infer data that is more personal and sensitive than any single piece of original information alone," he said, urging people not to make too much information public on social network sites.
Acquisti, who researches the economics of privacy, said he was curious to find what he could learn about people just by using the information provided on these sites. He called it "a great experiment in self-revelation."
He said people were often willing to list their date of birth and hometown, which is information used in issuing SSNs.
The researchers also consulted the SSA's "Death Master File," which lists the numbers of the deceased. The file was made public in order to prevent the Social Security theft of those who had died. However, the data listed for people between 1973 and 2003, provided the researchers with statistical patterns for the numbers issued that helped them identify SSNs for the living.
"I was surprised by the accuracy of certain predictions," Acquisti said.
He explained that the system can give a range of possibilities for the last four numbers, which makes it easier for a computer to test the possibilities until the correct number is found for an individual.
He went on to say that "attackers can exploit various public and private-sector online services, such as online "instant" credit approval sites, to test subsets of variations to verify which number corresponds to an individual with a given birth date."
While it was well known that the numbers have a geographic component, previous studies have used the patterns in addition to other data in estimating when and where a specific number may have been issued.
"Our work focuses on the inverse, harder, and much more consequential inference: it shows that it is possible to exploit the presumptive time and location of SSN issuance to estimate, quite reliably, unknown SSNs," Acquisti said.
On the Net:
- Carnegie Mellon University
- Proceedings of the National Academy of Sciences
- Social Security Administration