In the space of seven weeks, Cambridge Analytica went from being the poster child for smart, data-driven electioneering to a pariah. In early May, the firm that once promised to “find your voters and move them to action” announced it was starting bankruptcy proceedings, as the “siege of media coverage has driven away virtually all of the Company's customers and suppliers” (bit.ly/2r1wwj1).
The beginning of the end came with an interview in the Observer on 17 March, when “whistleblower” Christopher Wylie alleged that Cambridge Analytica “exploited Facebook to harvest millions of people's profiles” and used that data to target voters with personalised political adverts (bit.ly/2ragdUb).
Data reportedly came via an app created in 2013 by academic psychologist Aleksandr Kogan. At the time, Facebook allowed app developers to collect data not only about app users but also their Facebook friends, according to Facebook CEO Mark Zuckerberg (bit.ly/2f8a5G3). In written evidence to a UK parliamentary committee, Kogan confirmed that his app collected data from friends of users if privacy settings allowed access to that information, including name, birth date, location and gender (bit.ly/23bvrfS).
Seven days after the Wylie interview, investigators from the UK Information Commissioner's Office (ICO) descended on Cambridge Analytica's London premises (bit.ly/2r1rqUz). Facebook too was subjected to scrutiny when Zuckerberg was called to testify before Congress.
In announcing its winding down, Cambridge Analytica said it had “unwavering confidence that its employees have acted ethically and lawfully”, while the ICO said it would “continue its civil and criminal investigations”.
But what do statisticians and data scientists make of the story and its ramifications? Andrew Garrett, who chairs the Royal Statistical Society's Data Science Section (DSS), told Significance: “What is scary is the harvesting of information on other people from Facebook accounts. That some form of consent could allow access to data on your friends (not just yourself) seems counter to individual consent principles.”
DSS secretary Leone Wardman said this highlights the need for debate around the meaning of “consent” in the digital age. “We are now faced with long consent or privacy statements, but we don't understand who reads them before clicking through or whether people understand the ways in which their data might end up being used,” said Wardman. “There is therefore a tension between legal consent – having ticked the box – and moral or informed consent – what users actually feel comfortable with.”
This idea of “moral consent” links to wider discussions about ethics in data science. Julia Lane, professor of New York University's Wagner School and Center for Urban Science and Progress, believes there is “a major gap in ethics training across the board”. She said: “All scientists working on federally-funded research projects with data on human subjects have annual training in confidentiality protection; that should be extended to ethics training.”
A report published in May by the National Academies of Sciences, Engineering and Medicine (NAS) suggests that: “Data science ethics might be codified in an ‘oath’ similar to the Hippocratic Oath taken by physicians.” The proposed form of this oath includes statements such as: “I will respect the privacy of my data subjects” and “I will remember that my data are not just numbers without meaning or context, but represent real people and situations, and that my work may lead to unintended societal consequences” (nas.edu/EnvisioningDS).
Are micro-targeted ads effective?
One question asked in the wake of the Facebook–Cambridge Analytica scandal is: just how worried should we be about the effectiveness of personalised – or micro-targeted –political ads? The short answer is “very”, says Martin Goodson, chief scientist at Evolution AI. He points to a 2012 article in Nature, which reported: “About 340 000 extra people turned out to vote in the 2010 US congressional elections because of a single election-day Facebook message” (go.nature.com/2rlEGsC). Goodson says: “Now imagine you only target certain groups, trying to reduce their desire to vote. This is voter suppression.”
Lane, who was a contributor to the NAS report, said: “The work of Helen Nissenbaum” – e.g. bit.ly/2a7nuVx –“makes it clear that it's impossible to have personal privacy and control at scale, so it is critical that the uses to which data will be put are ethical – and that the ethical guidelines are clearly understood.”
Whether privacy is impossible is certainly up for debate. But clearly there are challenges in our hyper-networked world. For example, researchers at Imperial College London's Computational Privacy Group believe that individual privacy is to some extent contingent on how diligent a person's friends and colleagues are about their own privacy.
Writing on the group's blog (bit.ly/2r3z4cm), assistant professor Yves-Alexandre de Montjoye and colleagues argue that: “In our modern networked societies, privacy is a shared responsibility. … Our results show that the privacy risks incurred by the (poor) privacy settings of the people around us are much more important than previously believed”. That is one clear takeaway from the Cambridge Analytica story.
ASA calls decision “regrettable and lacking in scientific rigor”, while state AGs sue to block question
US Commerce Secretary Wilbur Ross has ordered the inclusion of a citizenship question on the 2020 Census form, despite the concerns of experts – including those within the US Census Bureau – who warned that response rates might suffer as a result.
In an eight-page memo (bit.ly/2HSaFwE) dated 26 March, Ross argued that the question is necessary to provide “complete and accurate data” to the Department of Justice (DOJ). The DOJ, in turn, said it needs citizenship data to produce “a reliable calculation of the citizen voting-age population” in order to protect against “racial discrimination in voting” (bit.ly/2m8eWNE).
Statisticians and law-makers wrote to Ross earlier this year, urging him not to add a citizenship question to the census for fear that it might lower response rates among immigrants and ethnic minority communities.
Ross acknowledged these concerns in his memo, writing that: “A significantly lower response rate by non-citizens could reduce the accuracy of the decennial census and increase costs for non-response follow up operations.
“However,” he continued, “neither the Census Bureau nor the concerned stakeholders could document that the response rate would in fact decline materially.”
Ross argued that non-response rates for the citizenship question on the American Community Survey (from 2013 to 2016) were in a similar range to non-response rates for other questions. However, for the citizenship question specifically, non-response rates for non-Hispanic whites (6–6.3%) were about half what they were for non-Hispanic blacks (12–12.6%) and Hispanics (11.6–12.3%).
In a statement, the American Statistical Association (ASA) described Ross's decision as “regrettable and lacking in scientific rigor” (bit.ly/2HQLWUq). It said: “While it is true we have little experience and testing on this specific change [the addition of a citizenship question], Secretary Ross ignores the expert opinion of the broad scientific community involved with survey and questionnaire research, which includes government, industry, and academic scientists. Given the innumerable and powerful ways census data help improve societal and economic conditions, great care should be exercised to avoid any situation that may further increase the risk of an undercount.”
Meanwhile, in Puerto Rico…
Soon after our April issue was published, Puerto Rico senators joined their colleagues in the House of Representatives in approving plans to dismantle the independent Puerto Rico Institute of Statistics (bit.ly/2jLBAuD). Organisations including the American Statistical Association and Royal Statistical Society have since written to Puerto Rico's governor to urge a rethink (bit.ly/2jLrQ3w).
The ASA ends by saying: “If reversal of the decision is possible, we strongly advise it.” Meanwhile, a coalition of state attorneys general, cities and mayors have filed a lawsuit to try to force that reversal (bit.ly/2HNecHu).
UK peers reject pre-election ban on poll results, but warn that oversight must improve
The House of Lords, the upper house of the UK Parliament, has called for greater oversight of public opinion surveys amid concern for the inaccuracy of pre-election polls.
Lord Lipsey, chairman of the Select Committee on Political Polling and Digital Media, urged the British Polling Council (BPC) to play “a more proactive role in how it regulates polling and influences the reporting of polls”.
Recommendations include having the BPC issue guidance on methodological best practice, providing an advisory service for the review of poll design, and developing a programme of training opportunities for journalists on how to read and interpret polling data (bit.ly/2rfeKaD).
The Lords investigation follows the perceived failure of the polls in the 2015 and 2017 UK general elections and the 2016 “Brexit” referendum. Its report notes how “For each of those events, albeit to varying degrees, the polls ‘called it wrong’”.
Referring to “compelling evidence that polls influence the narrative around elections”, Lord Lipsey said: “The polling industry needs to get its house in order. Otherwise the case for banning polling in the run-up to elections – one we for now reject –will become stronger.”