From The Press To Cambridge Analytica

    The History of Computing 10/28/2020

10/28/2020

Welcome to the history of computing podcast. Today we’re going to talk about the use of big data in elections. But first, let’s start with a disclaimer. I believe that these problems outlined in this episode are apolitical. Given the chance to do so I believe most politicians (or marketers), despite their party, would have jumped on what happened with what is outlined in this podcast. Just as most marketers are more than happy to buy data, even when not knowing the underlying source of that data. No offense to the parties but marketing is marketing. Just as it is in companies. Data will be used to gain an advantage in the market. Understanding the impacts of our decisions and the values of others is an ongoing area of growth for all of us. Even when we have quotas on sales qualified leads to be delivered. 

Now let’s talk about data sovereignty. Someone pays for everything. The bigger and more lucrative the business, the more that has to be paid to keep organizations necessarily formed to support an innovation alive. If you aren’t paying for a good or service, then you yourself are the commodity. In social media, this is represented in the form of a company making their money from data about you and from the ads you see. The only other viable business model used is to charge for the service, like a Premium LinkedIn account as opposed to the ones used by us proletariat.  

Our devices can see so much about us. They know our financial transactions, where we go, what we buy, what content we consume, and apparently what our opinions and triggers are. Sometimes, that data can be harnessed to show us ads. Ads about things to buy. Ads about apps to install. Ads about elections.

My crazy uncle Billy sends me routine invitations to take personality quizzes. No thanks. Never done one. Why?

I worked on one of the first dozen Facebook apps. A simple rock, paper, scissors game. At the time, it didn’t at all seem weird to me as a developer that there was an API endpoint to get a list of friends from within my app. It’s how we had a player challenge other players in a game. It didn’t seem weird that I could also get a list of their friends. And it didn’t seem weird that I could get a lot of personal data on people through that app. I mean I had to display their names and photos when they played a game, right? I just wanted to build a screen to invite friends to play the app. I had to show a photo so you could see who you were playing. And to make the game more responsive I needed to store the data in my own SQL tables. It didn’t seem weird then. I guess it didn’t seem weird until it did. 

What made it weird was the introduction of highly targeted analytics and retargeting. I have paid for these services. I have benefited from these services in my professional life and to some degree I have helped develop some. I’ve watched the rise of large data warehouses. I’ve helped buy phone numbers and other personally identifiable information of humans and managed teams of sellers to email and call those humans. Ad targeting, drip campaigns, lead scoring, and providing very specific messages based on attributes you know about a person are all a part of the modern sales and marketing machine at any successful company. 

And at some point, it went from being crazy how much information we had about people to being - well, just a part of doing business. The former Cambridge Analytica CEO Alexander Nix once said “From Mad Men in the day to Math Men today.” From Don Draper to Betty’s next husband Henry (a politician) there are informal ties between advertising, marketing and politics. Just as one of the founders of SCL, the parent company of Cambridge Analytica had ties with royals having dated one and gone to school with others in political power.

But there have also always been formal ties. Public Occurrences Both Foreign and Domestick was the first colonial newspaper in America and was formally suppressed after its first edition in 1690. But the Boston News-Letter was formally subsidized in 1704. Media and propaganda. Most newspapers were just straight up sponsoring or sponsored by a political platform in the US until the 1830s. To some degree, that began with Ben Franklin’s big brother James Franklin in the early 1700s with the New England Courant. Franklin would create partnerships for content distribution throughout the colonies, spreading his brand of moral virtue. And the papers were stoking the colonies into revolution. And after the revolution Hamilton instigated American Minerva as the first daily paper in New York - to be a Federalist paper. Of course, the Jeffersonian Republicans called him an “incurable lunatic.” And yet they still guaranteed us the freedom of press. 

And that freedom grew to investigative reporting, especially during the Progressive Era, from the tail end of the 19th century up until the start of the roaring twenties. While Teddy Roosevelt would call them Muckrakers, their tradition extends from Nellie Bly and Fremont Older to Seymour Hersch, Kwitny, even the most modern Woodward and Bernstein. They led to stock reform, civic reforms, uncovering corruption, exposing crime in labor unions, laying bare monopolistic behaviors, improving sanitation and forcing us to confront racial injustices. They have been independent of party affiliation and yet constantly accused over the last hundred years of being against whomever is in power at the time.

Their journalism extended to radio and then to television. I think the founders would be proud of how journalism evolved and also unsurprised as to some of the ways it has devolved. But let’s get back to someone is always paying. The people can subscribe to a newspaper but the advertising is a huge source of revenue. With radio and television flying across airwaves and free, advertising exclusively became what paid for content and the ensuing decades became the golden age of that industry. And politicians bought ads. If there is zero chance a politician can win a state, why bother buying ads in that state. That’s a form of targeting with a pretty simple set of data. 

In Mad Men, Don is sent to pitch the Nixon campaign. There has always been a connection between disruptive new mediums and politics. Offices have been won by politicians able to gain access to early printing presses to spread their messages to the masses, those connected to print media to get articles and advertising, by great orators at the advent of the radio, and by good-looking charismatic politicians first able to harness television - especially in the Mad Men fueled ad exec inspired era that saw the Nixon campaigns in the 60s. The platforms to advertise become ubiquitous, they get abused, and then they become regulated. After television came news networks specifically meant to prop up an agenda, although unable to be directly owned by a party. None are “fake news” per se, but once abused by any they can all be cast in doubt, even if most especially done by the abuser. 

The Internet was no different. The Obama campaign was really the first that leveraged social media and great data analytics to orchestrate what can be considered to really be the first big data campaign. And after his campaign carried him to a first term the opposition was able to make great strides in countering that. Progress is often followed by lagerts who seek to subvert the innovations of an era. And they often hire the teams who helped with previous implementations. 

Obama had a chief data scientist, Rayid Ghani. And a chief analytics officer. They put apps in the hands of canvassers and they mined Facebook data from Facebook networks of friends to try and persuade voters. They scored voters and figured out how to influence votes for certain segments. That was supplemented by thousands of interviews and thousands of hours building algorithms. By 2012 they were pretty confident they knew which of the nearly 70 million Americans that put him in the White House. And that gave the Obama campaign the confidence to spend $52 million in online ads against Romney’s $26 million to bring home the win. And through all that the Democratic National Committee ended up with information on 180 million voters.

That campaign would prove the hypothesis that big data could win big elections. Then comes the 2016 election. Donald Trump came from behind, out of a crowded field of potential Republican nominees, to not only secure the Republican nomination for president but then to win that election. He won the votes to be elected in the electoral college while losing the popular vote. That had happened when John Quincy Adams defeated Andrew Jackson in 1824, although it took a vote in the House of Representatives to settle that election. Rutherford B Hayes defeated Samuel Tilden in 1876 in the electoral college but lost the popular vote. And it happened again when Grover Cleveland lost to Benjamin Harrison in 1888. And in 2000 when Bush beat Gore. And again when Trump beat Hillary Clinton. And he solidly defeated her in the electoral college with 304 to her 227 votes. 

Every time it happens, there seems to be plenty of rhetoric about changing the process. But keep in mind the framers built the system for a reason: to give the constituents of every state a minimum amount of power to elect officials that represent them. Those two represent the number of senators for the state and then the state receives one for each member of the house of representatives. States can choose how the electors are instructed to vote. Most states (except Maine and Nebraska) have all of their electors vote for a single ticket, the one that won the state. Most of the states instruct their elector to vote based on who won the popular vote for their state. Once all the electors cast their votes, Congress counts the votes and the winner of the election is declared. 

So how did he come from behind? One easy place to blame is data. I mean, we can blame data for putting Obama into the White House, or we can accept a message of hope and change that resonated with the people. Just as we can blame data for Trump or accept a message that government wasn’t effective for the people. Since this is a podcast on technology, let’s focus on data for a bit. And more specifically let’s look at the source of one trove of data used for micro-targeting, because data is a central strategy for most companies today. And it was a central part of the past four elections. 

We see the ads on our phones so we know that companies have this kind of data about us. Machine learning had been on the rise for decades. But a little company called SCL was started In 1990 as the Behavioral Dynamics Institute by a British ad man named Nigel Oakes after leaving Saatchi & Saatchi. Something dangerous is when you have someone like him make this kind of comparison “We use the same techniques as Aristotle and Hitler.”

Behavioural Dynamics studied how to change mass behavior through strategic communication - which US Assistant Secretary of Defense for Public Affairs Robert Hastings described in 2008 as the “synchronization of images, actions, and words to achieve a desired effect.” Sounds a lot like state conducted advertising to me. And sure, reminiscent of Nazi tactics. You might also think of it as propaganda. Or “pay ops” in the Vietnam era. And they were involved in elections in the developing world. In places like the Ukraine, Italy, South Africa, Albania, Taiwan, Thailand, Indonesia, Kenya, Nigeria, even India. And of course in the UK. Or at least on behalf of the UK and whether directly or indirectly, the US. 

After Obama won his second term, SCL started Cambridge Analytica to go after American elections. They began to assemble a similar big data warehouse. They hired people like Brittany Kaiser who’d volunteered for Obama and would become director of Business Development. 

Ted Cruz used them in 2016 but it was the Trump campaign that was really able to harness their intelligence. Their principal investor was Robert Mercer, former CEO of huge fund Renaissance Technologies. He’d gotten his start at IBM Research working on statistical machine translation and was recruited in the 90s to apply data modeling and computing resources to financial analysis. This allowed them to earn nearly 40% per year on investments. An American success story. He was key in the Brexit vote, donating analytics to Nigel Farage and an early supporter of Breitbart News. 

Cambridge Analytica would get involved in 44 races in the 2014 midterm elections. By 2016, Project Alamo was running at a million bucks a day in Facebook advertising. In the documentary The Great Hack, they claim this was to harvest fear. And Cambridge Analytica allowed the Trump campaign to get really specific with targeting. So specific that they were able to claim to have 5,000 pieces of data per person. 

Enter whistleblower Christopher Wylie who claims over a quarter million people took a quick called “This is Your Digital Life” which exposed the data of around 50 million users. That data was moved off Facebook servers and stored in a warehouse where it could be analyzed and fields merged with other data sources without the consent of the people who played the game or the people who were in their friend networks. Dirty tactics. 

Alexander Nix admitted to using bribery stings and prostitutes to influence politicians. So it should be as no surprise that they stole information on well over 50 million Facebook users in the US alone. And of course then they lied about it when being investigated by the UK for Russian interference and fake news in the lead to the Brexit referendum. Investigations go on. 

After investigations started piling up, some details started to emerge. This is Your Digital Life was written by Dr Spectre. It gets better. That’s actually Alexandr Kogan for Cambridge Analytica. He had received research funding from the University of St Petersburg and was then lecturing at the Psychology department at the University of Cambridge. It would be easy to make a jump that he was working for the Russkies but here’s the thing, he also got research funding from Canada, China, the UK, and the US. He claimed he didn’t know what the app would be used for. That’s crap. When I got a list of friends and friends friends who I could spider through, I parsed the data and displayed it on a screen as a pick list. He piped it out to a data warehouse. When you do that you know exactly what’s happening with it. 

So the election comes and goes. Trump wins. And people start asking questions. As they do when one party wins the popular vote and not the electoral college. People misunderstand and think you can win a district due to redistricting in most states and carry the state without realizing most are straight majority. Other Muckraker reporters from around the world start looking into Brexit and US elections and asking questions. 

Enter Paul-Olivier Dehaye. While an assistant professor at the University of Zurich he was working on Coursera. He started asking about the data collection. The word spread slowly but surely. Then enter American professor David Carroll, who sued Cambridge Analytica to see what data they had on him. Dehaye contributed to his Subject Access request and suddenly the connections between Cambridge Analytica and Brexit started to surface, as did the connection between Cambridge Analytica and the Trump campaign, including photos of the team working with key members of the campaign. And ultimately of the checks cut.  Cause there’s always a money trail. 

I’ve heard people claim that there was no interference in the 2016 elections, in Brexit, or in other elections. Now, if you think the American taxpayer didn’t contribute to some of the antics by Cambridge Analytica before they turned their attention to the US, I think we’re all kidding ourselves. And there was Russian meddling in US elections and illegally obtained materials were used, whether that’s emails on servers then leaked to WikiLeaks or stolen Facebook data troves. Those same tactics were used in Brexit. And here’s the thing, it’s been this way for a long, long time - it’s just so much more powerful today than ever before. And given how fast data can travel, every time it happens, unless done in a walled garden, the truth will come to light. 

Cambridge Analytica kinda’ shut down in 2017 after all of this came to light. What do I mean by kinda? Well, former employees setup a company called Emerdata Limited who then bought the SCL companies. Why? There were contracts and data. They brought on the founder of Blackwater, Mercer’s daughter Rebekah, and others to serve on the board of directors and she was suddenly the “First Lady of the Alt-Right.” Whether Emerdata got all of the company, they got some of the scraped data from 87 million users. No company with the revenues they had goes away quietly or immediately. 

Robert Mercer donated the fourth largest amount in the 2016 presenting race. He was also the one who supposedly introduced Trump to Steve Bannon. In the fallout of the scandals if you want to call them that, Mercer stepped down from Renaissance and sold his shares of Breitbart to his daughters. Today, he’s a benefactor of the Make America Number 1 Super PAC and remains one of the top donors to conservative causes. 

After leaving Cambridge Analytica, Nix was under investigations for a few years before settling with the Federal Trade Commission and agreed to delete illegally obtained data and settled with the UK Secretary of State that he had offered unethical services and agreed to not act as a director of another company for at least 7 years. 

Brittany Kaiser flees to Thailand and is now a proponent of banning political advertising on Facebook and being able to own your own data. 

Facebook paid a $5 billion fine for data privacy violations and have overhauled their APIs and privacy options. It’s better but not great. I feel like they’re doing as well as they can and they’ve been accused of tampering with feeds by conservative and liberal media outlets alike. To me, if they all hate you, you’re probably either doing a lot right, or basically screwing all of it up. I wouldn’t be surprised to see fines continue piling up. 

Kogan left the University of Cambridge in 2018. He founded Philometrics, a firm applying big data and AI to surveys. Their website isn’t up as of the recording of this episode. His Tumblr seems to be full of talk about acne and trying to buy cheat codes for video games these days. 

Many, including Kogan, have claimed that micro-targeting (or psychographic modeling techniques) against large enhanced sets of data isn’t effective. If you search for wedding rings and I show you ads for wedding rings then maybe you’ll buy my wedding rings. If I see you bought a wedding ring, I can start showing you ads for wedding photographers and bourbon instead. Hey dummy, advertising works. Disinformation works. Analyzing and forecasting and modeling with machine learning works. Sure, some is snake oil. But early adopters made billions off it. Problem is, like that perfect gambling system, you wouldn’t tell people about something if it means you lost your edge. Sell a book about how to weaponize a secret and suddenly you probably are selling snake oil.  

As for regulatory reactions, can you say GDPR and all of the other privacy regulations that have come about since? Much as Sarbanes-Oxley introduced regulatory controls for corporate auditing and transparency, we regulated the crap out of privacy. And by regulated I mean a bunch of people that didn’t understand the way data is stored and disseminated over APIs made policy to govern it. But that’s another episode waiting to happen. Suffice it to say the lasting impact to the history of computing is both the regulations on privacy and the impact to identity providers and other API endpoints, were we needed to lock down entitlements to access various pieces of information due to rampant abuses. 

So here’s the key question in all of this: did the data help Obama and Trump win their elections? It might have moved a few points here and there. But it was death by a thousand cuts. Mis-steps by the other campaigns, political tides, segments of American populations desperately looking for change and feeling left behind while other segments of the population got all the attention, foreign intervention, voting machine tampering, not having a cohesive Opponent Party and so many other aspects of those elections also played a part. And as Hari Seldon-esque George Friedman called it in his book, it’s just the Storm Before the Calm. 

So whether the data did or did not help the Trump campaign, the next question is whether using the Cambridge Analytica data was wrong? This is murky. The data was illegally obtained. The Trump campaign was playing catchup with the maturity of the data held by the opposition. But the campaign can claim they didn’t know that the data was illegally obtained. It is illegal to employ foreigners in political campaigns and Bannon was warned about that. And then-CEO Nix was warned. But they were looking to instigate a culture war according to Christopher Wylie who helped found Cambridge Analytica. And look around, did they? 

Getting data models to a point where they have a high enough confidence interval that they are weaponizable takes years. Machine learning projects are very complicated, very challenging, and very expensive. And they are being used by every political campaign now insofar as the law allows. To be honest though, troll farms of cheap labor are cheaper and faster. Which is why three more got taken down just a month before the recording of this episode. But AI doesn’t do pillow talk, so eventually it will displace even the troll farm worker if only ‘cause the muckrakers can’t interview the AI. 

So where does this leave us today? Nearly every time I open Facebook, I see an ad to vote for Biden or an ad to vote for Trump. The US Director of National Intelligence recently claimed the Russians and Iranians were interfering with US elections. To do their part, Facebook will ban political ads indefinitely after the polls close on Nov. 3. They and Twitter are taking proactive steps to stop disinformation on their networks, including by actual politicians. And Twitter has actually just outright banned political ads. 

People don’t usually want regulations. But just as political ads in print, on the radio, and on television are regulated - they will need to be regulated online as well. As will the use of big data. The difference is the rich metadata collected in micro-targeting, the expansive comments areas, and the anonymity of those commenters. But I trust that a bunch of people who’ve never written a line of code in their life will do a solid job handing down those regulations. Actually, the FEC probably never built a radio - so maybe they will.

So as the election season comes to a close, think about this. Any data from large brokers about you is fair game. What you’re seeing in Facebook and even the ads you see on popular websites are being formed by that data. Without it, you’ll see ads for things you don’t want. Like the Golden Girls Season 4 boxed set. Because you already have it. But with it, you’ll get crazy uncle Billy at the top of your feed talking about how the earth is flat. Leave it or delete it, just ask for a copy of it so you know what’s out there. You might be surprised, delighted, or even a little disgusted by that site uncle Billy was looking at that one night you went to bed early.

But don’t, don’t, don’t think that any of this should impact your vote. Conservative, green, liberal, progressive, communist, social democrats, or whatever you ascribe to. In whatever elections in your country or state or province or municipality. Go vote. Don’t be intimated. Don’t let fear stand in the way of your civic duty. Don’t block your friends with contrary opinions. If nothing else listen to them. They need to be heard. Even if uncle Billy just can’t be convinced the world is round. I mean, he’s been to the beach. He’s been on an airplane. He has GPS on his phone… And that site. Gross.

Thank you for tuning in to this episode of the history of computing podcast. We are so, so, so lucky to have you. Have a great day. 

(OldComputerPods) ©Sean Haas, 2020