Wired magazine had it right with its headline “Trump Wants All Your Voter Data. What Could Go Wrong?” As the article notes, “The private ballot is tradition in the United States. Now, President Trump’s voter fraud commission wants to collect every American’s voting history and make it available to the public—all in the name of ‘election integrity.’”
Forty-four states have already said they will not comply for a variety of reasons. Other sites can discuss the merits of the request or rejection of the requests. Here we want to examine the implications of big data and why arbitrary collection of this information may not be a great idea.
For this particular request, states were asked to provide quite a bit of information about voters and their voting history. This included details like a voter’s political party affiliation, address, voting history, felony history, and the last four digits of their Social Security numbers. Some of this is disturbing in its own right like the last four digits of Social Security numbers. This information is often used in conjunction with other details like an email address to set up accounts on websites. These are often financial sites.
The information requested is unlikely to provide all such details but here is the rub. One aspect of big data manipulation and analysis is the ability to relate and combine database. Big data analysis is being augmented with machine learning techniques that can help increase this association. Coming up with related information that companies are using to provide you access to your accounts and services would be more than a bit annoyed, as would you, if it became easy to impersonate you online.
Big databases like can be useful and problematic by themselves but that is no longer the only way large databases are being utilized. Information acquired by sites from Google and Facebook to Amazon is commonly used to analyze users’ preferences to do everything from make sales suggestions to providing their customers with this information. Cross referencing this information with that obtained by the voter fraud commission is simply an extension of what is already being done in the name of big data.
Of course, this type of combination and analysis assumes that someone has access to two or more databases of this magnitude. Keeping such information private and contained is a good idea but we know that major breaches like this occur on a regular basis.
The trend now is toward smartphone apps with millions of users and countless Internet of Things (IoT) devices that are generating trillions of bits of information. Cars are starting to generate more data, much of it in real time, and smart speakers are adding voice data to the mix. This data can and is being correlated with the people involved.
The question for developers is what types of information are their applications generating, how is it being used, and what protections, if any, are being used with the data and applications involved. Most of these decisions will be at the corporate level but the understanding of security, big data, etc. are often the developer’s area of expertise.
Developers and entrepreneurs are rushing to incorporate everything from smart speaker technology to arrays of wireless IoT devices into their products. As a developer, data security and use may not be your primary concern but ignoring these issues is not a good idea.