From recognizing speech to identifying unusual stars, new discoveries often begin with comparison of data streams to find connections and spot outliers. But simply feeding raw data into a data-analysis algorithm is unlikely to produce meaningful results, say the authors of a new Cornell study.
That’s because most data comparison algorithms today have one major weakness: somewhere, they rely on a human expert to specify what aspects of the data are relevant for comparison, and what aspects aren’t.
But these experts can’t keep up with the growing amounts and complexities of big data.
So the Cornell computing researchers have come up with a new principle they call “data smashing” for estimating the similarities between streams of arbitrary data without human intervention, and even without access to the data sources.
Tag Archives | Big Data
Could you see any stalwart of the mainstream media in America using the medium of an online comic to address the tensions that so-called Big Data present? Upstart Al Jazeera America commissioned cartoonist Josh Neufeld and reporter Michael Keller to create a graphic novella that you can read here online and you’ll also find download links for iBooks, ePub and PDF versions. This is the first page:
Okay, so sue me. I like some pop music, especially the electronic stuff. Can’t listen to Godspeed You Black Emperor every day. Anyway, Big Data reminds me enough of the eighties synth pop I enjoyed that I found the video on YouTube. Glad I did. It’s a great little sweet-nothing of a “viral” video encapsulating a sour pill of a poke at marketers attempt to engineer “viral” media to promote their projects (Yeah! Headbash your way through life! Hashtag it!); that the video utilizes all of the same techniques through this narrative frame is, I’m sure, an intentional irony. Meta, meta, meta.
Via the Guardian Tom Chatfield on gazing at reality through computers’ data-crunching models:
… Read the rest
When Facebook asks me what I “like”, it’s making the convenient assumption that I feel one of two ways about everything in the world – indifferent or affectionate. When it aggregates the results of mine and a billion other responses, marvellous insights emerge. But these remain based on a model of preference that might kindly be called moronic.
Similarly, every measurement embodies a series of choices: what to include, what to exclude. If a computer could learn to recognise images of cats with absolute accuracy, would that mean it knew what a cat was? Not unless you redefined cats as silent, immobile, odourless sequences of information describing two-dimensional images. If a computer could learn to identify you with absolute accuracy via surreptitiously scraped data from your social media presence, phone calls and banking activities, would that mean it knew what it means to be you?
The move will allow insurers to more efficiently serve the public and pharmaceutical companies to better target their life-saving new drugs…because surely those are the only reasons why those industries would pay to access vast troves of personal medical data. The Guardian reports:
… Read the rest
Drug and insurance companies will from later this year be able to buy information on patients – including mental health conditions and diseases such as cancer, as well as smoking and drinking habits – once a single English database of medical data covering the entire population (harvested from GP and hospital records) has been created.
Privacy experts warn there will be no way for the public to work out who has their medical records or to what use their data will be put. The extracted information will contain NHS numbers, date of birth, postcode, ethnicity and gender.
Once live, organisations such as universities – but also insurers and drug companies – will be able to apply to gain access to the database, called care.data.
Will democracy give way to algocracy? Via the Institute for Emerging Ethics & Technologies, John Danaher writes:
… Read the rest
In brief, modern technology has made it possible for pretty much all of our movements, particularly those we make “online”, to be monitored, tracked, processed, and leveraged. We can do some of this leveraging ourselves, by tracking our behavior to improve our diets, increase our productivity and so forth. But, of course, governments and corporations can also take advantage of these data-tracking and processing technologies.
Data-mining [could create] a system of algorithmic regulation, one in which our decisions are “nudged” in particular directions by powerful data-processing algorithms. This is worrisome because the rational basis of these algorithms will not be transparent:
Thanks to smartphones or Google Glass, we can now be pinged whenever we are about to do something stupid, unhealthy or unsound. We wouldn’t necessarily need to know why the action would be wrong: the system’s algorithms do the moral calculus on their own.
The The Next Web writes that you will soon be “empowered” by having every mundane aspect of your life mined for data:
… Read the rest
Are you only as good as the company you keep? Before you accept that next friend request, consider what that person says about you, what that association might eventually cost, or be worth – even in the financial sense.
Where you live, who you friend on Facebook, the frequency you shop at Trader Joe’s, how much you spend – all of this information will be picked up, shared, and analyzed amongst the various connected devices and services you use.
This wealth of data will also be applicable to your financial decisions. “Who you are” as a consumer will no longer be based solely on your purchases, investments or credit file, but will also consider your daily routines, such as browsing the Internet, where you shop, and more.
Technology and new services are now making it possible to incorporate entirely new, more relevant data into a credit profile — data that is mostly consumer controlled or contributed and generated by simply gathering and delivering your lifestyle data.
For all you Monsanto watchers, here’s where the corporation we all love to hate is looking to expand its reach, via Salon via AlterNet:
… Read the rest
Imagine cows fed and milked entirely by robots. Or tomatoes that send an e-mail when they need more water. Or a farm where all the decisions about where to plant seeds, spray fertilizer and steer tractors are made by software on servers on the other side of the sea.
This is what more and more of our agriculture may come to look like in the years ahead, as farming meets Big Data. There’s no shortage of farmers and industry gurus who think this kind of “smart” farming could bring many benefits. Pushing these tools onto fields, the idea goes, will boost our ability to control this fiendishly unpredictable activity and help farmers increase yields even while using fewer resources.
The big question is who exactly will end up owning all this data, and who gets to determine how it is used.
Honestly, I think what will most spur members of Congress to action on this issue is that databases of Americans with erectile dysfunction are among those being sold. Forbes writes:
In a congressional hearing this week, Pam Dixon, executive director of the World Privacy Forum, revealed disturbing lists that she has found for sale from data brokers you’ve likely never heard of, including a “Rape Sufferers List” from a company called MEDbase 200, which sells lists about the medical industry.
The list, which was taken down yesterday after an inquiry from the Wall Street Journal, is still cached, as are some other disturbing lists such as “erectile dysfunction sufferers,” “alcoholism sufferers” and “ AIDS/HIV sufferers.“ All the lists promised 1,000 names for the low of $79:
“Select from families affected by over 500 different ailments, or who are consumers of over 200 different Rx medications. Lists can be further selected on the basis of lifestyle, ethnicity, geo, gender, and much more.”
A snippet from a Foreign Policy piece on NSA chief Keith Alexander reveals the logic at play in our surveillance state:
“He said at one point that a lot of things aren’t clearly legal, but that doesn’t make them illegal,” says a former military intelligence officer who served under Alexander at INSCOM.
When he ran INSCOM, Alexander was fond of building charts that showed how a suspected terrorist was connected to a much broader network of people via his communications or the contacts in his phone or email account.
“He had all these diagrams showing how this guy was connected to that guy and to that guy,” says a former NSA official who heard Alexander give briefings on the floor of the Information Dominance Center. “Some of my colleagues and I were skeptical. Later, we had a chance to review the information. It turns out that all [that] those guys were connected to were pizza shops.”