Tag Archives | Data Mining

Anomalies, Prisons, and Geophysics: How Governments Use Data and How to Stop Them

via chycho

A common definition of an anomaly is “a deviation from the common rule, type, arrangement, or form.” This definition, however, can be simplified by stating that an anomaly is a deviation from specific parameters. The defining characteristic of an anomaly is that it can only exist in a comparative setting, implying that it can only be detected within a certain data set. Once a data set is obtained then parameters can be specified to filter out so called anomalies for evaluation. Depending on the type of data collected, these parameters can be specified to be anything occurring in any combination. If there is no data set, then there are no anomalies.

A prison can be defined as “a place of seeming confinement.” It is a place to incarcerate people who have lawfully or unlawfully stepped outside the parameters set in their society. This implies that inmates are anomalies within a community.… Read the rest

Continue Reading · 4

Predicting Trends Beforehand

In this age of constant advertisement and brand placement, trending topics on Twitter have become a great free way for advertisers to get their message in front of more potential customers. The only problem is that no one can predict what will be come a trending topic, at least until now.  A professor at M.I.T. in conjunction with one of his students, developed an algorithm that they claim will be 95% accurate in predicting those trending topics as much four to five hours before they are trending.

Picture: Unmadindu (CC)

Via M.I.T.

At the Interdisciplinary Workshop on Information and Decision in Social Networks at MIT in November, Associate Professor Devavrat Shah and his student, Stanislav Nikolov, will present a new algorithm that can, with 95 percent accuracy, predict which topics will trend an average of an hour and a half before Twitter’s algorithm puts them on the list — and sometimes as much as four or five hours before.

Read the rest

Continue Reading · 1

Datamining Hip-Hop’s History

Duncan Geere reports for Wired :

An artist named Tahir Hemphill wants to datamine 30 years of hip-hop lyrics to provide a searchable index of the genre’s lexicon.

The project analyzes the lyrics of over 40,000 songs for metaphors, similes, cultural references, phrases, memes and socio-political ideas. For each, it registers a date and a geographical location. Hemphill has raised more than $8,000 in funding for the project on Kickstarter, from 349 people.

The idea is so that important questions can be answered, like who was the first to mention “haters,”…

Continue Reading · 0