Linguistics Identifies Anonymous Users

Chaos pattern in 1DCVCN left-influential rule=147 gI = 0.14

Chaos pattern in 1DCVCN left-influential rule=147 gI = 0.14

via SCMagazine

Being anonymous online may become more challenging for those who wish to be unknown. A new data mining technique is being developed to reveal identities of people by writing style.

Imagine that the social networks which require real names will be used as a standard to delve the deep dark alleys of the internetAlthough it appears there may be ways to add white noise to a writing style, if indeed one is that concerned about being revealed.

Up to 80 percent of certain anonymous underground forum users can be identified using linguistics, researchers say.The techniques compare user posts to track them across forums and could even unveil authors of thesis papers or blogs who had taken to underground networks. “If our dataset contains 100 users we can at least identify 80 of them,” researcher Sadia Afroz told an audience at the 29C3 Chaos Communication Congress in Germany.”Function words are very specific to the writer. Even if you are writing a thesis, you’ll probably use the same function words in chat messages.”Even if your text is not clean, your writing style can give you away.” The analysis techniques could also reveal botnet owners, malware tool authors and provide insight into the size and scope of underground markets, making the research appealing to law enforcement.

to achieve their results the researchers used techniques including stylometric analysis, the authorship attribution framework Jstylo, and Latent Dirichlet allocation which can distinguish a conversation on stolen credit cards from one on exploit-writing, and similarly help identify interesting people.

The analysis was applied across millions of posts from tens of thousands of users of a series of multilingual underground websites including,,,,,,, and

It found up to 300 distinct discussion topics in the forums, with some of the most popular being carding, encryption services, password cracking and blackhat search engine optimisation tools.

While successful, the work faces a series of challenges. Analysis could only be performed using a minimum of 5000 words (this research used the “gold standard” of 6500 words) which culled the list of potential targets from tens of thousands to mere hundreds.



  • alizardx

    Once this tech is known to be in actual deployment, ways to game it can be found to either improve anonymity or persuade analytic systems that the user’s voice is somebody else’s – apps for mobile and Linux/OSX/pre-8 Windows desktops.

    • Calypso_1

      The link under Jstylo references countermeasures.

      These things always remind me of the Fremen in Dune, walking the sand w/out rhythm, imitating the shifting of sand & wind to avoid the sandworms.

      The survival dance of crypsis between cycles of predation & prey is one of the most magical displays of nodal points in the environment. These mechanisms portend some far greater permutation of reality than science yet frames within the knowable schema.

  • BuzzCoastin

    > Up to 80 percent of certain anonymous underground forum users can be identified using linguistic

    it’s not clairvoyant
    it can only identify a style of expression
    not the persona’s identity
    that’s what IP addresses, snitches & smart phones are for

    • echar

      Excellent point Buzz! They don’t like to have to work, and don’t mind putting the squeeze on someone.

  • Sir Legendhead

    I’d like to see this concept used in a work of fiction. Just imagine a criminal mastermind like Jigsaw, using this technique against the authorities by subtly shifting his own style.

  • Jason LeClair

    Lingustics got the unibomber caught. His manifesto and letters/papers had the same very statistically uncommon errors. I guess the lesson is to make your posts on the bomb making site you hang out on one syllable mini posts. You should be able to be in the 20 percent they can’t identify if you get your friends to proof read that ransom email.

    • Aram Jahn

      Kaczynski was caught because his brother’s wife noted the style in the Wa.Post/NYT matched the letters Ted sent his brother from Montana. No literary forensics cracked that case.

  • DeepCough

    It’s time to go back to the old-fashioned way of declaring revolutionary or anarchistic intent: graffiti.

    • Hadrian999

      I await the return of the pamphlet

      • echar

        One could cut and paste to create meme-bombs, much like the cut and paste from magazines of way back when.

  • Hadrian999

    If they really want to know my opinion on MMA, video games, telivision, and pornography, go ahead government spend money tracking

    • echar

      Meanwhile the uber rich are snorting cocaine off the bare buttocks of video game playing, MMA watching, porn stars.

      • Hadrian999

        you’ve had that dream too huh?

  • alizardx

    “Aieee! A thesaurus!”

    • echar

      Perhaps multiple thesauruses to be extra crafty?