Web services for predictive text analytics

April 11, 2016

Textgain's security web services

Textgain is developing new web services for very specific problems, including identifying hate and depression on social media. Recently, we have made notable progress with identifying hate speech, specifically Islamic State (IS/ISIS/ISIL/Daesh) tweets.
In a recent announcement (February 2016), Twitter has spoken out against the use of their microblogging platform to promote terrorism. They report having suspended over 125,000 profiles for threatening or promoting terrorist acts, primarily related to Islamic State, using manual review and proprietary anti-spam technology.

Twitter remarks: ‘As many experts and other companies have noted, there is no “magic algorithm” for identifying terrorist content on the internet, so global online platforms are forced to make challenging judgement calls based on very limited information and guidance.’

Twitter's mission is challenging. For every subversive profile suspended, a new profile appears. Profiles that have not yet been suspended then broadcast the existence of the new profile, and so on, in an endless cat-and-mouse game.

Over the course of several terrorism acts that occurred in the past year, Textgain has developed a proof-of-concept that automatically identifies hate speech. We have collected a large amount of subversive tweets that promote hate and terrorism. In parallel, we have collected non-incendiary tweets from journalists, experts, religious leaders, muslimas, etc., that report on the same topic. Using machine learning and text analytics techniques, we then compared both datasets to predict what features (words, word combinations, ...) correlate strongly with hate speech. For example, out-of-place words such as kafir (infidel, كافر) and dawla (roughly: temporary state, دولة) combined with words such as dog or rage may raise the alarm bell.

Instead of using a fixed notion of what hate speech looks like, our machine fits itself to the available data as the rhetoric evolves. It is over 80% accurate in lab conditions. However, in the real world, automatically identifying one inflammatory tweet in a million other tweets is very difficult. We must use such tools with caution since they may predict false positives and false negatives alike. But Textgain believes that our technology can be valuable to help manual review of subversive text.

We will discuss and freely share our technology with platforms under distress such as Twitter and with known security agencies. Ask us about it!
Network visualization of (some) profiles involved in hate speech.

 
Press coverage [text analytics], most recent first:

Press coverage [image analysis], most recent first: