Guest Post: A Plan For Abusiveness

The following is a guest post by Jeff Revesz and Elena Haliczer, co-founders of Adaptive Semantics. Adaptive Semantics specializes in sentiment analysis, in particular using machine learning to help automate comment moderation. They’ve been quite successful at the Huffington Post, which is also an investor. Intrigued by their approach, I reached out to them to solicit this post. I encourage you to respond publicly in the comment thread, or to contact them personally (first name @

Seven years ago, Paul Graham famously stated:

“I think it’s possible to stop spam, and that content-based filters are the way to do it.”

Well, seven years of innovation and research have brought about some great advances in the field of text classification, so perhaps it’s time to raise the stakes a little. In short, we think it’s possible to stop abusiveness in user-generated content, and that content-based filters are the way to do it.

The Problem with UGC

Publishers these days are in a tight spot with user-generated content (UGC). The promise of UGC in terms of engagement and overall stickiness is hard to pass up, but along with the benefits come some headaches as well. Comment spam is less of an issue than it once was, thanks to services such as Akismet, but the problem of trolling and outright abuse is as bad as it ever was. Any publisher venturing into UGC is stuck with the question of how to keep comments in line with their editorial standards while at the same time avoiding accusations of censorship. The solution employed thus far has mainly been a combination of keyword filters and human moderators. Unfortunately for publishers, there are serious problems with both of those, so let’s look at that more closely.

The main problem with human moderators is the cost involved. They’re expensive, hard to outsource, and they don’t scale. The average human has a maximum capacity of about 250 comments per hour, which is a generous estimate. At minimum wage this works out to about $0.03 per comment, which seems reasonable until you consider that a typical online publisher like the Huffington Post receives about 2 million comments per month site-wide. Add in overhead costs like hiring, training, auditing, etc and it quickly starts to get out of control. On top of this is the issue of moderator bias. Is it possible that your Democratic moderator is simply deleting every post that disagrees with President Obama, regardless of content?

To mitigate the costs involved, many publishers add in a layer of non-human filtering, such as a keyword list. While this may seem like a good idea at first, all it really does is offer you the worst of both worlds. Now you have an expensive, non-scalable solution that also gives bad results. Keyword lists can be easily beaten by the simplest obfuscation, such breaking up bad words or simply replacing a letter with a symbol. In addition, it is impossible for keyword filters to catch anything but the crudest type of abusiveness. A great example is the recent Facebook poll “Should Obama Be Killed?” which would likely pass right through a keyword filter but is quite obviously abusive content.

The Solution:  Sentiment Classifiers

The idea of using a machine-learning classifier to identify text-based semantics is not a new one. Vladimir Vapnik introduced the original theory of support vector machines (SVMs) in 1995, and in 1998 Thorsten Joachims argued that the algorithm was perfectly suited for textual data. Finally, in 2002 Lillian Lee and colleagues showed that not only are SVMs well suited for identifying sentiment, but they also dominate keyword-based filters consistently. When applied to the problem of comment moderation, SVMs can mimic human moderation decisions with an accuracy of about 85%. That raises the question, is 85% good enough? How can we push the accuracy higher?

We have some proprietary answers to that question over at Adaptive Semantics, but a less controversial one arises from a well-documented property of classifier output known as the hyperplane distance (labeled vk in the diagram below).


If the separating hyperplane can be viewed as the dividing line between abusive and non-abusive content, the hyperplane distance of any individual test comment can be interpreted as the classifier’s confidence in its own answer. If a test comment turns out to be very far from the dividing line, we can say that it lies deeply in “abusive space” (or in “non-abusive space” depending on the polarity). Now let’s imagine an SVM pre-filter that only makes auto-publish or auto-delete decisions on comments that have a large hyperplane distance, and sends all other comments to the human moderator staff. Such a classifier would have a guaranteed accuracy above 85%, and would progressively reduce the reliance on human moderators as it is re-trained over time. Even a conservatively tuned model can reduce the human moderation load by about 50% while keeping comment quality roughly the same. That’s a pretty good start.

In addition to high accuracy, a content-based classifier does not have the same limitation of a keyword filter in terms of vocabulary. Since the classifier is trained by feeding it thousands of real-world examples, it will learn to identify all of the typical types of obfuscation such as broken words, netspeak, slang, euphemisms, etc. And since the entire content of the comment is used as an input, the classifier will implicitly take account of context. So the comment “Should Obama Be Killed?” would likely be flagged for deletion, but a comment like “A defeat on healthcare may kill Obama’s chances at re-election.” would be left alone.

So is the abusiveness problem licked? Not quite yet, but the use of linear classifiers would be a huge step in the right direction. You could imagine further advances such as aggregating comment scores by user to quickly identify trolls, and maybe even using those scores as input for another classifier. Or how about training more classifiers to identify quality submissions and pick out the experts in your community? The possibilities are definitely exciting, and they raise another question: why are publishers not using these techniques? That one we don’t have a good answer for, so we founded a company in response.

By Daniel Tunkelang

High-Class Consultant.

7 replies on “Guest Post: A Plan For Abusiveness”

This is a very cool idea – does your system take the word order into account? For example “We should kill the health bill because I don’t like Obama” is not a very nice thing to say, but one can shuffle those words around to form a comment which is absolutely toxic (exercise left to the reader). Would these two comments be treated identically?


“does your system take the word order into account?”

There are various methods for creating the actual document vectors from your original comment text. Some methods account for word order and some do not. Currently we are using unstemmed unigrams and bigrams, so “Obama” “kill” and “bill” would show up as features, but also “we should” “should kill” “kill the” etc. So if you rearranged the words to form a violent threat, then the bigram “kill Obama” would definitely be recognized as its own feature, and that would contribute information to the final score.


Comments are closed.