Skip to main content

https://data.blog.gov.uk/2012/06/14/on-the-issue-of-spam-in-data-gov-uk/

On the issue of spam in data.gov.uk

Posted by: , Posted on: - Categories: Uncategorized

As many of our users would have noticed, there has been recently an increase in the number of spam being posted as comments or forum topics across the site.  We are well on top of the situation and are working not only to stop it from occurring again but also to clean up those entries.  As many will know, this is not a uncommon problem for sites where commenting is allowed.

Although we have in place reCAPTCHA, which has stopped, by looking at the logs, a fair amount of potential spam, it can only stop spam generated by software, not people. After a long analysis of the spam in our site, we have a strong feeling that human intervention is also at play.  This, of course, is not new, spam companies are increasingly hiring cheap labour where the spambot captures the captcha image and this is sent to 'workers' that provide the correct interpretation, opening the way for the spambot to post freely, in many instances, there is not even an automated bot, but sheer manual labour involved and they can put out an amazing amount of content out on forums and comment sections, combine them both and you have a monumental situation.

As it would be impossible to moderate every single post to a site (ours or any other decent volume site) without dedicating several full time staff just to sift through, it presents a logistics problem (let alone the inconvenience caused to you, the users).  It is well known that reCAPTCHA, one of the most popular version of a CAPTCHA implementation and owned by Google, has been cracked in the past and shortly after and until Google patched it, allowed for an unprecedented wave of spam postings in forums worldwide.  We don't know if new ways of bypassing it have been found, but surely it is a matter of time. Given the combination of human intervention and spambots, the amount of effort and attention required to simply keeping our forums and messages clean and free from spam is quite considerable in terms of manpower and we need to find a sustainable balance.

We are taking a look at some extra measures, such as using services like Akismet or Mollom and increasing our moderation time.  Of course, key questions will have to be answered internally, if automated services do not manage to curtail the spam to a manageable level for us to moderate, what is the cost of staffing a full time team just to moderate and will it end up costing as much just to keep the forums and comments free from spam as it costs to run the site and develop it further?

For now, we are momentarily freezing posting on the site so we can clean up, some legit posts may suffer collateral damage in the process, we aim to bring them back from archives and we may not be able to do so for all. We also know that users will understand that losing a couple of historical conversations may be a small price to pay to bring the comments and forums back to a nice clean state and will support us through and come back and re-start any lost conversation with further gusto, especially as the new site to be launched very, very soon will hopefully provide plenty of satisfaction and give much to talk about amongst our family of users.

This issue is only inconveniencing those wishing to comment or those participating on the forums, it has no relevance to the data catalogue and the integrity of the information held within it.

I will keep you updated as we work solutions on the spam front, in the meantime I can only say that although it is a common thing that happens to sites nowadays, I am sorry this may have created any inconvenience to your use of the site.

PS- please DO report any post you think spam, it will make it easy for us to exterminate it. Also note that I have disabled comments on this post for obvious reasons.

Sharing and comments

Share this page