📅 Posted 2018-12-20
Recently, I wrote an article about Analytics as Code, which was my way of attempting to apply “Infrastructure-as-code” to something new. A work colleague alerted me to the fact that when he Googled the term, someone else had already written on the topic! In fact, a few people have. But one site, which ranked first, really stood out.
It wasn’t my website, but it was my article!
The website in question is Saowen.com. A quick check around and I discovered that Saowen.com is some sort of blog aggregation thing - except rather than simply aggregating the titles and summaries of blogs like a feed site, it takes ENTIRE blogs: images and all.
After a bit of poking around, I found that 5 of my articles were lifted and 100% republished on Saowen.com complete with a multitude of ads, trackers and comment modules. Disgusting! Saowen.com appears to have repurposed over 6 million blogs. I also found another colleague’s 4 blogs were also hoovered up by the same site.
I had 2 plans of attack:
- Raise a copyright “DMCA” claim with the Google
- Send a cease and desist notice to the website in question
For larger organisations this would be submitted through a legal team (or at least with proper legal representation) and there are many warnings as you go through the process with Google about making false claims and so forth. But I forged ahead myself because I was pretty confident those words and images were mine! The form I used was here
My argument had 3 key points:
- To prove I own nickmchardy.com
- To prove the content is owned by nickmchardy.com
- The website saowen.com admits that the author is nickmchardy.com
1. To prove I own nickmchardy.com
This step is probably optional for most, but as my email address doesn’t match my domain name, I figured this was a good place to start. Google actually responded pretty quickly to my request asking for proof of website ownership. So I invented my own way to prove this: by adding a TXT record to the domain with the case number in it. Simple and effective. This was accepted by the Google support team.
DNS TXT Record for nickmchardy.com
2. To prove the content is owned by nickmchardy.com
It was amusing to try and work out how to prove that I actually wrote the articles. Sure, I could easily say I wrote them, but who would believe me?
There are 5 articles in question and I decided to go through each article individually to prove it, but for the sake of time I’ll just talk about this one:
Analytics as Code
- This article contains the text “nickmchardy.com” near the header
- This article hot links images hosted at nickmchardy.com
- The body copy of the article is a complete copy of https://nickmchardy.com/2018/10/analytics-as-code.html
Some of the markup contained things like:
<meta name="og:image" content="https://nickmchardy.com/images/the-humble-minidisc/md-2.jpg" /> <meta content="https://nickmchardy.com/images/the-humble-minidisc/md-2.jpg" property="og:image:secure_url" /> <meta name="og:image" content="https://nickmchardy.com/images/koi-pond-architecture.png" /> <meta content="https://nickmchardy.com/images/koi-pond-architecture.png" property="og:image:secure_url" /> <span class="author vcard"> <a class="url" href="/source/site/nickmchardy_com"> <strong class="fn" itemprop="author">nickmchardy.com</strong></a></span>
I provided Google my evidence using annotated screenshots of page source code:
The remainder of the articles contain pretty similar evidence that they were blatantly copied from me and I’m quite sure it was done using a script:
https://hk.saowen.com/a/7f7e94200da506b71d6dff2cff674000c93bca0deffef799a10be082f2240892 https://hk.saowen.com/a/c7370e660a4312532b197574b4f072efbb4813fa01554ea35c672e455256c37b https://hk.saowen.com/a/e058fefebfa8577ebe581b893da2e21f6374ee949b7b5226435e40f4eb128356 https://hk.saowen.com/a/ee7c7f2eff7c10147ae51fc4c4b39ace749dffd5830e7398d6314b808aa21b4c
3. The website saowen.com admits that the author is nickmchardy.com
This is the most amusing part! Saowen.com totally admits I wrote all 5 articles with the following markup found in the source:
<strong class="fn" itemprop="author">nickmchardy.com</strong>
Google accepts my request and the index and search results reflect this. The whole process took about 3 days from first lodging my copyright claim, which seems like a reasonable timeframe to me.
DMCA notice when searching for “Analytics as Code”
It’s funny being on the other side of the fence for once: normally I would be seeing these notices on Google search results for other people’s content when searching for a TV show or movie. The complaint is also publicly listed in the Lumen Database if you want to have a read.
I know that this is a fairly impractical process to follow: the URLs could easily be changed at Saowen.com, more articles could be republished and it’s completely out of my control. Continuing to log more DMCA requests just wastes everyone’s time.
So why bother?
I have doubts about how often people search for tech things and come across my blogs (and I know it’s not that popular), however I felt like I’d test out this process just to see what it’s like. I was also pretty disappointed that Saowen.com ranks higher than my site for content which is clearly mine, so my domain must not have as much ‘weight’ in the eyes of Google.
In writing this article, I’m hoping that the automated scripts at Saowen crawl the article and publish it for maximum irony. A bit like this.
Currently, there hasn’t been a response to my Cease and Desist request, but I’ll keep waiting. Failing this, I could go after the hosting provider to lodge an abuse complaint.
Fingers crossed I wrangle more Google search referral traffic now!