My site content got copied

...and I am thinking about changing my CC-BY-NC-SA license to a more restrictive one.

I just noticed that a blog blatantly copies my posts without attribution and violates even a (relatively) permissive CC-BY-NC-SA license. It has been popular recently to copy online content without proper attribution, but I didn't expect it to happen to my blog in such a regular, almost matter-of-factly, way.

I found most of my recent posts copied word for word even with the formatting preserved (there is one or two missing, which points to a manual work). The only "attribution" I found is a text at the bottom of the post via Zero Brane http://notebook.kulchenko.com/zerobrane/live-coding-in-lua-bret-victor-style, which is not even a link back to my original post. One of the posts even includes changes that I made only last night.

Let's see what provisions of the license this content was under are violated (I'm not a lawyer):

  • keep intact any copyright notices for the work: None are provided. All my posts are marked with "Some rights reserved © 2004-12 Paul Kulchenko", but the copyright notice is not included in any of the posts.
  • provide...the name of the Original Author (or pseudonym, if applicable) if supplied: None is provided.
  • provide ... the title of the Work if supplied: The title of the work is used, but it is not referenced in any way.
  • provide ... the Uniform Resource Identifier, if any, that Licensor specifies to be associated with the Work: The URI is provided as a text; not sure if this violates the license by it not being a link.
  • keep intact all notices that refer to this License: None are present.
  • may not exercise any of the rights granted to You in Section 3 above in any manner that is primarily intended for or directed toward commercial advantage or private monetary compensation.: If there were ads on the pages, I'd say it would be a clear violation of the license. But what if a site gains a better placement in search engines by using someone else content under non-commercial license and then uses that placement for a monetary gain? What if my original content gets penalized by search engines as it now looks like a scraper site?

This is how the post looks on the copied page (http://sunxiunan.blogspot.com/2012/06/live-coding-in-lua-with-zerobrane.html):

And this is the bottom part with "attribution":

So, why do I want to go with a more restrictive license. Let's say the copy includes a proper notice along the lines of:

This is a copy of the original work by Paul Kulchenko licensed under CC-BY-NC-SA; Some rights reserved © 2004-12 Paul Kulchenko.

Let's also say that the notice is at the bottom of the page in 3 point size font. This doesn't seem to violate the license at all as the placement of the attribution credit is left to a licensee, but it violates the spirit of my decision to share under CC-BY-NC-SA license.

I'm all for sharing and all my publicly release software is licensed under permissive MIT and Artistic licenses, but I don't like the idea of someone simply copying my blog posts instead of -- my idealistic spirit talking -- using them to add something of their own and make it better for everyone.

You should get a copy of my slick ZeroBrane Studio IDE and follow me on twitter here.

6 Comments

Do you think changing the license of your content is going to stop these people from ripping off your content again in the future? They've already shown that they do not have regard for your licensing.

Its the rise of automatic blogs(autoblogging).

There are potentially methods to prevent a copier from conveniently scraping your blog for content. That may be the most effective response as opposed to relying on policy or license based approaches.

@Brian, I agree, this is not likely. My main reason for the change was that I realized that even with proper attribution I don't want people simply copying my content. I want them to be using the content for a good purpose and adding something of their own. It is something I did not give much thought when I was picking this license in the first place (or rather site scraping wasn't so popular at that time).

@bob, In this case it seems to be manual, rather than automatic, but I'd like to hear more about these methods you mention.

From what I've heard, some scrapers slightly modify the content they steal in order to trick Google's keyword analysis into believing it's original work.

Most scrapers seem to be moron kids who hang around "black hat" forums, trying to prove to each other that they can make money by selling ads on other people's content. Since they generally haven't got a clue, they tend to use services like Blogger/Wordpress for hosting, which means they're at the mercy of their service provider if you send a DMCA.

I suggest you find a template DMCA notice and fire it off every time one of these little parasites pops up.

@Craig, already did and received a response this morning "We have received your attached DMCA complaint, however we need additional information from you in order to continue investigating. For each URL you've provided, please identify the exact content that you claim infringes upon your copyright." Responded and now waiting for their answer.

Leave a comment

what will you say?
(required)
(required)

About

I am Paul Kulchenko.
I live in Kirkland, WA with my wife and three kids.
I do consulting as a software developer.
I study robotics and artificial intelligence.
I write books and open-source software.
I teach introductory computer science.
I develop a slick Lua IDE and debugger.

Recommended

Close