Copyright RDF

The little CC license that could, or when technology is all busted up

Recovered from the Wayback Machine.

Phil Ringnalda points to the new Yahoo Creative Commons search engine and notices that because the engine is relying purely on links to CC licenses to pull out content that is supposedly licensed as CC, there is going to be a lot of confusion related to what is, or is not, CC licensed.

An issue with CC has always been how to attach CC license information in such a way that automated processes could work with it. The solution has been to use RDF/XML embedded within HTML comments to indicate what is licensed on the page. However, this is kludgy and doesn’t validate within XHTML and people are dropping it, and just including the link to the specific license. More, even if they include the RDF/XML they do so in such a way that it looks like everything in the page is under the specific license–HTML, writing, CSS, photos, whatever.

In other words, they take the rich possibilities inherent with using RDF, and dumb it down until it’s equivalent to the link.

Phil then pointed out that Yahoo releasing this search that just looks for links to the license in a document, and doing so without any legal disclaimers, warnings, or asides, is about the same as somebody accidentally putting a GPL license on the next version of Windows. In other words: it’s a a really dumb move:

But if I was the Yahoo! lawyer who vetted their Creative Commons search, and let it loose without any disclaimer that “Yahoo! makes no assertion about what, if any, content in these results is actually offered under a Creative Commons license” I’d be hanging my head in shame.

To make matters worse, in the associated FAQ for the new search is the following:

This search engine helps you quickly find those authors and the work they have marked as free to use with only “some rights reserved.” If you respect the rights they have reserved (which will be clearly marked, as you’ll see) then you can use the work without having to contact them and ask. In some cases, you may even find work in the public domain — that is, free for any use with “no rights reserved.”

Yup. I think this is a case for the new Corante legal weblog.

I tried the search with my weblog’s name, and found one interesting result: the bbintroducingtagback tagback in Technorati. It seems that Technorati has linked to one of the CC licenses that allows non-commercial use. But used in the way it is, it implies that all the material in the page is licensed this way. Wait a second, though: that’s my photo in the page, pulled in from Technorati via flickr. I don’t license my work as CC–it’s still too damn vague a licensing, usually applied badly (as we’re seeing now).

(Marius, what do you think about that? And this picture is still too cute for words.)

Phil calls this accidentally by link association form of CC licensing, viral and viral it is, indeed; through bad implementations of a vague license, I may, by allowing my photo to be copied (while holding all rights), have lost rights to that photo by implication and effect. At a minimum. who holds the copyright on the photo has been lost when it filters through both the Technorati tag and the search engine results.

I’ve been in a discussion about the CC license and the issue of how to record more specific information with Mike Linksvayer (who is on the staff at cc) at Practical RDF. I brought up the issue of lack of precision in the licensing and Mike mentioned that one approach CC is looking at is to use, again, the ‘rel’ attribute as a way of marking metadata. But this can only go so far — it’s really not much more than just linking to the license and assuming this implies usage.

(And, frankly, our use of ‘rel’ is becoming a bit of a stretch–we’re trying to stuff all the meaning in the internet in one little bitty attribute.)

The approach I’m using for complex metadata (which is what CC is) in Wordform is to generate a separate RDF/XML feed that explicitly states which element is licensed, which isn’t within a page, and exactly how the licensed element can be used (among other metadata). I link to this page through a LINK element in the header, as many of you do with auto-discovery of feeds right now. However, Mike’s response to this was:

A separate RDF file is a nonstarter for CC. After selecting a license a user gets a block of HTML to put in their web page. That block happens to include RDF (unfortunately embedded in comments). Users don’t have to know or think about metadata. If we need to explain to them that you need to create a separate file, link to it in the head of the document, and by the way the separate file needs to contain an explicit URI in rdf:about … forget about it.

But if we don’t explain to people how all this works, and provide a way for folks to be more precise, problems like the Yahoo CC search and the Technorati tag page are going to continue. By ‘protecting’ people from the technology, we are, in effect, doing more to harm them then help them.

What we should be doing is providing the tools to allow people to use rich metadata, richly; not make assumptions that “people can’t deal with it” and then dumb it down accordingly. We should be helping people understand how to use something like the CC license wisely and effectively–using clear, non-technical language to explain how all the bits work–not depend on technology to somehow ‘guess’ what a person wants and act accordingly.

Because as we’ve seen, technology almost invariably guesses wrong.

RDF Semantics

Dumbing down of America

A recent spate of postings at Planet RDF revolve around a two-day session on SPARQL that’s coming up in Europe. It was reading through these that something I noticed recently became more apparent: that most of the semantic web effort, or the effort that’s involved with RDF, is happening in Europe (with some side trips into Canada when the weather is good).

In the United States, on the other hand, most of the discussion is about folksonomies. We are a nation filled people raising excited fingers from both coasts to point at delicious, flickr, Technorati, and Wikipedia; matched with solemn assurances that these new ‘bottom up’ systems are going to kick the butt of ‘formal’ ontologies.

Leaving aside whether one would want a doctor who learned biology the ‘folksonomic’ way, is there a geographical split to the direction of study for the semantic web? Are ‘folksonomies’ becoming the fast food of semantics–the McDonald’s of taxomonies? If so, then are we in the US going to end up with obese vocabularies, barely able to clasp the belt of understanding around their middles?

And I want to know why events like these never happen in St. Louis. Is it an European/Canadian plot to slowly dumb down America until they can quietly invade us one day, and we don’t even know it until a tag appears in Technorati labeled “AllYourMetaBelongToUs”?

All I can say is I didn’t vote for him!

As for having a meeting here, we have beer, too. Good beer. In fact, Budweiser is located her…

As for having a meeting here, we have wine, too. Good wine. Stomped by only the finest squirrel and beaver.

Social Media Weblogging

WordPress and the hidden articles

Recovered from the Wayback Machine.

An interesting story appeared today about the WordPress site, and several thousand articles that could be found in a

Disclaimer. I’m hesitant to even write about this, knowing the web’s fondness for angry mob justice, but I feel like it’s an important issue that needs to be addressed. My one request: please be calm and rational. WordPress is a great project, and Matt is a good guy. Think before piling on the hatemail and flames.

The Problem. WordPress is a very popular open-source blogging software package, with a great official website maintained by Matt Mullenweg, its founding developer. I discovered last week that since early February, he’s been quietly hosting almost 120,000 articles on their website. These articles are designed specifically to game the Google Adwords program, written by a third-party about high-cost advertising keywords like asbestos, mesothelioma, insurance, debt consolidation, diabetes, and mortgages. (Update: Google is actively removing every article from their results. You can still view about 25,000 results on Yahoo. Or try this search tool, which searches multiple Google datacenters.)

(Several links within the original material.)

From comments left, it would seem that the content with the links to the articles is hidden within the WordPress main page, therefore passing on the high Google rank the site gets to the articles, themselves, while still not providing a visible indication of this on the site page.

<div style="text-indent: -9000px; overflow: hidden;">
<p>Sponsored <a href="/articles/articles.xml">Articles</a> on <a href="/articles/credit.htm">Credit*lt;/a>, <a href="/articles/health-care.htm">Health</a>, <a href="/articles/insurance.htm">Insurance</a>, <a href="/articles/home-business.htm">Home Business</a>, <a href="/articles/home-buying.htm">Home Buying</a> and <a href="/articles/web-hosting.htm">Web Hosting</a></p>


Since the words used in the pages are high ‘rate’ words within the Google AdSense program, we can assume this could be lucrative to the company that provided the articles. According to Matt’s response in a thread at the WP support forum, WordPress itself received a set fee for hosting the articles.

How much? Well, enough to hire the first employee of WordPress, Inc..

I am not one of those who believes that the only decent open source project is one where the people do the work only as a labor of love. I don’t think there’s anything wrong with people making money from their art. But of course, I would say this, as I try to put together an online store with goods featuring my photos, as well as still trying to find buyers for my books and/or articles–and after I had added, and pulled, Google Ads.

It’s all very good to say, “We should do this because we love to do it”. But it’s hard to be motivated to write and create when one is worried about what the next month holds. Nobel to say, “Well, I would deliver pizza if needs be, to keep my art free of contamination.” Tell me, though: how many of you have delivered pizza? Want to try it at 50?

Still, I can also see that there’s been a dimming of the joy of this medium, as more and more people turn to these pages as a way to make a buck. What did Jonathon Delacour write, in a nice twist on Talleyrand?

Those who did not blog in the years before the revolution cannot know what the sweetness of blogging was.

Very sweet, indeed. Sweet and impossible–a castle made of spun sugar.

But to return to the story, this is about WordPress and what amounts to actions that could be considered scamming Google.

Google is now removing all of the articles from it’s databases, but one could say that the company was hoist on its own petard (following along with English usage that Tallyrand would appreciate) with this action–its own pagerank was used against the company. Perhaps if it wasn’t so easy to be gamed, events like this wouldn’t occur.

Still, this is using weblogs to play the system, and not really different than what the comment spammers do, though at least this isn’t in our space.

I learned about the WordPress article through Stavros who wrote:

I challenge you to think about the creative output of artists and artisans whose work has touched you. Think of your favorite books, your favorite paintings. That piece of handmade furniture or that gloriously handtooled little application. The music you listen to or the writers-on-the-web you read because they get into your heart and fill you with the ineffable, simple joy of being alive and having a mind. I wonder how many of them would have done their work whether or not they eventually got paid for it. My guess is ‘most’.

I’m not saying that people shouldn’t be paid. Hell, if I could get paid for making the things I make because there’s something inside me that impels me to do it, I’d be thrilled. It’d be a dream come true, by crikey. But I do it, regardless. And so do you, probably, if you’re reading this.

For some reason I’m reminded of Michelangelo and the Sistene Chapel. Michelangelo didn’t like to paint, he prefered sculpture. He didn’t even want to do the work, and only did so after pressure from the Pope. And then there was the fee.

There’s art, and then there’s art.

Bottom line is: do you like WordPress? Do you like using WordPress? Can you still get it for free? Is it still GPL? Then perhaps that’s what should be focused on, and however or whatever Matt does with the WordPress page is between him and Google; because what matters is the code, not the purity of actions peripherial to the code, or its release.

I am also reminded of the story of the Roman general returning in triumphant parade through the city after a great victory; and the man who stood behind him in the chariot, holding the victory wreath made of leaves over his head. “Thou art mortal”, he would whisper, over an over again into the general’s ear, as reminder that no matter how great the triumph, how beloved of the people, the general is, after all, only human.


WordPress, Inc. first employee on this issue.