Copyright RDF

The little CC license that could, or when technology is all busted up

Recovered from the Wayback Machine.

Phil Ringnalda points to the new Yahoo Creative Commons search engine and notices that because the engine is relying purely on links to CC licenses to pull out content that is supposedly licensed as CC, there is going to be a lot of confusion related to what is, or is not, CC licensed.

An issue with CC has always been how to attach CC license information in such a way that automated processes could work with it. The solution has been to use RDF/XML embedded within HTML comments to indicate what is licensed on the page. However, this is kludgy and doesn’t validate within XHTML and people are dropping it, and just including the link to the specific license. More, even if they include the RDF/XML they do so in such a way that it looks like everything in the page is under the specific license–HTML, writing, CSS, photos, whatever.

In other words, they take the rich possibilities inherent with using RDF, and dumb it down until it’s equivalent to the link.

Phil then pointed out that Yahoo releasing this search that just looks for links to the license in a document, and doing so without any legal disclaimers, warnings, or asides, is about the same as somebody accidentally putting a GPL license on the next version of Windows. In other words: it’s a a really dumb move:

But if I was the Yahoo! lawyer who vetted their Creative Commons search, and let it loose without any disclaimer that “Yahoo! makes no assertion about what, if any, content in these results is actually offered under a Creative Commons license” I’d be hanging my head in shame.

To make matters worse, in the associated FAQ for the new search is the following:

This search engine helps you quickly find those authors and the work they have marked as free to use with only “some rights reserved.” If you respect the rights they have reserved (which will be clearly marked, as you’ll see) then you can use the work without having to contact them and ask. In some cases, you may even find work in the public domain — that is, free for any use with “no rights reserved.”

Yup. I think this is a case for the new Corante legal weblog.

I tried the search with my weblog’s name, and found one interesting result: the bbintroducingtagback tagback in Technorati. It seems that Technorati has linked to one of the CC licenses that allows non-commercial use. But used in the way it is, it implies that all the material in the page is licensed this way. Wait a second, though: that’s my photo in the page, pulled in from Technorati via flickr. I don’t license my work as CC–it’s still too damn vague a licensing, usually applied badly (as we’re seeing now).

(Marius, what do you think about that? And this picture is still too cute for words.)

Phil calls this accidentally by link association form of CC licensing, viral and viral it is, indeed; through bad implementations of a vague license, I may, by allowing my photo to be copied (while holding all rights), have lost rights to that photo by implication and effect. At a minimum. who holds the copyright on the photo has been lost when it filters through both the Technorati tag and the search engine results.

I’ve been in a discussion about the CC license and the issue of how to record more specific information with Mike Linksvayer (who is on the staff at cc) at Practical RDF. I brought up the issue of lack of precision in the licensing and Mike mentioned that one approach CC is looking at is to use, again, the ‘rel’ attribute as a way of marking metadata. But this can only go so far — it’s really not much more than just linking to the license and assuming this implies usage.

(And, frankly, our use of ‘rel’ is becoming a bit of a stretch–we’re trying to stuff all the meaning in the internet in one little bitty attribute.)

The approach I’m using for complex metadata (which is what CC is) in Wordform is to generate a separate RDF/XML feed that explicitly states which element is licensed, which isn’t within a page, and exactly how the licensed element can be used (among other metadata). I link to this page through a LINK element in the header, as many of you do with auto-discovery of feeds right now. However, Mike’s response to this was:

A separate RDF file is a nonstarter for CC. After selecting a license a user gets a block of HTML to put in their web page. That block happens to include RDF (unfortunately embedded in comments). Users don’t have to know or think about metadata. If we need to explain to them that you need to create a separate file, link to it in the head of the document, and by the way the separate file needs to contain an explicit URI in rdf:about … forget about it.

But if we don’t explain to people how all this works, and provide a way for folks to be more precise, problems like the Yahoo CC search and the Technorati tag page are going to continue. By ‘protecting’ people from the technology, we are, in effect, doing more to harm them then help them.

What we should be doing is providing the tools to allow people to use rich metadata, richly; not make assumptions that “people can’t deal with it” and then dumb it down accordingly. We should be helping people understand how to use something like the CC license wisely and effectively–using clear, non-technical language to explain how all the bits work–not depend on technology to somehow ‘guess’ what a person wants and act accordingly.

Because as we’ve seen, technology almost invariably guesses wrong.

RDF Semantics

Dumbing down of America

A recent spate of postings at Planet RDF revolve around a two-day session on SPARQL that’s coming up in Europe. It was reading through these that something I noticed recently became more apparent: that most of the semantic web effort, or the effort that’s involved with RDF, is happening in Europe (with some side trips into Canada when the weather is good).

In the United States, on the other hand, most of the discussion is about folksonomies. We are a nation filled people raising excited fingers from both coasts to point at delicious, flickr, Technorati, and Wikipedia; matched with solemn assurances that these new ‘bottom up’ systems are going to kick the butt of ‘formal’ ontologies.

Leaving aside whether one would want a doctor who learned biology the ‘folksonomic’ way, is there a geographical split to the direction of study for the semantic web? Are ‘folksonomies’ becoming the fast food of semantics–the McDonald’s of taxomonies? If so, then are we in the US going to end up with obese vocabularies, barely able to clasp the belt of understanding around their middles?

And I want to know why events like these never happen in St. Louis. Is it an European/Canadian plot to slowly dumb down America until they can quietly invade us one day, and we don’t even know it until a tag appears in Technorati labeled “AllYourMetaBelongToUs”?

All I can say is I didn’t vote for him!

As for having a meeting here, we have beer, too. Good beer. In fact, Budweiser is located her…

As for having a meeting here, we have wine, too. Good wine. Stomped by only the finest squirrel and beaver.


Danny Ayers: Man of steel

Aside from being a terrific dad to several cats and a cute dog, including my god-daughter Sparql, and a good writer, patient advocate of RDF, artist, and writer, Danny Ayers is also a very good hearted man.

It’s an honor to know him. I haven’t met him, but it’s still an honor to know him.

Thanks, Danny. For dropping some positive words when this tired old writer needed them.


Update: Yahoo search

I had made an assumption that Yahoo Search was using the RDF/XML embedded with the CC license information to build its search results; Mike Linksvayer, though, was kind enough to clarify in comments that the company is using the CC license links, only, to capture this information.

This is disappointing, as I feel that there is more about the CC licensed objects that Yahoo could provide and doesn’t because it’s only after the links. That’s about the same as running a mine for rubies and tossing aside the diamonds you find.

Mike also mentioned about the use of RDF-A to bypass problems with embedded RDF/XML. Trying to define yet another new syntax when there’s an option already available doesn’t make sense. The RDF/XML Syntax document stresses the use of <LINK> for linking to a separate RDF/XML document with whatever metadata is defined for the resource. This is a good approach, and I’m not sure why folks are resistent to this. It’s not as if the extra documents will take up a lot of space; for dynamic systems, such as many of the ones we’re using today for weblogging, commerce, and so on, the document can be generated on demand.

A scenario for use with CC could be that when the CC license is generated, the person is told to create a file and copy in the generated RDF/XML. Then to take this LINK and add it to the header of the page. If they also want to add a icon and a link to the license in human readable format, then copy this link and put it into the page.

Is this that much more complicated for the people? Yes and No.

No in that people who host their own sites could probably do this without much problem; especially if tools start providing ways of editing pages on the site. However, for hosted sites, this is a problem – and will continue to be a limitiation of these types of sites. Now, a smart hosted site will be one that eventually gets that they need to provide some mechanism to allow for this type of activity. But until then, yes this is a limitation.

But CC could solve this for the hosted sites, by hosting the license files themselves and giving the person the link to the file to put into their document. Even with a weblogging tool, you could do this just by embedded a tag for the individual file name as the name of the metadata file into the header.

Eventually, we need ways of merging data for many uses into these pages. One way would be to provide the RDF/XML document URI to these tools, and the tools would then read in the existing RDF/XML and add the additional statements. Another would be for tools to provide a way of reading in a block of RDF/XML, pull out the individual statements, and then merge into those that already exist.

There’s code everywhere to do this type of data merging, and best of all: it’s RDF/XML, which means you don’t have to worry about namespaces and collision.

All we would need, then, is nice search bots that grab this and pull all this info into a nicely consumable spot. With API that returns individual data query results, or RDF/XML.

Yahoo! Yahoo! *knock knock knock* Opportunities knocking. Don’t blow it.

Technology Weblogging


Just a quick FYI in how this is going:

I need to integrate fulltext in the application. This allows people to view a single page in a multi-page posting.

I’m still trying to get the RDF meta-data component finished, using RAP (RDF API for PHP). Some troubles with data updates.

Still hunting down SQL statements that have been embedded in the process files, and isolating them in the backend.

Few other odds and ends. I had thought about not worrying about multi-blog support, but I think I will add this in, after all. I think all in all it would be easier to add it in from the beginning then to try and incorprate after the tool’s been used.

Lot’s of work. Most of it fun. Really like the metadata thing, and consider the discussion about datablogging timely, too. It’s not going to be that polished, though, because the metadata functionality will be an add on, whereby people provide a vocabulary and the functionality enables it for each post. But I agree with Danny: this is the perfect use for RDF/XML.