Categories
Semantics

Funky to go

Joi Ito, presumably in response to this newswrote the following about a possible Microsoft strategy as regards to Google, searching, and metadata:

Google likes scraping html, mixing it with their secret sauce and creating the all-mighty page ranking. Anything that detracts value from this rocket science or makes things complicated for Google or easy for other people is probably a bad thing for Google.

I have a feeling they (Microsoft) will embrace a lot of the open standards that we are creating in the blog space now, but that they will add their usual garbage afterwards in the name spaces and metadata so that at the end of the day it all turns funky and Microsoft.

That’s a good read. The power behind Google is that the company owns the algorithms used to find data from the featureless mess of HTML that exists today. The more sophisticated the data storage, the less important the algorithms, and the less edge that Google has. Microsoft, by controlling the origination of much of this data can build in the missing knowledge about the data and basically undercut the ground on which the House of Google is written.

I also agree with Joi — Microsoft will then make it proprietary by their own funkiness. For those who think RDF is bad, try working with MS generated XML. If you’ve ever seen what the company can do to HTML generated from Word, I rest my case.

I don’t agree with Dave Winer, who wrote today:

Speaking of people who could be friends who are full of shit — today Joi Ito sings a well-sung but false song about Microsoft screwing with nascent standards. Joi, in RSS-land, MS is playing fair and square, so far (and so are AOL and Yahoo, btw). The people who are pissing in the soup are people you don’t have the guts to criticize. You’re in their blogroll, they’re in yours. Dig deeper dear Joi, really disassemble the lunacy of our little world, and do what you can to unravel it. Then, when and if Microsoft screws with us, you’ll have some credibility. Right now you haven’t got a leg to stand on.

I presume from all that blogroll talk that Dave means Six Apart, or the Pie/Echo/Atom effort.

I have no doubts that Microsoft won’t muck with RSS at this time — why should it? In the overall scheme of things, it’s only a syndication format, nothing more. But I wouldn’t be surprised if Microsoft isn’t working on creating its own metadata and ontology XML vocabulary and data model, one that it will share with others, of course, putting it at the center of knowledge-based query in the years to come.

Original and comments archived at Wayback Machine

Categories
RDF

Jena Week: final examples

Recovered from the Wayback Machine.

I converted the rest of the Jena1 examples to Jena2 without any major reworking being necessary. At the end of this post is a zipped file of all of the Java source that you can download, use to start your own explorations.

In Example 7, reading in an RDF/XML file in is no different with Jena2 than it was with Jena1. The examples don’t change (other than class structure and creating the memory model using the factory object), except that iterating through statements using StmtIterator exhibits better Java behavior now. StmtIterator now returns an Object when you use next rather than a Statement, and you’ll need to cast it. Or you can use the new nextStatement function call:

// next statement in queue
Statement stmt = iter.nextStatement();

Examples 8 and 9 work against graphs/models persisted to a relational database. I used MySQL in Windows 2000 to test the examples, but you can also use Oracle or other supported database systems.

There have been some significant changes, for the better, with using a relational data model for persistence in Jena2. For instance, you don’t have to specify a storage type or preformat the storage as you did with Jena1; this is all automated in the new classes, by passing in the type of database when you create the model. Another change is that you don’t use the RDBModel.create method, which is deprecated – you can use a new class, ModelMaker, or you can call the static function, createModel.

In the following, the driver is loaded, a connection is established and a model created. Once created, a serialized RDF/XML graph is read.

// load driver class
Class.forName(“com.mysql.jdbc.Driver”).newInstance();

// establish connection – replace with your own conn info
Connection con = DriverManager.getConnection(sConn, sUser, sPass);
IDBConnection dbcon = new DBConnection(con, “MySQL”);

ModelRDB model = ModelRDB.createModel(dbcon, “one”);
model.read(sUri);

Check the documentation for Jena2, and in the DB directory, you’ll find a discussion on migrating Jena1 databases to Jena2, as well as a good description of these changes, and example code.

One interesting aspect of storing Jena within a relational database, especially as it relates to previous postings, is whether Jena2 maintains a reference to the actual source of the external data, and from what I can see, it does not.

Within Jena, you create a graph (termed a model in Jena) and can load many serialized sub-graphs (individual RDF/XML files) into it, or add statements directly using the API. I did so, loading in several FOAF files from people I know. I then looked through the data to see if the source of the sub-graph, the actual file, was maintained and couldn’t find anything.

Once the sub-graphs are merged into one graph, all blank nodes are resolved, and any traces of their separate natures are not maintained. This is proper behavior for RDF according to the RDF specifications, and this merging of sub-graphs is one of strengths of the RDF model.

Now, you can perform a little deductive work and find that a person’s properties have a specific blank node identifier, which is then the object of a “knows” predicate, which is then the property of another blank node identifier of another person with a given name – a bit of a fun challenge with RDQL – but there’s nothing I can see that identifies the source of the actual RDF/XML file, itself.

It could be among the information stored in the BLOB that makes up the graph’s name, but the only way to know this for sure is to examine the contents of BLOB itself, or to find a specific class that allows us to interrogate this data. However, that breaks into new territory not covered in the book examples, and best left for further essays.

Download file

Categories
Diversity RDF

Accept

My roommate surprised me with a wonderful gift tonight, a movie I’ve been trying to find on DVD for a long time. The movie is “Mr. Baseball” with Tom Selleck. It wasn’t a popular movie, and I doubt you’ve heard of it. It’s also not especially ‘artsy’ but I still love it. *

I wish I could say that I identify with the lovely Aya Takanashi, but to no avail. Her gentle refined sense of acceptance sounds wonderfully peaceful, and is exceedingly elegant, but I never have been one to just roll with the punches. I’m not particularly elegant, either.

No I tend to identify more strongly with Jack, Mr. Baseball. Its not as if I chew tobacco, maintain a rigid inflexibility, have a hairy chest, and am rude to people in their own land, the defining symbols of the protagonist; it’s more a matter of having a strong sense of self, a streak of stubbornness and defensiveness, and not always to the good.

When I say, strong sense of self, this doesn’t mean that I’m not a team player, I can be. My problem, as it was Jack’s, is I tend to play in the wrong teams. And then I’m too stubborn to admit it.

I watched this movie tonight as I thought about some of the discussions I got into this week. Especially the discussions about RDF. This has not necessarily been a great week for my book on RDF/XML because it’s caught up in the very real wars between the XML ‘view source’ people, and those who support RDF and RDF/XML.

I spent two years working on the Practical RDF book, all the time maintaining one firm decision – it was not going to be a book for the Semantic Web adherents; it was going to be a book for just plain folks. For people like me. I lost some respect from the theoreticians with this approach. Not all, but some. I can name you about 20 long-time RDF adherants who could have done a better job covering the theory behind RDF and the Semantic Web.

My book also tends to fall between the cracks – too RDF/XML for some, not enough RDF for others. And the title doesn’t help: who ever heard of combining ‘Practical’ and ‘RDF’? The title itself generates laughter, sometimes with me, sometimes not.

However, I knew that if the Semantic Web is ever going to become real, it’s going to come about because of the same people who created today’s web, and this book is written for those people. Look in a mirror – that’s who created today’s web, and that’s where the Semantic Web of the future is coming from.

Lately, I’ve been spending considerable time with the Alpha Geeks, the P/E/A revolutionaries, and the XML view source people, and there’s just no return for me in this. Smart, dedicated, and too damn stubborn themselves, they’re good people and they make good team members. But they’re not my team. I’m not an Alpha Geek or a P/E/A revolutionary. I’m definitely not an XML view source person.

The technology is important to me, but it’s not a religion. If I support RDF/XML its because I want us to move on and do something with it. What’s more important to me is not that I win wars for RDF/XML; it’s that the technology is accessible and understandable to everyone, not just the Geeks.

I once thought that the disconnect between me and the Alpha Geeks was because they were primarily men, and I was a woman; sometimes the only woman. I realized today that I was wrong – in most cases gender has nothing to do with it. The disconnect is because we come from such different backgrounds, and our focus, interests, and talents are different.

Oh there’s a few pricks who get threatened by any woman smarter than a gerbil, You can recognize them – anytime a woman disagrees with them, they’re either being “hormonal” or “hysterical”. And as we’ve discussed, time and time again, men and women play together differently. But for most of the Alpha Geeks, gender really isn’t the issue. Passion, interest, and focus are, and in these we differ. The differences left me feeling like odd man out, making me defensive, but too damn stubborn to just get out, to realize I need to let go.

It was an epiphany for me, let me tell you. Kick in the pants time.

So, I’m making some changes, starting with closing down the Practical RDF weblog. I’m re-focusing on the …For Poets weblogs. They may not be for everyone, too poetic or wordy by far for many of my Alpha Geek friends and others, but I like them.

*And the noodle dinner scene cracks me up, every time.

Categories
RDF

Blindsiding and forward thinking

Recovered from the Wayback Machine.

When Sam Ruby invited me into an IRC chat last week to talk about creating a compliant RDF/XML syntax for Pie/Echo/Atom, I wasn’t aware that there was an agenda behind this effort. It wasn’t until Sam published the full model at his weblog and discussed how RDF needs to meet XML half way and that there needs to be changes to the RDF/XML syntax for it to be more ‘palatable’, that I became aware that there was a lot of behind the scenes conversations occurring about RDF/XML and Pie/Echo/Atom.

I was disappointed in this effort, not the least of which is that all it did last week is start up the usual anti-RDF crowd; the same crowd who give very little in the nature of specifics about exactly what they dislike. (Sam did give three specifics, and all three were comprehensively addressed in his comments.)

Being an RDF and RDF/XML supporter among anti-RDF crowd feels like being in the middle of a bobbing head factory after an earthquake.

I though the issue was over and done with, though, and I was turning my attention to trying to repair the damage to RDF/XML, when I read with suprise today, this article by Mark Pilgrim at xml.com. I discovered it when I went to Mark’s site and saw Everything considered harmful.

If I was disappointed in last week’s debacle, I was wasn’t made happier that O’Reilly and xml.com published Mark’s article when they declined my suggestion of an article discussing some of the politics surrounding RDF/XML — a suggestion I made over a month ago. And that my own effort last week was, frankly, used against what I support — without me being given fair time for defense except in this weblog. Thank goodness for weblogs, I suppose.

In case you’re not a techie and are curious as to what the technical term for all of this is, it’s called ‘being blindsided’.

Mark discusses four issues of RDF: model, syntax, semantic web, and tools. He stated that people may be communicating at cross-purposes because they’re talking about different issues when they talk about ‘RDF’. Some may like the model, but not the syntax. Others may like the tools, but think the semantic web is a pipedream.

I can agree with Mark’s assessment of the four issues. His opinion about using RDF for Pie/Echo/Atom is: the model forced the people to examine specific aspects of the syntax, which is good; the syntax sucks; the tools that work with the syntax are good (he mentions a Python one, of course, RDFLib); while the semantic web is a pipedream.

I can’t disagree with Mark’s opinion — it is his opinion. But I can disagree with his arguments.

Unfortunately, the example used to demonstrate how difficult the RDF/XML syntax was one created under a specific requirement — create a full, proper, formal syntax using all nuances of the RDF/XML syntax. Because of this, it effectively brings in the most complex aspects of RDF/XML, rather than keeping the model simpler, and more digestible by the XML folks. I wrote an email to Sam Ruby afterwards and said the model could be made simpler, more practical.

Returning to Mark Pilgrim’s article, he said that because the syntax is too complicated for the XML people to manually read and write, this makes RDF/XML untenable for Pie/Echo/Atom. He further states that we RDF/XML supporters can use XSLT to transform the syntax easily, and that when we have RDF support installed, we also have XSLT support installed.

Mark is in error.

First of all, an important objective of RDF/XML is dissemination of data. It’s not the processing it’s the data that’s critical; the more data is organized into RDF/XML, better yet full OWL, the more data is directly accessible by automated tools and bots from machines other than where the XML resides. In addition, data formatted as RDF/XML can be merged automatically, without having any prior knowledge of the vocabulary — another critical factor.

If a person who has the Pie/Echo/Atom syndication feed does not run an XSLT transform on the data to create RDF/XML versions of the data, the web bots then will need to do this, and this makes most data gathering techniques prohibitive.

The power behind RDF/XML is to have people generate data files that can be accessed by RDF tools, as Technorati is doing with our FOAF RDF/XML files at this very moment. Most webloggers I know don’t have RDF or XSLT capability or experience. They don’t have XML capability or experience, either, if it comes to that.

Second, what makes Mark think that people who work with RDF/XML are familiar with XSLT or have this installed automatically with our ‘tools’? I’ve worked with RDF/XML in six different programming languages, but I don’t work with XSLT — I think XSLT is the ugliest damn thing in the world to work with. I don’t have XSLT support installed.

Now, tools can be modified to add support to generate the RDF/XML format explicitly. I can create an MT template to do this, and I’m sure other tools can also do this. This helps RDF/XML users, but unfortunately, this doesn’t help Pie/Echo/Atom, or the Pie/Echo/Atom syndication users.

With Pie/Echo/Atom being vanilla XML, it becomes extremely difficult to add in new extendsion to the data. These have to become basically Pie/Echo/Atom extensions, which means they are single-purpose only, and there isn’t necessarily common agreement about what these extensions mean — the group has to work through each. Secondly, there is no automated way to add support for existing RDF/XML vocabularies that are getting widespread support, such as FOAF and others that are most likely going to be hitting the streets shortly. What has to happen is that transforms need to be made with FOAF and other RDF/XML vocabularies to XML, added into Pie/Echo/Atom, and then transformed back to RDF/XML.

Now, exactly what about this is ‘easier’?

In addition, I can access an RDF/XML file using a web bot or any kind of application, can pull out the statements, can store them, can query them all without making one change to code to add the new vocabulary. Not one change in code. The same ease of extensibility does not apply to vanilla XML. Period.

I asked a question, and no one at Pie/Echo/Atom answered it: Can you use regular XML tools on a Pie/Echo/Atom RDF/XML feed? Without transforms? If the answer is yes, then which format is the more exclusionary?

Regardless, I am disappointed at how this whole thing played out. I thought the RDF/XML community acted with integrity, first supporting the notion that Pie/Echo/Atom would not be RDF/XML and we would have to use transforms (though we knew this would cause problems later for Pie/Echo/Atom and others); secondly, when the issue was brought up again, supporting, in good faith, the creation of the proper RDF/XML model with hopes of further discussion, only to have our effort and our hopes tossed aside.

As for the semantic web — as Kendall Clark stated today, with the candidate release of OWL, we’re on our way. We were only waiting on the specs to get to recommendation state because there was too much flux previously. Now, we have a stable platform in which to work. In some ways, I’m glad we don’t have to worry about Pie/Echo/Atom and it can go its little deadend alley approach of being ‘pure’ XML, because we all have other things to do.

As for the semantic web, and being a dreamer — guilty as charged. I can see the ’semantic web’ in my mind, and it’s so real to me, I can reach out and touch it. If this is a pipedream, then leave the room, because I’m going to smoke it. And I’m going to help make it real.

I could use so many examples from history of dreamers who made things work in the face of those who said they couldn’t be done — proving that the sun doesn’t circle the earth, that we can fly, that we can walk on the moon and hopefully soon, on Mars. That time and light are effected by gravity and actually be able to demonstrate it. To discover a planet in another solar system. That we can talk to someone in England directly on the phone. That some day, we’d be able to sit at a computer, reading this.

That something like a transparent food could be created in all flavors, none of them natural. So put that on your scales, and see if you don’t get jello.

update

One other thing that I thought of after I wrote this article originally. Mark mentioned that through the process of creating the RDF/XML syntax, the RDF model generated questions about the order and the nature of two sets of data — the entries and the contributors. He thought this was a very helpful exercise.

However, Mark neglected to mention how it was the nature of RDF/XML semantically flavored syntax that drove this out. He also neglected to mention how vanilla XML was going to be used to enforce these set membership rules. So, how does one enforce or at a minimum document collection or container semantics using plain vanilla XML?

Categories
RDF Writing

Practical RDF-Generating giggles around the world

Practical RDF has now been immortalized in a comic strip.

Achewood

Hey! I love black licorice!