Category: Technology

Web

Wikipedia and nofollow

Post author By Shelley Powers
Post date January 22, 2007

That bastard Google warp-around nofollow rears its ugly little head again, this time with Wikipedia. Jimmy Wales, chief Pedian has issued a proclamation that Wikipedia outgoing links will now be labeled with ‘nofollow’, as a measure to prevent link spam.

seomoz.org seems to think this is a good thing:

What will be interesting to watch is how it really affects Wikipedia’s spam problem. From my perspective, there may be slightly less of an incentive for spammers to hit Wikipedia pages in the short term, but no less value to serious marketers seeking to boost traffic and authority by creating relevant Wikipedia links.

Philipp Jenson is far less sanguine, writing:

What happens as a consequence, in my opinion, is that Wikipedia gets valuable backlinks from all over the web, in huge quantity, and of huge importance – normal links, not “nofollow” links; this is what makes Wikipedia rank so well – but as of now, they’re not giving any of this back. The problem of Wikipedia link spam is real, but the solution to this spam problem may introduce an even bigger problem: Wikipedia has become a website that takes from the communities but doesn’t give back, skewing web etiquette as well as tools that work on this etiquette (like search engines, which analyze the web’s link structure). That’s why I find Wikipedia’s move very disappointing.

Nick Carr agrees writing:

Although the no-follow move is certainly understandable from a spam-fighting perspective, it turns Wikipedia into something of a black hole on the Net. It sucks up vast quantities of link energy but never releases any.

Seth Finkelstein notices something else: WIKIPEDIA IS NOT AN ANARCHY! THERE IS SOMEBODY IN CHARGE!

The rel=”nofollow” is a web extension I despise, and nothing in the time it was first released–primarily because of weblog comment spam–has caused me to change my mind. As soon as we saw it, we knew the potential existed for misuse and people have lived down to my expectations since: using it to ‘punish’ web sites or people by withholding search engine ranking.

Even when we feel justified in its use, so as to withhold link juice to a ‘bad’ site (such as the one recently Google bombed that had misleading facts about Martin Luther King) we’re breaking the web, as we know it. There should be no ‘good’ or ‘bad’ to an item showing up on a search list: if one site is talked about and linked more than another, regardless of the crap it contains, it’s a more topically relevant site. Not authoritative, not ‘good’, not ‘bad’, not definitive: topically relevant.

(Of course, if it is higher ranked because of Google bombing of its own, that’s a different story, but that’s not always the case.)

To return to the issue of Wikipedia and search engine ranking, personally I think one solution to this conundrum would be to remove Wikipedia from the search results. Two reasons for this:

First, Wikipedia is ubiquitous. If you’ve been on the web for even a few months, you know about it and chances are when you searching on a topic, you know to go directly to Wikipedia to see what it has. If you’ve been on the web long enough, you also know that you have to be skeptical of the data found, because you can’t trust the veracity of the material found on Wikipedia. I imagine that schools also provide their own, “Thou shalt not quote Wikipedia”, for budding young essayists.

Reason one leads to reason number two: for those folks new to this search thing, ending up on Wikipedia could give them the impression that they’ve ended up with a top-down authority driven site, and they may put more trust into the data than they should. After all, if they’re not that familiar with search engines, they certainly aren’t familiar with a wiki.

Instead of in-page search result entries, Google, Yahoo, MSN, any search engine should just provide a sidebar link to the relevant Wikipedia entry, with a note and a disclaimer about Wikipedia being a user-driven data source, and how one should not accept that this site has the definitive answer on any topic. Perhaps a link to a “What is Wikipedia?” FAQ would be a good idea.

Once sidebarred, don’t include Wikipedia in any search mechanism, period. Don’t ‘read’ its pages for links; and discard any links to its pages.

Wikipedia is now one of those rare sources on the web that has a golden door. In other words, it doesn’t need an entry point through a search engine for people to ‘discover’ it. If anything, its appearance in search engine results is a distraction. It would be like Google linking to Yahoo’s search result of a term, or Yahoo linking to Google’s: yeah, we all know they’re there but show me something new or different.

More importantly, Wikipedia needs to have “Search Engine General’s” warning sticker attached to it before a person clicks that link. If it continues to dominate search results, we may eventually get to the point where all knowledge flows from one source, and everyone, including the Wikipedia folks, know that this is bad.

This also solves the problem about Wikipedia being a Black hole, as well as the giving and taking of page rank: just remove it completely from the equation, and the issue is moot.

I think Wikipedia is the first non-search engine internet source to truly not need search engines to be discovered. As such, a little sidebar entry for the newbies, properly annotated with a quiet little “there be dragons here” warning, would eliminate the spam problem, while not adding to a heightened sense of distrust of Wikipedia actions.

One other thing worth noting is is seomoz.org’s note about a link in Wikipedia enhancing one’s authority: again, putting a relevant link to Wikipedia into the search engine sidebars, with a link to a “What is Wikipedia?” FAQ page, as well as the dragon warning will help to ‘lighten’ some of the authority attached to having a link in the Wikipedia. Regardless, I defer to Philipp’s assertion that Wikipedia is self-healing: if a link really isn’t all that authoritative, it will be expunged.

Web

Article pulled from Google’s database?

Post author By Shelley Powers
Post date January 22, 2007

Post wasn’t pulled, just not propagated across all the data centers. Did I happen to mention I haven’t had a good night’s sleep for the last few days? Disregard the paranoia.

However, there is a silver lining. Thanks to Seth for pointing out this Google data center tool. Put in the search term, and then switch among the data centers.

JavaScript RDF

To JSON or not to JSON

Post author By Shelley Powers
Post date January 4, 2007

Recovered from the Wayback Machine.

Dare Obasanjo may be out of some Ajax developers spheres….actually *I’m probably out of most Ajax developers spheres…but just in case you haven’t seen his recent JSON/XML posts, I would highly recommend them:

The GMail Security Flaw and Canary Values, which provides some sound advice for those happily exposing all their vulnerable applications to GET requests with little consideration of security. I felt, though, that the GMail example was way overblown for the consternation it caused.

JSON vs. XML: Browser security models. This gets into the cross-domain issue, which helped increase JSON’s popularity. Before you jump in with “But, but…” let me finish the list.

JSON vs. XML: Browser Programming Model on JSON being an easier programming model. Before you jump in with “But, but,…” let me finish the list.

XML has too many Architect Astronauts. Yeah, if you didn’t recognize a Joel Spolskyism in that title, you’re not reading enough Joel Spolsky.

In the comments associated with this last post, a note was made to the effect that the cross-domain solution that helped make JSON more popular doesn’t require JSON. All it requires is to surround the data returned in curly brackets, and use the given callback function name. You could use any number of parameters in any number of formats, including XML, as long as its framed correctly as a function parameter list.

As for the security issues, JSON has little to do with that, either. Again, if you’re providing a solution where people can call your services from external domains, you better make sure you’re not giving away vital information (and that your server can handle the load, and that you ensure some nasty bit of text can’t through and cause havoc).

I’ve seen this multiple places, so apologies if you’ve said this and I’m not quoting you directly, but one thing JSON provides is a more efficient data access functionality than is provided by many browser’s XML parsers. Even then, unless you’re making a lot of calls, with a lot of data, and for a lot of people, most applications could use either JSON or XML without any impact on the user or the server. I, personally, have not found the XML difficult to process, and if I wanted really easy data returns, I’d use formatted HTML–which is another format that can be used.

You could also use Turtle, the newly favored RDF format.

You could use comma separated values.

You could use any of these formats with either the cross-domain solution, or using XMLHttpRequest. Yes, really, really.

As was commented at Dare’s, the cross-domain issue is not dependent on JSON. HOWEVER, and this one is worthy of capitals: most people ASSUME that JSON is used, and you’re not returning JSON, you better make sure to emphasize that a person can a) choose the return format (which is a superior option), and/or b) make sure people are aware if you’re not using JSON by default with callback functions.

As for using JSON for all web service requests, give us a break, mate. Here’s a story:

When the new bankrupty laws were put into effect in the year 2005, Congress looked around to find some standard on which to derive ‘reasonable’ living costs for people who have to take the new means test. Rather than bring in experts and ask for advice, their eyes landed on the “standards of living expenses” defined by the IRS to determine who could pay what on their income tax.

The thing is, the IRS considers payment to itself to probably be about as important as buying food and more than paying a doctor. The IRS also did not expect that their means test would be used by any other agency, including Congress to define standards for bankruptcy. The IRS was very unhappy at such when it was discovered.

In other words, just because it ‘works’ in one context doesn’t mean it works well in all contexts: something that works for one type of application shouldn’t be used for all types of applications. Yes, ECMAScript provides data typing information, but that’s not a reason to use JSON in place of XML. Repeat after me: JavaScript/ECMAScript is loosely typed. I’m not sure I’d want to model a data exchange with ‘built-in typing’ based on a loosely typed system.

Consumers of JSON or XML (or comma separated values for that matter) can treat the data they receive in any way they want, including parsing it as a different data type than what the originator intended. Yes, JSON brings a basic data typing, and enforces a particular encoding, but for most applications, we munge the returned data to ensure it fits within our intended environment, anyway.

What’s more important to consider is: aren’t we getting a little old to continually toss out ‘old reliables’ just because a new kid comes along? I look at the people involved in this discussion and I’m forced to ask: is this a guy thing? Toss out the minivan and buy the red Ferrari? Toss out the ‘old’ wife for a woman younger than your favorite shirt? Toss out old data formats? Are the tools one uses synonymous with the tools we have?

Snarky joking aside and channeling Joel Spolsky who was spot on in his writing, just because a new tech is sexy for it’s ‘newness’ doesn’t mean that it has to be used as a template for all that we do.

The biggest hurdle RDF has faced was it’s implementation in XML. It’s taken me a long time to be willing to budge on only using RDF/XML, primarily because we have such a wealth of tools to work with XML, and one can keep one’s RDF/XML cruft-free and still meaningful and workable with these same tools. More importantly, RDF/XML is the ‘formal’ serialization technique, and there’s advantages to knowing what you’re going to get when working with any number of RDF APIs. However, I have to face the inevitable in that people reject RDF because of RDF/XML. If accepting Turtle is the way to get acceptance of RDF, then I must. I’d rather take another shot at cleaning up RDF/XML, but I don’t see this happening, so I must bow to the inevitable (though I only use RDF/XML for my own work).

We lose a lot, though, going with Turtle. We loose the tools, the understanding, the validators, the peripheral technologies, and so on. This is a significant loss, and I’m sometimes unsure if the RDF community really understands what they’re doing by embracing yet another serialization format for yet another data model.

Now we’re doing the same with JSON. JSON works in its particular niche, and does really well in that niche. It’s OK if we use JSON, no one is going to think we’re only a web developer and not a real programmer if we use JSON. We don’t have to make it bigger than it is. If we do, we’re just going to screw it up, and then it won’t work well even within that niche.

Flickr and other web services let us pick the format of the returned data, Frankly, applications that can serve multiple formats should provide such, and let people pick which they use. That way, everyone is happy.

Ajaxian: Next up: CSV vs. Fixed Width Documents. *snork*

*most likely having something to do with my sense of humor and ill-timed levity.

JavaScript Writing

Learning JavaScript errata

Post author By Shelley Powers
Post date January 2, 2007

Recovered from the Wayback Machine.

If wishes were horses, book authors would have a herd. All too often you see the ‘oops’ and such only after the book is in print. In my case, I’ve worked with JavaScript for so long (since the very beginning) I brought along a couple of bad habits that made it into the book.

One new errata that is going in for the book is the following:

Several examples in the book use document.write, but with an XHTML doctype. The document.write or document.writeln functions do not work with XHTML when the page is served with the application/xhtml+XML MIME type. The examples in the book work with the most common browsers because the examples have an .htm extension. These pages are served up with an HTML MIME type, regardless
of which DOCTYPE was used, therefore the use of document.write or innerHTML does not fail. When the page is loaded with an XHTML MIME Type, though, the examples will fail.

The examples will work in the most common browsers, and to ensure they continue to do so, you can change the DOCTYPE to an HTML one, though you’ll need to modify automatic closings such as that on the meta tag (remove the ending ‘/’) if you want them to validate.

The author is apologetic for not explaining this in Chapter 1. The alternative is to use the DOM to create new page elements and append to the document, but since this wasn’t covered until later in the book, document.write was used instead.

Typically, you’ll want to use the DOM, just because this ensures the examples work fully with XHTML, as well as HTML. To see this demonstrated more fully, the author is working on modified examples using DOM calls and ensuring the examples work as XHTML. As soon as these are finished, they will be posted and a note added to this errata page.

In this, the DOCTYPE is XHTML but the page is served up as HTML. As Anne Van Kesteren succinctly puts it it doesn’t matter what DOCTYPE you use if the page is served up as text/html. And yes, I am using document.write and innerHTML, bad me.

I don’t necessarily share in the universal condemnation of document.write or innerHTML, especially when you’re learning. I have 98 examples in the book, and a simple document.write sure saves on book space rather than having to use the DOM to get document, and use create element and create text node and append and yada yada. What I should have done, though, is create a library and make my own version of ‘write’ that is XHTML friendly, and used this. Note, though, that in the book I don’t cover the DOM until chapter 11, so the only alternatives I had were document.write or an alert, and the latter doesn’t work if you’re using focus and blur events.

However, in pages where I used document.write, I should have used an HTML DOCTYPE, and also made mention of document.write and its incompatibility with XHTML. I should also have covered this in more detail in chapter 1. I should have also covered quirks mode in more detail in chapter 1.

As for innerHTML, now that one is open for debate. There’s bunches of Ajax developers who will only give up their innerHTML if you pry if from their cold, dead browser. BUT, it’s also not XHTML friendly, though it is the handiest darn thing (and again, one could create a library alternative).

The reason why these are a problem is they both allow us to add XHTML formatted data directly to the document, but without going through the XHTML validation process. When one serves valid XHTML, one doesn’t want one’s page developer putting crufty XML or HTML into one’s perfectly lovely XML formatted document. Gives one heartburn, causes one to tear hair out, does odd things to one’s browser and so on.

For now, I asked O’Reilly to put in this errata. Next, I’m going through all of the examples and updating them to be more forward looking and using the DOM, only. These will be provided as secondary downloads, because comparing the two–the original example and the modified–is a learning experience in itself.

The use of document.write and innerHTML is incidental to most examples. I only used such to print some result out or demonstrate some other feature of JavaScript. Still, if I’m going to stress best practices, I blew it with both of these. All I can say is I think it is a good book regardless, these errors aren’t that common or that essential, and mea culpa. Twenty lashes with Firebug.

Here’s a discussion on the problem and code workaround. Note that both Google Maps and Google’s adSense use document.write, so I’m in good company–the use of these really are ubiquitous, but NOT a best practice, so no excuse for me.

JavaScript Writing

New Year

Post author By Shelley Powers
Post date January 2, 2007

Recovered from the Wayback Machine.

I just posted a note at Mad Techie Woman about an error in the Learning JavaScript book I could kick myself for. It has to do with quirks mode, and the fact that browsers interpret an XHTML document as HTML when served with an .htm extension, and the fact that I used document.write as a way of doing a quick print out of a JS example. Sure it works now, but it’s a bad habit to get into and I shouldn’t have done so.

Some people might think it’s easy to write on JavaScript because it’s not a ‘real’ language, but it’s actually the opposite: an extremely difficult language to write about because the environment that surrounds it is in a constant state of flux. People use old browsers, new browsers, old markup, new markup, old styling, new CSS, and so on. When you’re writing on C++ or Java, there is a minimum level of compliance that must be met or the damn language just doesn’t work.

Not with JavaScript, oh no, siree. You can do almost anything and it will work with most browsers, as they labor mightily to ensure applications, old and new, work. Rightfully so, but as in the case of my “document.write”, you can get in a habit of using an easy and simple (and ubiquitous) piece of code, and it is not Right, or Best Practice, and when you realize what you’ve done (after the book is published, of course), you kick yourself ten ways to Sunday for Being So Stupid.

It’s lovely being an author with its constant reminders of how dumb you are. Or being a tech with the constant reminders of how much smarter that 18 year old is then you. It’s only followed by being a photographer whose work is received with silence, or a weblogger subject to so many variations of sneers and disdain that upon presenting ourselves as such, should be immediate grounds for a poor dear, at least. What is a weblog? What is a weblog? It’s toilet paper. It’s the toilet paper we use to blot our blood, our tears, the sweat of our work, the grime of our living, the dirt of our dishing, and to fold into pretty swans when all is well (though don’t try to float these swams because they’re not meant to last).

You have to question the mental health of anyone who not only does any one of these acts, but does all four. And doesn’t get paid for two, and isn’t especially rolling in dough for the third and fourth (that 18 year old, you know).

That is the life of a writer of tech books, which is why most people only do one and then run, screaming. Why don’t I just place my head in the path of a truck, rather than go through the pain in dribbles and dabs. I’d made the news, then (“It sounded just like a ripe watermelon…”).

But this is about the New Year, not being an author or the pain of a thousand words we want to take back. I started to write a New Year post and got as far as the following:

I’ve been trying to come up with a last post for the end of the year, something positive and hopeful, but it seems like I keep putting up stories of anger or sadness.

I think that rather than being well informed via the internet, the constant stream of news batters at you until you eventually either give into the despair, or become completely indifferent. Years ago, when we got our news from the newspaper or the evening news, we had a chance to discuss the news, with friends, co-workers, family, before getting fed the next burst. Now, our friends are just as likely to be the source of the news as Fox or CBS, and we don’t talk as much as we broadcast at each other. It’s overwhelming.

I realized that rather than sounding optimistic or upbeat, I come across as hanging lower than a slug’s belly, and feeling about as oogie. Oogie. Yes, that is a word. It’s a beautiful word. I am an author and I can decree what is a word, and what is a beautiful word, and make it so in print. I’ve done so frequently.

Where was I? Slug’s belly. No one has an ‘excuse’ for being low, but I have traipsed legitimate steps down into the murky waters of Feeling Kinda Shitty, so I feel vindicated for my lowness, if not necessarily excused for same (“…people being excused as in, ‘there are people in the world who are tortured, imprisoned, forced to work at Google, who is now fashionably evil’, yada yada…”)

I worry and fret over every error in “Learning JavaScript”, have become obsessive with trying to ensure absolutely no error in “Adding Ajax”–a task that’s as impossible as it is imperative, because the Ajaxian world is not a tolerant world–and I’m facing a possible court battle against this country’s largest arbitration forum, without any legal help other than advice from Smart People (most of whom can’t recommend I take the action I’m planning because though it may end up helping lots of people in the end, it could end up costing me everything if the gamble fails).

Some people want a free computer from Microsoft; I’d settle for a good night’s sleep. Oh, and perky breasts again, which is as likely as getting anything truly free from Microsoft, or any other Big Corporation. (Free as in, ‘no strings attached’, which could, in an odd way, be used to describe my breasts).

Yet, yet, for all this doom and gloom (and “My God, Shelley, why do these things happen to you? Don’t you realize that weblogging is reserved for good times, marketing, or bleeding The Right Way; flowing gushily and with exquisite pain–not your tawdry drip drip, drip after inexorable drip: cut the vein, put yourself out of misery”)…I digress, yet for all this doom and gloom I wake, I eat, I pet my kitty, I walk in the sun, I correct the errors, I kick at the box I find myself in, I sneer as I’m sneered at in turn–I continue, because that’s what people do, you know; we continue. We’re all bleeding, and we trail metaphorical gore behind us, but we continue. To quote the good folks of Firefly, “That makes us mighty”.

So, from one of the Mighty to the rest of the Mighty, Happy New Year.