Recovered from the Wayback Machine.
I just published the final three parts of the Weblog Link Series, on permalinking and archives:
Part 1 – The Impermanence of Permalinks
Part 2 – Re-weaving the Broken Web
Recovered from the Wayback Machine.
I just published the final three parts of the Weblog Link Series, on permalinking and archives:
Part 1 – The Impermanence of Permalinks
Part 2 – Re-weaving the Broken Web
For those of you who want to subscribe to one RSS file that consists of updates for all Burningbird Network weblogs (sans the Wayward Webloggers, which is separate), you can use the Burningbird RSS file.
I generate this file once every hour, pulling together recent entries from all of the weblogs. It features the ten most recent postings across the Network. Note that it is in RSS 1.0 only.
Additionally, the Burningbird Network home page lists excerpts from all recent entries.
Recovered from the Wayback Machine.
Each of the For Poets sites, of which this weblog is a member, has a different photo representing the topic of the site as part of the design of the main page. The photos are compatible with each other and complementary to the page; more importantly, though, they provide a clue, sometimes a very subtle clue, about my view of the topic. The Linux for Poets show a photo of ravens, and I quote the Raven, “Nevermore”, as a sly dig at never more using Windows. The Semantic Web photo with its foggy outlines and confusing lines was perfect for the topic, as semantics is something we all think we understand; pity that none of our understandings agree. This theme is only enhanced by the related poem, with its references to three mountains and three islands and returning to our origins — as if Lady St. Vincenty Millay was a member of the RDF working group. The photo for this weblog, Weblogging for Poets, is a lighthouse on an island, a natural choice when paired with its poem, the words that …no man is an island. Isn’t that the nature of Weblogging, that uniqueness we all extol? No weblogger is an island? Do we not beat each other and our own chests daily with the accouterments of our connectivity? The first three were easy, but the Internet For Poets weblog refused to identify itself for the longest time. I found a seaside picture I took in Astoria, Oregon that I connected fiercly with — strongly enough for me to use the photo. However, unlike the other photos from the other weblogs, I didn’t understand why I wanted to use it. Not at first. The photo is of a tall rock surrounded by an incoming tide, the persistent movement of the waves highlighted by the white froth of their action; the rock is standing strong against the waves, the sight made even more poignant because we know that ultimately the ocean will win and someday, and this rock will be gone. The impermanence of the rock is its beauty. It wasn’t until I was searching for a poem to go with the photo, and found Henry Wadsworth Longfellow’s The Tide Rises, The Tide Falls, that I understood why I picked this particular photo:
The tide rises, the tide falls, The twilight darkens, the curlew calls; Along the sea-sands damp and brown The traveller hastens toward the town, And the tide rises, the tide falls. Darkness settles on roofs and walls, But the sea, the sea in darkness calls; The little waves, with their soft, white hands, Efface the footprints in the sands, And the tide rises, the tide falls. The morning breaks; the steeds in their stalls Stamp and neigh, as the hostler calls; The day returns, but nevermore Returns the traveller to the shore, And the tide rises, the tide falls.
The traveler passes through, leaving their footprints in the sand, a mark of their passage. But the mark doesn’t persist, it doesn’t last past the next tide; and in the morning, life goes on. It is the impermanence of the footprint, the impermanence of our lives, that is the beauty. Somewhere along the way, we began to see the Internet as a persistent entity, a medium for preserving our words and photos and graphics and web sites for an eternity, equivalent to the bugs and other detritus embedded in amber. We’ve codified this as an ethic to live by, and then extended this sense of permanence to the hypertext link that points to our contributions. Rather than providing a unique address for an item, at a specific point in time, we create a permalink, and we surround this innocuous Internet address with rules and requirements. When the contribution is removed or moved, we refer to the link as broken, annotating simple mechanics with evilness, and we put up pages that say, 404 Page Not found — or some clever variation thereof. We treat the missing resource as a gap in the web, a point of damage. Rebecca Blood wrote on this, saying:
Changing or deleting entries destroys the integrity of the network. The Web is designed to be connected; indeed, the weblog permalink is an invitation for others to link.
But the Internet is not a static thing, consisting of captured moments layered one on the other, each with its own proper place. It is a fluidic creature that, like the waters of the oceans, adapts constantly; when resources are created it flows around them and when resources are removed, it fills the gaps left. It is this change that is the strength of the Internet, its integrity, not the static sense of permanence. Consider the simple day-to-day act of searching for resources about a specific topic, let’s say information on Iraq. The results I get today are different than the results I would get a year ago, and different than the results I’ll get in a year’s time. The Internet is a tide in a constant state of turmoil — web resources as starfish and shells, bits of wood and polished glass, thrown like flotsam onto the beaches and just as quickly reclaimed by the next wave. There’s the beauty of it. Rather than one resource statically being the definitive source for all information on Iraq — a resource potentially controlled by one group — the Internet, the beautiful, mutable Internet, thrashes about and brings up the debris closest to the surface and lays it out at your feet. Contrary to popular myth, the Internet is not forever. Like the footsteps in the sand from the poem earlier, the tides move in and quickly remove all reference to you once you leave the medium. If you want to keep your mark online, you have to work at it. You have to maintain your pages, your domain, the easy accessibility of your writing. More than that, though, you have to keep it alive. Or not.
I hesitated to take the discussion from the wider Internet down to our specific uses of it for weblogging. It seems as if we are constantly setting ourselves apart from it, setting rules where there are none. Before weblogging I had no hesitation in pulling a resource that had aged beyond use; and I was philosophical when a resource I linked was pulled. It was the nature of the beast. Now, though, it’s personal. Whether we keep our archives online or not, the Internet survives, integrity intact. Writing published online does not equate to writing living forever unless what we write is so profound that it can live beyond us. My goal is to someday write something that is capable of living in the next moment after I’m gone. I hope I’ll know it when I see it someday. I know that it wasn’t represented in a story I wrote about the government surrounding the Bay Bridge with razor wire not long after the World Trade Center destruction. This posting was located on my Manila site, and was also one of my most popular pages for awhile, after Dave Winer linked to it. The page is now gone, and the link at Scripting News leads to Userland’s version of a 404 error and careful checking of Google shows that no trace of the page remains. Was this a loss? After all, the story and the writing had meaning at the time, but not since. At one time the story was being viewed by hundreds of people, and now — it’s as if it never existed. The old story did give a unique perspective of the event, and how one person, myself, felt about seeing their beloved landmark wrapped in razor wire. That one perspective was a bit of history that’s lost, and we could say that history is diminished by it being gone. We could, but history isn’t so diminished. The story disappeared without a trace so easily because it lived in the moment, it added to the moment, but overall, it added little to the story of a country impacted by an event it found traumatic. When weighed against all the other stories since, and the stories still happening, it was judged by the Internet, the ruthless, dispassionate Internet, to have little merit and fell below the churn, settling quietly on the bottom to live in obscurity. And no loss. I am philosophical about the loss of the story. It wasn’t my type of story, the writing wasn’t particularly adept, the photos I still have and can re-post if I wish, and the events are past, the people have moved on — persisting the event, and the appropriate permalinks, adds no value. Not for me, not for Dave Winer, and definitely not the Internet. But look at another page that I keep hanging around because we never throw out archives. I never bothered to import these posts into Movable Type when I moved from Blogger. Now when I look at it, I see some discussions that require no background and some writing that’s okay, some not. However, much of it is references that require context — you literally had to be there to even understand what I’m talking about. I could go through this page and eliminate 50% of it and there would be no loss. These old posts are, occasionally, accessed as a result of some strange search request, but my words add no value to the search, and the searching adds no value to me. If I removed these posts, no one would notice. No one. There would be no destruction of integrity for the Internet and certainly none for the other webloggers, because if these old, old pages have settled gently into obscurity, then other webloggers’ pages that may or may not have linked to the page have also, though I know this may make many wince. Of course, this all presupposes that the page continues to lie in a state of somnolence; that they aren’t yanked back into new debates. I’m not quite sure whether it was irony or serendipity that as I began the task of writing this essay — out with the old, on with the new — the old is pinged back to life.
Although I found out about the accountability (Winer Watch) controversy long after it had concluded, I was triply interested since:
- I’ve been the subject of one of Dave Winer’s (deleted) inflammatory posts;
- I invented the term Doing a Dave (“substantially editing or removing content after having posted it to the web”); and
- I once believed that changes to weblog entries should be clearly identified (using the <edit></edit> and <edit/> syntax suggested by Burningbird).
Personally, I think my original idea, the use of edit tag that seemed so clever then as I was caught up in the mechanics of weblogging and sought to add rigidity to an open medium, is appalling. One might as well write lines of code than words for all the readability using the foolish pseudo-markup added to the writing. Oddly enough, though, I wouldn’t remove this one post because I rather liked what I said about being a technical anarchist and may use this again in other writing. Of course, one only has to look at the post to see humor — writing about technical anarchy while demonstrating the finer uses of “Weblogging for the anal”. Returning to the issue of accountability, if I had pulled this post, would I have committed harm to Jonathon Delacour, in his search for annotation for his new essay? Dorothea Salo wrote the following, pertinent to this issue I think:
Especially public writing conceived as part of a conversation rather than a mere broadcast. That to my mind amounts to editing your fellow human beings’ memories, a highly ethically suspect action under most circumstances, I should think.
We edit each other’s memories all the time. Two old friends get together and they talk about old times and one says, “Hey remember when…” and the other goes, “That’s not what I remember…”. Memory of a shared conversation is a negotiation, a give and take and by the time all parties are finished, the memory isn’t exactly as it happened, but is no less real. That’s how conversations work — we are not heads of state to have every word in every exchange recorded, permanently. Hypothetically, if the old post that Jonathon had linked had been pulled, would what he wrote be diminished? No. I might because the post used my codification of ‘weblog writing edits’, and he wouldn’t be able to find it because it would be pulled. Realistically though, if you read either of us then, you know what we talked about. If you didn’t, do you care now? Unlikely. (Unless the issue is of ‘proof’ that I’ve changed in regards to weblog editing. Proof. You don’t have to seek proof in the archives — ask, and I’ll gladly tell you. “I changed my mind.” If I continue writing and you continue reading me, you’ll see me change my mind on other things, too.) Dorothea also made a point of saying that she disagrees with removing old content that makes us look bad. She sees it as a pretense that the old ugliness didn’t occur.:
If you did something wrong, apologize and do better henceforth. If you didn’t, stick to your guns. Just please don’t try to pretend it never happened; that only adds to the fault (if any). I flatly refuse to help anyone create a public fictive perfection. I despise such fictions, refusing to so much as attempt one for myself. Why should I help anyone else do it?
This one puzzles me. Not the apology — we should apologize when we wrong another. This isn’t weblogging ethics, this is common decency. No, what puzzles me about this statement is keeping these old disputes alive. With the world the way it is, I have difficulty understanding why we would want to persist past ugliness here within our personal writing. If we were politicians or professional journalists, judges, or other keepers of the public morality, I could understand this — those who condemn must themselves be open to condemnation. But most of us are just plain folk, with little power beyond an opinion and a vote. If people believe that we have acted ugly in the past, and that the ugliness is part of our makeup, then it will surface again in the future. It won’t go away if it’s an inherent part of us. And if isn’t? Then what’s the purpose of keeping the archive of it — to beat us about the head with it in the future? (Weblogging writing may disappear, but weblogging mistakes are like spicy food — they never go completely away, but continue to resurface long after the original event.) When I decided to write this essay and bring up the issue of eliminating archives, it wasn’t as a way of excusing past “bad” behavior. Our behaviors are reflected in how others interact with us. I only have to look around in other people’s posts or comments to see my past behavior — ‘bad’ being both subjective and relative — returned again, and again, viewed through other eyes. At times there is a gentle glee in bringing up what a person said in the past; an invitation to reminisce and to play. Other times, the words are brought up as part of a new debate, not to punish but to reflect. Many times, though, there is a vindictive quality to it — schoolyard taunts of “You said! You said! You said!” Rather than encouraging us to grow, it forces us, and our behavior into amber. Rather than accountability, we have accusability. There is nothing inherently honorable or noble in this, and rather than deplore weeding out past disputes of this nature, we should demand it of each other. Posterity has no need to be cluttered with our petty quarrels.
In comments in a post related to this issue wise words from wise people about the value of those past bits of our lives persisted in our archives:
“f you are thinking of discarding your archive, I don’t think you should, although I don’t really have a good argument why I think that.” Will “Part of the value of a journal, which is still my favorite form of blog, is that it allows you to see change, where you’ve been and where you are.” Loren “I’ve just been reading Siddhartha (thanks to whiskeyriver for the introduction to a wonderful book) and become very aware that what I was and did a year ago, five years ago, twenty five years ago – and what I will do in twenty five years time – are as much (or as little) a part of me as the person that sits here now bashing keys.” Andy “Come to think of it, everything you write, can and will be used against you. ” Fishrush, being funny. We hope. “Do I make judgements based on what I read in people’s blogs? Of course I do. I also qualify those judgements by recognizing that I’m working from a very limited set of information, and I’m willing to adjust my opinions as new information becomes available.” Rev Matt “Seriously, Shelley … dumping the archives of a weblog, in my view, would gut the authority of the voice of the blogger. It is over time, in history (through the layers of those archives) that we acquire color and resonance, or the distinct features of our distinct voices in these conversations.” Maria “Each of these experiences has its own value–either as a testament to my growth, or as a reaffirmation of my essence. I need an ongoing dose of both. I would never dump my recorded gropings, no matter how stupid some of them may seem in the light of present-day.” Tom “agree with Loren, that the archives are just another part of the whole that is you, and should be preserved (zits and all). Imagine what a full resource of written history people will have in the future. Your writing helps to enrich that resource. Keep it online.” wKen “To lose that two years of writing at the ‘bottle, even the crappy bits, would devastate me. My archives are my *life* over the past couple of years, in the same way as some of the bits of journalistic scribbling in those dusty old boxes are the only record I have of my life on the road, bar scars and liver damage, and in 10 or 20 years I hope I can go back and reread them with yet more beers, and remember, and feel like I had lived a life worth writing about.” Chris aka Stavros the Wonder Chicken “I thought later that another reason to keep Old Stuff around (even when it may be emotionally painful to read or just plain painful because we hate our old writing) is that there’s always someone who can identify with it. Just because we’re not in that place anymore doesn’t mean someone else may not garner some insight.” Michelle
I look at my archives now and I think about eliminating much of it, not because I’m trying to ‘hide’ what I am as a person — I am happy with who I am. Nor is it because I’m trying to rewrite history. No, the reason is that much of my old writing in these pages was me writing for the weblogging medium, rather than me writing as myself. I look at that old archive page and I shake my head and I think to myself, what was I doing? This wasn’t the the person I was then. The words were from a persona that formed with the sobriquet, “Burningbird”. This type of writing is a what you see when you shine a spotlight on a deer by the side of the door — frozen, devoid of shadows, the grim and the glory, words as glazed as the deer’s eyes, and just as paralyzed. I would like to prune my archives until they’re of a state that a person could enter my pages at any one point and see me. Past me, current me, future me, good, bad, or indifferent me, doesn’t matter as long as its me. Not me the weblogger. Not the looking glass self created by the people I know but have never met. Me, in whatever guise I might use. Dorothea’s perfect fictive, but one that is imperfect and real.
I’ve been talking about archives in the past, and I doubt there is anyone that sees much harm to others in removing old archives from public access (while agreeing with people, that it wouldn’t be a bad idea to preserve these offline somewhere). Consider that removing old material may be just as much a part of our growth, just as much of a story about ourselves, as leaving it online. However, removing old material differs from editing or removing fresh writing; writing that is still a part of the churn. How do we handle edits and deletions when we’re newly linked or commented? What’s the right thing to do? It depends. It depends on the circumstances. It depends on the people. It depends on the events. It depends on the time of day and the moon’s state. It depends on how you feel going forward. It depends on how you feel about the past. It depends on how you feel at this very moment. It depends on the price you’ve paid. It depends on the price you’re willing to pay. It depends on who you see in the mirror. It depends on what you see in the mirror. It depends on what you know is right. It depends on what you hope is right. It depends on a set of factors, all of which combined make each instance unique. How do you handle deleting and editing recent writing? It depends.
The first three essays in this series dealt with the technology of permalinks and the mechanics of redirection and ways of preserving, or not preserving, archives. However, within the hypertext links and the 404 pages, the versions and the editing markup, exists real people. Imperfect and wonderfully flawed. I no longer assume that because this is weblogging that a link that’s here today will be there tomorrow. If I want to quote a person, I do so within my writing, and if the person changes and decides to remove what they wrote, it has little impact on my writing because I’ve preserved what I’ve wanted to make my point. If someone says, “But there’s no proof now that what the person said was actually said”, I’ll agree with this and say, yes, you’re right. There is no proof. However, I’m also more cautious about what I quote and what I respond to. I look at the writing I’m thinking of quoting and I think about the person and I ask myself whether my addition adds value. I question my motives for quoting them, and I also question whether they might, at some point, regret what they have written. I am not here to hold others accountable, at least, not those who have no control over my life. I am not my brother’s keeper. We have put so much of the responsibility of connectivity on the person connected to that we’ve forgotten that there is someone at the other end of the link, and they, too, have responsibilities.
Recovered from the Wayback Machine.
I commend the effort on the part of the Pie/Echo/Atom people to come up with interoperable syndication formats, export/import formats, and a common weblogging API, but one other thing I wish they would address: a common permalink structure.
The one single cause of hesitation about moving from one tool to another is unless both tools have a compatible archive structure, you must use extraordinary means in order to maintain archives and archive connectivity.
However, I realize that a common permalink structure isn’t supportable, especially when you consider that several weblogging tools use dynamic page generation, through PHP or CGI or JSP, rather than static pages, such as those supported in Movable Type and Blogger.
At a minimum, though, weblogging tools should support the following:
Others?
Recovered from the Wayback Machine.
If you change your weblogging environment, such as move to a different tool, different archive structure, or even different server and domain, and others have linked to your posts, you’re going to be leaving broken links behind. What can you do? Actually, quite a bit, depending on how important the continuous linkage is, and what change you’ve made.
(Note: All web server discussions in this essay are based on the assumption that you’re using Apache at either your old or new file location. If not, you’ll need to check the server documentation for instructions in the use of these techniques.)
First, though, a look at the toolbox you’ll be using.
Contrary to popular opinion, a missing page isn’t a flaw within the internet. I like Edd Dumbill’s view of the 404 error, given in his essay in defense of RDF:
The oft-cited innovation of the web is the 404 error: the ability for a page not to be there, and the system still work.
The 404 status is a legitimate web page status, and the web is designed to work with it. It is a tool rather than a point of damage, and should be treated as such.
Other simple tools are the hypertext link as well as the web-page based META tag that can be modified to become a redirect to a new location. The advantage to these is that you don’t have to access direct file or web server configuration access in your old environment, something you’re not likely to have with hosted solutions such as Blogger, Bloghorn, or TypePad.
However, some tools require the appropriate environment to work, including the aforementioned file access. For instance, web servers have the ability to include redirection within their configuration — something I won’t talk about here because most webloggers have no access to their web server’s configuration files. Most, though, have all the control they need as long as they can host a file — one file — at the old file location. Unfortunately, this is usually as inaccessible as the web server configuration with hosted environments.
If you’re moving pages within a server, or you have the ability to host files in your old location, then you can consider the use of the .htaccess file for most of your redirection needs. The .htaccess file (and note that the beginning period (‘.’) is required) is a text file that contains instructions to the web server that overrides the server’s normal web page service behavior. The instructions can be as simple as sending 404 errors to a specific document; to something as complicated as preventing what is known as ‘hot links’ — preventing people from accessing your photos directly from their web pages, and thereby increasing your bandwidth use.
Of course, if you do have access to web server configuration, you’re better off using this rather than .htaccess; once you turn on .htaccess support, the web server will then look for these in every directory, and every page request within a directory also causes the .htaccess document in that directory to be loaded. If you’re .htaccess file is large, and documents in the directory are requested frequently, the performance hit can be significant.
Most webloggers share home space with several other people, and direct web server configuration file access is usually discouraged. Additionally, it’s rare that our archived pages are accessed that frequently that the use of .htaccess is an issue. Still, you’ll always want to do what you can to keep the .htaccess files small.
Even within the .htaccess file, there are multiple ways to accomplish the same task, usually depending on whether your host has a certain extension, mod_rewrite, compiled into your web server.
If neither web-page or web-server configuration solutions appeals, or for more sophisticated file redirection, you can use code such as JavaScript or PHP or Perl to redirect people or to handle missing pages.
Rather than write a tutorial of all these techniques, I’ll demonstrate uses of each in different circumstances. At a minimum, this will give you some idea of what are your options.
If you’re maintaining the same filenames and archive structure, but changing domains, you have two solutions depending on what your environment supports.
If you don’t have the ability to create an .htaccess file on your old host then you can use the HTML META tag with the refresh attribute to redirect people to the new location.
Mike Golby recently moved to the Wayward Weblogger co-op from Blogspot hosting, and he could use this approach with his old archives, which are maintained both at Blogspot and at his new location. For instance, this post at http://pagecount.blogspot.com/2002_12_01_pagecount_archive.html#90127536 could be redirected to the new location, here by adding the appropriate META tag to each of the old archive files. Since this is a template based system, he would need to add a bit of code to the template and then regenerate the archives at the old site to generate the META tag:
<meta http-equiv=”refresh” content=”5;url=http://pagecount.burningbird.net/pagecount/2002_12_01_pagecount_archive.html”>
When the page with this at the top is loaded into a browser, it automatically redirects the person to the new location after 5 seconds. The time difference is necessary, because if you didn’t have it, the back button wouldn’t work. You’ve all been out to sites that refresh to a different page, but then wouldn’t allow you to back button out of it. You don’t like it, your readers won’t either.
(I would show you the exact template tag and format to use, but I can’t find any that work — Blogger’s tags are very limited in their use, as well as not being especially well documented. Both items covered in detail in part 3 or this series. If someone has working template code for this, please add to comments or send me link.)
This won’t help with specific items, but at least you can get redirects at the file level. You should only use the META tag if you have no other option — they require that the page be downloaded, and they don’t give your readers a smooth transition. A better approach would be to use a server-side technique, such as the .htaccessf file.
If you have the ability to create a file in your old environment, you can create an .htaccess file in the top-level directory and use pattern or URL matching to redirect requests to the new location.
For instance, in part 1 of this series I mentioned about John Robb’s weblog and it disappearing from its old location on the Userland servers. However, recent accesses show that the old server location has been modified to redirect requests from the old location to the new. Now, when I access an old link, such a shttp://jrobb.userland.com/2001/11/05.html, I’m redirected to the new location at http://jrobb.mindplex.org/2001/11/05.html.
To redirect all directories at the old site to the new one, the .htaccess file would contain the following line:
Redirect permanent / http://jrobb.mindplex.org/
When a page is accessed at the old site, this directive lets Apache know to send it to the new URL, but the rest of the request is maintained as is — which means that http://jrobb.userland.com/2001/11/05.html gets redirected to http://jrobb.mindplex.org/2001/11/05.htm.
As long as the .htaccess file is located on the old server, and John maintains the same archive structure, the page requests will be redirected to the correct location. But what happens if John changes the archive structure, such as moving to a different tool? Well, the options then depend on exactly what you’re starting with, and where you’re going.
Changing archive structure is either a very simple process, or virtually impossible to deal with. All of it depends on whether a pattern can be established between the old and the new locations.
As an example of a very simple change, when I first started with Movable Type I used to have my archive files sent to /archives, and my individual pages all had a .php extension. When one of the items was slashdotted, it came to my attention that putting a .php extension on a page, when I’m not really using PHP for anything specific, isn’t a good idea. When it comes to the CPU, waste not, want not.
I ended up regenerating my individual archive items to a new location, /fires, and also changed all their extensions to .htm. However, I’m still getting hits at the old location. What’s to do?
Again, .htaccess comes to the rescue. At my original host, mod_rewrite was not compiled into Apache, so I couldn’t use this feature, but I could use the RedirectMatch regular expression handler for .htaccess. What this does is look for a pattern in the request and literally match this with a pattern in the redirected page.
Easier just to demonstrate. The line added to .htaccess is:
RedirectMatch permanent ^/archives/(.*)\.php$ /fires/$1.htm
This line instructs the web server to redirect all requests to any PHP file located in the archives subdirectory to the same named file (that $1 parameter), but with an HTM extension in the fires directory. Since the move is permanent, a status is returned to the requesting browser or web bot that this move is a permanent and to adjust accordingly.
When I moved to the Wayward Weblogger co-op, I made sure the web server was compiled with mod_rewrite, and the converted the .htaccess file to:
RewriteEngine On
RewriteCond %{REQUEST_URI} \.php$
RewriteRule ^(.*)\.php$ http://weblog.burningbird.net/fires/$1.htm [R=301,L]
Why use mod_rewrite over the Redirect? Performance improvements, plus mod_rewrite gives you the ability to do more sophisticated manipulations than the other .htaccess directives such as redirect — including that aforementioned hot-link prevention to keep people from linking directly to your bandwidth expensive photos.
What happens, though, if the pattern matching can’t be done? Well, in that case what you can do is find a way to generate, or hand write, individual redirect statements for each page.
Liz Lawley was faced with this when she not only moved from her university address to her own domain, she also changed the MT default numeric file name to a format that has considerable popularity — archived subdirectories by name, and then call the file name the same name as the entry title.
There was no regular expression matching that could handle this, so what she did, with advice from Mark Pilgrim, was to create an MT template to run on the new server that looked like the following:
<MTEntries lastn=”999999”>
Redirect permanent /archives/<$MTEntryID$>.html <$MTEntryLink$>
</MTEntries>
Once the .htaccess file was generated, Liz then moved it to her old server.
I wince at the size of the .htaccess file — remember web server load every time a page is accessed. However, knowing that the pattern of access for weblogs is an exponential dropoff in archive page access as soon as an item rolls off the front page, the performance shouldn’t be a problem.
What if you’re like me and started with weblogging environment that allows no export? Or if you decide not port your archives files, or remove them ? What are your options then?
If your tool doesn’t support any form of export the only way you’ll be able to port these is manually. At thsi point, you may want to consider leaving some of your writing behind, or be in for a long, tedious effort of copy and paste. There are no other options with this type of tool, which is why I don’t recommend their use by anyone for any reason.
For existing posts that you want to remove, one option is to remove the content of the file, but leave the file itself and then add a message that the content has been removed. You might direct the person to the front page, or mention why the content was removed, and so on.
If you want to remove the file itself, or you don’t port it, you can create what is known as an error document and use the ErrorDocument directive in your .htaccess file:
ErrorDocument 404 /missing.php
This file is located at the top-directory, and applies then to all directories underneath it, unless overridden by local .htaccess files. Whenever a file is accessed that doesn’t exist, the person is redirected to a file, in this case, missing.php. My Missing page has a form for searching, as well as a listing of recent entries across the network. Others have other useful information, but the point is, give your readers somewhere to go when they access a page that no longer exists.
Though the directive shows that I’m accessing the missing.php file directly with the ErrorDocument directive, in actuality, I’m accessing an application that does more than just redirect to a specific page. Note that this solution is only for those programmatically inclined.
Though not for everyone, and not for every circumstance, you can use programming languages to manage missing or moved files.
I am finishing up an application I use to generate RDF/XML files that track the movement of files on my server. Considering that I’ve had web sites for several years, I’m still getting requests for very old code that I pulled because it was obsolete long ago. In addition, I’ve also moved files around to different locations, renamed them, and so on.
When I want to move or remove a file, I log into PostCon and pull the file’s RDF file up (found by inputting the current file name). I then annotate the current event — move, remove, or add for a new resource — and provide the location of the resource once the event occurs. PostCon generates an updated RDF file with the new history. More than that, though, it also annotates an event file — an RDF file of events that’s hosted on the server and that is used to kick off server side processing every hour.
For instance, a move event will result in the file being copied to the new location. Since I’m running Linux, though, it does a bit more — it creates what is called a soft link between the old file location and the new location, using the Unix command “ln -s oldlocation newlocation”.
When the file is accessed at the old location, the link causes the file to be served from its new location, transparently to the client.
A removal is a bit more complex because there are reasons for this move, and these are added to the RDF file. In server-side processing, the file is removed, which is simple, but in addition, the removed file URL is added to a super fast database on the server that handles all 404 errors. This database looks for a file URL when it gets a request for a file that no longer exists and either redirects the request to a newer resource that supersedes the old file, or it loads the RDF file for the removed file and prints out some useful information about why the file is removed. You can see this in action using the URL /samples/scripting/TYPEOF.HTM — note that the look is an old one, and isn’t too pretty. What’s important is the information pulled from the RDF file giving more details about what happened to the file.
PostCon is an open source application, source code freely available, and whose launch will coincide with an upcoming O’Reilly Network article. It’s built with a combination of PHP and Perl, and uses MySql.
A programmatic solution isn’t for everyone. And sometimes the best solution for handling broken links is…do nothing.
Doing nothing is a highly underrated solution when faced with moving from one weblogging tool or environment to another. When Mike Golby moved, he left his old archives at blogspot and created new archives at the new location. In the old site, he listed a link to his new location. Simple, easy, and uncomplicated.
By the time that his old archive pages are purged from blogspot, references to his old location will probably be few and far between.
If you’re changing tools, but on the same server, again, just consider not doing anything. Requests for old pages go to the old pages, new requests to new and life goes on. The only time that this becomes a problem is if you’re using comments and there are incompatible comment systems between the two. In this case, then, you might consider turning comments off with your old files.
Consider this — I get comments on the average of about 10 per week for older archived entries, those more than a couple of weeks old. I get fewer than a couple of month for anything older.
Not doing anything can be the simplest, easiest, most efficient, less heart breaking, less limiting approach you can use to dealing with old and new archives. It can be the one that gets you going in your new home/tool faster, which means that you’re out writing rather than fussing with the tool.
And isn’t that why we’re here?
If you have other broken link, archive redirect, porting solutions, please leave a comment, ping this entry with your solution, or send me an email and I’ll include it here in this section.