Categories
Web

Putting Hotlinks on Ice

Recovered from the Wayback Machine.

Hotlinks — what a perfect word for the practice of directly linking to a photograph or other high bandwidth item on someone else’s server. Hot with its implication of hot goods and thieves passing in the cybernight. The proper term is “direct linking”, and while more technically accurate, the latter term lacks panache. Hotlinking is a particularly warm subject for me because of my extensive use of photography with my writing.

I’m not really sure what led to me start posting photographs with my essays and other writing. Probably the same impulse that leads me to mix poetry with technology, a combination leading Don Parks to write They are rather verbose and poetic… of my Permalinks for Poets essays. Well, get comfortable with your favorite drink, because we’re about to embark on another poetic, verbose, adventure into the mysteries of technology. Most fortunate for you, this one’s a murder mystery because we’re going to put hotlinks on ice.

This is a photograph of me

It was taken some time ago.
At first it seems to be
a smeared
print: blurred lines and grey flecks
blended with the paper;

then, as you scan
it, you see in the left-hand corner
a thing that is like a branch: part of a tree
(balsam or spruce) emerging
and, to the right, halfway up
what ought to be a gentle
slope, a small frame house.

In the background there is a lake,
and beyond that, some low hills.

(The photograph was taken
the day after I drowned.

I am in the lake, in the center
of the picture, just under the surface.

It is difficult to say where
precisely, or to say
how large or small I am:
the effect of water
on light is a distortion

but if you look long enough,
eventually
you will be able to see me.)

Margaret Atwood

 

Hotlinking is the practice of adding a photograph or other multimedia directly in a web page, but linked to the resource on someone else’s server. The bandwidth bandit gets the benefit of the photograph, but the owner of the photograph or has has to pay for the bandwidth. If enough photographs or movies or songs are hotlinked, the bandwidth use adds up.

Recently I noticed that several photographs from FOAF, Flocking, and the Semantics of Starlings were being accessed from various other weblogs, including Adam Curry’s weblog. The reason this was happening is that some folks copied part of the essay, including the links to the photographs. The photograph accesses started appearing from one weblog, then another, then another.

The problem was then compounded when each of these sites published RSS that included all their content rather than excerpts — including these same direct links to the photographs. In fact, it was through RSS that photographs appeared in Adam Curry’s online aggregator — along with several very interesting pornography photos.

I’ve had photographs hotlinked in the past and haven’t taken any steps to prevent it because the bandwidth use wasn’t excessive. In addition, some people who are weblogging within a hosted environment don’t have a physical location for photographs, and I’ve hesitated about ‘cutting them off’. Besides, I was flattered when people posted my photographs, being a pushover when it comes to my pics.

However, with this last incident, I knew that not only was my bandwidth being consumed from external links, those who share space and other resources on the weblogging co-op I’m a part of are also losing bandwidth through our shared line. Time to close the door on the links.

To restrict access to images, I’ll need to add some conditions to my existing .htaccess file. If you’ve not worked with .htaccess before, it’s a text file located in your directory that provides special instructions to the web server for files in your directories. In this particular case, the restrictions I’ll add will be dependent on a special module, mod_rewrite, being compiled into your server’s installation of Apache. You’ll need to check with your ISP to see if you have it installed.

(If you have IIS, you’ll use ISAPI filters, instead. See the IIS documentation for specifics.)

Restrictions for image access are made to the top-level .htaccess file shared by all my sites. By putting the restrictions into the top-level file, they’ll be applied to all sub-directories unless specifically overridden.

Three mod_rewrite instructions are used within the .htaccess file:

RewriteEngine On — turns on the rewrite engine
RewriteCond — specifies a condition determining if a rewrite rule is implemented
RewriteRule — the rewrite rule

When the web server accesses the .htaccess file and sees these directives, three things happen: the rewrite engine is turned on, the rewrite conditions are used against the incoming request to see if a match is found, and the rewrite rule is applied.

The rewrite conditions and rules make use of regular expressions to determine if an incoming request matches a specific pattern. I don’t want to get into regular expressions in this essay, but know that regular expressions are basically pattern matching, using special characters to form part of the pattern. The examples later make use of the following regular expression characters, each listed with its specific behavior:

! used to specify non-matching patterns
^ start of line anchor
$ end of line anchor
. match any single character
? zero or one of preceding text
* 0 or N of the proceding text, where N is greater than zero
\char Escape character — treat char as text, not special character
(chars) grouping of text

There are other characters, but these are the only ones I’m using — the mod_rewrite Apache documentation describes the entire set.

Within .htaccess I add a line to turn on the rewrite engine, and add my first condition — match a HTTP request from any domain that is not part of the burningbird.net domain:

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?burningbird.net/.*$ [NC]

The condition checks the HTTP referrer (HTTP_REFERER) to see if it matches the pattern, in this case anything that is not from the burningbird.net. This includes domains other than paths.burningbird.net, rdf.burningbird.net, www.burningbird.net, and burningbird.net directly. The qualifier at the end of the line, [NC], tells the rewrite engine to disregard case.

I’m looking for domains other than my own because I want to apply the rules to the external domains — let my own pass through unchecked. Since I have more than one domain, though, I need to add a line for each domain and modify the file accordingly:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?burningbird.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?forpoets.org/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yasd.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?dynamicearth.com/.*$ [NC]

Once all of conditions are added to .htaccess, when the web server accesses a file within my directories, the conditions are combined, adding up to a pattern match for any domain other than a variation on burningbird.net, forpoets.org, yasd.com, and dynamicearth.com.

One last pattern domain needs to be allowed through, unchecked — I need to allow access to the images when the referrer has been stripped, such as local access or access through a proxy. To do this, I add a line with no domain or pattern — a blank referrer. The file then becomes:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?burningbird.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?forpoets.org/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yasd.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?dynamicearth.com/.*$ [NC]

Once I have the rewrite conditions set, time for the rule. This is where all of this can get interesting, depending on how clever you are, or how devious.

In my .htaccess file, when a referrer from a domain other than one of my own accesses one of my photos, I forbid the request. The rule I use is:

RewriteRule \.(gif|jpg|png)$ – [F]

What this rule says is that any request to a JPG, GIF, or PNG file, coming from a domain that doesn’t match the conditions set earlier, is rewritten to the ‘-‘ character. In addition, the [F] qualifier at the end of the line tells the browser that they are forbidden to fetch this particular file.

Depending on the browser accessing the web page that contains the hotlinked photo, rather than the image, the page will either show a missing image symbol, or the name of the image file will be printed out.

Now, my approach just prohibits others from hotlinking to my images. Other people will redirect the image request to another image — perhaps one saying something along the lines of “Excuse me, but you’ve borrowed my bandwidth, and I want it back.” In actuality, people can be particularly clever, and downright mean, with the image redirection.

If this is the approach you want, then you would use a line similar to:

RewriteRule \.(gif|jpg|png)$ http://burningbird.net/baddoodoo.jpg [R,L]

In this case, the image request is redirected to another image, baddoodoo.jpg, and a redirect status is returned (the ‘R’). The ‘L’ qualifier states that this is the last rewrite rule to apply, to prevent an infinite lookup from occurring (accessing that redirected image, triggering the rule, that accesses that image, that triggers…you get the idea). Don’t forget to terminate the rule with the ‘L’ qualifier or you’ll quickly see how your web server deals with runaway processes.

(Does anyone smell smoke?)

It’s up to you if you want to forbid the image access, or redirect to another file — note, though, that you shouldn’t assume that people who are hotlinking are doing so maliciously. Most do so because they don’t know there’s any wrong with it. Including most webloggers.

My complete code is:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?burningbird.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?forpoets.org/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yasd.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?dynamicearth.com/.*$ [NC]

RewriteRule \.(gif|jpg|png)$ – [F]

 

update

Some browsers strip the trailing slash from a request, and can cause access problems, as noted in comments in Burningbird. I’ve modified the .htaccess file to the following to allow for this:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?burningbird.net(/.*)?$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?forpoets.org(/.*)?$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yasd.com(/.*)?$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?dynamicearth.com(/.*)?$ [NC]
RewriteRule \.(gif|jpg|png)$ – [F]

I tested to ensure this worked using the curl utility, which allows me to access a file and pass in a referrer:

curl -ehttp://burningbird.net http://weblog.burningbird.net/mm/blank.gif

The .htaccess file should now work with browsers that strip the trailing slash

With the rule now in place, if someone tried to link directly to the following photograph, all they’ll get is a broken link.

boats5.jpg

You can see the effect of me putting hotlinks on ice by accessing this site with the domain mirrorself.com. This is one of my domains, but I haven’t added it to the .htaccess file — yet. If you access the main burningbird weblog page through mirrorself.com, using http://www.mirrorself.com/weblog/, you’ll be able to see the broken link results. Look quickly, though, I’ll be adding mirrorself.com in the next week.

When I mentioned writing this essay, Steve Himmer made a comment that the rules he added to .htaccess didn’t stop Googlebots and other bots from accessing images. To restrict access of images from webbots such as googlebot, you’ll want to use another file, robots.txt.

My photos are usually placed in a sub-directory called /photos, directly under my main site. To prevent well behaving webbots such as Google from accessing any photo, add the following to your robots.txt file, located in the top-level directory:

User-agent: *
Disallow: /photos/

This will work with Googlebot, which is a well behaved bot; it will also work with other well behaved bots. However, if you’re getting misbehaving ones, then the next step is banning access from specific IPs — but that’s an essay for another day, because it’s past my bedtime, and …to sleep, perchance to dream.

bwboats.jpg

Categories
Web

Weblogging for poets-series published

Recovered from the Wayback Machine.

I just published the final three parts of the Weblog Link Series, on permalinking and archives:

Part 1 – The Impermanence of Permalinks

Part 2 – Re-weaving the Broken Web

Part 3 – Architectural Changes for Friendly Permalinking

Part 4 – Sweeping out the webs

Categories
Technology Weblogging

Bye Bye Wiki Necho or Pie

Recovered from the Wayback Machine.

I am finishing the Permalink essays as you read this (well, depending on when you read this, I may be finished), though about to take a break because the words are running away with me. I’m glad I waited on Part 4 until today because the essay is writing itself – I’m just there to move the keys on demand.

In the meantime, Christian from Radio Blogistan, better known as xian, wrote Blogistan Pie to the tune of American Pie – a song I dislike, and much improved by xian’s effort.

Verse 6:

I met this person, Burningbird,
And I asked her for an ontologically meaningful word
But she just smiled and made a sound

I went down to the Scripting News
Where some years before I’d seen the clues
But the server said the file wasn’t found

In LiveJournals children screamed,
The lovers cried and the poets dreamed
But not a word was trusted
The permalinks were busted

And the three men I admire most,
Phil Wolff, Mark Pilgrim, and Steve Yost
Kept editing their final post
The day the blogging died
And they were singin’…

(Thanks to Sam for pointing it out.)

Categories
Technology

NotWiki

Recovered from the Wayback Machine.

Liz wrote a great note on the recent and growing pushback against the use of the Wiki for Pie/Echo/Atom, based in part on a discussion at Phil’s and a posting over at Sam’s.

Liz’s summary hits all the points:

I’m not yet at the point where I see wikis as adding sufficient value to any process I’m involved with to justify the installation, configuration, and learning curve for users necessary to add another tool to my social software arsenal. Like Phil, I continue to be troubled by the inherent ahistoricism built into the wiki environment; like Shelley I find the lack of social cues to tell me if I’m treading on someone’s toes by changing content to be inhibiting; like Dare, I find that large-scale active wikis are often too chaotic and disorganized, making it difficult for me to find what I’m looking for.

I had concerns about the wiki in the beginning because I wanted to get non-techs involved. Yes, the techs will have to build the tools, but tools are only as good as the people who use them, and I wanted others to have a voice. And face it – Wiki has a real geek feel to it that’s not necessarily inviting to the non-geeks.

Still, I participated originally, focusing in my area of expertise – the data model. It seems as if I had just started and then turned around and realized the work had zoomed right past while I wasn’t looking.

Okay, so I tried again, taking a snapshot and writing about the effort in a nutshell, and I figured I’d help contribute to the effort by doing this once a week or so – until the next week when I realized that there had been so much work, so much activity, that a snapshot wasn’t feasible. Of if it was, I wasn’t the person to provide it. The parade had passed by.

Wikis are a fascinating device, and I admire Sam wanting to get input from the world at large by using a wiki. He actually didn’t have much choice: he’d been warned what would happen if the Big Blog Tools met behind closed doors and just threw specs over a tall, tall wall.

But there’s got to be a happy medium between total control, personal ownership, and closed doors on the one hand; and a digital foodfight and freeforall that is the Wiki on the other.

Wikis favor the aggressive, the obsessive, and the compulsive: aggressive to edit or delete others work; obsessive to keep up with the changes; and compulsive to keep pick, pick, picking at the pages, until there’s dozens of dinky little edits everyday, and thousands of dinky little offshoot pages. And name choices like “BarbWire”.

(BarbWire. Good God. Let’s get pipes and hose and find the original Echo trademark holders and give them an offer they can’t refuse to let the trademark go.)

But Wikis also favor enormous amounts of collaboration among a pretty disparate crew, which is why there’s also all sorts of feeds being tested, and APIs being explored, and a data model that everyone feels pretty darn good about. So one can also say that Wikis favor the motivated, the dedicated, and the determined.

What we need now is a hold moment. We need to put this effort into Pause, and to look around at the devastation and figure what to keep and what to move aside; and to document the effort, and its history, for the folks who have pulled away from the Wiki because of the atmosphere. We need to do this for the techs and non-techs alike, because I’m pretty sure some technical decisions were made that are not going to make a lot of current webloggers happy if I’ve read some of the copy at the wiki correctly.

We need to record what’s been accomplished in a non-perishable (i.e. not editable), human manner. No Internet standard specification format. Words. Real ones. We then need to give people a chance to comment on this work, but not in the Wiki. Or not only in the wiki. Document the material in one spot – a weblog. After all, this is about weblogging – doesn’t it make sense that we start moving this into the weblogging world again? Not bunches of weblogs, with bits and pieces.

One weblog. Limited author access.

We need to get more people involved then a small core group and if this means using different mediums of communication and even – perish the thought – slowing down a bit, then slow down. Mediums that have history so those late to the party aren’t left out in the cold. This means not wiki, not IRC.

We also need another stated commitment from the stakeholders in all of this, the aforementioned Big Blog Tool makers, that they are still supporting this effort’s output. A lot’s happened between then and now.

Most of all, we need to ungeek Pie/Echo/Atom – start channeling this effort into a more controlled environment, with open communication, yes, but less movement, and more deliberation. I’m not saying give one person control, but we need to start identifying those with the most to gain and lose by this effort, those who are most impacted, and we need to start pulling them into a consortium. A weblogging consortium.

(Now, where have I heard that before?)

But here’s the kicker – include the non-tech webloggers, too. You know, the people that don’t get excited because Python 2.3 released?

Sam mentioned in a new post that I hadn’t contributed much in the last month because I was too busy. Because of this, he said the medium wouldn’t have mattered in my overall contribution. But that’s not the story, Sam.

My lack of recent contribution wasn’t that I was too busy for the Wiki effort; it was because the Wiki effort was too busy for me.

P.S. A new name suggestion for Pie/Echo/Atom – let’s just call it Pie/Echo/Atom.

Categories
Burningbird Technology Weblogging

Two down, three to go

Recovered from the Wayback Machine.

I’ve installed two weblogs in the For Poets site:

Linux for Poets – maintained by the freebie pMachine installation.

Internet for Poets – maintained by WordPress an open source weblogging tool.

Both support comments and trackbacks, and both weblogs feature the look and feel straight out of the box.

I couldn’t install Blojsom, based on RDF and Jena because it requires a Java servlet container and I don’t want to install Tomcat. In addition bBlog was a little too beta, and Bloxsom a little too simplified, especially since I’m reviewing tools for non-techs. However, may change my mind and go with Bloxsom.

To look for other weblogging tools to use, I spent some time randomly clicking on weblogs in weblogs.com. Interesting results:

  • There are a lot of people using Movable Type. A lot. And there’s something about many of the MT sites that look similar – I could tell a MT site as soon as it opened, without looking for the MT banner. Regardless, I can see why Six Apart got VC funding – there are a lot of people that use MT.
  • Still lots of Blogger and BloggerPro sites – but no where near the number of MT users.
  • Light grey text and a slightly darker grey background is not elegant – it’s unreadable
  • Please don’t show pictures of your rash
  • Is that legal?
  • Where are the Radio weblogs?
  • No AOL or LiveJournal – they don’t ping weblogs.com?
  • Some people are just plain tacky, especially in what they allow advertised at their weblogs. Dirt cheap ammo? Now guess what type of weblog I found this one one.
  • Is this photo for real? Looks retouched. Still kinda cool.
  • Ve are Movable Type and all your weblogs belonga us!
  • Larry using Bloxsom – I think I’ll give this another shot. At least it’s not MT.
  • What the world needs more of – diagonal weblogs
  • Why do people stick these things all over their weblogs? Weblog after weblog with very little text, but lots of empty space and little buttons and tiny people and graphics and hearts and flowers and quizzes and mood indicators and other things that are anything but writing. It’s as if their weblogs are only wire frames on which to poke bits of string and tinsel and colored ribbon. Do they weblog only as a placeholder? A way of saying, “I stake this space?”
  • Oh, there’s a LiveJournal.
  • Great come back for a complaint on style – cement canoes
  • 0xDECAFBAD also uses Bloxsom – okay, I’m convinced. Dorothea, he’s quoting your weblog.
  • Hey! Bloomington, Indiana library won’t install porn filters. Good on you Bloomington.
  • There was a war of the weblogging tools, and the squirrels won
  • There’s Tinderbox – nope, not at 145.00
  • Ohmigod! Pink! With little sprinkly, glittery things all over. I’ll take the grey on grey
  • Finally! here’s a Radio weblog! It’s called “Blogging Alone”. No shit.
  • What is Blogstreet? Am I on the list? No? Then who cares. I found this at the Agonist – wasn’t this the weblog that was accused of plagiarism? Yup, that’s what it takes to be a top weblog.
  • That’s a great name for a weblog Opinions you should have
  • I’m dying to know what this weblog is talking about – but scroll down – isn’t the flower photo nice?
  • From Ozark Rambler:

     

    For those I haven’t spoken to lately, “herself” is doing very well and “on the mend” following her surgery a month ago. Your prayers and support during the past month has been very much appreciated.

    She’s not quite up to doing any “plowing or mowing” yet, but then, those of you who know her realize that she wasn’t to excited about participating in those activities anyway. come to think of it, they don’t excite me much either.

    Thank you Ozark Rambler for you simple tales of berrying in chigger filled woods, for your sharing, your humor, your interesting political views and Orwellian quotes, and for reminding me that there is more to weblogging than Echo/Atom/RSS and fights between silly boys.

A productive exercise, one I recommend people do weekly. I didn’t find all the weblogging tools I needed, but I found something more important: perspective.