Categories
Web

Putting Hotlinks on Ice

Recovered from the Wayback Machine.

Hotlinks — what a perfect word for the practice of directly linking to a photograph or other high bandwidth item on someone else’s server. Hot with its implication of hot goods and thieves passing in the cybernight. The proper term is “direct linking”, and while more technically accurate, the latter term lacks panache. Hotlinking is a particularly warm subject for me because of my extensive use of photography with my writing.

I’m not really sure what led to me start posting photographs with my essays and other writing. Probably the same impulse that leads me to mix poetry with technology, a combination leading Don Parks to write They are rather verbose and poetic… of my Permalinks for Poets essays. Well, get comfortable with your favorite drink, because we’re about to embark on another poetic, verbose, adventure into the mysteries of technology. Most fortunate for you, this one’s a murder mystery because we’re going to put hotlinks on ice.

This is a photograph of me

It was taken some time ago.
At first it seems to be
a smeared
print: blurred lines and grey flecks
blended with the paper;

then, as you scan
it, you see in the left-hand corner
a thing that is like a branch: part of a tree
(balsam or spruce) emerging
and, to the right, halfway up
what ought to be a gentle
slope, a small frame house.

In the background there is a lake,
and beyond that, some low hills.

(The photograph was taken
the day after I drowned.

I am in the lake, in the center
of the picture, just under the surface.

It is difficult to say where
precisely, or to say
how large or small I am:
the effect of water
on light is a distortion

but if you look long enough,
eventually
you will be able to see me.)

Margaret Atwood

 

Hotlinking is the practice of adding a photograph or other multimedia directly in a web page, but linked to the resource on someone else’s server. The bandwidth bandit gets the benefit of the photograph, but the owner of the photograph or has has to pay for the bandwidth. If enough photographs or movies or songs are hotlinked, the bandwidth use adds up.

Recently I noticed that several photographs from FOAF, Flocking, and the Semantics of Starlings were being accessed from various other weblogs, including Adam Curry’s weblog. The reason this was happening is that some folks copied part of the essay, including the links to the photographs. The photograph accesses started appearing from one weblog, then another, then another.

The problem was then compounded when each of these sites published RSS that included all their content rather than excerpts — including these same direct links to the photographs. In fact, it was through RSS that photographs appeared in Adam Curry’s online aggregator — along with several very interesting pornography photos.

I’ve had photographs hotlinked in the past and haven’t taken any steps to prevent it because the bandwidth use wasn’t excessive. In addition, some people who are weblogging within a hosted environment don’t have a physical location for photographs, and I’ve hesitated about ‘cutting them off’. Besides, I was flattered when people posted my photographs, being a pushover when it comes to my pics.

However, with this last incident, I knew that not only was my bandwidth being consumed from external links, those who share space and other resources on the weblogging co-op I’m a part of are also losing bandwidth through our shared line. Time to close the door on the links.

To restrict access to images, I’ll need to add some conditions to my existing .htaccess file. If you’ve not worked with .htaccess before, it’s a text file located in your directory that provides special instructions to the web server for files in your directories. In this particular case, the restrictions I’ll add will be dependent on a special module, mod_rewrite, being compiled into your server’s installation of Apache. You’ll need to check with your ISP to see if you have it installed.

(If you have IIS, you’ll use ISAPI filters, instead. See the IIS documentation for specifics.)

Restrictions for image access are made to the top-level .htaccess file shared by all my sites. By putting the restrictions into the top-level file, they’ll be applied to all sub-directories unless specifically overridden.

Three mod_rewrite instructions are used within the .htaccess file:

RewriteEngine On — turns on the rewrite engine
RewriteCond — specifies a condition determining if a rewrite rule is implemented
RewriteRule — the rewrite rule

When the web server accesses the .htaccess file and sees these directives, three things happen: the rewrite engine is turned on, the rewrite conditions are used against the incoming request to see if a match is found, and the rewrite rule is applied.

The rewrite conditions and rules make use of regular expressions to determine if an incoming request matches a specific pattern. I don’t want to get into regular expressions in this essay, but know that regular expressions are basically pattern matching, using special characters to form part of the pattern. The examples later make use of the following regular expression characters, each listed with its specific behavior:

! used to specify non-matching patterns
^ start of line anchor
$ end of line anchor
. match any single character
? zero or one of preceding text
* 0 or N of the proceding text, where N is greater than zero
\char Escape character — treat char as text, not special character
(chars) grouping of text

There are other characters, but these are the only ones I’m using — the mod_rewrite Apache documentation describes the entire set.

Within .htaccess I add a line to turn on the rewrite engine, and add my first condition — match a HTTP request from any domain that is not part of the burningbird.net domain:

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?burningbird.net/.*$ [NC]

The condition checks the HTTP referrer (HTTP_REFERER) to see if it matches the pattern, in this case anything that is not from the burningbird.net. This includes domains other than paths.burningbird.net, rdf.burningbird.net, www.burningbird.net, and burningbird.net directly. The qualifier at the end of the line, [NC], tells the rewrite engine to disregard case.

I’m looking for domains other than my own because I want to apply the rules to the external domains — let my own pass through unchecked. Since I have more than one domain, though, I need to add a line for each domain and modify the file accordingly:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?burningbird.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?forpoets.org/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yasd.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?dynamicearth.com/.*$ [NC]

Once all of conditions are added to .htaccess, when the web server accesses a file within my directories, the conditions are combined, adding up to a pattern match for any domain other than a variation on burningbird.net, forpoets.org, yasd.com, and dynamicearth.com.

One last pattern domain needs to be allowed through, unchecked — I need to allow access to the images when the referrer has been stripped, such as local access or access through a proxy. To do this, I add a line with no domain or pattern — a blank referrer. The file then becomes:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?burningbird.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?forpoets.org/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yasd.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?dynamicearth.com/.*$ [NC]

Once I have the rewrite conditions set, time for the rule. This is where all of this can get interesting, depending on how clever you are, or how devious.

In my .htaccess file, when a referrer from a domain other than one of my own accesses one of my photos, I forbid the request. The rule I use is:

RewriteRule \.(gif|jpg|png)$ – [F]

What this rule says is that any request to a JPG, GIF, or PNG file, coming from a domain that doesn’t match the conditions set earlier, is rewritten to the ‘-‘ character. In addition, the [F] qualifier at the end of the line tells the browser that they are forbidden to fetch this particular file.

Depending on the browser accessing the web page that contains the hotlinked photo, rather than the image, the page will either show a missing image symbol, or the name of the image file will be printed out.

Now, my approach just prohibits others from hotlinking to my images. Other people will redirect the image request to another image — perhaps one saying something along the lines of “Excuse me, but you’ve borrowed my bandwidth, and I want it back.” In actuality, people can be particularly clever, and downright mean, with the image redirection.

If this is the approach you want, then you would use a line similar to:

RewriteRule \.(gif|jpg|png)$ http://burningbird.net/baddoodoo.jpg [R,L]

In this case, the image request is redirected to another image, baddoodoo.jpg, and a redirect status is returned (the ‘R’). The ‘L’ qualifier states that this is the last rewrite rule to apply, to prevent an infinite lookup from occurring (accessing that redirected image, triggering the rule, that accesses that image, that triggers…you get the idea). Don’t forget to terminate the rule with the ‘L’ qualifier or you’ll quickly see how your web server deals with runaway processes.

(Does anyone smell smoke?)

It’s up to you if you want to forbid the image access, or redirect to another file — note, though, that you shouldn’t assume that people who are hotlinking are doing so maliciously. Most do so because they don’t know there’s any wrong with it. Including most webloggers.

My complete code is:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?burningbird.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?forpoets.org/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yasd.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?dynamicearth.com/.*$ [NC]

RewriteRule \.(gif|jpg|png)$ – [F]

 

update

Some browsers strip the trailing slash from a request, and can cause access problems, as noted in comments in Burningbird. I’ve modified the .htaccess file to the following to allow for this:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?burningbird.net(/.*)?$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(.*\.)?forpoets.org(/.*)?$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yasd.com(/.*)?$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?dynamicearth.com(/.*)?$ [NC]
RewriteRule \.(gif|jpg|png)$ – [F]

I tested to ensure this worked using the curl utility, which allows me to access a file and pass in a referrer:

curl -ehttp://burningbird.net http://weblog.burningbird.net/mm/blank.gif

The .htaccess file should now work with browsers that strip the trailing slash

With the rule now in place, if someone tried to link directly to the following photograph, all they’ll get is a broken link.

boats5.jpg

You can see the effect of me putting hotlinks on ice by accessing this site with the domain mirrorself.com. This is one of my domains, but I haven’t added it to the .htaccess file — yet. If you access the main burningbird weblog page through mirrorself.com, using http://www.mirrorself.com/weblog/, you’ll be able to see the broken link results. Look quickly, though, I’ll be adding mirrorself.com in the next week.

When I mentioned writing this essay, Steve Himmer made a comment that the rules he added to .htaccess didn’t stop Googlebots and other bots from accessing images. To restrict access of images from webbots such as googlebot, you’ll want to use another file, robots.txt.

My photos are usually placed in a sub-directory called /photos, directly under my main site. To prevent well behaving webbots such as Google from accessing any photo, add the following to your robots.txt file, located in the top-level directory:

User-agent: *
Disallow: /photos/

This will work with Googlebot, which is a well behaved bot; it will also work with other well behaved bots. However, if you’re getting misbehaving ones, then the next step is banning access from specific IPs — but that’s an essay for another day, because it’s past my bedtime, and …to sleep, perchance to dream.

bwboats.jpg

Categories
RDF

Tax? Or Precision?

Recovered from the Wayback Machine.

I read the comments about “RDF tax” and how we must “prove” RDF’s worth, yet when I look at so many plain XML feeds, all I can see is the improvement that could be added because of the precision of using RDF/XML. Not all XML feeds, but any that are purposed beyond being generated and consumed by one product.

For instance, in an IRC conversation yesterday about the Pie/Echo/Atom syndication feed, a question arose about how to interpret the ordering and grouping of the contributors and entries within the plain vanilla XML feed. Yesterday’s exercise was focused on doing a straight port of functionality, which meant, according to the participants, that both the entries and contributors needed to be placed within an RDF Seq container. Why? Because order is implied in XML. Or part of the specs, I’m not sure which.

A discussion then commenced that just because XML enforces a grouping and a sequencing on child elements by some form of default, doesn’t mean we have to constrain the same elements within RDF/XML; we have other options built directly in the syntax. We could use repeating properties, which means there is no implied grouping. We could use a Seq, which means there is a grouping, and a sequence to the elements. Or we could have used a List, which also includes a nil factor, meaning that these elements are in a group, and there are no other members for this group. There’s a lot less ’slop’ built into RDF/XML then there is in just plain XML by default.

That “RDF/XML” tax, as it’s called, opened up a door to a conversation that forced the members of the Pie/Echo/Atom group to look at implicit behavior and XML and decide if this is the type of behavior they want to enforce; or whether this is the type of behavior imposed by tree structure nature of the hierarchy of XML. As Sam wrote:

(T)his effort provided an alternate insight into this data, which surfaced a number of questions I never pondered before. For example: is the order of contributors significant? This needs to be answered and documented.

Of course, we can use XML DTDs and XML Schema and a host of other XML ancillary specifications to provide the same precision as we have within the RDF/XML model and syntax; but it doesn’t strike me that the vanilla XML is all that ‘readable’ at this point. In fact, seems to me that you’d have to spend at least as much time reading and working with the XML specifications to do XML ‘properly’ as you would with RDF/XML.

In other words, there’s a ‘tax’ to XML, too, but it can be ignored if you don’t care whether your feed is imprecise. In fact, if my memory serves me, one of the reasons why people had trouble with RSS 2.0 is they felt there were ambiguities in it – that constraints were too loosely defined and it was too easy to generate a ’sloppy’ XML feed. One of the reasons for Pie/Echo/Atom was to create a tight, well-defined feed that also allowed room for future growth. This type of rigor implies either that you use DTDs or XML Schema to ensure similar behavior. Or you use RDF/XML.

Folks ask me to prove the worth of RDF/XML. I think at this point I’ll turn this one around – I’ll ask the “RDF Tax” folks to prove to me that vanilla XML provides the same precision in both meaning and implemented behavior as RDF/XML – and still be ‘readable’.

Categories
RDF

I am not the church and RDF is not the earth

The discussion continues on using RDF/XML for the new Pie/Echo/Atom syndication feed, in Sam’s comments and in the email list. I even had a very fun time in the echo IRC yesterday, though I’m not a particularly adept IRC person.

(I did find out about the use of /me, and went crazy using it as a result.)

I’m glad these conversations are happening now. I would like to work with the Pie/Echo/Atom folks as much as possible promoting the idea of using RDF/XML for the syndication feed, but not at the expense of hiding what this means for the feed in the long run. I do have interests in showing how using RDF/XML can be helpful, beneficial, and not that complicated; but I have no interest in sneaking it in through the backdoor.

In the first chapter of Practical RDF, I wrote:

 

RDF is a wonderful technology, and I’ll be at the front in its parade of fans. However, I don’t consider it a replacement for other technologies, and I don’t consider its use appropriate in all circumstances. Just because data is on the Web, or accessed via the Web, doesn’t mean it has to be organized with RDF. Forcing RDF into uses that don’t realize its potential will only result in a general push back against RDF in its entirety—including push back in uses in which RDF positively shines.

RDF and RDF/XML aren’t for every person or every project. The most I can do is gently work with those reluctant in its use, suggest it where appropriate, demonstrate it here and elsewhere, and be philosophical if it’s use is rejected.

The editor for Practical RDF is my friend Simon St. Laurent, a person who I admire and greatly respect. He was the perfect editor not only because he’s a adept and skilled and a great writer in his own right; but also because he is not an obsessive fan of RDF. Neither one of us wanted Practical RDF to be a ‘fan book’. Both of us realize the problems associated with the perception of the specification, and more specifically the constraints of the markup.

Simon recently wrote a rant, as he styled it, on RDF/XML. I link to it here not to chastise or disagree, but because I found it to be well written and concise in where the pushback against RDF is arising.

I think the reason why I don’t have as much problem with RDF/XML as others is because I’ve been working with RDF/XML about as long as I’ve been working with plain XML. To me, there is no problem with the syntax because I’m so comfortable with it, pure and simple. I need to reminded that others are less so, and I’m grateful when they write their reasons, clearly and bluntly.

Categories
Connecting Photography

Fight or flight

Recovered from the Wayback Machine.

The summer heat and lack of rains lowered the Meramec to the point where I could scramble down its banks tonight and walk along the river bed. The hill leading down was steep and rough and a year ago I wouldn’t have tried it, but days of walking, always on the look out for a new angle for a photograph have increased my agility.

Among the rough stones small frogs, no bigger than a beetle or a dime, were hoping away from me as fast as they could, some jumping into the river to avoid me – becoming a real treat for the surprisingly large fish along the edge. I felt bad that my shadow was triggering their instinctive flight response, but I imagine that the known terror was less frightening than the unknown. Can’t fight instincts – animals react to threats either by running, or by turning and standing to fight. Flight or fight is the name of the game.

riveredge.jpg

I tried to take a picture of one of the tiny frogs, but it didn’t come out well. No loss, though, because it was fascinating just to see them, to explore what would normally be under water. It’s experiences like this that make me so glad that my photography has forced me into situations that I would normally have avoided. What adventures I’ve had and what beauty I’ve seen because of this insane desire to find the perfect angle for the perfect shot.

I started taking photographs seriously in January 1991 when I purchased my first Nikon 8008, as an incentive to quit smoking. I’d smoked for years and had developed a cough that was getting progressively worse. When I woke up one morning and coughed so hard I spit up blood, we knew something was seriously wrong. After I had lung X-Rays, the doctor quietly told me that the results weren’t definitive, but there was some evidence in the film that could indicate emphysema, especially in light with the other symptoms. I would need to have more detailed tests, but one thing was certain – I would have to quit smoking.

The nicotine patch was fairly new then and she prescribed a series of them for me, but I knew that I was going to have to fight the addiction on my own if I was going to be successful at quitting where I hadn’t been before. To give myself something to occupy my time, and hands, I bought the camera.

The doctor warned me that my cough wouldn’t go away quickly, and regardless of what they found, it would probably be years before I’d stop having problems. Still, I managed to quick smoking with only minimal damage to those nearest and dearest to me. In addition to the photography, I also started walking and then hiking to help deal with the to-be-expected weight gain that comes with giving up cigarettes.

Odd thing is, my condition improved drastically. Within three days, I was no longer coughing hard enough to see stars. Within a month I could breath in and not fall down on the ground coughing. By the summer, I was going for days, weeks even, without coughing once, especially as we cleaned all traces of the cigarette smoke from the house. The doctor was more than pleased – she was stunned by the rapid improvement. And puzzled. The additional tests did show some lung deterioration, but not enough to generate the original coughing. This paired with my rapid recovery ended up being a bit of a medical mystery.

More tests and discussions with other doctors and the final finding was that I had developed a severe allergic reaction to cigarette smoke. Allergic to cigarettes and smoking – can you believe it? Consider being allergic to ragweed or cat dander and then waking up every morning and breathing from a bag full of it. That’s what I was doing.

I traded all of that for a few extra pounds, and a love of hiking and photography I have to this day.

riverlowerlight.jpg

Returning to the topic of this essay, this fight or flight. Earlier the frogs reacted in fright and escaped me only to become dinner; but I could have just as easily been a predator bird and the fish in the river replete except the odds weren’t in the tiny frogs favor. Earlier still, I fought for life, as we all do when faced with a challenge to our seeming immortality, but in my case the odds were in my favor. In both situations, instinct took over, guiding us into fight or flight depending on the challenge and the prize. The rest of the time, though, we’re on our own.

I have never successfully figured out when I should fight the good fight and when I should walk away. One time I’ll stay to fight to the bitter end, all dignity and umbrage, only to have others come up to me afterwards and ask me what was I thinking? Why the hell didn’t I just walk away? Why did I rise to the bait?

Other times I beat what I consider to be a dignified retreat from the battles only to be faced with scorn from those who see my walking away to be nothing more than throwing my hands up in the air, and giving up.

moreriveredge.jpg

Earlier this week, in comments over at another weblog I got into a discussion about how one deals with aggressive people. Not just aggressive people – people that can be abusive, people that can be ‘acerbic’, yes that’s the word. Normally, I’d link to the post and the comments and re-print significant quotes from both; however, I’ve done this is the past with topics similar to this, and doing so brings others, willing or no, into this conversation and the focus becomes these people and the relationships between these people, when that’s not what this is all about. With respect, this is about knowing when to fight and when to walk away.

It’s a deep part of my nature not to back down from a fight, and I’ve written before of this failing or strength, depending on your view. I also have a temper, though this is something I’ve learned over the years and wasn’t born with.

(I once worked with another woman, years ago, who said I was great to work with, but needed to learn to be more aggressive. If I gave you her name, would you send her flowers or stones?)

Getting into a fight, a nasty one not a good, challenging debate, can leave you tired and discouraged and there has been times when I have walked away, sometimes with grace, sometimes less so. In these situations, I congratulate myself on not ’stooping’ to the protagonists level, only to be chastised for not standing my ground. Or worse – rising to another’s bait and rather than respond with dignity I respond with anger and storm out, and as a consequence, lose respect.

I’ve thought long about the discussion I was apart of, earlier this week, and one thing that I realized from it is that flight is not an option for me – not in life, not with my beliefs, political and otherwise, and not in my field. Most of the people I associate with in one manner or another are people who don’t suffer gladly those who walk away at the first sign of aggression, no matter how unjustified the aggression and how ugly its manifestation. More importantly, these people are also not of a mind come to my aid in a battle of my own joining, because aside from a few of us, we’re on our own in these things.

That latter has been the toughest for me because of my expectations of a friend coming to my defense; the loyal friend I can send in as my Champion to do to dirt the knave who would besmirch and sully my good name. What a rude awakening to find out that my friends either think I should take care of my own battles, as if I’m a capable, intelligent, and responsible adult; or they disagree with my joining the fight in the first place. I have, at times, found myself wishing for a sycophant or two to call my own in trying times, but I dare say this is counter-productive to my emotional growth.

The frog, the shadow, and the fish in the river. I should write another parable using this cast of characters, but for now, another photo as I continue my contemplations.

halo2.jpg

Categories
RDF

Bray and Symbols and Grounding

Tim Bray on the namespace fooflah that’s been happening:

Right now, in the context of the Pie/Echo/Atom/whatever project, people assert that crystallizing the meaning of embedded namespaces is the key to interoperability, the central problem, and so on. Huh? When someone proposes markup from another namespace for inclusion in a syndication feed, there are three possible outcomes:

Nobody pays attention and it isn’t much adopted.

It gets widely adopted, with semantics along the lines originally proposed.

It gets widely adopted, with some semantic drift away from the original proposal becoming evident in the implementations. (Note that this has already happened with some RSS 2.0 markup).

Oddly enough, this is exactly what will happen with proposed tags and attributes that aren’t in a different namespace.

I agree with Tim when he summarizes his essay with “…we shouldn’t try to kid ourselves that meaning is inherent in those pointy brackets, and we really shouldn’t pretend that namespaces make a damn bit of difference”. There is no ‘meaning’ behind markup, there is no ‘meaning’ behind namespaces.

But, there is behavioral assumptions associated with both – behavior that can be programmed into both producers and consumers of the markup. In the recent discussions about namespaces, as per Jon Udell, the programmatic behavior and assumptions might be getting a bit blurred about the meaning of it all, but within the Pie/Echo/Atom world, the discussion of namespaces is concrete: what signals a change, what works, what doesn’t, and what should be ignored.

For me, namespaces say:

I mark things that belong in a specific schema. This schema isn’t an extension to diddly squat – it can live on its own, thank you. If it didn’t, we’d have these psuedo-schemas floating around because the originators of Important Schema didn’t take the time to do their job right in the beginning. The opposite of analysis paralysis is … broken bits of schema floating around, desperately holding on to Big Brother, hoping to be acknowledged as part of the family and not the bastard add-on that crept in after darkness.

If you see a name like one of mine somewhere else that has a different namespace, this means that the two things aren’t the same. How they differ is up to organic side of this relationship to figure out. I personally don’t care. Because all I do, is mark things.

Come to think of it, there is a lot of ‘meaning’ in my understanding of namespaces, isn’t there?