Day: February 22, 2003

Getting hammered

Post author By Shelley Powers
Post date February 22, 2003

Someone used a fake email address from my ‘yasd.com’ domain to send a huge spam emailing and I’m getting hammered with email rejections and mail delivery system failures.

If I find the little creep that did this, I’m going to take out that virus code someone embedded in my comments and use it to fry his machine.

Don’t send email directly to me until I send all clear. All clear. The email to the bogus email address is being directed to the email blackhole.

Update

Email is originating from spedia.com servers.

Update Two

Nope, they were victims, too. This could be an email virus. Get an email from ‘zhujil@yasd.com’, delete immediately.

Update Three:

Have forwarded all email to email blackhole, so I’m no longer getting all the responses, though my server is still getting hammered (less than if delivered, though). Question: who is email spam/virus expert in audience? I want to find out where these things originated. I’ve kept all the emails with the headers.

Last Update:

The email forwarding didn’t take at first, but is finally working. All total over a 1000 emails in a very short time. Many of the rejections were from automated virus scanning systems, so I know that the email did contain a virus.

I’m going back to bed.

Weblogging

Morphing URLs

Post author By Shelley Powers
Post date February 22, 2003

Recovered from the Wayback Machine.

I signed up at Blogrolling.com to manage my blogroll, and you can the results in this page. Scroll down and you’ll see the ten most recently active webloggers in my virtual neighborhood. Click the “more…” link and you’ll go to my Blogroll page.

I’m using the blogrolling.com feed a couple of different ways. I’m using the raw PHP feed in this page, because it’s simple to process. However, I modified the code of the feed to only display the recent ten updates. I’d create another list, instead, and limit it to the most recent ten (a feature at blogrolling.com), but that’s only for those who have paid, and money is in short supply at the moment. So I tweaked the code on my own.

In the Blogroll page, I’m accessing the feed as RSS, and then using the PHP XML classes to process the data. By doing so, I can access the individual elements of the feed, such as the URL of the weblog, which I then use with my new Talkback feature.

(I’m thinking of accessing the RSS feed in this page and then caching the feed locally, to be used by the blogroll page, and lower the number of hits against the blogrolling.com site. We’ll see.)

Blogrolling.com makes use of changes.xml from weblogs.com to check for recently updated weblogs, a feature I incorporated into my blogroll. I really appreciate this, because it lets me see who’s updated without having to use an RSS aggregator, something I’m not fond of.

The problem, though, is that we’re inconsistent in how we format URLs. For instance, a person might update weblogs.com as “http://www.myweblog.com/”, but a blogrolling.com customer adds them as “http://myweblog.com”. These are two different URLs, syntactically, even though they point to the same weblog. Unfortunately, then, when the person updates their weblog, they’re not floating to the top of my blogroll.

The problem is that we all have different understandings of how a URL works, and what we need to use in a URL, and what not. Time for URL 101, I think.

First, the ‘www’ that is so common in most URLs today. Originally, the ‘www’ part of a URL stood for the hostname of the server on which the website lived. The complete name, ‘www.myweblog.com’ then translated into a specific IP (via DNS lookup of the domain) and a specific server.

Things have changed quite a bit and we now have something called virtual hosting. What this is, among other things, is the ability to create a sub-directory, such as (web server basepath)/weblog, and have the web server map weblog.domainname.com to that sub-directory. For instance, I have the following sub-directories, each of which is paired with the mapped subdomain:

basepath/weblog – weblog.burningbird.net
basepath/rdf – rdf.burningbird.net
basepath/articles – articles.burningbird.net
basepath/www – www.burningbird.net
and so on..

The last one in the listing shows www.burningbird.net, but I don’t have to use “http://www.burningbird.net” to get to my top-level web site — I can use “http://burningbird.net”. The reason is within my web server configuration files, the URLs “http://burningbird.net” and “http://www.burningbird.net” map to the exact same sub-directory, the one named ‘www’. You’ll find with most modern web installations that “http://www.domainname.com” and “http://domainname.com” map to the same sub-directory on the server (something you can easily check through your browser).

Just think: All that time when you’ve been typing in ‘www’, when you could have saved key strokes. Why you probably could have saved enough time to go and buy a Krispy Kreme.

(Note, though, that this mapping isn’t consistent, and you may actually get errors if you omit the ‘www’. Don’t you love individualism in web access?)

So the use of ‘www’ isn’t mandatory. Neither is the use of the trailing forward slash (‘/’) at the end of the URL, as you’ll see some people use.

In olden times, when you used the trailing slash at the end of the URL, the browser knew that you were accessing a directory not a file, and you saved the browser a second trip to the server to determine this. However, all modern browsers now assume that “http://yourdomain.com” and “http://yourdomain.com/” are the same, and you don’t get any performance benefit from the use. However, if your weblog is off of a sub-directory, such as “http://yourdomain.com/somedirectory/”, you will still, usually, get a performance benefit using the trailing slash.

However, the use of the trailing slash is one more difference in our URLs. At this point we have the following variations all pointing to the same web page:

http://www.yourdomain.com
http://www.yourdomain.com/
http://yourdomain.com
http://yourdomain.com/

But there’s yet another variation — specifying a file, explicitly.

For most of us, our weblogs are located in a page named ‘index.someextension’. It could be ‘index.html’ or ‘index.htm’ or ‘index.php’ and so on, but it is the index file, which is the default file to load when a directory is specified without a file name (this differs slightly based on web server and configuration).

To load my weblog, you can access “http://weblog.burningbird.net”, and you’ll get “http://weblog.burningbird.net/index.php”, because my web server is configured to look for files in the following order:

index.html
index.htm
index.php
and so on

As long as I don’t accidentally include an ‘index.html’ file in my directory, you’ll get the index.php page instead.

By not specifically giving the file name extension, what I can do is change the type of file, from index.html to index.php, and you all don’t have to change your links to me because you’re only specifying the directory, not explicitly the file name. In fact, if a person is using the default ‘index’ file name, you shouldn’t use this in your blogroll link to them, because it will break if they go to a new file format.

However, we now have yet another variation of the URL:

http://www.yourdomain.com
http://www.yourdomain.com/
http://www.yourdomain.com/index.html
http://yourdomain.com
http://yourdomain.com/
http://yourdomain.com/index.html

All in all, our use of URLs is about as distinct as we are, and I’m amazed that the bubble up feature of blogrolling.com works, at all.

To attempt to work around these challenges, I added people to my blogrolling.com list when they showed on weblogs.com, using the URL format they used with their pings. In addition, I checked the person’s perma-links, to see if they used ‘http://www.domainname.com’ or ‘http://domainname.com’, and so on. It became a treasure hunt in a way, but the golden egg in this hunt is a correctly bubble upped URL when the person updates.

BUT…

This has left my Talkback feature in a difficult state. The reason is, that the URL you use to ping weblogs.com, usually generated by your weblogging tool, isn’t the same URL you used in my comments. So, you might bubble up to the top of my blogroll, but querying for the blogrolling.com supplied URL in Talkback results in no comments showing.

Pain in the butt.

What we need is consistency. Perhaps we need a URL cleanup day, to clean up the URLs we use in our blogrolls. And a common guideline for URL usage, such as the following:

Use ‘www’ only if you need to. You don’t need to use ‘www’ unless your page doesn’t resolve without it.
Use the default ‘index.extension’ filename for your weblog main page.
If the default filename is used, don’t including this in the blogroll link. You’re putting a burden on the weblogger to have to use redirection if they want to change to a different page format.
Use the same URL in your comments that you use when pinging weblogs.com or blo.gs. In fact, be consistent with your weblog URL regardless of where you use it.

I’m not going to say it…

Post author By Shelley Powers
Post date February 22, 2003

Unfortunately, I came down with the same flu that’s hit so many others. This and a nasty snow storm that just blew in are conspiring to keep me from my much needed walks and explorations, dammit. So I might as well work on the edits for Practical RDF, and some tech tweaks around here.

One tweak is, I upgraded to Movable Type 2.62. I had to re-apply my Trackback re-build customization, and one small change in the search template, but other than that, the upgrade went without a problem.

I hadn’t started playing with any of the new features yet, but did notice the button to add Creative Common licenses to your weblog. However, before you touch that button, you might want to read about the experiences of a MT user that Phil documented. It seems someone decided to play around with the license only to find out you can’t remove it from the page. There is no off switch. Currently, the only way to remove CC licenses in MT is to make a change in the database, and it sounds like the templates.

Big ouch, there. Sounds like Ben is working on the problem, but I wouldn’t play with pushing buttons now, until this fix is made.

However, there is a misunderstanding I do believe in the license interpretation. From what some of the legal beagles here abouts have said, if you apply a CC license on a work, and then withdraw it, this withdrawal doesn’t impact on people who have already used your work. However, the CC license will no longer apply to new uses of this work. At least, this is my understanding of what others have said.

(I wish I could remember who said this and where. This is one of those times when we need to be able to track a thread regardless of use of tracback and other technologies.)

Additionally, and legal people correct me if I’m wrong, in Phil’s comments, the person who had mistakenly applied the license stated:

It would have only granted an “irrevocable license” on any new material published while the CCL was still displayed. (The content published previously remains protected by its original copyright since it predates the CCL and cannot be covered by the CCL legal agreement.) Since no new content was published under the Creative Commons License while it was briefly displayed on the site, the license’s addition to the page and later removal is mute.

Sorry, but from what I hear, the CC license applies to whatever it’s attached to, regardless of date of material. Unless you specifically make annotations that the license is only effective on material dated after such and such a date, or only to the design of the site, or only to the writing, or only the images, that license applies to everything. And if you mix CC licenses and copyright on the same page, from what I’ve been told, the person can pick which license they choose to use your material under — so long copyright.

Dammit, call me cranky from being sick and missing out on my walk, but the point I’ve been making is that the use of the license is too damn confusing for people who aren’t legal experts. I think adding CC license support into Movable Type is a mistake, pure and simple. This is a case of technology and law mixing to the detriment of both. Adding new toys, but new toys that can bite you on the butt.