Weblogging

I recently installed Planet software in order to provide one common feed from all my different web sites. I like separating out my different interests into different web sites. However, for those who are interested in keeping up with the various writings, having to subscribe to multiple feeds can be irritating. Enter Planet Planet.

Planet is a feed aggregator–an application that aggregates several feeds into one, and then provides a display of the result. The application uses Mark Pilgrim’s Python feed parser to parse any valid RSS 2.0, RSS 1.0, or Atom feed. It then combines the data into one coordinated whole, which is cached and published as both feeds and XHTML web page. To simplify the page generation, the software uses Tomas Styblo’s templating engine, which takes tags added into a template page and transforms them into generated XHTML output.

Though Planet Planet is written in Python, you don’t need to know any Python to use the application. All changes to the application occur through the templates, the stylesheet, and the configuration file, config.ini. You do need to have Python 2.2 on your server, but most hosting companies pre-install Python.

To start, download the Planet Planet software to your desktop and unzip it to a folder. Contained in the zipped file are several files and folders. The ones of interest right now are the INSTALL file and examples folder.

Within the examples folder, there are several folders and files. Two–basic and fancy–provide configuration file and index page templates, one plain, and one more fancy. Until I create my own template, I’m using the fancy, which provides a nice output page. I copied the index.html.tmpl and config.ini files within the fancy subdirectory to the main examples folder, the images folder and stylesheet to the output directory (discussed later) and deleted the fancy and basic folders. The resulting list of files and subdirectories in examples is now:

config.ini (configuration file)
index.html.tmpl (XHTML template file)
atom.xml.tmpl (template file for the atom feed)
foafroll.xml.tmpl (template file for FOAF roll — Friend of a Friend blogroll)
opml.xml.tmpl (template file for OPML list)
rss10.xml.tmpl (template file for RSS 1.0 feed)
rss20.xml.tmpl (template file for RSS 2.0 feed)
cache (folder)
output (folder)

You’ll be editing your configuration file, but first we’ll load the application to your server and then edit and re-load the configuration file. The Planet software doesn’t have to be accessible by the web: it’s processed using an application command (discussed later). As such, it can be installed in top level of your site. For instance, most folks have a site structure that looks like the following:

/home/yourname/public_html/website

/home/yourname/www/website

The website portion (www or public_html) is accessible by the web, the host name directory (/home/yourname) is not. The Planet software can be installed anywhere, but I preferred to put it in my home directory (/home/yourname).

I used my FTP application (Cyberduck, my favorite) to upload the entire Planet folder to my home directory. Since I installed Planet where it’s not web accessible, I deleted the output folder in examples. Instead, I’m having my generated Planet files put into a new subdirectory I created, planet, which is located in my home directory, among the web sites:

/home/yourname/www/planet

Most hosting companies allow creation of subdomains, and I created mine as planet.shelleypowers.com. However, you can also just access the site as a subdirectory, http://yourdomain.com/planet–makes no difference to the software.

Once I determined where the generated files would be created, and how they will be accessed from the web (such as http://planet.yourdomain.com), next step is adjust the config.ini file. It can be edited using your favorite text editor (such as Notepad in Windows, TextEdit in Mac). I’ll cover each section, in turn.

The top section looks something like the following:

# Planet configuration file
#
# This illustrates some of Planet’s fancier features with example.

# Every planet needs a [Planet] section
[Planet]
# name: Your planet’s name
# link: Link to the main page
# owner_name: Your name
# owner_email: Your e-mail address
name =
link =
owner_name =
owner_email =

Next to each field, type in the value. For my site, it’s as follows:

name = Planet Powers
link = http://planet.shelleypowers.com/
owner_name = Shelley Powers
owner_email = shelleyp@burningbird.net

The name is the name that’s displayed as the title for the generate output. The link is the URL for the generated output. Owner name and email is self-explanatory.

The next section looks like the following:

# cache_directory: Where cached feeds are stored
# new_feed_items: Number of items to take from new feeds
# log_level: One of DEBUG, INFO, WARNING, ERROR or CRITICAL
# feed_timeout: number of seconds to wait for any given feed
cache_directory = /home/yourname/planet/examples/cache
new_feed_items = 2
log_level = DEBUG
feed_timeout = 20

The cache directory is where the cached feeds are stored. It’s the full pathname of the installation at your site. In my case, my home directory, followed by planet, then examples/cache. The new_feed_items is the number of new feeds to take from each feed aggregated in your installation. You can adjust this to take more than 2 to as many as you prefer.

The log_level is used to determine what is written out to the Planet software log. Currently this is set to DEBUG; leave as is. The feed_timeout is set to 20 seconds. This is the length of time before a timeout occurs when trying to read any feed. I’d suggest you leave this alone, too, for now. if you find that you’re missing out on a feed, you might want to adjust this value.

The next section lists out the template files, each one separated by a space. You can use as few or as many templates as you want. In my case, I wanted the main web page (index.html.tmpl) as well as the three feeds. I wasn’t interested in the FOAF roll, or the OPML file, so I deleted them from the line.

I also had to adjust the path location for each file. This isn’t the URL; this is the actual file location, such as:

/home/yourname/planet/examples/index.html.tmpl

My template_files setting at this point is:

template_files = /home/yourname/planet/examples/index.html.tmpl /home/yourname/planet/examples/atom.xml.tmpl /home/yourname/planet/examples/rss20.xml.tmpl /home/yourname/planet/examples/rss10.xml.tmpl

The next section contains several settings, including the output directory. You’ll want to provide the full file path for the output directory, such as /home/yourname/www/planet/. This is where the generated files will be put, not the Planet software installation.

You can also define how many items you want per page (default is 20), the file format, the type of encoding, and locale. Unless you have a specific reason to alter the date formatting or encoding, I’d leave as is. You can set the locale to whatever is appropriate for your own locale, such as fr_FR.UTF-8 if the result will be in French. Otherwise, you can leave it as the default, which is ‘C’, a Python default value, using the server’s locale setting.

Following is a section where you can provide template specific settings if you want to use different settings for each template. For instance, if you want your RSS 2.0 feed to be in French, you could define a locale setting specific to this template. Right now, though, we’re using the default templates and settings, so we’ll leave this section blank.

The next section settings are self-explanatory: how many days to display, and if you want to disable a feed that’s been inactive for so many days. The section after provides a default width and height setting for photos (or other images) if you want associate an image with each feed. To see an example of Planet using face images, check out Planet Gnome, which also incorporates a nice use of CSS.

The last section lists the feeds. For each, you provide the feed URL, and optionally, the feed name, face image (or icon image), and custom width and height for the image if it differs from the default. There’s more that can be defined in this section, but I’ll cover that in a later posting. For now, in my installation, I have the following:

[http://words.einsteinslock.com/feed/atom/]
name = Just Shelley
# pick up the default facewidth and faceheight

[http://bbgun.burningbird.net/feed/atom/]
name = The Bb Gun

[http://scriptteaser.com/feed/atom/]
name = ScriptTeaser

Contained in square brackets is the feed URL, followed by the feed name.

Once the config.ini file is edited, it’s uploaded to the directory, usually the same location where the template files are located. Then it’s just a matter of running the application.

If you have command line access to your server, through SSH or via web application, run the application like follows:

python /home/yourname/planet/planet.py /home/yourname/planet/examples/config.ini

Specify the full pathname of both the application and configuration file, to ensure both are found. If all goes well, you should see the generated pages in your output directory. Otherwise, you’ll see an error in the output. This is usually fairly easy to debug: most of the errors are the application not being able to find the template or configuration file, or not being able to generate the output to the output directory. In other words: path information.

If you don’t have command line access, or do and the application runs fine, time to set up the application to run on a regular basis. Most hosts provide an option to add cron entries. A cron ‘job’ is one that runs at a regularly scheduled time. If you use cPanel to manage your site, use the following steps:

Access the cron setup page.

There are two ‘styles’ of cron setups; select Standard.

In the page that opens, enter an email address where the Planet software output is to be sent. Once the application is running properly, you’ll most likely want to remove this.

Next add the command line to be run. It will look something like:

/home/yourname/planet/planet.py /home/yourname/planet/examples/config.ini

Then pick the times when the application is run. If you’re only just testing the application, have the application run every five minutes, so you can check out the output after making edits. Otherwise, set how often the cron job will run by setting the minute, hour, and day of week when the application is run. My own is set up to run at the top of the hour, every hour, every day of the week. Unless you have a great number of feeds, this should be sufficient.

At this point, you should have a working Planet installation, using default settings, templates, and stylesheet. All of this, without once touching Python code.

You can use the software to aggregate the feeds from your sites, as I do. Or you can use the software to aggregate feeds from other sites. A popular use of Planet is to aggregate sites based on topics–here’s your chance to aggregate your favorite food, literature, political, tech, whatever sites. You can even create multiple aggregations, as I do for my feeds and my comments. One installation of Planet can process any number of aggregations–all you need is separate config.ini files.

To recap:

Download the Planet software and unzip.

Copy the files to an installation directory. This location does not need to be accessible via the web.

Create a location for the output files. This output directory must be accessible via the web. Convention has it that the directory is named ‘planet’ but you can use whatever you want.

Determine whether you’ll use the plain or fancy template and copy the index.html.tmpl file and config.ini to the same location as the other template files. If using the fancy template, copy the images subdirectory and stylesheet to your newly created output directory.

Edit the config.ini file providing, at a minimum, Planet name and URL, owner name and email, actual path location of the template files and output directory, and a list of feeds.

Copy the config.ini file to the installation directory.

Run the application, either from the command line or as a cron or other schedule job. You may have to get your hosting company’s help with this.

I have to agree with the Herd on this one: AOL screwed the pooch by releasing its actual search results. Even if the data is ‘anonymized’ (is that a new Web 2.0 word?), it shows a betrayal of confidentiality that’s going to end up costing the company big time. AOL is trying to paint itself as a newer, hipper company with its recent weblogger hirings, as well as monetizing (another new Web 2.0 word?) of linkers. This at a time when its customer service will eventually become a verb in modern dictionaries: to AOL a customer (i.e. argue with a customer, abuse the customer, not let them make legitimate changes in their account, and now, betray the customer’s confidentiality.)

There are ways to provide search term data that doesn’t rely on exposing actual search terms, many of which include names, phone numbers, and addresses (and other associated information in unrelated searches that could prove embarrassing). This is pure hype; attention grabbing stuff. Bad juju, may your CDs burn in hell, AOL behavior.

This site is where I first read the story (via Seth), and this is the site that seems to have broken the story. I wanted to give credit where it’s due, since certain A listing sites seem to think this is unnecessary.

(I’m pointing to Search Engine Watch’s post on the topic because of the last sentence in the post, related to a person’s reaction to the news: Want more wow. Though I don’t think we’re discussing the same thing, I want more wow, too.)

update

AOL has officially apologized.

Image of little boy, looking guilty as hell, one hand in grubby jeans, the other hand’s thumb tucked into the urchin’s mouth. He kicks his foot in the dirt, looks, down, cheeks turned red and mumbles, “I’m sowwy”. He then holds his arms up, wanting to be picked up and held, and told, “That’s OK, we still love you. Just don’t set fire to the cat again.”