First of all–isn’t there anything in any of the syndication feed specs that when a syndicated item returns 404 or like, some indication is made? Shouldn’t there be?
In the meantime, I’ve been putting some thought into what I can do with S3. If you’ve been living under a rock (or conversely, are non-tech and go out to the park and stuff on the weekend), S3 is a very cheap mass storage system that Amazon is providing. You pay a few bucks a month and get bunches of space and bandwidth. The only thing is, you have to store data using web services–it’s not a regular hosting system.
I thought this would be a perfect place to put my RDF files. You can’t store database data at S3, which limits data types of storage. But I don’t store my RDF data in a database. Each model is stored as a separate file, which would be simple to move to the storage. Only thing is, I have plenty of space between the two servers I have now–my shared system for the weblog, and my development server.
I could put my pictures on S3, but it took time for me to find a way to pull all of these back from Flickr AND modify my URL in my posts. I’m not of a mind to do the URL thing again.
I could store my gmail email on S3, but I deleted the account. Actually, I’ve deleted most of my centralized accounts.
That space demands media files. Only problem is, I’m not a real media person–outside the pics. I don’t think I’m going to get heavily into podcasts. I don’t have a video camera.
As for storing my personal computer data at S3, I have a DVD burner; I have blank discs.
The more I think on it, the more I think S3 would be a good spot for RDF data. Not just the RDF that helps run my site–RDF I download, or RDF I scrape from other sites, or RDF I pick up here and there. Then, when I need the data, since the models are stored as separate files, it would be easy to access the data, and update it if necessary.
This doesn’t work with the microformat stuff, as this type of metadata is stored directly in the pages. RDF, on the other, hand, can be associated with our web pages or other files, but stored in an external location.
The key is not to provide public access to the data on S3. I don’t control the domain name, I am unaware of how one can assign a domain name for an individual piece of storage, and there is no guarantee the data will live there forever. It’s hard enough preventing 404 errors when I do host the files, much less when I don’t.
Instead, I’ll mine the data from my server, and then serve it directly from my domains. If I then decide to move the files, I just pull the data, put it somewhere else.
As for security and confidentiality of data–heck, people have been bitching about how unreadable RDF/XML is for years. Now when they say it, I can smile, tell them it’s a perk.