Categories
Technology Weblogging

Web Server 101 for Ghost

update

I disabled the Ghost weblog because I don’t want to maintain two different weblogging software applications, and Drupal is my primary weblogging software.

The configuration listed here was sound and reliable for well over six months.

—-

Recently I installed Ghost on my server. Ghost is a Node.js weblogging tool that you can host yourself or have another organization host for you.

It’s not a complicated application to install. There’s even a How to Install Ghost web site, with handy instructions for installing Ghost in various environments.

Take my environment, which is Ubuntu Linux hosted on a Linode VPS. You could order a brand new VPS, but chances are your server is already set up, and you’re just looking to install Ghost.

You have to install Node.js first, of course. Then you download the Ghost source code, and install the application, using the following steps:

mkdir -p /var/www/
cd /var/www/
wget -O ghost.zip https://ghost.org/zip/ghost-latest.zip
unzip -d ghost ghost.zip
cd ghost
npm install --production
cp config.example.js config.js

Next you edit the configuration file, changing the default IP and port values to ones that allow people to access your new Ghost weblog:

host: '[your Linode public IP]',
port: '80'

Lastly, start that puppy up:

npm start --production

 

Now, you’re all set, good to go, blasting full ahead on all cylinders…

…right for that wall that is about to hit you, hard, in the face.

A reality check on that environment

Let’s back this sweet scenario up to, oh, step 1: your environment.

True, you could be installing Ghost in a brand new environment with nothing else installed. Chances are, though, you’re installing Ghost in an existing system that is already running a web server. Chances are also good the web server you’re running is Apache, which is running on port 80, the same as your new Ghost installation.

So what happens when you try to run Ghost on the same port you’re using with Apache?

If you try to start the application withnpm, you’ll get an almost immediate error response signaling that an unhandled error event has occurred, and to contact the Ghost author. If you try to start the Ghost application via another tool, such asforever, you may not see an immediate problem at the command line, but if you access the Ghost site via a browser, you’ll get a 503 “Service Temporarily Unavailable Error”.

You’re getting an error because you can’t run both web services at the same port. You then have two options to run your Ghost weblog. Option 1 is to shut down Apache or install Ghost on a machine where Apache is not installed, and then you can run Ghost at port 80. The second option is to find a way to get Apache (or other web server) and Ghost to co-exist.

Let’s take a look at that first option: running Ghost in a stand alone environment at port 80.

Running Ghost in a Stand Alone Environment

If you don’t have Apache installed, or no other web service running at port 80, you could follow the steps I gave earlier, and you might expect Ghost to start. You might, but you could be in for a surprise.

If you try to start Ghost with a line such as the following:

npm start --production

You’ll most likely get an error. If you try to start Ghost up directly, using Node, you’ll find that the error is EACCES, “permission denied”. The reason why is that port numbers below 1024 are what is known as privileged ports, requiring root permission. When you tried to start up Ghost as a normal user, you lacked the permissions to start the service.

You might be tempted to use sudo to start the Ghost application:

sudo npm start --production

And yes, Ghost should be alive and well and running on port 80. But this command just triggered a visceral response from security minded web developers, because now Ghost (which is currently in beta development and unlikely to be hardened) is now running with root privileges. Not even Apache, hardy veteran of the web wars that it is, responds to web requests with processes running with root privileges.

Behind the scenes, when you start Apache as root, it spawns a worker thread that handles a web request. This worker thread runs with the privileges of a non-root user, typicallywww-data. If you’re running a Linux server, you can issue the following command and see that one Apache process is running as root, while other child processes are running as the Apache user (www-data).

ps -ef | grep apache

Other web servers, such as Nginx, operate under the same principle.

So, what do you do if you want to run Ghost at port 80?

Well, you can start Ghost up on port 2368 (the default port) and then go where most folks don’t dare to go by tweaking iptables to redirect port 80 to port 2368, or something similar. Bottom line, though, is you just don’t run Ghost as a stand alone web server/application, period.

In an issue thread at github the architects behind Ghost have stated that they see Ghost as an application, not web server. As such, they don’t recommend that people run it directly, as a web server. Instead, they recommend using Ngnix or Apache as the web server, and then reverse proxy to Ghost. Both Nginx and Apache are hardened from a security perspective, and both do a better job providing other services the Ghost folks won’t be providing.

The Ghost folk recommend using Ngnix,and you can use Ngnix as reverse proxy for both Apache and Ghost if you’re hosting Apache applications. For myself, I don’t need the extra level of complexity or the performance boost that Ngnix can provide for static files, so I decided to go with Apache as reverse proxy. I’ll cover the steps I took, but first, let’s return to that running instance of Ghost.

Running Ghost Forever

My Ghost weblog is currently running at Shelley’s Toy Box. Shelley’s Toy Box is my hack and trash domain where I’ll try new things. I thought about getting a separate VPS for it, but can’t really afford it at this time, so I’m making do with a virtual host for now.

Ghost is running at port 2368, the default. Yes, this means you could also access the Ghost weblog at http://burningbird.net:2368, and completely bypass Apache doing so, but we’ll get to that in a little bit. For now, I have started Ghost usingforever:

forever start  -l forever.log -o out.log -e err.log index.js

If Ghost crashes and burns,forevershould start it back up. However, if the server, itself, gets re-booted, Ghost isn’t restarted. Luckily, since I’m running Ubuntu, I can use Ubuntu’s Upstart to re-start Ghost withforever.

I created a simple file named ghost.conf in /etc/init, with the following text:

# /etc/init/ghost.conf
description "Ghost"

start on (local-filesystems)
stop on shutdown

setuid your-userid
setgid your-grpid

script
    export HOME="path-to-ghost"
    cd path-to-ghost
    exec /usr/local/bin/forever -a -l path-to-logfiles/forever.log --sourceDir path-to-ghost index.js

end script

Now when my system re-boots, the Ghost weblog restarts. And it restarts as my non-privileged user thanks to thesetuidandsetgid.

Using Apache as reverse proxy

Apache has a good description of forward and reverse proxies, but simply stated, a reverse proxy allows web users to access both Apache and Ghost in our servers, seemingly on the same port 80.

I’ve not used Apache as a reverse proxy before, so this is new territory for me. After going through variations that left my server crawling on its knees, I found a routine that seems to work well, or at least, not work badly.

First, I had to enable bothmod_proxyandmod_proxy_http:

sudo a2enmod proxy
sudo a2enmod proxy_http

I’m not turning on forward proxying, so didn’t uncomment any of the lines within the proxy.conf file in Apache’s mods-available subdirectory.

I already have several virtual hosts configured, so it was a simple matter of creating another for Shelley’s Toy Box. In the settings, I turnProxyRequestsoff, just because I’m paranoid (it shouldn’t be on), and then add myProxyPassandProxyPassReversesettings:

<VirtualHost ipaddress:80>
    ServerAdmin shelleyp@burningbird.net
    ServerName shelleystoybox.com

    ErrorLog path-to-logs/error.log
    CustomLog path-to-logs/access.log combined

    ProxyRequests off

    <Location />
            ProxyPass http://ipaddress:2368/
            ProxyPassReverse http://ipaddress:2368/
    </Location>
</VirtualHost>

I specify my ip address for the virtual host, because I have used two IP addresses in the same server in the past, and may again in the future.

Once I enable the site and reload Apache, I’m good to go.

a2ensite shelleystoybox.com
service apache2 reload

The only issue left is the fact that people can also access the Ghost weblog directly using other domains and the 2368 port (i.e. http://burningbird.net:2368). Since I’m running Ubuntu and am using iptables, I add the following rule:

iptables -A input -i eth0 -p tcp --dport 2368 -j DROP

This prevents direct access of port 2368, while still allowing Apache to proxy requests to the port. I maintain it between boots using iptables-persistent. However, it is modifying iptables, so you have to balance modifying iptables against people accessing the site via another site domain and direct access of the port.

Ghost as Weblog

Ghost is an important addition to the Node.js community because weblogging software appeals to a broad range of interests and experience levels. That latter is particularly important because we’re now seeing with Ghost the issues people face when running a Node.js web application, particularly in a world where Apache is so ubiquitous. Ghost is a learning experience, and not just for the Ghost developers or users.

Having said that, at this time I don’t recommend Ghost for people who are only looking for a weblogging tool. Ghost is still in the early stages of development and lacks much of the basic functionality we’ve come to associate with weblogging software. However, if you’re interested in Node.js development and are looking to get in on the ground floor of a weblogging application (before all the complex bells and whistles are added), Ghost can be a fun and educational alternative to Drupal or WordPress.

Whatever you do, though, don’t run Ghost with root privileges. It’s no fun to get your butt bitten off.

Categories
Documents Legal, Laws, and Regs

Don’t Mess with one of the E-Discovery Triumvirate

I dabble more than a little in the legal world, but that’s OK, because the legal world dabbles quite heavily in the world of technology. Nowadays, metadata is the smoking gun in court, and e-discovery is the ballistics test that uncovers it.

The concept of e-discovery, or electronic discovery is simple: it is the discovery, identification, and production of electronically stored information (ESI). However, the execution can be involved, complex, and frequently contentious.

Take for example something seemingly simple and benign: the keyword search. If you and I want to find out about something online, we open up Google or Bing and type in some words, such as “e-discovery keyword search”. We typically get back a ton of links, in order of relevancy. We pick and choose from among the links to find what we need. Rarely do we have to go beyond the first few pages to get the information or resources we’re looking for.

In a legal case, though, what keywords are used can trigger a conference between parties, and even hearings with the judge. If there’s too much material produced, both parties may want to refine the keywords; too little material produced, and the parties may question what keywords were used, or whether the use of keywords is even useful.

In a white paper titled Where Angels Fear to Tread: The Problems of Keyword Search in E-Discovery (pdf), the author notes:

The heavy reliance on keyword search in e-discovery places an enormous burden on today’s legal teams. Inconsistencies in language, inefficiencies in search techniques and software user interfaces, which conceal more than reveal, place the attorney in a difficult position: determining what is relevant in a compressed timeline using obsolete tools and tactics. These outdated tools are a key factor behind the spiraling costs and risks associated with e-discovery.

There’s an entire science devoted to keyword searches within the legal community. As for other metadata, oh my goodness, let’s not even get started.

The use of e-discovery was an important component of the Ringling Brothers/animal welfare group Endangered Species Act case (now titled “AWI et al v. Feld Entertainment”). It has continued as an important component of the fees allocation process for this same case.

In a decision that is both unusual and controversial, the judge in the case, Judge Emmet Sullivan, decided that the animal welfare groups should pay attorney fees to Feld Entertainment for the 9+ year court case. After many months, Feld’s lawyers submitted their fee request in a set of filings spanning thousands of pages. (See my copy of the case history, starting with docket number 635.) Not only is the $25 million dollar (and change) fee request large, it’s also been provided in a not useful format: PDF documents with manual redactions, and color coding (example).

The animal welfare groups asked for something a little more useful:

The Fee Petition, which spans at least four-and-a-half four-inch binders, includes nearly two thousand pages of time records and invoices as well as numerous other Excel spreadsheets and tables. The time records and invoices, accounting tens of thousands of attorney and staff hours, are so voluminous that FEI’s paid experts were unwilling to review them. Plaintiffs, unfortunately, do not have the luxury of limiting their review of the time records and invoices to a determination that the “time entries provide level of detail . . . that is typical of appropriate block billing practice,” as Mr. Millian did, see D.I. 664 at 18, or to review only a supposedly “representative sample of litigation activities” limited to three brief periods of time, as Mr. Cohen did, see D.I. 663 at 11-12.5 Rather, Plaintiffs and their experts must scrutinize all of the hours that Feld now seeks to pass on to them.

As Feld’s experts make clear, and as Plaintiffs’ counsel explained to counsel for Feld, this is not a task that can be accomplished by reading the PDF versions of spreadsheets and invoices that Feld included in the Fee petition. It can only be accomplished via computer assisted analysis of the underlying time records using a program such as Microsoft Excel, which will allow Plaintiffs’ counsel and/or experts to (i) sort the data, (ii) perform complex searches within the data, and (iii) mathematically compare time entries across (for example) timekeepers, law firms, and parties to the litigation.

There is no commercially available computer program that can take a PDF of an Excel spreadsheet, much less a PDF of actual invoices, and generate a functioning spreadsheet containing the underlying data. Accordingly, the only way Plaintiffs could independently recreate the time records of Feld’s counsel would be to manually reenter tens of thousands of rows of numbers and text, a process that would take even highly-experienced data entry personnel hundreds to thousands of hours. It would be patently unfair to require Plaintiffs to undertake such an effort to recreate data that Feld’s counsel already have at their fingertips. Moreover, because an analysis of Feld’s billed time is one of the first steps needed to craft Plaintiffs’ response to the Fee Petition, requiring Plaintiffs to replicate Feld’s time records would inject months of needless delay into the fee application process, in addition to creating needless, and substantial, additional expense.

Feld’s lawyer’s response begins with:

Plaintiffs’ second request is for FEI to re-create all of the time entries for Fulbright (JS Ex. 31 and 32), Covington (EG Ex. 1), and Troutman Sanders (“Troutman”) (CA Ex. 2) in
sortable Excel spreadsheets because Plaintiffs say they want to “sort the data” and “perform complex searches.” Mot. at 6-8. These requests should be denied because: (1) the documents do not exist in sortable Excel format, (2) Excel format would not protect FEI’s privilege redactions that Plaintiffs cannot and do not challenge; (3) Excel format would not reflect the color-coding of the exhibits; and (4) FEI is not obligated to undertake the time, effort, and expense of creating new documents, to Plaintiffs’ specifications. It is not necessary for Plaintiffs’ response to the Fee Petition, and if they want to have such charts, they can create them themselves. JS Ex. 32, EG Ex. 1, and CA Ex. 2. These exhibits contain the time entries that were sent as part of invoices to FEI, and were produced to Plaintiffs in .pdf files, which is the same format in which they were sent to the client (or in some cases, the invoices were sent to the client in paper, in which case FEI provided a .pdf to Plaintiffs). The invoices do not, nor have they ever, existed in a sortable Excel format – a fact that FEI’s counsel represented to Plaintiffs. While the .pdf files are not sortable, however, they are word-searchable, as any Adobe document is. But as Plaintiffs themselves argue, there “is no commercially available computer program that can take …. a PDF of actual invoices, and generate a functioning spreadsheet containing the underlying data.” Mot. at 7. So Plaintiffs demand the creation of a document that does not exist, which is a requirement that is non-existent even within normal Rule 26 discovery on the merits of a case, let alone once the case has concluded and is in the final phase of assessing legal fees for frivolous and vexatious litigation.

The legal document goes on for several more pages, with the lawyers expressing increasing umbrage at the animal welfare groups’ request.

If the sheer volume of words and the level of outrage were any influence, a judge might be moved to side with Feld’s lawyer, John Simpson, from Norton Rose Fulbright. But the judge handling the fee allocation, Magistrate Judge John Facciola, isn’t just any judge. He’s one of three judges respectfully known as the e-discovery triumvirate—three men known far and wide for their expertise related to e-discovery.

And Judge Facciola was just a tad skeptical about Feld’s lawyers lamentations:

To that end, I will hold a one day evidentiary hearing, at which I expect knowledgeable representatives, such as billing database managers, from 1) Fulbright, 2) Covington, and 3) Troutman Sanders to be prepared to demonstrate the billing software used during their representation of FEI in the instant action. I also expect the representatives to be prepared to testify to the following issues:

1. Explain and demonstrate live (e.g. not in a PowerPoint presentation but in the actual database) how, within their particular software program(s), an individual timekeeper
makes an entry; what is recorded in that entry; how that entry is saved; who reviews that entry; how that entry is edited or altered for privileges or in an exercise of billing discretion; how that altered entry is saved; and finally, in what format the final bill is sent to the client.

2. Explain why that data saved within their particular software program(s) is NOT, through the use of commercially available software, capable of being converted into a sortable Excel-compatible delimited value spreadsheet format such as comma-separated value (CSV).

3. Explain why, if there exists data that was only saved in a .PDF format, it is NOT, through the use of commercially available software, capable of being converted into a sortable Excel-compatible delimited value spreadsheet format such as comma-separated value (CSV).

A noticeably subdued response indicated that the entries in Excel spreadsheet format would be forthcoming.

Categories
Books JavaScript

JavaScript, not a ‘real language’

Simon St. Laurent and I have been discussing that exciting upcoming conference, DHTMLConf.

Party like golden sparkles following the mouse cursor is cool again!

If you’re going to JSFest, how can you not go to DHTMLConf? This is a conference celebrating a time when all of the technologies we take so seriously now, were fun!

Simon is going, and is taking along a copy of his old Dynamic HTML book he managed to find. That made me dig around online for anything on my old books, including my Dynamic HTML book (1998), as well as my earlier (1996) JavaScript How-To.

I had to laugh when I saw the marketing blurb attached to the JavaScript How-To book at Amazon:

JavaScript is the ultimate in web eye-candy. It is not a real programming language such as Java, and it isn’t really essential for web site development, but it sure is a lot of fun for tinkerers.