Categories
Technology Weblogging

The Survival Guide to LAMP: MySQL and Saving the Pig

In the last two weeks, two WordPress weblog sites have had their sites suspended or moved to interim servers because of performance issues. In both cases, the ISPs who hosted the sites (different companies) sent snapshots of the MySQL processes that caused the problems with the emails.

I worked with one of the sites offline, but the owner of the second site, Ampersand, posted the copy of the email he received at the WordPress support site. I grabbed a sample of it, as follows:

| 2073 | theenn2_amptoons | localhost | theenn2_MT | Query |
4 | Copying to tmp table | SELECT alas_posts.*,
MAX(comment_date) AS max_comment_date FROM alas_comments,
alas_posts WHERE alas |
| 2078 | theenn2_amptoons | localhost | theenn2_MT | Query |
4 | Copying to tmp table | SELECT alas_posts.*,
MAX(comment_date) AS max_comment_date FROM alas_comments,
alas_posts WHERE alas |

Plain as dirt what the problem is, eh?

Both of the weblogs are WordPress, but the SQL that generated the performance hit differed. With one, it was the latest comment plug-in; with the other, it was the SQL to support a category listing. In addition, this isn’t specific to WordPress–it could occur with the PHP-based version of Movable Type, ExpressionEngine, or any other MySQL based tools that have dynamic access.

Ostensibly, there is something wrong with these two sites. However, they’re only representative of what we’ll most likely see more of in the future. Our weblogging tools are becoming increasingly sophisticated, the data richer and more complex, the functionality modular and extensible by every person with a text editor and a yen–all packaged up in one reusable, standard, one-size-fits-all package. To make it all even more interesting, these applications are being installed on systems where you can get all you can eat for $5.88 a month, which means that a lot of sites need to be hosted on each server in order for the company to break even.

Things are bound to start breaking. But hey, it’s all fun, until you get that email that says your site is a pig, and it’s just been sent to the butcher.

Please! Won’t someone save the pig!

Now that we’ve established that the world is out to get our weblogs, let’s focus back on these problems, and in particular, the information that the ISPs sent.

In both cases, the messages contained a phrase “Copying to tmp table” and then what looks like a standard SQL query. If you look for this phrase in any of the WordPress code, you won’t find it–it’s a MySQL process that only shows up when you run SHOW PROCESSLIST within MySQL, or access the same from PHPMyAdmin.

Now, depending on your ISP’s tech person and how proficient they are with database optimization, you may be told that “Copying to tmp” is a ‘bad’ thing and shouldn’t happen, and therefore something is wrong and the code is crappy. Well, this isn’t true.

MySQL optimizes queries before they’re executed, to get the maximum amount of data in finished format with the minimum amount of processing and time. Part of the optimization can be to build a temporary table to hold an intermediate set of data before finishing the query. In addition, if the order the data (the sorting sequence) is on one column, but it’s grouped on another, or on a column in a different table, MySQL uses a temporary table.

There is a function, EXPLAIN, that provides information about how MySQL will execute a query. Developers use this in order to fine-tune the SQL so that the use of ‘expensive’ operations are avoided. If you have access to PHPMyAdmin, and run a query, the option to run EXPLAIN is provided with the results. Still, you can only tweak a query so much, and sometimes even the optimal SQL results in MySQL creating a temporary table.

When the use of a temporary table is always bad is when MySQL doesn’t have enough memory to hold all of the contents of the temporary table; the tool then needs to copy the contents to disk. Anytime MySQL has to go to the disk rather than memory, performance takes a hit. This shows up in the processes as:

“Copying to tmp table on disk”

However, what showed up with both of the weblogs that had problems is:

“Copying to tmp table”

An open question at MySQL asks whether this is the tmp table or memory, but from the impact to the servers, we can probably assume it’s to disk.

If it is, then the problem could be that tmp space allocated isn’t enough, and MySQL is having to write to the disk frequently. Or it could be, depending on your type of MySQL, that the maximum space allocated for memory is less than the maximum size allocated for the tmp space. Or other variations of settings at the database level.

Or it could be a badly written plugin, or too many plugins, or a cheap host that doesn’t allocate enough space for the sites hosted, or less than optimum SQL query, or even a trackback attack

Get used to sleeping with the pig

Ampersand’s site, Alas, a Blog gets between 1000 and 2000 visitors a day. It’s a popular site and gets lots of comments, and spam, of course, so several plug-ins were installed to help contain it. Evidentally, it was one of these plug-ins that started to have problems, because the query is not part of the WordPress code. In particular, if you look for max_comment_date in WordPress, you don’t find it.

However, with the second weblog, the query is within list_cats, a built-in WordPress function, and looks like the following:

SELECT cat_ID, cat_name, category_nicename, category_description, COUNT( wp_post2cat.post_id ) AS cat_count
FROM wp_categories
INNER JOIN wp_post2cat ON ( cat_ID = category_id )
INNER JOIN wp_posts ON ( ID = post_id )
WHERE post_status = ‘publish’
AND post_date_gmt < ‘2005-03-18 20:48:04’
AND cat_ID <>1
GROUP BY category_id
LIMIT 0 , 30

While Ampersand’s weblog problem was being discussed in the WordPress forum, a third person also had the same problem — most likely in a completely different area.

Bacon

With the growing number of ‘cheap’ hosting sites, and an increasing use of sophisticated PHP applications and MySQL queries, not to mention the extensibility of the tools, we’ll see more of this problem–especially as we demand more and more from our tools. Think about it: how many plugins are you using with your WordPress installation?

So what can you do? As a starter, you might want to look at that ‘good’ deal you get from your host. Not all inexpensive hosts cut costs and have less than optimal installations, but you’re less likely to have a host that will patiently work through problems with you if you’re only there for the 5.88 special.

You could also trim the fat by dropping plugins that you really don’t need, and make sure that whatever tools or applications or plugins you use are fully baked, i.e. have been through a healthy bug fix period.

If a problem does occur, make sure to file a report with the developers of whatever tool you’re using, providing all the information the host provides. The SQL used in the tool may not be optimum, and being informed of problems provides necessary feedback.

Hopefully this hasn’t been more info than you want or need (”too much sharing!”). At the least, if this situation comes up, you’re not going to be as intimidated when your host sends you an email that tells you …your site is a pig, fix it, or we’re kicking you off.

Categories
Technology Weblogging

Another update

I was asked to help a group weblog with its new look a week or so ago and it ended up being more work than I expected. However, I finally finished that work today — really, really finished– and have started my final descent for the first release of this product.

I found a couple of things I need to change. First of all, WordPress creates page titles the first time you save a post. However, I think the page title should match the post title up and until you publish a post. But, there’s a difference between a title being generated and one manually entered, so if go this approach, I’ll need to add a flag to the database.

However, I’m tired of numbered posts appearing because I forget to title a post when I save a draft, and then forget to change the post SLUG before publishing.

Some more clean up in Admin to pull all SQL separate from the processing.

And I’ve finally taken my first shot at integrating RDF API in as an extension to support the metadata effort. That one isn’t going as well as I would like — but it is optional, and if need be, I’ll leave it off the first release.

Oh, I do love my interface, and my comment management, but magic quotes got screwed up again.

All in all though, it’s movin’ along.

Categories
Technology

MT free servers

I’m in the middle of preparing my annual “Burningbird’s Bash of Etech” presentation. It will have the usual: laughs, tears, and passionate outrage and sad reflection in equal measure–not to mention, intense inspection of photographs with an accompanying “is the one with a shaved head, tattoos, bulgy chest, and eye liner a boy or a girl?” discussion. You know. The usual.

First, though, thanks to Rogi, I found out my host has created its first “Movable Type” free server, named Circe. (Hee, good name.) The thread to discuss this can be found here.

I’ve been watching the processes the last week or so, on the server and in my logs, and what I’m seeing is a lot of hits for trackbacks. And I mean a lot. Many are to Movable Type weblogs but the WordPress weblogs are getting their share, too. Now, because of whatever spam protection you have, these may not be showing up in your pages; but the pings are creating a significant strain on the database, which in turn strains the CPU and the disk I/O.

For instance, if I had trackback enabled, even with spam protection in place, each trackback requestion would still generate, at a minimum, four requests to the database and over thirty function calls. This isn’t that big a deal–until you multiply that several times a minute, and across many, many weblogs. Then repeat this daily, once in the morning and once in the afternoon, because that’s how often it’s happening.

Of course, since I’ve pulled every aspect of trackback from Wordform, the most that happens is that the web server returns a “404″ whenever one of my pages is accessed with “/trackback” appended.

I have no doubts that if this much activity is happening with WordPress, which is relatively stripped down as these types of applications go, much more is happening with the increasingly complex comment spam management in Movable Type. In addition, as Annette details in the HM thread — CGI applications such as MT spawn a new thread for each trackback request. I can say that the most limited resource on a server are these threads of execution.

I most likely will ask that my site be switched to Circe, as soon as I can. And to be honest, I’m feeling pretty damn smug for deciding to yank trackback out of Wordform, right about now. For you MT folks — if you’re not running the PHP version by now, you should be. And you also need to start pressing Six Apart into providing a PHP-based comment and trackback management system.

Or switch to new software.

Or continue dealing with problems.

Your choice.

Categories
Diversity Technology

Number 9 Number 9 Number 9

Recovered from the Wayback Machine.

I wanted to welcome you all to this, the third annual Burningbird Bash of ETech! This year’s show promises to be the best ever, especially considering that O’Reilly has, after all these years, finally broken the 10% rule for percentage of presenters that are female!

Yes, indeedy, this year’s female participation is a whopping nine percent (9% or 0.09)! Nine percent! Why, I bet there’s more Windows users in the audience than women presenters!

I want to take a moment now to send out congratulations to Tim and the gang and say, “Job well done! You finally found the solution to the 10% problem!”

Okay, so this introduction to what has become my annual report on the lack of women at O’Reilly’s Emerging Technology conference is a little over the top. Every year, though, I and others say the same thing and nothing changes except this year things are worse instead of better. Once this fact hit me in the face, like a too-dead squid thrown by a fish monger, I had to scramble to find a bigger soap box and louder sound system. I even thought about hiring a troop of clowns to entertain the kiddies while we talk–but my heart just wasn’t in it because all I could see is that 9% rather than the 11% or, gosh, even 15% I had hoped to see.

So much of my discussion lately, though, has been on diversity that I almost decided to forgo this writing. However, where much of the previous discussion has been about diversity in weblogging, this is about diversity in technology–specifically, the lack of female representation at many of the technology events. Still, too much of anything is like eating a cake that’s 90% frosting: no matter how good it is, you’re going to get sick of it before you’re through.

I finally decided to go ahead anyway on this one writing, primarily because there are a few things different about the discussion this year. Now, I don’t know if the differences add to the discussion or to the noise, but since I like discussion and noise, here goes.

I submittaled a paper with a female origination

Whatever the representation of women at ETech, I can say I did my part. Unlike the conferences in the past, I submitted a proposal to ETech this year. Previously when I pointed out the lack of women presenters at the conference, one or more people would come back at me with, “Well, did you submit a proposal.” Now I can say, “Yes, I did”. It wasn’t accepted, but what’s more important is that I did try, I tried to be part of the solution. So, neener, neener, eat your wiener.

I found out in a post at David Weinberger’s that only 5% of the proposals were submitted by women. If we compare percentages, then, a larger percentage of women’s proposals were accepted than men. Now, how many men were invited to speak without proposals, invited to submit proposals, and leaving aside the fact that some of the committee members that decided on the proposals also spoke at the conference–whatever led to the event, a drop in women presenters this year is not a positive direction.

Danah Boyd also submitted a proposal this year, which, like mine, was also rejected. She wrote:

I was actually part of the 5% who applied to etech, only my application was rejected because it wasn’t emerging.

I don’t know if my submittal/submission/proposal was ‘emerging’ or not. It talked about semantic web and achieving critical mass with schemas, so there were all sorts of geeky terms present. But there was also poetry and words to the effect about bringing the semantic web to the butcher, the baker, and the candlestick maker. On reflection, now, this could have been a mistake.

I have a feeling, though, that the proposal lost out as soon as selection committee saw the title: I, Poet. Compared to “How to geek out your car”, poetry and semantics probably seemed less than interesting. But, as Ms. Boyd points out, there is interesting and then there’s interesting.

After a conversation last night, i wanted to clarify a few things. In conferences like SXSW and Etech, there’s no clear delineation of what is an acceptable topic or not (as opposed to say CHI). I mean – what is interactive or emerging? Additionally, the review panel consists of a very small number of people (all of who are pretty much guaranteed a slot). At CHI, there are hundreds and hundreds of blind reviewers. At SXSW and Etech, the metric is “interesting” – this is where we get ourselves into trouble. Interesting to whom? To the un-diverse review committee?

It wasn’t until I saw in comment in David Weinberger’s post on this issue that I knew who the committee was: Andrew “Bunnie” Huang, Clay Shirky, Cory Doctorow, Brian Jepson, and Marc Hedlund. No, “I, Poet”, a discussion on semantics and poetry wouldn’t have rung any bells here.

But would it have done so if the planning and selection committee were a little more diverse? Say, having one woman on the committee? Or perhaps some faces that weren’t so familiar? Would Danah Boyd’s proposal had been accepted if the committee were more diverse or had stronger ties to the social software industry? Hard to say, and that’s part of the problem, and the concern.

What makes this issue of diversity an even more pertinent one is that across the country, an event was held that did result in a much greater diversity than ETech: SxSW.

It burns so good

In David’s post, an interesting discussion about the lack of women at Etech arose in his comments, and are worth a read. Several were written by Liz Lawley, who also wrote about choosing to attend SxSW instead of Etech at Many-to-Many.

That’s another factor that sets this year’s discussion apart from previous years: the overlap between Etech and SxSW. More importantly, where Etech achieved only 9% participation, SxSW achieved significantly higher numbers.

Nancy White, who did such a great job liveblogging the sessions she attended, also tried to keep a head count of women in each. From what she and others have written, women made up anywhere from 25 to 50% of the participants at SxSW. That’s double to almost five times the numbers of ETech.

This level of participation at SxSW is important in relation to that of ETech for a couple of reasons. First, it shows that women are interested in participating in conferences related to their profession. Secondly, it also shows they’re willing to take the time and cover the expense.

One response that’s been raised time and again about the lack of women at O’Reilly conferences is that woman are less interested in attending conferences, or lack the financial means and/or time to do so. With SxSW and ETech happening virtually at the same time, we can compare the two, side by side, and see that this isn’t necessarily true. Women have the interest, and are willing to commit the time and resources following through on that interest. In fact, one reason there could have been a drop in women’s participation at ETech is because so many chose SxSW, instead. And the question then becomes: why?

Looking at both conferences more closely there some major logistical differences between the two. One is cost–SxSW’s fee is peanuts compared to O’Reilly’s normally quite expensive conference fees. The second is location–the central part of the country, even if it is south-central (well, if we must, south-by-southwest), is more accessible to more people than the California coast; cheaper to visit, too.

A third difference is when the conference was held. SxSW was over a weekend, while ETech was held during the work week. For women, who are usually the prime caregivers for children, it might be easier to arrange care on a weekend than a weekday.

However, I think the major difference was the players. Both conferences had names, though SxSW had more human-interaction and design names than ETech, which focuses more on ‘to the metal’ geeks. But there is more of an intimacy surrounding the players at SxSW than there is at ETech. Frankly, when I looked through the lists of Big Names at both get togethers, the SxSW Names were all people who struck me as being more approachable.

In fact, I think the same could be said of the entire SxSW conference — it encouraged participation, even from the audience. Lively discussions in the hallway aside, O’Reilly’s ETech conference is fairly passive. People sit in rows and listen to a speaker. People go to birds-of-a-feather sessions for interactivity, but these are an aside to the whole experience. Even the entertainment has an orchestrated aspect to the whole thing. Bluntly, ETech is very formal, very superior alpha-geek, somewhat passive, and even rather intimidating.

SxSW, on the other hand, is formed of beloved chaos, tenderly nutured in a solar vat consisting of an odd mix of creative anarchism and social responsibility. I don’t know whether one appeals more to all women more than the other, but I know that if I had my choice, after reading the reports from both conferences, I’d rather go to SxSW than ETech. And I consider myself a ‘to the metal’ geek.

That’s a key point, too, in this discussion. If there are many conferences and people can choose between them, why should we care if conferences such as ETech have only about 10% attendance, as compared to ones like SxSW? After all, these events are open, and nothing is stopping people from participating.

Shake that networking booty

We are living in a time when outsourcing IT companies are charging 3.00 US an hour for labor, and there are fewer and fewer IT jobs every year. It’s becoming tough to be a tech. No, change that: it is tough to be a tech.

One way to keep ahead in the tech industry now is through networking and contacts, and attending conferences is a big part of this. If I were to coldly and dispassionately sit down and choose between SxSW and ETech from this perspective, I would pick ETech. After all, it had folks from Yahoo, Google, IBM, Amazon, Nokia, and movers and shakers from most major IT companies. It’s also a closer match for my skills and experience.

So, then you’re saying: Okay, so what’s stopping you and other women from attending?

One major reason is no one wants to be the freak in the crowd; or worse, invisible. If you’ve never been the only woman in a room full of men (or the only black in a room full of white people), you may not understand how intimidating and uncomfortable you can be made to feel. Especially in technology, where women’s visibility is usually compromised anyway.

It’s hard to network if you just aren’t seen There was another comment in David’s weblog post that I think highlights this. In it, the commenter, Jo, wrote:

I spoke at etech the year previously. Meeting clay shirky after my talk, he made a couple of comments to the effect that “your guy” should look into something, “your guy” might find something interesting. I was too quietly stunned and post-talk-drunk to frame a better reply than “er, i do write my own software, you know.” As a female with a gender-neutral name, i am often assumed to be male by conference organisers, people online, etc. I’m quite used to being the only woman at BOFs, at user group meetings, etc. It’s hard to even notice it any more; it’s just the way i grew up as a geek. I always assumed it would slowly change. But if that’s the case, it’s not reflected on the public platform.

Every year when i see the Etech highlighted speakers’ list with speaker photos, i scroll down disconsolately for the inevitable token non-male face. The 9% don’t get much of a look-in.

O’Reilly’s organisers *are* in a position to “counteract the prevailing cultural forces” in Fred Brooks’ wonderful phrase. How much backlash, of a New-Labour-Women-Only-Parliamentary-Shortlist flavour, would that provoke from those who had been cut out by a defacto quota?

From the recent discussion on women in weblogging, we can answer Jo’s last question about quotas, openness, and what is the chance of folks being provoked into a backlash with a simple answer: a lot. When the status quo suits one group over another, we can’t expect to the former to willingly give it up in the interests of fairness.

People are resistant to change. People are even more resistant to change unless they see the problem impacting on them personally. People are especially resistant to change when the change means they have to give something up. Pigs refusing to leave a particularly fine pool of mud comes to mind.

If ETech is a success for O’Reilly, what is the impetus for the company to change? If the type of sessions and the opportunity to network is a success for the majority of its participants–and the majority of technologists in this country are still white, male, as reflected at ETech — what is the impetus to change? If even among women, some don’t see this situation as an issue, or only do so from a personal perspective, where is the force that could generate the impetus to change?

Why change? Because Etech will be better.

Where have all the semanticians gone, long time passing…

One thing I noticed about both SxSW and ETech is both conferences featured much on XHTML attributes and tags as the wave of the future in the semantic web, but very little representation from what has been the semantic web community for many years. Some–many?– might say this was a positive aspect to both conferences, but is it really so?

In the long run, I don’t think so. I’ve noticed that more of the activity and work related to the semantic web, outside of folksonomies that is, is happening in Europe or Canada rather than the United States. Is it that our country is so caught up in gizmos and gadgets and mini-macs and iPods and cheap and easy solutions and meme of the minutes that we no longer want to take the time and energy to understand the more in-depth and complex, and perhaps less flamboyant, aspects of technology? Are we becoming a nation of fad technologists?

In addition to technology diving into the shallow end of the pool, if you read down the list of presenters, you see, repeated again and again, the same company names: Yahoo, Google, Microsoft, Apple, IBM, Amazon, BBS, Wired, Nokia — all well known technology or media companies, many of whom are indugling in some fascinating advances in technology. Yet there seems to be fewer and fewer representations from smaller companies and independents–to the point where I’m surprised that O’Reilly is even bothering with submissions outside of a picked group of organizations.

Technology as it used in specific well-known companies can be interesting, but the problem with it is that much of the time what each company is doing is unique to that company; and the information can’t be extrapolated to other uses. Not everyone has the same system requirements as Google. Not everyone needs to own a dam.

And frankly, who is to say that how each company uses the technology is the best use of that technology? If you have enormous resources and funding, you can afford to spread out. Smaller companies may need to come up with innovative ways of doing the same thing for less. Yet if an employee of Google is presenting on web services and Jane Blow in off the street doing the same–who is going to be picked? This isn’t always in the best interests of the audience. Thanks to the Google’s fame, we know how it uses web services; I kind of want to hear what Jane Blow has to say. Maybe she has a new twist, and a new idea.

Not just the same companies keep showing up — the same people, too. I read in a weblog from one attendee (and my apologies for no permalink; I had read several and forgot where I saw this one), that when he arrived at the conference he looked around and noticed that it didn’t look all that much different than the year before, or at other events he’d attended in the last year. Same faces, same folks, same groups, and similar topics–the only change being the ‘it’ topic for the year, such as tags and gizmos this year (thanks to folksonomies and O’Reilly’s new gizmo magazine, Make).

Okay fine. Big companies, less depth, familiar faces. But what does all of this have to do with lack of women presenting?

Well, it all comes back to the lack of diversity.

O’Reilly tends to pick from a non-diverse pool of people when planning ETech, and this is reflected not only in the lack of female participation, but also in the fact that the conference is beginning to resemble more of a annual meeting of a club than a conference celebrating innovation. The sessions might be interesting or even entertaining, but they don’t necessarily challenge the attendee–how can they? So many of the attendees are no different than the people presenting.

This lack of challenge, and the resulting epiphanies and excitement that can result from same, shows in so many of the weblog entries about the conference. The sessions were interesting, the people enjoyed them, but no one came away jumping up and down with enthusiasm. Well, except for the Ruby on Rails photo. (For the best take on folksonomies at ETech, also see Sam’s wonderfully ironic posting.)

This, then, forms the impetus for O’Reilly to look more closely at how it manages its conferences, and to begin to diversify the community that both presents and attends: not just because it’s the ‘right’ thing to do; not just because it’s the ‘fair’ thing to do; but because it’s the smart thing to do.

Not unless O’Reilly wants ETech 2007 to look like Etech 2006 to look like Etech 2005 to look like…

number 9, number 9, number 9, …

Categories
Programming Languages

PHP and the corporate seal of approval

I did want to point something out before I forgot, but IBM is putting its corporate blessing on PHP with, among other things, a new PHP Weblog site. I was reminded of it today with a link in Dave Winer’s weblog, which disappeared.

(Hey, did I happen to mention Mark Pilgrim is back?)

Anyway, folks on the coast may not realize how significant this move on IBM’s part is. Companies in the mid-west and away from the coast tend to be conservative with technology, and usually want it vetted by one of four companies: IBM, Microsoft, Oracle, or Sun. Because of this, open source technologies and languages such as PHP rarely see the light of day in the development projects here. This, in turn, means people like me can search on jobs using PHP in Monster and not find one–one–in two months of looking.

It’s going to take a while, but I believe we’ll start to see a crack in the .NET/J2EE domination here abouts. This is good for the developers, and also good for the companies.

It’s funny but as time has gone by, IBM has become more like the Microsoft of decades past; while Microsoft, with the appointment of Ray Ozzie as CTO, has become the IBM of decades past. I guess we’ll wait and see if the Microsoft employees have to start wearing blue suits.