Saturday, May 31, 2008

Putting Ruby into Words

Since I've started learning about Ruby and reading some of the community blogs and books, I have this sense. I am the first to admit that it's a poorly defined sense, but somewhere deep inside of me, something was wrong with the community. Thankfully, there are Debian Developers out there with the same feelings who have a better way with words. To lazy to read the link? No problem, here's the critical bit:
What's troubled me for some time about the post-Rails Ruby community is that it has a distinct bent away from its Free Software roots. I understand Matz actually used to use (not sure about today) Debian Unstable, and Ruby traditionally displayed its roots quite strongly, with a Perl heritage and a community consisting largely of hardcore *NIX people. With the advent of Rails, the move has been towards things like TextMate and OSX. Software like Gems (no relation to Gemstone) fits in fine with one of these systems, but not so well with modern Free Software systems, and I think it's symptomatic of the change. Given this propensity in the Ruby community, and given the numbers Gemstone is posting, I'd be surprised if lots of Rubyists don't move that way as soon as it's available.
I couldn't agree more! When I first learned the preferred editor for Rails development is an OS X only commercial app, I was literally speechless.

There are other examples of this divergence from the Free Software world. For example, Rails recent decision to abandon Trac, a reliable ticketing system used by a whole set of large FOSS projects. Rails now uses Lighthouse, itself a Rails application, that is decidedly closed source. If this sort of behavior continues, I think you'll see a spike in useful stuff coming out of commercial shops followed by a slow decline as the ecosystem that comprises free Ruby code begins to shrink and eventually die off. At which point you've got a free language whose community and ecosystem is more about commercial interests than free software.

Tuesday, May 27, 2008

Voters' Intent Vindicated

Time for a political posting, boys and girls... I actually have a second political post in the works, but I'm still waiting for all the facts to trickle in on that one.

Today I was excited to read in the Seattle Times that Washington State's top-two primary system is going to produce as many as a dozen single-party races in the general election. Huzzah! This is great news for many districts who would generally face a very boring general election. First, here's a little history.
  • Washington used to be the home of the Blanket Primary where voters could vote for whichever candidate they wanted in each race regardless of party alignment... so, they could vote in the Democratic Primary for governor, while voting in the Republican Primary for their local legislative races. Everyone was happy.
  • In the late 90s (I think), California adopts a similar system, which is then challenged in court by the state parties. The suit goes all the way to the U.S. Supreme Court which ruled the Blanket Primary is a violation of the 1st Amendment Right of association... in this case, the Party's right of association.
  • Washington State parties, realizing an opportunity to gain more control over their own nomination process, launch a similar lawsuit, which inevitably leads to the invalidation of Washington's long practiced Blanked Primary.
  • In 2004 Washington State voters adopt I-872, an initiative that institutes a "top-two" primary, where the top two vote getters in an essentially non-partisan primary advance to the general election... meaning in liberal Seattle, two Democrats could appear on the general election ballot, and in conservative Eastern Washington, two Republicans could appear on the general election ballot.
  • Washington State parties again sue, and win, in Federal District and the 9th Circuit Court of Appeals, blocking the rules from going into effect.
  • The U.S. Supreme Court overturns the lower court's ruling, reinstating the top-two system... this is something I had totally missed, as I guess I'm not as plugged into the Washington State political machine as I once was.
  • Now, in 2008, Washington State will have it's first top-two primary vote!
So why am I, an avowed Democrat, excited about the prospects of a top-two primary system?! First, lets address the sole remaining challenge to the top-two system, that voters have a right to vote for their chosen party in the general election. I don't see how anyone has the right to vote for someone specific... I didn't have the right to vote for Bill Clinton for President in 2000, nor did I have the right to vote for Barack Obama in 2004. I get to vote for who appears on the ballot as determined by fair and open rules. Anyone can run in the primary and try to get on the final ballot, so I don't see that as a valid criticism.

On the positive side, for the first time in a very long time, there will be actual general election challenges in what would otherwise be considered "safe" seats. Take, for example, Frank Chopp of the 43rd District (my old district). He's a good man, and Speaker of the House, and I was always happy to vote for him. But if he were to go off the deep end, there would be nothing I could do about it, because as the Speaker of the House he would dominate any primary challenge by local Democrats attempting to replace him. But, with a top-two system, come the general election a centrist democrat challenger has a legitimate chance against an entrenched force because conservatives, who would normally rally around a doomed Republican challenger, now have the opportunity to vote for the centrist Democrat in the general election. If a majority of the voters back Chopp, then clearly he didn't go off the deep end after all, but in the previous closed primary system, voters would have the dubious choice between an "off the deep end Democrat" and whatever crazy Republican had decided to mount a quixotic challenge in one of the bluest districts in the State.

Good luck to those candidates who find themselves in a one-party race come the general, I know it won't be easy... but it's for the best when you consider the alternative we see in places like Chicago. Don't get me wrong, parties are good, but not an absolute good.

Monday, May 26, 2008

NGINX... why?!

Anyone who has any relationship with Rails development has, at this point, heard of Nginx. The point of Nginx is to replace the Apache, a the definitive global webserver that Rails devs feel is simply too slow for their lightening fast development framework. It's not the first time the Rails community has snubbed Apache, nor will it the last. Those Rails devs are simply fickle folks.

So, fine, let the Rails devs frolic with their uberfast webserver... what about the rest of us mere mortals? Is Nginx a good route for you? Let me say here and now, the answer to that question is almost always a strong, resilient, and durable no. The reasons for the rejection are many, so let's start with the funny ones first and proceed to the more technical ones.

First, it behaves in inexplicable ways for different browsers. Check out this screen shot of Penny-Arcade loaded in Firefox (on the top) and Konquerer (on the bottom) at the same time.

Click to see full resolution
This happened with multiple reloads (cache disabled)... it always worked with Firefox, always "failed" with Konqueror. Oh, and that "Bad Gateway" message is a something you should get used to if you are thinking about deploying Nginx, because it's an all too common sight (more about that later on).

Second, the primary documentation is in Russian. Yes, Русский. From what I can gather, the primary developers are Russian, which is great... yay global open source development! But, a webserver is a complicated beast, hence the great forests that are clear cut each year to produce the necessary library of books on Apache and MS Information Server. Let me be clear that when I say primary, I do mean to imply there is secondary documentation. This is secondary documentation in the same way that warning labels will list sixteen life threatening things you could do written in English, followed by a single warning in Spanish that translates to "Danger."

Third, nginx does not support .htaccess files. Anyone who spends much time building custom websites knows the power of these magic little files that alters the way Apache treats a particular folder. Securing a folder with basic authentication is two line simple lines and a password file. Nginx takes a different approach, where different means stop bugging us to add .htaccess support. Instead, every directive, for every folder, regardless of it's scope, must go into a master configuration file. You can split the conf file into many smaller files, but they are all loaded when the server starts and given global effect. The common approach here is to split each hosted domain into a conf file... but that only helps keep things organized, because in the end of the day, every conf file has global implications.

Third and a half, nginx requires you to have apache support tools lying around to do stuff. This really isn't worth a whole new point, because everyone already has apache lying around... but lets say you wanted to create a password file for basic authentication. There is no nginx utility to generate those handy hash values, you have to use htpasswd, available from your apache distribution.

Fourth, Nginx doesn't actually do anything beyond serve static HTML and binary assets... which is to say, it doesn't run php or perl or any of the other P's that you might find in the LAMP stack. What it does is take requests and proxies them to other servers that do know how to execute that code. This is great in the Rails world, which long ago decided to have Rails be it's on little server that you submit requests to and get responses back. Even under Apache, the standard approach is to run Rails as a cluster of Mongrel servers that Apache talks to via a proxy connection. In the world of PHP and Perl, this approach is somewhat counter-intuitive. Apache's mod_php loads a php interpreter into Apache, allowing Apache to do all the heavy lifting for you... ditto with mod_perl. Even ruby has a mod_ruby (although, it's still premature). With nginx, everything is it's own standalone server.

So, what if your php project needs to know something about the webserver (like the root folder, or a basic auth username)? Well, you need to know that ahead of time and setup the proxy (which you defined in that global conf file I mentioned in #3) to pass those variables to your application server, otherwise it won't be around for you to use. Better yet, what if the proxy server is down? Nginx will great you with a handy "Bad Gateway" message and no further information. Good luck debugging the underlying server, since it really only knows how to talk in http requests... perhaps you can code your own debugger with LWP.

Finally, I am left with the question why? The ostensible reason is that it's faster and can therefore handle more requests. Even if we accept that as true (*grumble, grumble*), it only accomplishes that speed by passing the buck off to other servers. When you find a non-responsive site it's not because the static assets like images and HTML text are being served slowly... it's because the dynamic content generated by php/perl/python/ruby/whatever and the underly database from which the data is drawn cannot keep up. Nginx suffers that same failing... while requiring just as many resources because you now have to run so many different servers for each of the languages you want to code it.

If you are developing Rails, then by all means, enjoy this flavor of the month until some new exciting technology comes along and all the little Ruby lemmings go marching off in a new direction. For everyone else writing applications that are meant to stand the test of time, stay with Apache, it hasn't let us down yet.

Tuesday, May 13, 2008

Working with SPF Records

Today I finally sat down and learned enough about SPF records to actually get one deployed on a site I'm setting up. What's an SPF record, you're wondering? Perhaps you are too lazy to click one of my provided links. No problem, here is a description anyone can understand.

So email is more like a normal letter than you might expected--not surprising, since most systems are built modeled after existing systems--and includes things like a sending address and a return address. In the world of email, these are the "To:" header and the "From:" header, respectively. So, if I were to send you an email, the top of it would look something like:
To: "John Doe" <jdoe@example.com>
From: "Sean Kellogg" <skellogg@probonogeek.org>
Subject: A message
...
Thus, you would know the email was from me and treat it appropriately.

Trouble is, just like a return address on an envelope, there is no way to be certain the return address is accurate. I could stamp all of my envelopes with 1600 Pennsylvania Ave. and it would still get delivered (well... maybe, not sure how USPS would respond to that particular address). Point is, I could send the following email to you just as easily as the one above.
To: "John Doe" <jdoe@example.com>
From: "Bill Gates <bill.gates@microsoft.com>
Subject: A message
...
and the mail system would happily send it off your your mailbox. So, you've got one class of people who are trying to steal your identity. The other class of folks are those who are more interested in masking their own. This group is known as spammers. I would say all spam today is sent using forged headers, such that the From: header is set to either a non-existent email address or some poor unsuspecting by-standard.

Enter SPF records, which are a mechanism to validate the From headers. Basically, as a domain manager, you declare a set number of machines which are authorized to send mail on behalf of the domain. Then, mail service providers are responsible for checking that declaration to ensure that the originating server is one of the authorized senders. In the microsoft.com example, the mailhost would figure out all the valid servers that can send email on behalf of microsoft.com, realize my server is not one of them, and reject the email.

The only part left is to figure out how to write SPF records. Turns out it's not as hard as I expected, once you know how. I recommend the following wizard as a great starting point for defining your SPF records. All you need to do is specify the domain you are managing, and then list the various servers you want to authorize to send on your behalf.

Of course, that last part is easier said than done in some cases. The domain I was doing this for uses google apps for mail delivery, and lord only knows how many different servers are involved with the gmail setup. Thankfully, the SPF folks were prepared for that! There is an "include" directive as part of the SPF spec that allows you to say, "in addition to these settings, include the settings from this other SPF record." Then you just point at the gmail SPF record and your set.

I'll be honest though, I'm not certain about this whole SPF system. For example, I use the washingtonpost.com article sender to send stuff to friends and colleagues. Those emails are generated by washingtonpost.com servers and set the From: header to my address. Except, if the recipient host is set to enforce SPF records, it's going to get the email and say, woah, washingtonpost.com is not authorized to send for probonogeek.org! Not sure how this problem gets resolved, but there needs to be a way for address holders to authorize third party sites to send email on their behalf on a one-at-a-time basis. Any bight ideas out there?

Sunday, May 11, 2008

My Parts Per Million

My company did a website for a water testing device manufacture (I realize, not our usual political fare... not every client can be running for President). The client was so pleased with the new site they sent us a nice gift basket and a few of their products. Once quick USPS shipment and I am the proud owner of an HMDigital TDS-4.

So, I drew myself a glass of tap water (I don't filter my water, but Sarah does... I'll test hers once she gets a new filter) and gave it a go. My Santa Cruz tap water measures in at 216 TDS PPM. That's right, 216 Total Dissolved Solids parts per million. The back of the product says the EPA's Maximum Contaminant Levels of TDS for human consumption is 500 ppm. I'm not sure if 216 is good, great, acceptable, below average, I just know it's not the maximum contaminant level... huzzah?!

Great thing is, this device is portable, so I can start taking it to restaurants and provide reviews on water quality. It's a whole new world of eating out metrics. Oh sure, we could go out to there, but the TDS was a little high last time...

Tuesday, May 06, 2008

AJAX File Upload: The Cake is a Lie

For a long time I have been smitten with the idea of AJAX. By now everyone has experienced AJAX, even if they don't know it yet. AJAX powers web 2.0 sites like Flickr and Gmail. Allowing the user to interact with a website without a page-refresh is a strangely liberating technology... finally my applications have state! But the true holy grail of AJAX lies with the mysterious mechanism of file uploading. No doubt you've done this before, in a non-ajax fashion. While filling out some innocuous HTML form you are presented by a seemingly innocent file selection dialog box, perhaps selecting the latest photo of you kitty, to send along with the other information. This basic file uploading capability is made possible by creating a special HTML form, like so:
<form action='upload.cgi' method='post' enctype='multipart/form-data'>
...HTML form fields go here...
<input type='file' name='my_picture'>
...maybe some more HTML form fields go here...
<input type='submit' value='Down the Tubes!'>
</form>
That enctype business there tells your browser to send a special sort of HTTP request that can contain binary data. Generally requests just send text, but by enabling binary data transmission, we can send photos, mp3s, pdfs, anything within the size limit of the protocol. Trouble is, ajax requests are built such that you cannot change the enctype to multipart/form-data! Even with the cross-browser prowess of Prototype (my preferred javascript framework), there is just no way to change the nature of the HTTP request. It's either text or bust.

So, how do internet giants like Flickr and Facebook do it? What is the secret ingredient? A little googling reveals the answer as satisfactory, yet unsatisfying. Allow me to explain. To start, we need to redefine our objectives... since we can't "use AJAX to upload a file" our objective needs to be "make it appear like we are using AJAX to upload a file." When we say, "use AJAX" what we really mean is communicate with the server without a page reload. But we must remember the earlier lesson, you can only upload a file use a multipart/form-data form. Put another way, we have to call submit() on that form... there is no other path to the promised land.

HTML Forms are a tricky thing. Left to their own devices, when you call submit(), the entire page reloads. So that's out. But, we can set a target for the form, such that calling submit() causes the form to load in the target window. Setting target='_new' will create an entirely new window where the form will be processed. This is sort of cool, in that the underlying window remains unchanged. But we certainly don't want new windows popup up all the time. Yuck.

We could set the target to an embedded iframe in the main window itself. This is a lot closer, because there is no messy popup business. But now you've got this iframe reloading, which isn't exactly the seamless experience we are shooting for. The final piece to our puzzle then is to make the iframe hidden with style='display: none;' attribute.

So, now are form from above looks like this.
<form action='upload.cgi' method='post' target='empty_iframe' 
enctype='multipart/form-data'>
...HTML form fields go here...
<input type='file' name='my_picture'>
...maybe some more HTML form fields go here...
<input type='submit' value='Down the Tubes!'>
</form>
<iframe src='about:blank' name='empty_iframe'
style='display: none;'></iframe>
Now, when you hit the submit button the form sends the data, including the file, off to the server and the response comes back to the invisible iframe. To the user, nothing seems to have changed. You can add a little pre-process magic with javascript, like hiding the form, but what if you want to do post-process magic? With a traditional AJAX request you could get an XML payload back, or javascript if you use a framework like Prototype. Turns out you can do something similar with the iframe trick. You can call methods on the parent window from within the iframe by sending javascript inside of a <script> tag.

Your output to the iframe will looking something like this:
<script type='text/javascript'>
window.top.window.function();
</script>
You can call as many functions from within the script tags as you like, just remember that the iframe has no sense of the variables available from within the parent window, so that can complicate things. But a little forth-thought can go a long way to making the magic happen. You can also do some cool things like insert and remove the iframe on the fly so that it's only there during the form processing bit.

Now that you know how it works, it should be obvious how this is all a lie... a horrible, horrible lie. There isn't anything the least bit AJAX-Y about this. In fact, if you accept this as a valid method of asynchronous server communication, then you can pretty much never use the XMLHttpRequest object ever again... just communicate via hidden iframes! I realize that file uploading is a serious security concern (we don't want malicious coders to be able to upload files from your harddrive without your knowledge), and I know that AJAX presents it's own security concerns... but there has got to be a better way. I hope that future revisions to the XMLHttpRequest object provide a way to send multipart/form-data responses so we can ditch this awful, messy hack.