Sherlock Holmes and the case of the broken CGI Script
This month, I'd like to address the general question, "I downloaded a script and it won't work." that I get about 5 or 6 times every day. However, the way that I will answer the question, is by telling you a story. I am going to tell you a story about me. A mystery.

It is a story about how I debug scripts on the zillions of different, intractable, curmudgeonous systems that exist on the web when I have tried all the common debugging practices.

It is a story about how I find the culprit bug when I am not exactly sure about how the operating system works, which web browsers are trying to run the script, what funky directives the local system administrator has applied to the server, or any other number of big, hairy, ugly question marks that stand between me and a programming-free weekend.

It is a mystery which, as most mysteries do, begins with Sir Arthur Conan Doyle.

Let's see what Doyle has to say about debugging.

Well, this may not seem like a discussion of software debugging, but it really is. What Doyle is trying to say is that all software and hardware bugs WANT to be caught. In fact they want to be caught so badly, that they carefully lay clues for you as to their whereabouts. Perhaps Doyle meant to say something like the following:

As a debugger, it's your job to listen to those clues, put them together into a theory which can be tested, and test the theory against the software package. In every case, you will bat yourself on the brow and say to yourself, "Doh! Of course, how simple!". Because, when all is said and done, computers are pretty simple creatures and when they break there are pretty simple reasons. The Virtue of Nothingness Benjamin Hoff once revealed this interesting little story about Taoism and I suppose I might pass it along to you.

Benjamin, added, "An empty sort of mind is valuable for finding Perls and Tails and things because it can see what's in front of it. An Overstuffed mind is unable to." (Well he actually spelled it like "Pearls", but we know what he meant!)

What does this have to do with CGI debugging you ask? Well, it has everything to do with CGI debugging. CGI debugging is not a skill. It is not a thing you learn in school. It is not something that is particularly aided by FAQS, or books, or system administrators, or discussion boards.

CGI debugging is a state of mind.

If I have spent more than an hour on a problem I stop. Very few problems necessitate more than an hour to solve, so if I've been sitting there for an hour I can be sure that it is most likely that the problem I am having is not the bug, but me.

At this point, I turn off the monitor, light a candle and some incense, (which I always have a store of in my middle desk drawer) turn on some music, and lie on the floor and try to isolate instruments in the songs into their separate tracks.

Note: For debugging I recommend "Technotronic: The Best of Trance", anything by the Cocteau Twins or Enya, "Wish You Were Here" from Pink Floyd or "Kiss" from the Cure.)

I might even go out and walk around the block if it is warm and sunny...there happens to be a good climbing tree outside my work.

About 20 minutes later I should be ready to get back to work having achieved several crucial things.

Newtonian Methodology and the Nitty Gritty of Debugging Upon returning from the void, the first thing I do is to set aside the program and start by coding something really, really small.

You see, debugging is a Newtonian exersise. And in a Newtonian universe the best thing you can do is break everything up into the smallest pieces you can because the whole is going to be a summation of the parts and when you find the faulty part, you find the problem. (1) Starting with Hello World So, you should start by creating the most minimal CGI program you can so that you can determine what special traits your local executing environment has that might cause a more complex program to fall apart.

Try this little script which you might call hello_world.cgi....

#!/usr/local/bin/perl
print "Content-type: text/html\n\n";
print "Hello World";

Okay, now set the permission for this little script so that it is readable and executable by the web server. Typically, you will use the following command on a UNIX server...

chmod 755 hello_world.cgi (2)

Next, run the "hello world" script from your browser. You will probably need to access it with a URL something like the following:

http://www.yourdomain.com/cgi-bin/hello_world.cgi

Does it work? If not...

At this level, you can be pretty sure that if the "hello world" script did not run, it was because of one of the three reasons above. After all, there is not much that can be wrong with three lines of code! That is the reason that we are starting so small. We can get our teeth around this!

So I will assume that you've gotten this far and we can go on. Figuring out Where you are The next thing I do is to try to get my little script to talk to library files since most likely, I will be using cgi-lib.pl to interpret incoming form data. So one way or another, we will need our CGI script to be able to talk to other files. To do that, I will need to grab cgi-lib.pl. [typically for Perl 4--ed.] from the web and place it in a sub-directory called Library. The subdirectory should be readable and executable by the web server and cgi-lib.pl should be readable by the web server.

So now I have something like this (open your browser window wide so this does not wrap)



cgi-bin Directory (readable and executable by the web server)
..|___hello_world.cgi (readable and executable by the web server)
..|___Library Directory (readable and executable by the web server)
........|___cgi-lib.pl (readable by the web server))

Now, let's use the "require" command to pull cgi-lib.pl into our "hello_world.cgi" program.

#!/usr/local/bin/perl
print "Content-type: text/html\n\n";
print "Hello World";
require ("./Library/cgi-lib.pl");

Getting pretty complex here pretty quickly eh? That is okay because we know for sure that cgi-lib.pl does not have any bugs in it since it is being used everywhere and has been for years now. So that means all we really did was:

So what could go wrong with that?

What the Script Sees Once I have successfully loaded cgi-lib.pl, I use it to make sure that my CGI script is actually getting the information from the browser that it is supposed to get from the browser.

So I will add the usual lines to my little CGI script.


#!/usr/local/bin/perl
print "Content-type: text/html\n\n";
print "Hello World<P>";


require ("./Library/cgi-lib.pl");
&ReadParse(*form_data);


foreach $incoming_form_variable (keys(%form_data))
{
print "$incoming_form_variable = $form_data{$incoming_form_variable}\n<BR>";
}

So what did I add to my script. I simply added a small foreach loop which goes through each of the incoming form variables stored in the %form_data associative array created in the ReadParse subroutine of cgi-lib.pl and printed out the name of the form variable and the value of the form variable.

However, there is one piece missing. I need to actually send some form data to my script. Of course, I don't actually have a form frontend to my script, so I will pass the script form data via a URL encoded string like:

http://www.yourdomain.com/cgi-bin/test.cgi?fname=selena&lname=sol

When I do so, the result should look something like the following:

This little foreach loop is an invaluable tool when you want to check to see what the script thinks its variables are. While debugging, you can always dump in this foreach loop to zip through the current variables and check to see what they are. It may be that you have 1) accidentally overwritten a variable, 2) the script has lost some values for variables you thought it had, or 3) the script never received variables that it needs.

Oh, and one more thing, you can also get a listing of the current ENVIRONMENT variables by adding the following foreach loop:

foreach $environment_variable (keys %ENV)
{
print "$environment_variable = $ENV{$environment_variable}<BR>";
}
Advanced Error Hunting So what happens if I introduce logical errors to the script while I am debugging? Worse yet, what if there are 1000 lines of code and I am not sure where the error is because I was coding poorly and jumping back and forth through sections without constantly checking myself to see what I did?

Well, this is actually pretty common and there are quite a few ways to go about finding the error depending on your taste.

Command Line Tactics
The first and most common way to check to see where a script is failing is to run it from the command line because the command line will give you much more information than the web browser when you are trying to debug.

Perl makes it very easy for you to check the syntax of your CGI script by offering you a special "debug" mode. In order to check the syntax of your CGI script, simply type the following from the command line:

perl -c scriptname.cgi

The debug mode actually checks the syntax of your CGI script without actually executing the code. A listing of all of the debugging commands can be listed by typing

perl -h scriptname.cgi

Of course, if executing the code has no affects other than outputting, you can also just try running the script itself without debugging using the following command:

perl scriptname.cgi
Perl will attempt to execute your CGI script and will output errors if there are any. A typical error message that you might see looks like:

Here is another one you'll see alot:

As you can see, Perl sends back a good deal of useful information about your problem. Typically, it will do its best to analyze what the problem was as well as give you a line number so that you can look into the problem yourself.

Well, as you may have guessed, there are quite a few commonly made syntax errors that will plague your command line executions. In fact, in "Teach Yourself CGI Programming in Perl", Eric Herrmann lists the following common suspects:

Symbol Name Description
; semicolon Each command in your Perl program must end with a semicolon. Unfortunately, the error message you get may not give you any hints. You'll usually get something to the effect of, "syntax error at 1.pl line 6, near print". It is up to you to track down the error.
{} braces Braces are used to delimit sections of the program (such as if, while or for loops). the most common problem is leaving off a closing brace to correspond with an opening brace.
() parentheses Every now and then, you will forget a parentheses in an if statement, just beware.
"" Quotation Marks Perl allows quoted strings to include multiple lines. This means that of you leave off a closing quote the rest of your entire program might be considered part of the quoted string. Also, beware of having quotes inside of quotes such as print "She said "hello""; How can perl know which quote is meant to be printed and which quote is meant to end the string to be printed?
@ At Sign The @ character is used to name list arrays in Perl. Thus, if you are going to print an @ character such as when you print an email address, you must make sure to "Escape" it using a backslash such as print "selena\@eff.org".

Log File Analysis
Assuming that your system administrator has given you access to this file, another useful debugging tool is the error log of the web server you are using. This text file lists all of the errors which have occurred while the web server has been processing requests from the web. Each time your CGI script produces an error, the web server adds a log entry.

If your sys admin does not allow access to the log file, you may ask her to email you a version of the log file with only errors related to your work. She can create such a version by using the GREP command and it should not be too difficult.

On the other hand, if you do have access to the log file, it can usually be found in the "logs" directory under the main web server root.

For example on NCSA serves, it can be found at

/usr/local/bin/httpd/logs

Dressing up as a Web Browser
In "Teach Yourself CGI Programming in Perl", Eric Herrmann outlines a method which you can use to test your CGI scripts using TELNET. I recommend reading the section if you have the chance. In the meantime, here is a quick explanation...

If you are able to use the TELNET program to contact your web server, you can view the output of your CGI script by pretending to be a web browser. This makes it easy to see "exactly" what is being sent to the web browser.

The first step is to contact the web server using the telnet command:

telnet www.yourdomain.com:80
Typically, web servers are located on port 80 of your server hardware. Thus, for most of you, you need only contact port 80 on the server.

Once you have established a connection with the HTTP server, you formulate a GET request using the following syntax:

GET /cgi-bin/testscript.cgi HTTP/1.0

This command tells the server to send you the output of the requested document, which in this case is a CGI script.

After your GET request, the web server will execute your CGI script and send back the results which will look something like the following:

eff.org:~$telnet www.mydomain.com:80
trying 190.2.3.120 ...
Connected to www.mydomain.com.
Escape character is '^}'.
GET /cgi-bin/test.cgi HTTP/1.0
<HTML>Hello World</HTML>

Connection closed by foreign host.

Using print "Content-type: text/html\n\ntest";exit;
However, I will note a third method that you can use to find out where a logical error is when it is not a "syntax error" but an HTTP error. An http error causes the favorite, "404 document contains no data" error which the command line and error logs won't necessarily help with. The script will run fine from the command line, but it won't run from the web.

Look at the hello world script with a couple of minor changes


#!/usr/local/bin/perl
require ("./Library/cgi-lib.pl");
&ReadParse(*form_data);


foreach $incoming_form_variable (keys(%form_data))
{
print "$incoming_form_variable = $form_data{$incoming_form_variable}\n<BR>";
}


print "Content-type: text/html\n\n";
print "Hello World<P>";

When you run this script, you will get a "404 document contains no data" error because you have sent text to the browser (the variable names and values) BEFORE you have sent the magic HTTP header line "Content-type: text/html\n\n". But how would you find out that this is a problem.

The solution is to use the "print "Content-type: text/html\n\ntest";exit;" line to walk through your routine one step at a time to discover at which point the problem begins. let's try it.


#!/usr/local/bin/perl
print "Content-type: text/html\n\ntest";exit;
require ("./Library/cgi-lib.pl");
&ReadParse(*form_data);


foreach $incoming_form_variable (keys(%form_data))
{
print "$incoming_form_variable = $form_data{$incoming_form_variable}\n<BR>";
}


print "Content-type: text/html\n\n";
print "Hello World<P>";

That is going to work just fine. The web browser will read "test" and we will know that the error is not being caused by the first line of the script. (Notice that because we use the "exit" function, Perl will stop executing the script so we will not get any of the other info.)

Next, let's move the testing line down...


#!/usr/local/bin/perl
require ("./Library/cgi-lib.pl");
&ReadParse(*form_data);
print "Content-type: text/html\n\ntest";exit;


foreach $incoming_form_variable (keys(%form_data))
{
print "$incoming_form_variable = $form_data{$incoming_form_variable}\n<BR>";
}


print "Content-type: text/html\n\n";
print "Hello World<P>";

That is going to work just fine too! I'm getting bold there jumping two lines at a time, but when you actually use this method, you can feel free to jump entire routines if you are sure they are not the cause of the bug. Just don't jump too many at once. Okay, now let's dump the line into the foreach loop.


#!/usr/local/bin/perl
require ("./Library/cgi-lib.pl");
&ReadParse(*form_data);


foreach $incoming_form_variable (keys(%form_data))
{
print "Content-type: text/html\n\ntest";exit;
print "$incoming_form_variable = $form_data{$incoming_form_variable}\n<BR>";
}


print "Content-type: text/html\n\n";
print "Hello World<P>";

Okay, That works too (remember to pass some variables as URL encoded data as shown above).

Finally, we move the line to the end of the foreach loop and we see that we get the 404 document contains no data problem!


#!/usr/local/bin/perl
require ("./Library/cgi-lib.pl");
&ReadParse(*form_data);


foreach $incoming_form_variable (keys(%form_data))
{
print "$incoming_form_variable = $form_data{$incoming_form_variable}\n<BR>";
print "Content-type: text/html\n\ntest";exit;
}


print "Content-type: text/html\n\n";
print "Hello World<P>";

That is it, we just discovered where the bug was. We can bonk ourselves on the head and say "Of course, the HTTP header MUST be the first thing printed to the browser! In Conclusion Well, that's all folks. If you are comfortable with the debugging tools outlined here and you are ready to get your mindset in gear, then you should have no worries. Think of CGI debugging as fun. In fact, to get practice, try going to a CGI discussion forum and helping people solve their problems. You will not only hone your own skills, but make the CGI community a happier group to be a part of. Good luck. Footnotes