eXtropia: the open web technology company
Technology | Support | Tutorials | Development | About Us | Users | Contact Us
 ::   Tutorials
 ::   Presentations
Perl & CGI tutorials
 ::   Intro to Perl/CGI and HTML Forms
 ::   Intro to Windows Perl
 ::   Intro to Perl 5
 ::   Intro to Perl
 ::   Intro to Perl Taint mode
 ::   Sherlock Holmes and the Case of the Broken CGI Script
 ::   Writing COM Components in Perl

Java tutorials
 ::   Intro to Java
 ::   Cross Browser Java

Misc technical tutorials
 ::   Intro to The Web Application Development Environment
 ::   Introduction to XML
 ::   Intro to Web Design
 ::   Intro to Web Security
 ::   Databases for Web Developers
 ::   UNIX for Web Developers
 ::   Intro to Adobe Photoshop
 ::   Web Programming 101
 ::   Introduction to Microsoft DNA

Misc non-technical tutorials
 ::   Misc Technopreneurship Docs
 ::   What is a Webmaster?
 ::   What is the open source business model?
 ::   Technical writing
 ::   Small and mid-sized businesses on the Web

Offsite tutorials
 ::   ISAPI Perl Primer
 ::   Serving up web server basics
 ::   Introduction to Java (Parts 1 and 2) in Slovak


Introduction to UNIX for Web Developers
The File System  
Okay yesterday we learned quite a bit of theoretical stuff about the operating system, shells, and the theory behind the file system. Today we are going to start getting our hands dirty and apply some of this theory to real world commands that you can use.

Specifically, we will learn how to move around the file system, create, modify and delete files and directories and understand how permissions work.

What exactly is a file system? Well, perhaps the first question we should ask is what is a file.

By now, most of us will be familiar with the concept of files. In fact, we use them every day. We save them, delete them, copy them, move them, etc. It is almost as if files were real things on our computer.

Well in actuality a file is only an abstract concept (a data structure). It is a metaphor used to describe a theoretical grouping of dispersed bits within a computer's memory.

In actuality, files do not exist

From the perspective of the computer, there are only flittering, ethereal bits floating around in its memory. True, these bits do have references to each other, but there is no actual "file thingy" sitting in the computer somewhere.

A file is actually a pretend creature that we have imagined to help us deal with what is otherwise a very ethereal thing.

Of course, never underestimate the power of make believe. It is extremely useful to use the file metaphor because it helps people do work. The file metaphor helps us organize things and feel comfortable in the worldless world of bits and bytes.

Files within the UNIX file system can be identified by their path and name. As such, filename standards and conventions have developed to make browsing the file system more efficient. Some of the standards you should be familiar with are listed below:

  • A good file name will consist of alphanumeric characters and/or punctuation characters other than slashes. However, it is best to stick with the standard alphanumeric name followed by a dot extension such as myfile.txt. Though valid, try to avoid such characters as !, ?, #, *, &, |, <, >, ', ", \, /, ^, (), {}, [], ;, :, +, =.

  • Most modern UNIX systems allow filenames of up to 256 characters, but you should try to strike a balance between the information contained in a file name and the convenience of typing the name in. For example, a name like "web_site_log_file_for_january.log" might be renamed "jan98_www.log". As a side note, you probably would not want to name the file "www_jan98.log" under the assumption that you will have other files such as "www_feb98.log" and "www_mar98.log". Here's why. If you put the month first, you can more easily use wild cards to find the file. If the date comes first "ls ja*" will find "jan98_www.log" with only 4 typed characters, whereas, if the "www" portion came first, you would need to type out "ls www_ja*" to get the same result.

  • Since UNIX is case sensitive, capitalization within a filename will matter. Thus, myfile.txt and Myfile.txt are two different files. Many admins use all capital letters or an initial cap for directories and all lower case letters for files. When this is done, directories and files will be very visibly distinct in directory listings using just "ls". The directories will all be grouped together at the beginning of the listing. One problem with this is that URLs can become confusing to the browser who may not know about case sensitivity and may type in a bad URL such as "http://www.yourdomain.com/news" instead of "http://www.yourdomain.com/News" with a capital "N".

  • Deciding upon good names and a good naming standard will save you time in the long run. For example, it is better to name a log file Jan1998.log rather than log12.txt. Remember that it may be two years before you return to the directory and you will need to recall what the contents of the file are. Further, other people may need to know what the files contain as well.

And so another crucial task performed by the operating system is the provision of a file system to fully elaborate the file metaphor.

A file system is also an abstract data structure used to store information. However, a file system stores information about files (a metaphor to hold metaphors).

Typically, the file system uses the file cabinet metaphor to describe how files are stored within the bowels of the computer. Individual files are grouped into file folders called directories. Directories may contain files or sub-directories. Sub-directories can contain more files or more sub-directories and so on, and so on.

Like most modern operating systems, UNIX defines an inverted tree file system emanating from a single "root" directory. This is shown below

[File System]

The structure of the UNIX file system however, has some general rules that you can use to navigate through it. Specifically, it defines a set of generic directories that hold a predictable set of files. Each system administrator may add to or delete from these standard directories, but it is a good bet that you will see the following hierarchy on freshly installed UNIX systems.

[File System]

Lets take a look at a few of the more important ones from the perspective of a web technician.

The "bin" Directory
This directory contains binary files and executables that are considered basic to the use of the system. We are going to discuss many of these commands tomorrow such as "ls", "mv", or "grep". For the most part, you do not need to access this directory yourself. However, it is a good idea that you know where it is located.

Since these executables are considered basic, generally, all users are granted permission to read or execute the files in "bin". We will talk more about permissions later.

The "lib" Directory
This directory contains libraries for any installed compilers. If you are writing CGI scripts in C, you may be interested in making sure the libraries you need can be found here.

The "tmp" Directory
This directory is used to store temporary files created by users or applications. You are free to use this directory, but make sure you do not store anything that cannot be deleted since this directory is often wiped out when the system is rebooted.

The "usr" Directory
The usr directory is where you will probably spend most of your time. The usr directory contains files relevant to users whereas the rest of the directories we have discussed are more oriented towards systems issues.

For example, if you are working with an HTTPD web server, you are likely to find it in "/usr/local/etc/httpd". Similarly, most home pages will be based in the "/usr/home" directory. Finally, supporting applications such as the perl interpreter will be found in "/usr/bin".

Hold on, what does "/usr/local/etc/httpd" mean? Well, "/usr/local/etc/httpd" is called a "path" and is essentially the address of some file or directory. Let's take a closer look at paths.

Previous | Next | Table of Contents