| Short description | Write a web server. |
|---|---|
| Due date | Thursday, April 2, 2009 at 11:59 p.m. |
| Value towards final grade | 10% |
You are going to write a minimal web server. This web server will not strictly be conforming to the RFC (that would be too much work), but will implement a subset of the RFC such that it can communicate with modern web browsers.
We'll be discussing HTTP in class, but it's worth describing here if you want to get a jump on it. HTTP is an application-level protocol. Its primary purpose is to transfer "Hypertext" (HTML) from a server to a client, though in actuality it is used to transfer all sorts of files, not just HTML. In its simplest form, an HTTP server reads a static file from disk and transfers that static file over the network. Fully implemented HTTP servers do a lot more (such as generating content dynamically), but your web server will deal only with server static content.
A simple HTTP connection consists of the client (browser) connecting to the server, making a request for a file, the server serving that file to the client, then closing the connection. Fully implemented HTTP servers can be more complex (for instance allowing for connections to be "kept alive"), but your web server will not be dealing with any of that nonsense.
You aren't really required to know HTML for this assignment (since, strictly speaking, HTTP isn't required to deal with HTML), but it may be handy if you want pretty looking testing data/error messages :). Here is a bare bones HTML file:
<html> <head> <title>Your file was not found!</title> </head> <body> <h1>404 file not found</h1> <p>The file you requested was not found. You may want to check if you're an idiot.</p> </body> </html>
The above is an example of a string you may wish to hardcore into your server to handle requests for files which don't exist. An HTML file consists of an opening <html> tag, content, then a closing </html> tag. Inside, the file is broken into two parts: the head (which, for a simple file, has nothing but the title) and a body (which is where all the content of the document is). If you want to know more about HTML, you can ask me, but you don't really need to know much about it for this assignment.
HTTP (Hypertext Transfer Protocol, used for the WWW) properly runs on port 80. For security reasons, non-priveleged users (that means you!) are prohibited from providing servers on "standard" ports (port numbers less than 1024). To keep you guys from stepping on each others' toes, please use the port assignments used for assignment 3.
Note that port number should not be hardcoded into your server. Your server should take in a command line arguments describing which port to use. These port numbers are what you should use during testing.
| Surname | Port range |
|---|---|
| Antrobus | 6000–6009 |
| Beaton | 6010–6019 |
| Blay | 6020–6029 |
| Bouteiller | 6030–6039 |
| Brookfield | 6040–6049 |
| Cruise | 6050–6059 |
| Davison | 6060–6069 |
| De Angelis | 6070–6079 |
| Dizazzo | 6080–6089 |
| E | 6090–6099 |
| Eggleton | 6100–6109 |
| Eidt | 6110–6119 |
| Eineke | 6120–6129 |
| Ewer | 6130–6139 |
| Favaro | 6140–6149 |
| Ferguson | 6150–6159 |
| Fernihough | 6160–6169 |
| Foster | 6170–6179 |
| Galloway | 6180–6189 |
| Gao | 6190–6199 |
| Holik | 6200–6209 |
| Kalawon | 6210–6219 |
| Kapp | 6220–6229 |
| Lepp | 6230–6239 |
| Litvinov | 6240–6249 |
| Loh | 6250–6259 |
| Lu | 6260–6269 |
| Maloney-Chumney | 6270–6279 |
| Niu | 6280–6289 |
| Rao | 6290–6299 |
| Rumas | 6300–6309 |
| Sham | 6310–6319 |
| Simpson | 6320–6329 |
| Sinnamon | 6330–6339 |
| Sisco | 6340–6349 |
| Smith | 6350–6359 |
| Tsang | 6360–6369 |
| Tsotsos | 6370–6379 |
| Wong | 6380–6389 |
| Zhang | 6390–6399 |
Your server must take in one command-line argument, a number representing which port to listen on. It is not required to produce any output to the screen, but it can (note: it would be extremely useful for your server to print stuff to stdout during development).
Your server accepts is only required to accept one connection at a time (i.e., you don't have to use poll anymore, booya!). It should accept a connection fram a web browser, handle that connection, close that connection, then go to the top of its infinite loop to handle more connections.
Upon accepting a connection, the server can immediately accept a request from the web browser (web browsers are designed to make a request immediately upon connecting). Your web server is only required to respond to GET requests (requests for files). The request will look like:
GET /path/to/a/file HTTP/1.1 User-agent: Mozilla/blah blah Accepts: blah blah
Your web server need only look at the first line since that's all it needs to care about. Every line in HTTP is terminated with a carriage return followed by a newline ("\r\n" in C). You should read in the first line (so you know which file you're going to give the browser); you should then read (and discard) until you see "\r\n\r\n". "\r\n\r\n" (a blank line) signifies the end of a message in HTTP.
After getting a GET command, your server should try to open path/to/a/file on the local filesystem. Note that the file you read must be relative to your web browser's working directory and thus you should strip off the leading / character. (Note: stripping off leading characters is extremely efficient in C if you use pointer arithmetic. If filename is pointing to "/path/to/a/file", then filename + 1 is pointing to "path/to/a/file"!)
If you can successfully open the file requested, you must send the client "HTTP/1.1 200 OK\r\n\r\n" (a header indicating success, followed by a blank line) followed by the contents of the file. You should then close the socket (and the file).
If you cannot successfully open the file requested, you must check the value of errno to determine if it was due to the file not existing or due to insufficient permissions. If the file does not exist, you must send "HTTP/1.1 404 File not found\r\n\r\n" follow by a (optionally HTML-formatted) short human-readable message (to be displayed in the browser) indicating the file was not found. If the web server has insufficient permissions to read the file, you must send "HTTP/1.1 403 Permission denied\r\n\r\n" followed by a short human-readable message indicating permission was denied. You must then close the socket to the web browser.
To test correct operation of your server, you can fire up a web browser (such as Firefox). Supposing that your web server is running on port 9001 and you wish to request the file index.html, you would type the address http://obelix.gaul.csd.uwo.ca:9001/index.html into your web browser.
Some people have been having problems with Firefox on GAUL not displaying webpages. If your web server seems to be producing correct output through telnet, but not when you try it on Firefox, you will likely need to include a Content-length in your header. Properly you should be doing this anyway, but I figured it was a needless complication and most browsers cleanly handle a missing Content-length.
A Content-length follows on the line after your status line. For instance, if the user requests a file which is 213 bytes long, your header would be HTTP/1.1 200 OK\r\nContent-length: 213\r\n\r\n followed by the content.
For determining the size of a file, I highly recommend using fstat(2), as per the following code:
fd = open(filename, O_RDONLY);
if (fd < 0) { ... }
else {
struct stat fdstat;
if (fstat(fd, &fdstat) < 0) { ... deal with error ... }
... the size of the file is now in stat.st_size ...
...
close(fd);
}
First of all, you probably won't get 20% bonus unless you really go nuts with it :). Getting 10% bonus is pretty doable, though.
If you want bonus marks, mention why you think you deserve it in your README.TXT. You can do whatever you like (that you think deserves bonus marks), but here are a few ideas off the top of my head:
All files that are necessary for determing the proper working of your program should be
submitted electronically. They should be put in a directory called
asn4 and can then
be submitted on GAUL via the command:
submit cs3357 asn4
You should submit:
In an unsealed 9" by 12" envelope, submit, in the CS3357 locker: