![]() |
Distributed Systems Lab 2003 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
![]() |
Hypertext Transfer Protocol (HTTP) TutorialHTTP (Hypertext Transfer Protocol) is an application-level protocol over TCP (Transfer Control Protocol) for distributed, collaborative hypermedia information systems. It is a generic, stateless protocol. Though WWW browsers support a variety of protocols, e.g., FTP, NNTP, SMTP, etc., HTTP is the most frequently used protocol in combination with Web browsers. Several HTTP versions exist. Version 1.0 ( HTTP/1.0 in RFC1945 ) and version 1.1 ( HTTP/1.1 in RFC2616 ) are the most often found versions today. In the distributed systems lab, we focus on a subset of HTTP that is available in both, version 1.0 and 1.1. HTTP is a simple request/reply (RR) protocol over TCP. The standard procedure when an HTTP request is done is (for simplicity no proxies, firewalls, etc. are considered here):
This interaction scheme can be easily tested with standard tools:
telnet www.dslab.tuwien.ac.at 80 Trying 128.131.172.54... Connected to www.dslab.tuwien.ac.at. Escape character is '^]'. GET / HTTP/1.0 HTTP/1.1 200 OK Date: Mon, 16 Dec 2002 07:31:16 GMT Server: Apache/1.3.12 (Unix) tomcat/1.0 Last-Modified: Son, 15 Dec 2002 14:57:00 GMT ETag: "fc00-2b03-3c34713c" Accept-Ranges: bytes Content-Length: 11011 Connection: close Content-Type: text/html <html> <head> <title> Distributed System Lab </title> <link href=".//includes/text.css" rel="stylesheet" type="text/css" /> </head> <body bgcolor="#12658E"> <table border="0" cellpadding="0" cellspacing="0" width="100%"> <tr> <td align="center"> <img alt="Our Logo" src="./images/logo.gif" /> </td> <td> </td> <td align="left" colspan="2"> <br /> <font color="white"> <h1> Distributed Systems Lab 2003 </h1> </font> <hr /> </td> </tr> ... ... </body> </html> A browser such as Netscape uses the protocol in the same way. After each request the browser would parse the reply and check whether it needs additional requests for embedded elements like pictures, Java code, etc. These elements, if any, would then be retrieved via a new HTTP request. HTTP Drawbacks and ProblemsThe above example shows a problematic drawback of HTTP/1.0: the 1-1-mapping of requests to elements is rather inefficient if requests are issued whose replies have many embedded elements. Each element requires a separate request, which means to
If, for example, an average HTML page with 10 images and a Java applet would be requested, this would result in at least 12 requests, i.e. 12 TCP connections, even though the connections could be reused, provided that the embedded elements reside on the same server. Additionally, if the Java applet is not stored in an archive file (.jar) each Java class of the applet would require a separate request. A second drawback of HTTP stems from its R/R nature: if session-oriented services like databases are gatewayed to the Web, this conflicts with HTTP's request/reply scheme that has no session concept. The problem when mapping session-oriented concepts to HTTP is that HTTP does not maintain this state/context information. Each HTTP request is completely stand-alone and separated from other requests. Nevertheless, such interaction patterns need to be supported. This has to be done outside the HTTP protocol by the application programmer which may be rather complicated. HTTP/1.1 tries to overcome some of these drawbacks by introducing some new and improved features. The most important improvements are:
Table 1 . HTTP 1.1 improvements. HTTP RequestThe following table defines the subset of HTTP requests, your HTTP component has to understand in a BNF style form ( exp1 | exp1 means either exp1 or exp2 , [ exp ] denotes an optional exp and ( exp )* denotes any sequence of exp including the empty one). request = request-line <CRLF> (general-header|request-header|entity-header)* <CRLF> [entity-body] request-line = GET <Space> absolute-url <Space> (HTTP/1.0|HTTP/1.1) absolute-url = // the path of an absolute URL general-header = // can be ignored; some non-empty lines of information, // where each line is terminated by a single <CRLF> request-header = // contains the 'Host' parameter in an HTTP 1.1 request. // e.g., 'Host: www.dslab.tuwien.ac.at' // all other entries can be ignored; // some non-empty lines of information, // where each line is terminated by a single <CRLF> entity-header = // can be ignored; some non-empty lines of information, // where each line is terminated by a single <CRLF> Hint: Note that the HTTP request header section has to end with two CRLF! The HTTP server answers a GET request with a response (as described below) specifying information about the requested document and the requested document itself. HTTP ResponseWhen a request is received at the server, the HTTP component of lab 4 has to check its correctness, process it, and return the reply to the client. The following table defines the subset of HTTP responses the lab 4 HTTP component must support. response = status-line <CRLF> general-header response-header entity-header <CRLF> [entity-body] status-line = HTTP/1.1 <Space> status-code+reason-phrase status-code+reason-phrase = 200 <Space> OK | 400 <Space> Bad Request | 404 <Space> Not Found | 500 <Space> Internal Server Error | 501 <Space> Not Implemented general-header = Date: <Space> date <CRLF> Connection: <Space> close <CRLF> response-header = Server: <Space> vendor-string <CRLF> entity-header = Content-Length: <Space> integer-greater-or-equal-0 <CRLF> Content-Type: <Space> text/html <CRLF> Last-Modified: <Space> date <CRLF> [ Cache-Control: <Space> no-cache <CRLF> ] // only for dynamic pages [ Expires: <Space> date <CRLF> ] // only for dynamic pages entity-body = // the contents of the document requested by the client date = // date format according to RFC822 and RFC1123 vendor-string = // server identification // (freely definable by the server implementor) Hint: Note that the entity-body starts after the first blank line of the response. (i.e. there must be two CRLF ahead!). The order of the general-header , response-header and entity-header expressions (lines) is not important. The required lines simply have to be present in the response. The status codes in the HTTP response have the following meanings:
Table 2 . HTTP status codes and their meaning. The response headers give some information about the server, the connection, and the entity being returned by the server. Other headers give some information about the document itself. This allows the client to identify the document's size a priori and display the estimated transfer time, or more importantly to determine wether the document is cacheable:
Table 3 . HTTP response headers. If you are not sure if your HTTP server shows the correct behavior a look at the FAQ page might help. ExamplesThe following examples should help you to understand HTTP in greater detail. Whenever you have any doubts about HTTP, either check RFC2616 , or send a test request to an HTTP server (e.g., www.dslab.tuwien.ac.at) and have a look at that server's response. A typical HTTP/1.0 example has already been presented in one of the previous sections. The following shows a typical HTTP/1.1 example (Note: In the previous HTTP/1.0 example no host-line Host: www.infosys.tuwien.ac.at was given.): user@host:~% telnet www.infosys.tuwien.ac.at 80 Trying 128.131.172.91... Connected to www.infosys.tuwien.ac.at. Escape character is '^]'. GET /Teaching/Courses/RN.html HTTP/1.1 Host: www.infosys.tuwien.ac.at HTTP/1.1 200 OK Date: Mon, 16 Dec 2002 07:59:02 GMT Server: Apache/1.3.14 (Unix) tomcat/1.0 PHP/4.0.3pl1 mod_ssl/2.7.1 OpenSSL/0.9.4 Last-Modified: Tue, 23 Oct 2001 09:54:46 GMT ETag: "67987-2095-3bd53e66" Accept-Ranges: bytes Content-Length: 8341 Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> ... </HTML> The example above shows the Host request header which is mandatory in HTTP/1.1. This request header must be present in all HTTP/1.1 requests. If it is not present, the server must reply with the status code "400 Bad Request". This is necessary for multi-homed HTTP servers (servers that have been assigned multiple names and should act differently depending on the URL's hostname). Your HTTP component has to accept HTTP/1.0 and HTTP/1.1 requests. If in an HTTP/1.1 request the host line is missing, your server should should response with a "400 Bad Reqest". For HTTP/1.0 requests, the host line need not be checked. The following shows a typical HTTP error (the user requests a document that is not available on the HTTP server): user@host:~% telnet www.infosys.tuwien.ac.at 80 Trying 128.131.172.91... Connected to www.infosys.tuwien.ac.at. Escape character is '^]'. GET /unlikelytoexist HTTP/1.1 Host: www.infosys.tuwien.ac.at HTTP/1.1 404 Not Found Date: Mon, 16 Dec 2002 08:01:14 GMT Server: Apache/1.3.14 (Unix) tomcat/1.0 PHP/4.0.3pl1 mod_ssl/2.7.1 OpenSSL/0.9.4 Last-Modified: Wed, 05 Sep 2001 11:28:58 GMT ETag: "2222c2-282-3b960c7a" Accept-Ranges: bytes Content-Length: 642 Content-Type: text/html <html> ... some error document goes here ... </html> The following example shows a typical request including all the request headers of a Netscape Web browser requesting a /log from an HTTP server: GET /log HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.03 [en] (X11; I; HP-UX B.10.20 9000/777) Host: w0:4711 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */* Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 |
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Powered by MyXML |
Last update on:
2003-03-13
© 2001 Distributed Systems Group |