![]() |
Distributed Systems Lab 2003 |
||||||||||||||||||||||||||||||||||||||||||||||
|
![]() |
A World-Wide Web (WWW) Tutorial for BeginnersIf you never heard of the World-Wide Web and do not know what the abbreviations HTML, WWW or HTTP mean, then should read this introductory text. Otherwise just briefly scan through it to make sure that you know all the topics discussed here. The World-Wide Web was developed in 1989 at CERN (European Laboratory for Particle Physics) as a project of Tim Beners-Lee , which was intended to serve as an information system for physicists. It was originally developed to allow information sharing within internationally dispersed teams and the dissemination of information by support groups. The original WWW consisted of documents and links. Most documents on the World-Wide Web are written in HTML (Hypertext Markup Language). These documents consist of many elements such as text, pictures and links (also called hyperlinks) to other documents, or places within those documents (hypertext). To "follow" a link users simply click on it and the user's browser loads the corresponding document at the requested position. Figure 1 . The WWW's concept of documents and links. This simple but efficient concept became very popular and developed into the most successful Internet service. In the following sections we present the WWW's basic building blocks: documents, URLs (Uniform Resource Locators), and HTML. You need to understand these concepts for the implementation of the HTTP component in lab 4. Please note that this tutorial only gives a short introduction to WWW and its concepts. You can get comprehensive information about the WWW and all its protocols, standards, and concepts from the World-wide Web Consortium (W3C). DocumentsToday's WWW still follows the basic concept of documents and links but has considerably added to the notion of what a document can be. WWW documents can be categorized in several respects:
Table 1 . Different categories of documents on the WWW. URLs (Uniform Resource Locators)A Uniform Resource Locator (URL) addresses a document on the WWW. Simply put, its syntax is: protocol://host[:port]/[path][#anchor] protocol denotes the protocol used to access the document, e.g., http. host specifies the host computer the document resides on, and port defines the port on the host where the server listens for requests. If no port is specified, a protocol's (e.g., ftp, telnet, or http) "well-known" port is used. HTTP's well-known port is 80. / path denotes the path to the document in the file system of the server, starting with some virtual root for the server. Its syntax adheres to standard UNIX path syntax. anchor denotes a label inside a document, to allow the browser to load a document and position to a certain point inside the document. If the referenced document is not static but dynamically generated, parameters can be encoded within the URL. This is usually the case when clients submit an HTML form that uses GET as its submission method (see the HTTP tutorial for details ). In this case a URL has the following syntax: protocol://host[:port]/[path][?parameters] The only difference with the URL variant given first is that the path (which in the dynamic case usually denotes a program or script on the server), may be followed by a questionmark and a parameter string. The parameter string consists of key/value pairs. The keys are separated from the values by '=' and the key/value pairs are seperated from each other by '&'s. Spaces are encoded by '+'s to allow URLs to be read as a single token. To allow the inclusion of special characters within the key or value, characters may be escaped by '%' followed by a two digit hex value representing the characters ASCII code. A simple example for a parameter string is: mode=dinner&query=pizza+champagner . Hint: The order of the parameters is not fixed! Thus you must not expect the parameters in a certain order. The URLs given so far are called absolute URLs because they denote documents with an absolute name, i.e., include the protocol, the host, and the path. But URLs can also be relative to a currently visited URL, e.g., #ParamURL denotes a label, i.e. position inside a document. www.gif is a relative URL pointing to a file located in the same directory on the host as the current document. If only a directory name but no path to an actual HTML file is given in the URL this may be treated in 2 ways:
The following paragraph gives some example URLs: http://www.infosys.tuwien.ac.at/melange/ http://www.search.me./DoSearch.pl?author=coulouris http://nswt.tuwien.ac.at/htdocs/ftpsearch.html http://ftpsearch.ntnu.no/cgi-bin/search?query=JDK http://bibopac.at:8180/s?fieldtagf1=au&bcode=VERBUND / . Other types of URLs exist but are not of interest for lab 4. We will only use a subset of the parameterized URLs as defined in the relevant Request for Comments (RFCs). If you are interested in detailled information check the following RFCs: Remark: An RFC defines a quasi-standard, e.g., FTP, HTTP, SMTP, MIME, etc. It is first published as a draft so interested persons can discuss and comment it. The currently available RFCs can be found at http://www.ietf.org/rfc/ . Hypertext Markup Language (HTML)The Hypertext Markup Language (HTML) is a description language for content on the Web. The latest version of HTML is HTML 4.01 and XHTML 1.0 respectively . HTML is defined using SGML (Standard Generalized Markup Language). SGML is a system for defining markup languages. Markup means that the content of a document is enhanced with special, embedded directions, that are not displayed by the browser but give structural information on parts of the document or the document itself. Originally, HTML tried to completely separate structure from layout. Thus it was possible to display HTML with a wide variety of browsers and output devices. After getting popular, HTML was enhanced with lots of layout commands, that people considered essential for advertisement, etc. This also dramatically reduces HTML portability. Within HTML, spaces and linebreaks are not significant, i.e. are not mirrored in the formatted result. One space has the same effect as any number of spaces and linebreaks. Some browser implementations, however, do not follow this rule exactly. Hence the existance and/or number of spaces can sometimes be significant. HTML has a large set of tags and elements. For this lab we only need a limited set. The following figure shows the general structure of an HTML document. <html> <head> ... header information ... </head> <body> ... content information ... </body> </html> Besides the standard tags for document structuring, you will also need to know about tags to define HTML forms. The form tag defines the form. Within the form opening and closing tags you may include input tags that allow the user to specify different kinds of data. The form element has two additional attributes, the action and the method attributes. The action attribute defines the script to be executed on the server. The method attribute defines how the request will be submitted to the server. For the purpose of the lab, it is sufficient to use the GET method (see the HTTP tutorial for details) which encapsulates the form's data within the URL (see the URL section above for details). Here is a sample form element: <form action="/cgi-bin/doIt" method="GET"> ... form contents ... </form> input elements define fields that allow the user to enter data. Again, attributes determine the appearance and functionality of the input fields. The type attribute defines the type of the input field. The most important types are text fields and the submit button used to send the request. The name attribute specifies the variable name under which the input in this entry will be available to the script on the Web server. The value attribute specifies a default value for text fields. <form action="/cgi-bin/doIt" method="GET"> Enter your name: <input name="name" type="text" value="Hugo Maier" /> Enter your email: <input name="email" type="text" value="yourmail@email.com" /> <input type="submit" /> </form> This is only some rudimentary information on HTML. Comprehensive introductions and reference sites are available on the Web. For instance, consult some of the following tutorials for more information: |
![]() |
||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
||||||||||||||||||||||||||||||||||||||||||||||
Powered by MyXML |
Last update on:
2003-03-13
© 2001 Distributed Systems Group |