Pitt-Greensburg Digital Studies Logo: I Code!

Maintained by: Elisa E. Beshero-Bondar (ebb8 at pitt.edu) Creative Commons License
Last modified:

W3C Headquarters in Cambridge, Mass.The headquarters of the W3C at the MIT campus in Cambridge, Massachusetts.

In our course, at the point we expect you’ll be reading this, you have been writing and planning XML documents. By now you’re accustomed to writing well-formed XML code, as well as writing rules to constrain what elements, attributes, and content you want an XML file to contain. So far, your writing of code has followed syntax rules for well-formedness, leaving the naming and content of your elements and attributes up to you, to control with your own schema rules. We now turn to orient you to HTML (or Hyper Text Markup Language), which has its origins in the concept of “hypertexts” as linkable (or “hyperlinked”) documents, and developed into markup controlling the presentation of pages to be networked and shared on the World Wide Web (W3).

In writing HTML, we work within a standardized set of element and attribute names designed to be read by web browsers. XHTML (and the other forms of HTML that we’ll tell you about) all rely on standards formulated by the World Wide Web Consortium (abbreviated W3C), an organization founded in 1994 to develop open-source, platform-independent schema for coding and best practices around the world for sharing and displaying in web browsers. A web browser is software designed to share and display documents and other resources on the World Wide Web accessed through the internet. A web browser (like Chrome, Firefox, Safari) is considered “standards-compliant” when it supports the coding approved by the W3C for the creation of web pages, their styling, their linking to other sites, their representation of metadata, and their dynamic features to be customized by site visitors. Those curious to read about the history of HTML and the origins of the World Wide Web (and the much earlier origins of the Internet) can read more at LivingInternet.com (a wonderfully extensive resource), or this concise and witty walk through: “Internet History: HTML Code Evolution 1.0 to 5.0.”

This guide orients you to XHTML (or eXtensible Hyper Text Markup Language), which is at the time of this writing the most strictly defined content model for hypertexts, requiring XML syntax using the hierarchical, nested elements and the start and end tags that you are familiar with from writing XML. XHTML 1.0 Strict is the form of HTML used by the W3C on their site pages, and it has served as the long-term recommendation for precise code designed to be interoperable with other XML data formats, such as SVG (Scalable Vector Graphics—which you’ll later be learning to draw and code). Interoperability is term referring to a technology’s capacity to communicate effectively with a different kind of technology. HTML and the World Wide Web were first developed in the early 1990s in an attempt to make various information retrieval structures speak to each other in an interoperable way, and XHTML, due to its strict syntax control, effectively maximizes the interoperability of HTML, which is especially important for those of us developing XML-based projects with a public face on the World Wide Web.

Basic Requirements of XHTML

Valid XHTML in Strict 1.0 syntax requires the following structure, beginning with a <!DOCTYPE> declaration:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<title>A Title Here</title>

The <!DOCTYPE> declaration tells the web browser what version of XHMTL (or HTML) you’re working with. In the example, we’ve indicated the doc type for XHTML 1.0 Strict, though there are others. <oXygen/> inserts this declaration for us when we choose it from the “Framework Template” list: To open a new XHTML 1.0 Strict file, look in <oXygen/> under File → New → Framework Templates → (scroll to XHTML in the alphabetized list) → XHTML 1.0 Strict.

You’ll recognize our root element, which is always <html> and must always indicate an XML name space as an attribute: the xmlns="http://www.w3.org/1999/xhtml" part. Literally, this points from your root <html> element to the published standards set by the W3 consortium for XHTML. Note: <oXygen/> comes with up-to-date W3 schemas to validate that the code you’re writing is good, strict XHTML, but if you weren’t working in <oXygen/>, or if you’re just curious about whether a site you find on the web is valid, you can always use the W3C Markup Validation Service.

The head element is always required and must contain a title element, but note that this does not display on the browser page (though it often appears in the tabs above the browser window). The part of your web page that displays in the browser is coded within the body element.

You really don’t need many elements to build a website, so we are introducing a simple selection that we find ourselves always using in our pages (including the course pages you’re reading). For more, we recommend the w3schools site as a useful ready reference to look up HTML elements, see how they display in browsers, as well as how to style them using Cascading Stylesheets (CSS) code which we’ll be covering, too, in our Introduction to CSS.

To get a view of how most of the elements we’re discussing fit together on a web page, try selecting “View Page Source” (by right-clicking in your browser window, or locating it in your browser menu options.

Block-Level Body Elements

Block-level elements are the major structural components of an XHTML page, and usually we do not nest these inside each other: They are discrete “blocks” formatted for distinct display on a page. Each of these elements opens on its own line and closes before the next block-level element opens. Block-level elements are the only permitted children of the HTML root element, and they include headings, paragraphs, lists, and tables.


Heading elements are for title and section headings throughout your page. HTML defines six levels of headings, with the idea that the first level is usually the largest and strongest, and others get smaller and smaller. The heading tags are: <h1>, <h2>, <h3>, <h4>, <h5>, and <h6>. The idea is to use these in order, so that you typically only use <h1> once to give the title of the whole page, and then use <h2> for major sections and <h3> for subsections (etc.) Have a look at this visual example from w3schools to see how these six elements typically display in a browser.


Body paragraphs are simply coded with <p> elements. See w3schools visual example of code and browser display.


Lists are made with two elements, one nested inside the other: A list needs a “wrapper” (or container) element that indicates it’s a list and what kind of list (and that’s the block level part): The wrapper element is either <ol> for an “ordered” (or numbered) list, or <ul> for an “unordered” or bulleted list. (Whether your list is numbered or bulleted depends entirely on the wrapper element.) Inside that single wrapper we have multiple “items” coded with <li> elements (for “list item”). Here’s a sample of coding for an unordered list, followed by an example of its visual display in the browser:


Here’s the browser display of the coding above:

If I change my wrapper element from <ul> to <ol>, I generate an ordered (numbered) list:

  1. apples
  2. oranges
  3. bananas

(You’re probably noticing how much extra spacing I have in my unordered list: That’s because I have styled my unordered lists with CSS code to be “padded” with extra spaces on my course pages. Here’s a visual example from w3schools so you can see a default (unstyled) browser view. Try editing the code on the w3schools page to turn the unordered list into an ordered list!


Tables are a little more complicated: These are made with three nested elements:

  1. <table>: This is the “wrapper” block-level element on the outside that defines the table. We can place an attribute called "border" on table to outline it and each of its internal cells: <table border="1">
  2. <tr>: Table row elements which define each horizontal row of a table.
  3. <td> or <th>: Inside each table row are individual table cells (called “td” for “table data”). You can designate “heading” cells, which is styled with a little more emphasis, using <th> (or “table header” cell).

This may seem odd, but there isn't an element for wrapping columns in a table. Instead, columns are created by stacking the individual td cells inside their tr rows. If a table row has five cells inside, you have a table with five columns.

Here is a sample of code for an HTML table, outlined with a border, and containing three rows and three columns. In the first row, we've designated the table cells to be ( th ) for headings, followed by a couple of rows containing ordinary td cells.

<table border="1">
<th>Row 1, left column (heading cell)</th>
<th>Row 1, middle column (heading cell)</th>
<th>Row 1, right column (heading cell)</th>
<td>Row 2, left column</td>
<td>Row 2, left column</td>
<td>Row 2, left column</td>
<td>Row 3, left column</td>
<td>Row 3, left column</td>
<td>Row 3, left column</td>

Here's a visual display of the table coding above:

Row 1, left column (heading cell) Row 1, middle column (heading cell) Row 1, right column (heading cell)
Row 2, left column Row 2, left column Row 2, left column
Row 3, left column Row 3, left column Row 3, left column

In-line Body Elements

In-line elements are used inside the block-level elements, to set apart certain passages with emphasis, or to link out to other pages, or display an image or render a multimedia file in the browser. We use the following in-line elements most frequently in our work:

Generic Block and Inline Body Elements: (div and span)

You should know about two more extremely versatile HTML elements. These are used to block off portions of your document to format in a precise way, such as to create boxes sitting side by side on your page (as we did above to set a text next to an image). These are extremely useful for styling with CSS, as you'll be learning shortly. Here are the generic elements that we frequently use:

Editing and Uploading Your XHTML Pages: Working with Server Space

You can edit XHTML pages and save them, together with their associated image and stylesheet (and other) files, in a directory all together to be viewed in a web browser. That directory could be on your own personal computer (and while you are first developing pages you might simply save files on your local computer even disconnected from the internet to view in a web browser while you're drafting, though of course any content you have linked to on remote locations on the World Wide Web will not load if you are not online). Typically we create web pages to have a public-facing presence on the World Wide Web, or at least to have a community presence on an intranet (shared within a firewall while a site is under development). To make our pages available to others to access requires uploading them to a web server using File Transfer Protocol (FTP), a standard rule system for exchanging files between computers over a network. Various security measures have been developed to guard web servers from invasive hacking attacks, so that many web servers require Secure File Transfer Protocol (SFTP) and nearly always require registration and authentication with a username and password. SFTP can be accessed from command line, though more frequently people tend to use one of several freely available SFTP software clients with a GUI (Graphical User Interface) that stores your site connection information and, on connection, shows the files in your computer and the files on the remote server, making it easy to upload and download. (We've posted information on several SFTP clients on our course syllabus.) When XHTML pages are uploaded on a web server, they are given a specific URL, or Uniform Resource Locator, otherwise known as a website address, and it usually begins with http:// followed by a a distinct locator for the web browser and your files on it, as in http://www.pitt.edu/~ebb8/DHDS/Fall2014-DHSyll.html.

By convention, the first page you place in a particular website folder is designated your index.html page. You don't have to have an index.html, but if you do, the main page of your site can be abbreviated to the name of the site directory holding the web files, like this one for my personal Pitt homepage: http://www.pitt.edu/~ebb8/ . By default, when given that address, the web browser retrieves the index.html file I have placed in that space, and if it doesn't find one it generates an error. (The site address leads to exactly the same place as http://www.pitt.edu/~ebb8/index.html .)

If you are a student enrolled in our course at Pitt-Greensburg, we have shown you (or are about to show you) how to access our class’s Sandbox server space with our sremote (firewall) service together with SFTP (instructions posted on Courseweb Announcements) so that you can access your personal folder to upload files through an SFTP client and then view those files in a web browser. As with most colleges and universities, students, faculty, and staff across the Pitt system have access to public-facing personal web space, which we encourage you to learn about and set up on your space following Pitt's posted instructions. Enrolled students in our course will use our intranet Sandbox server space for XHTML related homework exercises and project website development.