Pitt-Greensburg Digital Studies Logo: I Code!


Maintained by: Elisa E. Beshero-Bondar (ebb8 at pitt.edu) Creative Commons License
Last modified:


What's an XML database and how does it work?

XQuery is one of the XML family of languages, and it builds on what you've learned of XPath. We use XQuery to work with databases of stored and indexed XML files. XML databases basically work by storing XML files and building "persistent indexes" for them--and this indexing capacity makes it quick and efficient to search for elements, attribute values (and the other things we've been doing with XPath expressions) across collections of files, by effectively reviewing the index that the database builds for each file, so that the computer doesn't have to review the entire file. Basically the index stores the tree-structure of XML in the database memory, and makes it available for quick retrieval through XQuery.

How to Access our eXist XML database:

For our class and the Pacific project, we're working with a particular XML database called eXist, and it's installed on our private sandbox server only accessible to members of our class group through a secure login. To work on homework exercises and on project development, here's how to access eXist:

eXist is available right now in two "places", in the same way you access our Digital Sandbox for developing your project websites:

  1. In our campus computer labs, without an sremote login: http://x1.digital.greensburg.pitt.edu:8080/exist/apps/dashboard/index.html
  2. On your OWN computers with a wireless or wired connections. (Please test this ASAP following the instructions below to make sure it works for you.)

To begin, you need to use the secure login, "Network Connect"--our private passageway to eXist. If you're enrolled in this class, you've been granted access. Here's what to do:

  1. Open Internet Explorer or Firefox, and go to http://sremote.pitt.edu . (Network Connect won't be fully functional in other browsers.) If you can't get in through Explorer, try the Firefox browser.
  2. With "Network Connect" selected, log in with your Pitt userid and password (the same that you use for e-mail and for your Pitt web space)
  3. After you log in, you'll see a link to "Firewall-GBG-DH-DIGITAL-SANDBOX-NetworkConnect." Click on this to bring up a new screen with the words "Network Connect" again--this time with a "Start" button. Click on the Start Button, and wait until Network Connect seems to have run.
  4. Once you've done this, you may open ANY web browser (or a new tab in this browser) and go to this link to our eXist XML database:
    http://x1.digital.greensburg.pitt.edu:8080/exist/apps/dashboard/index.html
  5. Here, you may log in locally with:
    • id: your Pitt userid
    • password: dhclass

    (After you log in for the first time, change the password to whatever you want.)
  6. Open eXide to work on XQuery.

 

How the database is organized:

eXist is set up like a file directory with a hierarchical order--a little like a massive XML file with subfiles! So there's a single root directory called "db", with subfolders containing folders (or collections), which may in turn contain their own subfolders (more collections), and finally files. I've installed a copy of our Georg Forster XML file here, in a collection called "voyages," inside a directory called "pacific", which is so its address is /db/pacific/voyages/ForsterGeorgComplete.xml .

As we work on project development, you may find that you want to upload your own collection of XML files into eXist, and we'll walk you through how to do that. This is different from uploading files to publish on your web space, which makes them publicly viewable but doesn't build index files or let you collect, extract, and remix your coding using XQuery.

XQuery for a Single Document vs. a Collection:

XQuery uses XPath expressions to find its way through its index of files. It can work on one file, or on a whole collection, thus:

Actually, both doc() and collection() are XPath functions, and we'll be adding more XPath, once you've designated the document or collection: You can write path expressions, use predicates, functions, walk up and down axes. This works if the files in your collection are coded (at least structurally) in the same or similar ways.

The TEI and XQuery: Declaring the TEI Namespace

Speaking of coding in the same or similar ways, we need to introduce you to the Text Encoding Initiative, or TEI. This a language of XML with designated rules and tag sets for coding digital versions of literary, linguistic, historical, and cultural texts, and it represents an international standard for coding work consistently for long-term, sustainable archives. TEI is also a community (of which I'm a member), and people serve on its ruling Council to make judgment calls on best practices and coding guidelines. We use TEI to build digital archives that can "talk to" each other around the world, and follow recognizable, standard patterns. We *could* make up our own XML tag sets, but when coding cultural resources, it's a Good Idea to make your work accessible, so its easy for others to access and, say, load into databases to run XQuery for analyzing it, or studying it, or connecting it wiht other comparable texts in other archives! We'll talk more about TEI structure and coding, and give you some experience with it. (To read more, here's the TEI's home site.) For now, you can quickly tell if one of our files is coded in TEI from its root element: <TEI> .

XQuery needs something called a namespace declaration when we use the TEI, in order to properly follow its index and in order to follow the schema rules for TEI (to determine if your file is valid as a TEI document). Similarly, we use a namespace declaration for html, to say there are certain rules governing the relationship of tags, their organization, etc. When we query our TEI files, we'll need to include the following namespace declaration as the first statement of our XQuery: declare default element namespace "http://www.tei-c.org/ns/1.0";

Here are examples of some XQuery expressions on collections of TEI files:

declare default element namespace "http://www.tei-c.org/ns/1.0";
collection('/db/pacific/literary')//titleStmt/title

declare default element namespace "http://www.tei-c.org/ns/1.0";
collection('/db/pacific/literary')/distinct-values(.//body//persName)

"FLWOR" Expressions in XQuery

"Flower" or FLWOR expressions are a powerful tool in XQuery, letting us work in more complex ways with querying and remixing information in files and collections--sometimes both in the same expression! Here's a primer on FLWOR (or really, LFWOR!):

"Let": establishes variables which may be single values or arrays of multiple values (single or multiple)

"For": establishes a range variable that moves step by step from one value to the next and the next in a long list of values defined by a a Let statement. (single ONLY)

"Where" (optional): filtering; analogous to predicates
"Order by" (optional): alphabatize, etc. Always appears after "Where"

"Return": generates output

A really, really simple little FLWOR:

let $hamlet := doc('/db/shakespeare/plays/hamlet.xml')
return $hamlet

Examples of two related FLWOR Expressions, to demonstrate the use of "For" and "Where"

1. (no "For" statement here):
declare default element namespace "http://www.tei-c.org/ns/1.0";
let $cook := doc('/db/pacific/voyages/cookVoy2Pnum.xml')
let $p := $cook//p[geo]
let $geo := $cook//p/geo
let $countlat := count ($geo[@select="lat"])
let $countlon := count ($geo[@select="lon"])
where $countlat gt $countlon
return $p

2. Using a "For" statement:

declare default element namespace "http://www.tei-c.org/ns/1.0";
let $cook := doc('/db/pacific/voyages/cookVoy2Pnum.xml')
let $P := $cook//p[geo]
let $geo := $cook//p/geo
let $countlat := count ($geo[@select="lat"])
let $countlon := count ($geo[@select="lon"])
for $p in $P

(: to show the use of a range variable with for, try commenting this last line out, and changing the return expression to give $P/@n :)
where $countlat gt $countlon
return string-join(('paragraph',$p/@n),':')


(: Note use of the string-join function, with its separator. Also notice which parts of it take the single-quotes ' ', and which parts don't! The single quotes, ' ' , allow you to indicate you want some literal text to be returned here. Without it, the computer thinks you're referring to an XPath expression. :)

The "O" in the FLWOR: Order:
The "Order" statement in the FLWOR is optional, but when you use it, it must follow a Where statement, and immediately precede the "Return." A default is to sort a list of results in alphabetical order, so, for example:

order by $a

organizes results in alphabetical order sorted by the whatever is indicate in the variable $a.
There are more complex ways to set up an order to organize results. You can order by descending (to get reverse alphabetical order): order by $a descending
Or order a set results according to their numerical position or count, in ascending or descending order.

Building New HTML or XML with XQuery: Using Curly Braces: { }

To add HTML or XML markup to the XQuery output, add the elements where needed to produce conformant code. However, these elements are passive, or non-functional when executing XQuery commands. So we use curly-braces { } to enclose any XPath or XQuery statements that we want to execute in XQuery, to separate them from the HTML or XML markup elements. Inside html elements, when we need to do some calculation or refer to a variable we defined in XQuery, we use the curly-braces again. We'll work on some examples in class. Here's one example that may be helpful as a reference point, showing how to make an html page with a table of two columns, making a list of two related variable results side by side. The resulting html file is coded to display a table of the distinct characters (<SPEAKERS>)in Hamlet from our Shakespeare collection, next to a count of their speeches (<SPEECH>) in the play. We've highlighted the position of the curly-braces in the example:

<html>
<head><title>Title</title></head>
<body>
<table>
{
let $hamlet := doc('/db/shakespeare/plays/hamlet.xml')
let $speeches := $hamlet//SPEECH
let $speakers := $hamlet//SPEAKER

let $distinctsp := distinct-values($speakers)
for $sp in $distinctsp
let $count := count($speeches[SPEAKER = $sp])
order by $count descending
return

<tr>
<td>{$sp}</td>
<td>{$count}</td>
</tr>
}
</table>
</body>
</html>

Working with Strings:

Example using XPath functions:
USEFUL LIST OF FUNCTIONS: see part III on Strings here:
http://dh.obdurodon.org/functions.html

MODEL: Generating a list of characters in Shakespeare plays in alphabetical order:
http://dh.obdurodon.org/shakespeare-characters.html

let $hamlet := doc('/db/shakespeare/plays/hamlet.xml')
let $speakers := distinct-values($hamlet//SPEAKER)
for $speaker in $speakers
let $speakerLength := string-length($speaker)
where ends-with($speaker,'O')
(:order by string-length($speaker):) (:commenting out! :)
order by $speakerLength

(:return $speaker:) (:commenting out! :)
return concat ($speaker, ' has ', $speakerLength , ' characters')

Example inserting HTML formatting into results: NOTE use of curly braces { } :

<html>
<head><title>Title</title></head>
<body> {
let $hamlet := doc('/db/shakespeare/plays/hamlet.xml')
let $speakers := distinct-values($hamlet//SPEAKER)
for $speaker at $pos in $speakers <-- sets position value $pos
let $speakerLength := string-length($speaker)
where ends-with($speaker,'O')
order by $speakerLength
return <p>{concat ($speaker, '#', $pos, ' has ', $speakerLength , ' characters')}</p>
}

</body>
</html>

Links to Some Excellent XQuery Resources: