There are four directories for the COCONUT corpus. The subdirectory raw contains the dialogues as they were collected and includes information about the state of the graphics display. Subdirectory units contains the same dialogues with the graphic information removed and turns broken up into utterance units (see the COCONUT-DRI manual for the definition of utterance units we used). The subdirectory annot1 contains a subset of the dialogues from the unit subdirectory that were annotated with the COCONUT-DRI coding scheme and annot3 contains the dialogues that have been annotated for the solution size. The instructions for annotating the solution size are included in the annot3 subdirectory. The corpus directories are divided into 2 collections because the data was collected during two different timeframes about 1 year apart. The second collection was simply because we needed more dialogues to analyze. The interface used in the second collection differs slightly from the first one. The participants were allowed to manipulate the graphics on their interfaces at any time during the dialogue. The graphical information recorded in the raw dialogues records snapshots of the screens just before and after each turn.
The other two directories, inventory and instructions, give additional information needed to interpret the dialogues. This is the information that was given to the players. Each dialogue file in the corpus points to the appropriate inventory files.
A subset of the COCONUT dialogues have also been annotated with Pam Jordan's coding scheme for NPs and discourse entity relations (see the annotation manual). We have not yet made this data publically available because it is still being analyzed. If someone would like access to it before it is made public, please contact pjordan@pitt.edu and let her know your plans for using the data and she will consider giving you access if it does not conflict with her current work.
See the COCONUT project webpage for additional background on the COCONUT corpus and project.
To access the material described above go here.
12/20/2000