The Coconut Corpus

The COCONUT Corpus was collected and annotated for the COCONUT project by
The University of Pittsburgh Intelligent Systems Program

There are seven directories for the COCONUT corpus.

  • The subdirectory raw contains the dialogues as they were collected and includes information about the state of the graphics display. These subdirectories are divided into 2 collections because the data was collected during two different timeframes about 1 year apart. The second collection was simply because we needed more dialogues to analyze. The interface used in the second collection differs slightly from the first one. The participants were allowed to manipulate the graphics on their interfaces at any time during the dialogue. The graphical information recorded in the raw dialogues records snapshots of the screens just before and after each turn.
  • Subdirectory units contains the same dialogues with the graphic information removed and turns broken up into utterance units (see the COCONUT-DRI manual for the definition of utterance units we used).
  • The subdirectory annot1 contains a subset of the dialogues from the unit subdirectory that were annotated with the COCONUT-DRI coding scheme
  • The subdirectory annot2 contains a subset of the COCONUT dialogues have also been annotated with Pam Jordan's coding scheme for NPs and discourse entity relations (see the annotation manual).
  • Subdirectory annot3 contains the dialogues that have been annotated for the solution size. The instructions for annotating the solution size are included in the annot3 subdirectory.
  • The other two directories, inventory and instructions, give additional information needed to interpret the dialogues. This is the information that was given to the players. Each dialogue file in the corpus points to the appropriate inventory files.

    See the COCONUT project webpage for additional background on the COCONUT corpus and project.

    To access the material described above go here.