The Coconut Corpus
The COCONUT Corpus was collected and annotated for the COCONUT
The University of Pittsburgh
There are seven directories for the COCONUT corpus.
raw contains the dialogues as they were collected and includes
about the state of the graphics display. These subdirectories are divided into 2 collections because the data
was collected during two different timeframes about 1 year apart. The
collection was simply because we needed more dialogues to analyze. The
interface used in the second collection differs slightly from the first
one. The participants were allowed to manipulate the graphics on their
interfaces at any time during the dialogue. The graphical information
in the raw dialogues records snapshots of the screens just before and
Subdirectory units contains
same dialogues with the graphic information removed and turns broken up
into utterance units (see the COCONUT-DRI
manual for the definition of utterance units we used).
annot1 contains a subset of the dialogues from the unit subdirectory
were annotated with the COCONUT-DRI coding scheme
The subdirectory annot2 contains a subset of the COCONUT dialogues have also been annotated with Pam
Jordan's coding scheme for NPs and discourse entity relations (see
Subdirectory annot3 contains
dialogues that have been annotated for the solution size. The
for annotating the solution size are included in the annot3
The other two directories, inventory and instructions, give
information needed to interpret the dialogues. This is the information
that was given to the players. Each dialogue file in the corpus points
to the appropriate inventory files.
See the COCONUT
project webpage for additional background on the COCONUT corpus and
To access the material described above go here.