The Coconut Corpus
The COCONUT Corpus was collected and annotated for the COCONUT
project by
The University of Pittsburgh
Intelligent
Systems Program
There are seven directories for the COCONUT corpus.
The subdirectory
raw contains the dialogues as they were collected and includes
information
about the state of the graphics display. These subdirectories are divided into 2 collections because the data
was collected during two different timeframes about 1 year apart. The
second
collection was simply because we needed more dialogues to analyze. The
interface used in the second collection differs slightly from the first
one. The participants were allowed to manipulate the graphics on their
interfaces at any time during the dialogue. The graphical information
recorded
in the raw dialogues records snapshots of the screens just before and
after
each turn.
Subdirectory units contains
the
same dialogues with the graphic information removed and turns broken up
into utterance units (see the COCONUT-DRI
manual for the definition of utterance units we used).
The
subdirectory
annot1 contains a subset of the dialogues from the unit subdirectory
that
were annotated with the COCONUT-DRI coding scheme
The subdirectory annot2 contains a subset of the COCONUT dialogues have also been annotated with Pam
Jordan's coding scheme for NPs and discourse entity relations (see
the
annotation manual).
Subdirectory annot3 contains
the
dialogues that have been annotated for the solution size. The
instructions
for annotating the solution size are included in the annot3
subdirectory.
The other two directories, inventory and instructions, give
additional
information needed to interpret the dialogues. This is the information
that was given to the players. Each dialogue file in the corpus points
to the appropriate inventory files.
See the COCONUT
project webpage for additional background on the COCONUT corpus and
project.
To access the material described above go here.
1/06/2011