In our NIPS 2008 paper, A Bayesian
framework
for cross-situational word learning, we annotated two files from the
CHILDES corpus (following Yu & Ballard, 2007) to create a corpus
for use with our model.
Source files of objects and words (aligned by line number)
objects.txt
words.txt
Pseudo-CHAT annotated version of the files
The %ref tier refers to objects that are present at the time of the
utterance. The %soc tier contains seven tags and the object in the
%ref tier that each one refers to: ih (infant's hands), im (infant's
mouth), ie (infant's eyes), it (infant touching otherwise), ch
(caregiver's hands), ce (caregiver's eyes), and ct (caregiver touching
otherwise).
me03.txt
di06.txt
Gold standard (the human created lexicon used in our NIPS paper)
baby = baby
bear = bear
bigbird = bird
bigbirds = bird
bird = bird
book = book
books = book
bunny = bunny
bunnyrabbit = bunny
cow = cow
cows = cow
moocow = cow
moocows = cow
duck = duck
duckie = duck
eyes = eyes
hand = hand
hat = hat
kitty = kitty
kittycat = kitty
kittycats = kitty
lamb = lamb
lambie = lamb
mirror = mirror
pig = pig
piggie = pig
piggies = pig
rattle = rattle
ring = ring
rings = ring
sheep = sheep
oink = pig
bunnies = bunny
meow = kitty
birdie = duck
bird = duck
hiphop = bunny