what is

New Brown Corpus?


a richly grounded dataset inspired by child language acquisition

  • simultaneous visual, spatiotemporal, and audio data
  • more like child-directed speech than other datasets
  • recorded in VR, which allows semi-realistic motion and interaction
  • 6 kitchen configurations across 2 distinct visual styles
  • relatively small: 18k words recorded across 108 sessions
  • unlabeled: no ground truth labels or segmentation
available for download on github

Spatiotemporal data

Spatial parameters are recorded every frame for the head, hands, and each object in the scene.

parameter (type) description
pos (xyz) absolute cartesian position of object center
rot (xyzw) absolute quaternion rotation of object
vel (xyz) absolute velocity of object center
relPos (xyz) position relative to head
relRot (xyzw) rotation relative to head
relVel (xyz) velocity from frame of reference of the head
bound (xyz) distance from object center to edge of bounding box
inView (bool) whether object is in the participant's field of view

example of y-position data (height) when picking up an apple:

Image data

Images at each timestep are available for download, or can be viewed as videos below.