We investigated whether there are central capacity limitations on consolidating information in visual short-term memory (VSTM). Subjects performed a visual memory task (deciding if two displays were the same or different) and a speeded tone pitch discrimination. When the tasks were performed concurrently, interference was observed in both tasks, suggesting that VSTM consolidation requires central resources. The cost of consolidating information in VSTM (as indexed by tone task performance) did not change as the number of to-be-consolidated items was increased, in contrast to previous work that used verbal materials (Jolicœur & Dell'Acqua, 1998). Results from Experiments 2 and 3 suggest that this apparent difference reflected a contribution from verbal coding in previous studies. Experiments 1 and 4 provided evidence that the capacity limitations had a central locus and were not merely due to preparing for a secondary task.