We study how best to schedule scans of large data files, in the presence of many simultaneous requests to a common set of files. The objective is to maximize the overall rate of processing these files, by sharing scans of the same file as aggressively as possible, without imposing undue wait time on individual jobs. This scheduling problem arises in batch data processing environments such as Map-Reduce systems, some of which handle tens of thousands of processing requests daily, over a shared set of files.