Typical signal-based approaches to extract musical descriptions from audio only have limited precision. A possible explanation is that they do not exploit context, which provides important cues in human cognitive processing of music: e.g. electric guitar is unlikely in 1930s music, children choirs rarely perform heavy metal, etc. We propose an architecture to train a large set of binary classifiers simultaneously, formany differentmusicalmetadata (genre, instrument, mood, etc.), in such a way that correlation between metadata is used to reinforce each individual classifier. The system is iterative: it uses classification decisions it made on some classification problems as new features for new, harder problems; and hybrid: it uses a signal classifier based on timbre similarity to bootstrap symbolic inference with decision trees. While further work is needed, the approach seems to outperform signal-only algorithms by 5% precision on average, and sometimes up to 15% for traditionally difficult problems such as cultural and subjective categories.