On sampling and modeling complex systems
The study of complex systems often has to deal with the fact that only few relevant variables are accessible for modeling and sampling. In addition, empirical data are often in the strongly under sampling regime. We discuss the consequences of this using generic information theoretic and statistical mechanics arguments. Our arguments suggest that models can be predictable only when the number of relevant variables is less than a critical threshold. Within our framework, the under sampling regime can be distinguished from the regime where the sample becomes informative of the system. In the under sampling regime, typical frequency size distributions have power law behavior. The most probable frequency distribution coincides with Zipf's law, which emerges at the crossover between the under sampled regime and the regime where the sample contains enough statistics to make inference on the behavior of the system.