<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns="http://purl.org/rss/1.0/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
   xmlns:dcterms="http://purl.org/dc/terms/"

>
<channel rdf:about="http://www.citeulike.org/about">
<pubDate>Sun, 27 Jul 2008 07:20:33 BST</pubDate>


	<title>CiteULike: neteler's cart</title>
	<description>CiteULike: neteler's cart</description>


	<link>http://www.citeulike.org/user/neteler/tag/cart</link>
	<dc:publisher>CiteULike.org</dc:publisher>
	<dc:language>en-gb</dc:language>
	<dc:rights>Copyright &#169; 2004-2008 citeulike.org</dc:rights>
	<items>
    <rdf:Seq>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/1069581"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/585800"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/666326"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/812826"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/812772"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/783245"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/783089"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/782726"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/626306"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/773826"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/773823"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/773821"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/165116"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/770032"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/484090"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/478811"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/neteler/article/482122"/>

	</rdf:Seq>
	</items>
	</channel>


<item rdf:about="http://www.citeulike.org/user/neteler/article/1069581">
    <title>Bias in random forest variable importance measures: Illustrations, sources and a solution</title>
    <link>http://www.citeulike.org/user/neteler/article/1069581</link>
    <description>&lt;i&gt;BMC Bioinformatics, Vol. 8 (25 January 2007), pp. 1-25.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Background: Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. Results: Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. Conclusion: We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research.</description>
    <dc:title>Bias in random forest variable importance measures: Illustrations, sources and a solution</dc:title>

    <dc:creator>Carolin Strobl</dc:creator>
    <dc:creator>Anne-Laure Boulesteix</dc:creator>
    <dc:creator>Achim Zeileis</dc:creator>
    <dc:creator>Torsten Hothorn</dc:creator>
    <dc:identifier>doi:10.1186/1471-2105-8-25</dc:identifier>
    <dc:source>BMC Bioinformatics, Vol. 8 (25 January 2007), pp. 1-25.</dc:source>
    <dc:date>2007-01-26T18:39:59-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>BMC Bioinformatics</prism:publicationName>
    <prism:issn>1471-2105</prism:issn>
    <prism:volume>8</prism:volume>
    <prism:startingPage>1</prism:startingPage>
    <prism:endingPage>25</prism:endingPage>
    <prism:category>cart</prism:category>
    <prism:category>classification</prism:category>
    <prism:category>machine-learning</prism:category>
    <prism:category>randomforest</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/585800">
    <title>Novel methods improve prediction of species distributions from occurrence data</title>
    <link>http://www.citeulike.org/user/neteler/article/585800</link>
    <description>&lt;i&gt;Ecography, Vol. 29, No. 2. (April 2006), pp. 129-151.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.</description>
    <dc:title>Novel methods improve prediction of species distributions from occurrence data</dc:title>

    <dc:creator>Jane Elith</dc:creator>
    <dc:creator>Catherine Graham</dc:creator>
    <dc:creator>Robert Anderson</dc:creator>
    <dc:creator>Miroslav Dudík</dc:creator>
    <dc:creator>Simon Ferrier</dc:creator>
    <dc:creator>Antoine Guisan</dc:creator>
    <dc:creator>Robert Hijmans</dc:creator>
    <dc:creator>Falk Huettmann</dc:creator>
    <dc:creator>John Leathwick</dc:creator>
    <dc:creator>Anthony Lehmann</dc:creator>
    <dc:creator>Jin Li</dc:creator>
    <dc:creator>Lucia Lohmann</dc:creator>
    <dc:creator>Bette Loiselle</dc:creator>
    <dc:creator>Glenn Manion</dc:creator>
    <dc:creator>Craig Moritz</dc:creator>
    <dc:creator>Miguel Nakamura</dc:creator>
    <dc:creator>Yoshinori Nakazawa</dc:creator>
    <dc:creator>Jacob</dc:creator>
    <dc:creator>Townsend Peterson</dc:creator>
    <dc:creator>Steven Phillips</dc:creator>
    <dc:creator>Karen Richardson</dc:creator>
    <dc:creator>Ricardo Scachetti-Pereira</dc:creator>
    <dc:creator>Robert Schapire</dc:creator>
    <dc:creator>Jorge Soberón</dc:creator>
    <dc:creator>Stephen Williams</dc:creator>
    <dc:creator>Mary Wisz</dc:creator>
    <dc:creator>Niklaus Zimmermann</dc:creator>
    <dc:identifier>doi:10.1111/j.2006.0906-7590.04596.x</dc:identifier>
    <dc:source>Ecography, Vol. 29, No. 2. (April 2006), pp. 129-151.</dc:source>
    <dc:date>2006-04-13T15:45:22-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>Ecography</prism:publicationName>
    <prism:issn>0906-7590</prism:issn>
    <prism:volume>29</prism:volume>
    <prism:number>2</prism:number>
    <prism:startingPage>129</prism:startingPage>
    <prism:endingPage>151</prism:endingPage>
    <prism:publisher>Blackwell Publishing</prism:publisher>
    <prism:category>algorithms</prism:category>
    <prism:category>analysis</prism:category>
    <prism:category>biology</prism:category>
    <prism:category>cart</prism:category>
    <prism:category>classification</prism:category>
    <prism:category>distribution_model</prism:category>
    <prism:category>geospatial</prism:category>
    <prism:category>geostatistics</prism:category>
    <prism:category>machine-learning</prism:category>
    <prism:category>modeling</prism:category>
    <prism:category>prediction-error</prism:category>
    <prism:category>presence-absence-models</prism:category>
    <prism:category>presence-only</prism:category>
    <prism:category>presence-only-models</prism:category>
    <prism:category>r_stats</prism:category>
    <prism:category>vegetation</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/666326">
    <title>Induction of Decision Trees</title>
    <link>http://www.citeulike.org/user/neteler/article/666326</link>
    <description>&lt;i&gt;Mach. Learn., Vol. 1, No. 1. (March 1986), pp. 81-106.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.</description>
    <dc:title>Induction of Decision Trees</dc:title>

    <dc:creator>JR Quinlan</dc:creator>
    <dc:identifier>doi:10.1023/A:1022643204877</dc:identifier>
    <dc:source>Mach. Learn., Vol. 1, No. 1. (March 1986), pp. 81-106.</dc:source>
    <dc:date>2006-05-23T16:15:17-00:00</dc:date>
    <prism:publicationYear>1986</prism:publicationYear>
    <prism:publicationName>Mach. Learn.</prism:publicationName>
    <prism:issn>0885-6125</prism:issn>
    <prism:volume>1</prism:volume>
    <prism:number>1</prism:number>
    <prism:startingPage>81</prism:startingPage>
    <prism:endingPage>106</prism:endingPage>
    <prism:publisher>Kluwer Academic Publishers</prism:publisher>
    <prism:category>cart</prism:category>
    <prism:category>classification</prism:category>
    <prism:category>machine-learning</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/812826">
    <title>Tree-based methods</title>
    <link>http://www.citeulike.org/user/neteler/article/812826</link>
    <description>&lt;i&gt;(August 1999), pp. 89-106.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;... see Fielding1999_machine_learning ...</description>
    <dc:title>Tree-based methods</dc:title>

    <dc:creator>John Bell</dc:creator>
    <dc:source>(August 1999), pp. 89-106.</dc:source>
    <dc:date>2006-08-22T15:44:24-00:00</dc:date>
    <prism:publicationYear>1999</prism:publicationYear>
    <prism:startingPage>89</prism:startingPage>
    <prism:endingPage>106</prism:endingPage>
    <prism:publisher>Springer</prism:publisher>
    <prism:category>cart</prism:category>
    <prism:category>ecology</prism:category>
    <prism:category>machine-learning</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/812772">
    <title>Machine Learning Methods for Ecological Applications</title>
    <link>http://www.citeulike.org/user/neteler/article/812772</link>
    <description>&lt;i&gt;(31 August 1999), pp. 1-36.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The last 25 years have seen a tremendous growth in the application of statistical and modelling techniques to ecological problems. This expansion has been accelerated by the increasing availability of software, books and computing power. However, the suitability of some of these approaches to data analysis, in a relatively knowledge-poor discipline such as ecology, can be questioned on grounds of appropriateness and robustness. One reason for these concerns is that many ecological problems are at best poorly defined and most lack algorithmic solutions. Machine learning methods offer the potential for a different approach to these difficult problems. One definition of machine learning is that it is concerned with inducing knowledge from data, where the data could be patterns in a game of chess or patterns in the species composition of natural communities. Unfortunately ecologists have little experience of these relatively recent and novel approaches to understanding data. This is a problem that is made more complex because there is no simple taxonomy of machine learning methods and there are relatively few examples in the mainstream ecological literature to encourage exploration. This is the first text aimed at introducing machine learning methods to a readership of professional ecologists. All but one of the chapters have been written by ecologists and biologists who highlight the application of a particular method to a particular class of problem. Examples include the identification of species, optimal mate choice, predicting species distributions and modelling landscape features. A group of experienced machine learning workers, who have become interested in environmental problems, have written a chapter that demonstrates how machine learning methods can be used to discover equations that describe the dynamic behaviour of ecological systems. The final chapter reviews `real learning', offering the potential for greater dialogue between the biological and machine learning communities.</description>
    <dc:title>Machine Learning Methods for Ecological Applications</dc:title>

    <dc:creator>Alan Fielding</dc:creator>
    <dc:source>(31 August 1999), pp. 1-36.</dc:source>
    <dc:date>2006-08-22T14:32:38-00:00</dc:date>
    <prism:publicationYear>1999</prism:publicationYear>
    <prism:startingPage>1</prism:startingPage>
    <prism:endingPage>36</prism:endingPage>
    <prism:publisher>Springer</prism:publisher>
    <prism:category>cart</prism:category>
    <prism:category>ecology</prism:category>
    <prism:category>machine-learning</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/783245">
    <title>Comparison of statistical methods commonly used in predictive modelling</title>
    <link>http://www.citeulike.org/user/neteler/article/783245</link>
    <description>&lt;i&gt;Journal of Vegetation Science, Vol. 15, No. 2. (2004), pp. 285-292.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Logistic Multiple Regression, Principal Component Regression and Classification and Regression Tree Analysis (CART), commonly used in ecological modelling using GIS, are compared with a relatively new statistical technique, Multivariate Adaptive Regression Splines (MARS), to test their accuracy, reliability, implementation within GIS and ease of use. All were applied to the same two data sets, covering a wide range of conditions common in predictive modelling, namely geographical range, scale, nature of the predictors and sampling method. We ran two series of analyses to verify if model validation by an independent data set was required or cross-validation on a learning data set sufficed. Results show that validation by independent data sets is needed. Model accuracy was evaluated using the area under Receiver Operating Characteristics curve (AUC). This measure was used because it summarizes performance across all possible thresholds, and is independent of balance between classes. MARS and Regression Tree Analysis achieved the best prediction success, although the CART model was difficult to use for cartographic purposes due to the high model complexity.</description>
    <dc:title>Comparison of statistical methods commonly used in predictive modelling</dc:title>

    <dc:creator>J Muñoz</dc:creator>
    <dc:creator>Felicisimo</dc:creator>
    <dc:identifier>doi:10.1658/1100-9233(2004)015[0285:COSMCU]2.0.CO;2</dc:identifier>
    <dc:source>Journal of Vegetation Science, Vol. 15, No. 2. (2004), pp. 285-292.</dc:source>
    <dc:date>2006-08-02T16:24:50-00:00</dc:date>
    <prism:publicationYear>2004</prism:publicationYear>
    <prism:publicationName>Journal of Vegetation Science</prism:publicationName>
    <prism:volume>15</prism:volume>
    <prism:number>2</prism:number>
    <prism:startingPage>285</prism:startingPage>
    <prism:endingPage>292</prism:endingPage>
    <prism:category>cart</prism:category>
    <prism:category>ecology</prism:category>
    <prism:category>machine-learning</prism:category>
    <prism:category>modeling</prism:category>
    <prism:category>roc</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/783089">
    <title>Classification tree methods for analysis of mesoscale distribution of Ixodes ricinus (Acari:Ixodidae) in Trentino, Italian Alps.</title>
    <link>http://www.citeulike.org/user/neteler/article/783089</link>
    <description>&lt;i&gt;Journal of Medical Entomology, Vol. 33, No. 6. (November 1996), pp. 888-893.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Cases of Lyme disease and tick-borne encephalitis were recognized recently in the Province of Trento, Italian Alps. Assessment of areas of potential risk for these tick-borne diseases is carried out by a model based on classification and regression trees (CART), using both discrete and continuous variables. Data on Ixodes ricinus (L.) occurrence resulted from extensive sampling carried out by standard methods in 99 sites over an area of approximately 2,700 km2 in the Province of Trento. A series of environmental parameters were recorded from each site and population densities of roe deer, Capreolus capreolus (L.), were considered. The CART model discriminates 2 variables that appear to have the greatest effect on the mesoscale occurrence of ticks: altitude and geological substratum, with a drastic decrease of tick frequency above an altitude of approximately 1,100 m and on volcanic substrata. The model is effective in identifying the mesoscale areas at greater potential risk, with a relatively low sampling effort.</description>
    <dc:title>Classification tree methods for analysis of mesoscale distribution of Ixodes ricinus (Acari:Ixodidae) in Trentino, Italian Alps.</dc:title>

    <dc:creator>S Merler</dc:creator>
    <dc:creator>C Furlanello</dc:creator>
    <dc:creator>C Chemini</dc:creator>
    <dc:creator>G Nicolini</dc:creator>
    <dc:source>Journal of Medical Entomology, Vol. 33, No. 6. (November 1996), pp. 888-893.</dc:source>
    <dc:date>2006-08-02T15:27:28-00:00</dc:date>
    <prism:publicationYear>1996</prism:publicationYear>
    <prism:publicationName>Journal of Medical Entomology</prism:publicationName>
    <prism:issn>0022-2585</prism:issn>
    <prism:volume>33</prism:volume>
    <prism:number>6</prism:number>
    <prism:startingPage>888</prism:startingPage>
    <prism:endingPage>893</prism:endingPage>
    <prism:category>cart</prism:category>
    <prism:category>disease</prism:category>
    <prism:category>ixodes</prism:category>
    <prism:category>machine-learning</prism:category>
    <prism:category>modeling</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/782726">
    <title>Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction</title>
    <link>http://www.citeulike.org/user/neteler/article/782726</link>
    <description>&lt;i&gt;Ecosystems, Vol. 9, No. 2. (March 2006), pp. 181-199.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The task of modeling the distribution of a large number of tree species under future climate scenarios presents unique challenges. First, the model must be robust enough to handle climate data outside the current range without producing unacceptable instability in the output. In addition, the technique should have automatic search mechanisms built in to select the most appropriate values for input model parameters for each species so that minimal effort is required when these parameters are fine-tuned for individual tree species. We evaluated four statistical models—Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS)—for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model. To test, we applied these techniques to four tree species common in the eastern United States: loblolly pine (Pinus taeda), sugar maple (Acer saccharum), American beech (Fagus grandifolia), and white oak (Quercus alba). When the four techniques were assessed with Kappa and fuzzy Kappa statistics, RF and BT were superior in reproducing current importance value (a measure of basal area in addition to abundance) distributions for the four tree species, as derived from approximately 100,000 USDA Forest Service’s Forest Inventory and Analysis plots. Future estimates of suitable habitat after climate change were visually more reasonable with BT and RF, with slightly better performance by RF as assessed by Kappa statistics, correlation estimates, and spatial distribution of importance values. Although RTA did not perform as well as BT and RF, it provided interpretive models for species whose distributions were captured well by our current set of predictors. MARS was adequate for predicting current distributions but unacceptable for future climate. We consider RTA, BT, and RF modeling approaches, especially when used together to take advantage of their individual strengths, to be robust for predictive mapping and recommend their inclusion in the ecological toolbox.</description>
    <dc:title>Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction</dc:title>

    <dc:creator>Anantha Prasad</dc:creator>
    <dc:creator>Louis Iverson</dc:creator>
    <dc:creator>Andy Liaw</dc:creator>
    <dc:identifier>doi:10.1007/s10021-005-0054-1</dc:identifier>
    <dc:source>Ecosystems, Vol. 9, No. 2. (March 2006), pp. 181-199.</dc:source>
    <dc:date>2006-08-02T13:14:50-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>Ecosystems</prism:publicationName>
    <prism:volume>9</prism:volume>
    <prism:number>2</prism:number>
    <prism:startingPage>181</prism:startingPage>
    <prism:endingPage>199</prism:endingPage>
    <prism:category>cart</prism:category>
    <prism:category>ecology</prism:category>
    <prism:category>epidemiology</prism:category>
    <prism:category>machine-learning</prism:category>
    <prism:category>randomforest</prism:category>
    <prism:category>risk</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/626306">
    <title>Modelling distribution and abundance with presence-only data</title>
    <link>http://www.citeulike.org/user/neteler/article/626306</link>
    <description>&lt;i&gt;Journal of Applied Ecology, Vol. 43, No. 3. (June 2006), pp. 405-412.&lt;/i&gt;</description>
    <dc:title>Modelling distribution and abundance with presence-only data</dc:title>

    <dc:creator>E Pearc</dc:creator>
    <dc:creator>L Jennie</dc:creator>
    <dc:creator>E Boyc</dc:creator>
    <dc:creator>S Mark</dc:creator>
    <dc:identifier>doi:10.1111/j.1365-2664.2005.01112.x</dc:identifier>
    <dc:source>Journal of Applied Ecology, Vol. 43, No. 3. (June 2006), pp. 405-412.</dc:source>
    <dc:date>2006-05-13T23:59:05-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>Journal of Applied Ecology</prism:publicationName>
    <prism:issn>0021-8901</prism:issn>
    <prism:volume>43</prism:volume>
    <prism:number>3</prism:number>
    <prism:startingPage>405</prism:startingPage>
    <prism:endingPage>412</prism:endingPage>
    <prism:publisher>Blackwell Publishing</prism:publisher>
    <prism:category>biology</prism:category>
    <prism:category>cart</prism:category>
    <prism:category>ecology</prism:category>
    <prism:category>machine-learning</prism:category>
    <prism:category>modeling</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/773826">
    <title>Classification and Regression Trees: A Powerful Yet Simple Technique for Ecological Data Analysis</title>
    <link>http://www.citeulike.org/user/neteler/article/773826</link>
    <description>&lt;i&gt;Ecology, Vol. 81, No. 11. (November 2000), pp. 3178-3192.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Classification and regression trees are ideally suited for the analysis of complex ecological data. For such data, we require flexible and robust analytical methods, which can deal with nonlinear relationships, high-order interactions, and missing values. Despite such difficulties, the methods should be simple to understand and give easily interpretable results. Trees explain variation of a single response variable by repeatedly splitting the data into more homogeneous groups, using combinations of explanatory variables that may be categorical and/or numeric. Each group is characterized by a typical value of the response variable, the number of observations in the group, and the values of the explanatory variables that define it. The tree is represented graphically, and this aids exploration and understanding. Trees can be used for interactive exploration and for description and prediction of patterns and processes. Advantages of trees include: (1) the flexibility to handle a broad range of response types, including numeric, categorical, ratings, and survival data; (2) invariance to monotonic transformations of the explanatory variables; (3) ease and robustness of construction; (4) ease of interpretation; and (5) the ability to handle missing values in both response and explanatory variables. Thus, trees complement or represent an alternative to many traditional statistical techniques, including multiple regression, analysis of variance, logistic regression, log-linear models, linear discriminant analysis, and survival models. We use classification and regression trees to analyze survey data from the Australian central Great Barrier Reef, comprising abundances of soft coral taxa (Cnidaria: Octocorallia) and physical and spatial environmental information. Regression tree analyses showed that dense aggregations, typically formed by three taxa, were restricted to distinct habitat types, each of which was defined by combinations of 3-4 environmental variables. The habitat definitions were consistent with known experimental findings on the nutrition of these taxa. When used separately, physical and spatial variables were similarly strong predictors of abundances and lost little in comparison with their joint use. The spatial variables are thus effective surrogates for the physical variables in this extensive reef complex, where information on the physical environment is often not available. Finally, we compare the use of regression trees and linear models for the analysis of these data and show how linear models fail to find patterns uncovered by the trees.</description>
    <dc:title>Classification and Regression Trees: A Powerful Yet Simple Technique for Ecological Data Analysis</dc:title>

    <dc:creator>Glenn De'ath</dc:creator>
    <dc:creator>Katharina Fabricius</dc:creator>
    <dc:source>Ecology, Vol. 81, No. 11. (November 2000), pp. 3178-3192.</dc:source>
    <dc:date>2006-07-25T22:48:05-00:00</dc:date>
    <prism:publicationYear>2000</prism:publicationYear>
    <prism:publicationName>Ecology</prism:publicationName>
    <prism:volume>81</prism:volume>
    <prism:number>11</prism:number>
    <prism:startingPage>3178</prism:startingPage>
    <prism:endingPage>3192</prism:endingPage>
    <prism:category>cart</prism:category>
    <prism:category>ecology</prism:category>
    <prism:category>machine-learning</prism:category>
    <prism:category>modeling</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/773823">
    <title>Classification Tree Methods for Analysis of Mesoscale Distribution of Ixodes ricinus (Acari: Ixodidae) in Trentino, Italian Alps</title>
    <link>http://www.citeulike.org/user/neteler/article/773823</link>
    <description>&lt;i&gt;Journal of Medical Entomology, Vol. 33, No. 6. (June 1996), pp. 888-893.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Cases of Lyme disease and tick borne encephalitis were recently recognized in the province of Trento, Italian Alps. Assessment of areas of potential risk for these tick-borne diseases is carried out by a model based on CART (Classification and Regression Trees), using both discrete and continuous variables. Data on &#60;em&#62; Ixodes ricinus&#60;/em&#62; (L.) occurrence resulted from samplings carried out by standard methods in 99 sites over an area of 2,700 km2 in the Province of Trento. A series of environmental parameters were recorded from each site and population densities of roe deer, &#60;em&#62; Capreolus capreolus&#60;/em&#62; (L.), were considered. The CART model discriminates two variables which appear to have the greatest effect on the mesoscale occurrence of ticks: altitude and geological substratum with drastic decrease of tick frequency above 1,100 m a.s.l. or on volcanic substrata. The model is effective in identifying the mesoscale areas at greater potential risk, with a relatively low sampling effort.</description>
    <dc:title>Classification Tree Methods for Analysis of Mesoscale Distribution of Ixodes ricinus (Acari: Ixodidae) in Trentino, Italian Alps</dc:title>

    <dc:creator>S Merler</dc:creator>
    <dc:creator>C Furlanello</dc:creator>
    <dc:creator>C Chemini</dc:creator>
    <dc:creator>G Nicolini</dc:creator>
    <dc:source>Journal of Medical Entomology, Vol. 33, No. 6. (June 1996), pp. 888-893.</dc:source>
    <dc:date>2006-07-25T22:42:28-00:00</dc:date>
    <prism:publicationYear>1996</prism:publicationYear>
    <prism:publicationName>Journal of Medical Entomology</prism:publicationName>
    <prism:volume>33</prism:volume>
    <prism:number>6</prism:number>
    <prism:startingPage>888</prism:startingPage>
    <prism:endingPage>893</prism:endingPage>
    <prism:category>cart</prism:category>
    <prism:category>ecology</prism:category>
    <prism:category>habitat</prism:category>
    <prism:category>ixodes</prism:category>
    <prism:category>machine-learning</prism:category>
    <prism:category>ticks</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/773821">
    <title>Classification and Regression Trees</title>
    <link>http://www.citeulike.org/user/neteler/article/773821</link>
    <description>&lt;i&gt;(1984)&lt;/i&gt;</description>
    <dc:title>Classification and Regression Trees</dc:title>

    <dc:creator>L Breiman</dc:creator>
    <dc:creator>J Friedman</dc:creator>
    <dc:creator>R Olshen</dc:creator>
    <dc:creator>C Stone</dc:creator>
    <dc:source>(1984)</dc:source>
    <dc:date>2006-07-25T22:39:26-00:00</dc:date>
    <prism:publicationYear>1984</prism:publicationYear>
    <prism:publisher>Wadsworth and Brooks</prism:publisher>
    <prism:category>cart</prism:category>
    <prism:category>machine-learning</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/165116">
    <title>Random Forests</title>
    <link>http://www.citeulike.org/user/neteler/article/165116</link>
    <description>&lt;i&gt;Machine Learning, Vol. 45, No. 1. (2001), pp. 5-32.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of ...</description>
    <dc:title>Random Forests</dc:title>

    <dc:creator>Leo Breiman</dc:creator>
    <dc:identifier>doi:10.1023/A:1010933404324</dc:identifier>
    <dc:source>Machine Learning, Vol. 45, No. 1. (2001), pp. 5-32.</dc:source>
    <dc:date>2005-04-19T18:57:17-00:00</dc:date>
    <prism:publicationYear>2001</prism:publicationYear>
    <prism:publicationName>Machine Learning</prism:publicationName>
    <prism:volume>45</prism:volume>
    <prism:number>1</prism:number>
    <prism:startingPage>5</prism:startingPage>
    <prism:endingPage>32</prism:endingPage>
    <prism:publisher>Kluwer Academic Publishers, Boston</prism:publisher>
    <prism:category>boosting</prism:category>
    <prism:category>cart</prism:category>
    <prism:category>classification</prism:category>
    <prism:category>ensemble</prism:category>
    <prism:category>machine-learning</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/770032">
    <title>GIS, geostatistics, metadata banking, and tree-based models for data analysis and mapping in environmental monitoring and epidemiology</title>
    <link>http://www.citeulike.org/user/neteler/article/770032</link>
    <description>&lt;i&gt;International Journal of Medical Microbiology, Vol. 296, No. Supplement 1. (22 May 2006), pp. 23-36.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;By the example of environmental monitoring, some applications of geographic information systems (GIS), geostatistics, metadata banking, and Classification and Regression Trees (CART) are presented. These tools are recommended for mapping statistically estimated hot spots of vectors and pathogens. GIS were introduced as tools for spatially modelling the real world. The modelling can be done by mapping objects according to the spatial information content of data. Additionally, this can be supported by geostatistical and multivariate statistical modelling. This is demonstrated by the example of modelling marine habitats of benthic communities and of terrestrial ecoregions. Such ecoregionalisations may be used to predict phenomena based on the statistical relation between measurements of an interesting phenomenon such as, e.g., the incidence of medically relevant species and correlated characteristics of the ecoregions. The combination of meteorological data and data on plant phenology can enhance the spatial resolution of the information on climate change. To this end, meteorological and phenological data have to be correlated. To enable this, both data sets which are from disparate monitoring networks have to be spatially connected by means of geostatistical estimation. This is demonstrated by the example of transformation of site-specific data on plant phenology into surface data. The analysis allows for spatial comparison of the phenology during the two periods 1961-1990 and 1991-2002 covering whole Germany. The changes in both plant phenology and air temperature were proved to be statistically significant. Thus, they can be combined by GIS overlay technique to enhance the spatial resolution of the information on the climate change and use them for the prediction of vector incidences at the regional scale. The localisation of such risk hot spots can be done by geometrically merging surface data on promoting factors. This is demonstrated by the example of the transfer of heavy metals through soils. The predicted hot spots of heavy metal transfer can be validated empirically by measurement data which can be enquired by a metadata base linked with a geographic information system. A corresponding strategy for the detection of vector hot spots in medical epidemiology is recommended. Data on incidences and habitats of the Anophelinae in the marsh regions of Lower Saxony (Germany) were used to calculate a habitat model by CART, which together with climate data and data on ecoregions can be further used for the prediction of habitats of medically relevant vector species. In the future, this approach should be supported by an internet-based information system consisting of three components: metadata questionnaire, metadata base, and GIS to link metadata, surface data, and measurement data on incidences and habitats of medically relevant species and related data on climate, phenology, and ecoregional characteristic conditions.</description>
    <dc:title>GIS, geostatistics, metadata banking, and tree-based models for data analysis and mapping in environmental monitoring and epidemiology</dc:title>

    <dc:creator>Winfried Schröder</dc:creator>
    <dc:identifier>doi:10.1016/j.ijmm.2006.02.015</dc:identifier>
    <dc:source>International Journal of Medical Microbiology, Vol. 296, No. Supplement 1. (22 May 2006), pp. 23-36.</dc:source>
    <dc:date>2006-07-23T12:02:55-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>International Journal of Medical Microbiology</prism:publicationName>
    <prism:volume>296</prism:volume>
    <prism:number>Supplement 1</prism:number>
    <prism:startingPage>23</prism:startingPage>
    <prism:endingPage>36</prism:endingPage>
    <prism:category>cart</prism:category>
    <prism:category>geostatistics</prism:category>
    <prism:category>gis</prism:category>
    <prism:category>habitat</prism:category>
    <prism:category>risk</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/484090">
    <title>Flächenhafte Schätzung mit Classification and Regression Trees und robuste Gütebestimmung ökologischer Parameter in einem kleinen Einzugsgebiet</title>
    <link>http://www.citeulike.org/user/neteler/article/484090</link>
    <description>&lt;i&gt;(2002)&lt;/i&gt;</description>
    <dc:title>Flächenhafte Schätzung mit Classification and Regression Trees und robuste Gütebestimmung ökologischer Parameter in einem kleinen Einzugsgebiet</dc:title>

    <dc:creator>MP Schillinger</dc:creator>
    <dc:source>(2002)</dc:source>
    <dc:date>2006-01-28T17:01:38-00:00</dc:date>
    <prism:publicationYear>2002</prism:publicationYear>
    <prism:category>cart</prism:category>
    <prism:category>classification</prism:category>
    <prism:category>jacknife</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/478811">
    <title>Random Forests</title>
    <link>http://www.citeulike.org/user/neteler/article/478811</link>
    <description>&lt;i&gt;Machine Learning, Vol. 45 (2001), pp. 5-32.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund &#38; R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.</description>
    <dc:title>Random Forests</dc:title>

    <dc:creator>L Breiman</dc:creator>
    <dc:source>Machine Learning, Vol. 45 (2001), pp. 5-32.</dc:source>
    <dc:date>2006-01-24T15:46:04-00:00</dc:date>
    <prism:publicationYear>2001</prism:publicationYear>
    <prism:publicationName>Machine Learning</prism:publicationName>
    <prism:volume>45</prism:volume>
    <prism:startingPage>5</prism:startingPage>
    <prism:endingPage>32</prism:endingPage>
    <prism:category>cart</prism:category>
    <prism:category>classification</prism:category>
    <prism:category>machine-learning</prism:category>
    <prism:category>randomforest</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/neteler/article/482122">
    <title>Habitat suitability modelling for red deer (Cervus elaphus L.) in South-central Slovenia with classification trees</title>
    <link>http://www.citeulike.org/user/neteler/article/482122</link>
    <description>&lt;i&gt;Ecological Modelling, Vol. 138, No. 1-3. (15 March 2001), pp. 321-330.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;We study and assess the potential habitats of a population of red deer in South-central Slovenia. Using existing data on the deer population spatial distribution and size, as well as data on the landscape and ecological properties (GIS) of the area inhabited by this population, we develop a habitat suitability model by automated data analysis using machine learning of classification trees. We assume that the recorded observations of deer approximate the actual spatial distribution of the deer population reasonably well. The habitat suitability models for individual animals have the form of classification trees. The induced trees are interpreted by domain experts and a generic model is proposed. The generic habitat suitability models can help determine potential unoccupied habitats for the red deer population and develop guidelines for managing the development of the red deer population and its influence on the environment.</description>
    <dc:title>Habitat suitability modelling for red deer (Cervus elaphus L.) in South-central Slovenia with classification trees</dc:title>

    <dc:creator>Marko Debeljak</dc:creator>
    <dc:creator>Saso Dzeroski</dc:creator>
    <dc:creator>Klemen Jerina</dc:creator>
    <dc:creator>Andrej Kobler</dc:creator>
    <dc:creator>Miha Adamic</dc:creator>
    <dc:identifier>doi:10.1016/S0304-3800(00)00411-7</dc:identifier>
    <dc:source>Ecological Modelling, Vol. 138, No. 1-3. (15 March 2001), pp. 321-330.</dc:source>
    <dc:date>2006-01-26T23:19:27-00:00</dc:date>
    <prism:publicationYear>2001</prism:publicationYear>
    <prism:publicationName>Ecological Modelling</prism:publicationName>
    <prism:volume>138</prism:volume>
    <prism:number>1-3</prism:number>
    <prism:startingPage>321</prism:startingPage>
    <prism:endingPage>330</prism:endingPage>
    <prism:category>cart</prism:category>
    <prism:category>ecology</prism:category>
    <prism:category>habitat</prism:category>
    <prism:category>machine-learning</prism:category>
    <prism:category>red_deer</prism:category>
</item>



</rdf:RDF>

