| |
Abstract
We present a method, fastIBD, for finding tracts of identity by descent (IBD) between pairs of individuals. FastIBD can be applied to thousands of samples across genome-wide SNP data and is significantly more powerful for finding short tracts of IBD than existing methods for finding IBD tracts in such data. We show that fastIBD can detect facets of population structure that are not revealed by ...
|
| |
In Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics (June 2006)
Abstract
We discuss Bayesian approaches to multiple comparison problems, using a decision theoretic perspective to critically compare competing approaches. We set up decision problems that lead to the use of FDR-based rules and generalizations. Alternative definitions of the probability model and the utility function lead to different rules and problem-specific adjustments. Using a loss function that controls realized FDR we derive an optimal Bayes rule that is a variation of the Benjamini and Hochberg (1995) procedure. The cutoff is based on increments in ordered posterior probabilities instead of ordered ...
Note (first note only)
- i=1...n tests
- d_i = 1 means "reject i-th test"
- D = sum_i d_i is the sum of rejections (i.e. the nb of discoveries)
- r_i = 0 means "truth of i-th test is H0"
- FDR = (sum_i (1-r_i) d_i) / D
|
| |
Abstract
Previous study of the time to a common ancestor of all present-day individuals has focused on models in which each individual has just one parent in the previous generation. For example, `mitochondrial Eve' is the most recent common ancestor (MRCA) when ancestry is defined only through maternal lines. In the standard Wright-Fisher model with population size n, the expected number of generations to the MRCA is about 2n, and the standard deviation of this time is also of order n. Here ...
Note (first note only)
- In (discrete-time) Wright-Fisher model, with population size n, the expected number of generations to the MRCA is about 2n, and the standard deviation of this time is also of order n.
- e.g. human mtDNA indicates an MRCA (dubbed "Eve") who lived 100,000 to 200,000 years ago
- In a two-parent analog of the Wright-Fisher model, the nb of generations to MRCA tightly concentrates around log2(n) when n->inf. E.g. log2(10^6)=20.
|
| |
Abstract
If a common ancestor of all living humans is defined as an individual who is a genealogical ancestor of all present-day people, the most recent common ancestor (MRCA) for a randomly mating population would have lived in the very recent past. However, the random mating model ignores essential aspects of population substructure, such as the tendency of individuals to choose mates from the same social ...
Note (first note only)
- see this paper of 1999 for mathematical arguments
- main message: "substantial forms of population subdivision can still be compatible with very recent common ancestors"
|
| |
Abstract
The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (in the Population Reference Sample [POPRES] dataset) to conduct one of the first surveys of recent genealogical ancestry over the past ...
Note (first note only)
- key notion: distinction between genealogical ancestor and genetic ancestor
- IBD block: a contiguous segment of genome inherited (on at least one chromosome) from a shared common ancestor without intervening recombination
- with this definition, everyone is IBD everywhere, but mostly on very short, old segments; thus focus
|
| |
(May 1997)
posted to philosophy politics statistics
by timflutre
on 2013-05-15 21:49:03
Abstract
The following text is divided in four chapters. Each of the chapters has its own and unique place in the argumentative structure of this thesis. Chapter 1 is the engine of the thesis: it provides the necessary background information and develops the notions and concepts that will be of importance in this dissertation. In order to make the argument lift off, two wings are needed. Chapter 2 is the morally normative wing, whereas Chapter 3 constitutes the technically normative wing of ...
Note (first note only)
- A coincidence is an a posteriori recognition of a synchronic occurrence of two independent --- or partly conflicting --- events, whereas a chance or probability is a matter of a priori anticipation.
- Economic theory, in general, is at best a set of tautologies that can function as a descriptive model, but never as a set of prescriptive rules for distributive justice.
|
| |
Abstract
Two clichés of science journalism have now played out around the ENCODE project. ENCODE’s publicity first presented a misleading “all the textbooks are wrong” narrative about noncoding human DNA. Now several critiques of ENCODE’s narrative have been published, and one was so vitriolic that it fueled “undignified academic squabble” stories that focused on tone more than substance. Neither story line does justice to our actual understanding of genomes, to ENCODE’s results, or to the role of big science in biology. ...
Note (first note only)
- There are three categories of big science: the big experiment, the map, and the leading wedge.
- A big experiment is driven by a single question or hypothesis test, but requires a large scale community investment.
- A map is a data resource — comprehensive, complete, closed ended — to be used by multiple groups, over a long time, for multiple purposes.
- A leading wedge is a
|
| |
(13 Apr 2006)
Abstract
In this report we examine the change in citation behavior since the introduction of the arXiv e-print repository (Ginsparg, 2001). It has been observed that papers that initially appear as arXiv e-prints get cited more than papers that do not (Lawrence, 2001; Brody et al., 2004; Schwarz & Kennicutt, 2004; Kurtz et al., 2005a, Metcalfe, 2005). Using the citation statistics from the NASA-Smithsonian Astrophysics Data System (ADS; Kurtz et al., 1993, 2000), we confirm the findings from other studies, we examine the average citation rate to e-printed papers ...
|
| |
Abstract
Mapping expression Quantitative Trait Loci (eQTLs) represents a powerful and widely adopted approach to identifying putative regulatory variants and linking them to specific genes. Up to now eQTL studies have been conducted in a relatively narrow range of tissues or cell types. However, understanding the biology of organismal phenotypes will involve understanding regulation in multiple tissues, and ongoing studies are collecting eQTL data in dozens of cell types. Here we present a statistical framework for powerfully detecting eQTLs in multiple tissues ...
|
| |
(11 August 2008)
Abstract
Even bad code can function. But if code isn't clean, it can bring a development organization to its knees. Every year, countless hours and significant resources are lost because of poorly written code. But it doesn't have to be that way. Noted software expert Robert C. Martin presents a revolutionary paradigm with _**Clean Code: A Handbook of Agile Software Craftsmanship**_. Martin has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code â? on ...
|
| |
In Bayesian Statistics 7 (2003), pp. 453-462
Abstract
We present an efficient procedure for estimating the marginal likelihood of probabilistic models with latent variables or incomplete data. This method constructs and optimises a lower bound on the marginal likelihood using variational calculus, resulting in an iterative algorithm which generalises the EM algorithm by maintaining posterior distributions over both latent variables and parameters. We define the family of conjugate-exponential models—which includes finite mixtures of exponential family models, factor analysis, hidden Markov models, linear state-space models, and other models of interest—for ...
|
| |
Abstract
Summary. The paper considers the problem of multiple testing under dependence in a compound decision theoretic framework. The observed data are assumed to be generated from an underlying two-state hidden Markov model. We propose oracle and asymptotically optimal data-driven procedures that aim to minimize the false non-discovery rate FNR subject to a constraint on the false discovery rate FDR. It is shown that the performance of a multiple-testing procedure can be substantially improved by adaptively exploiting the dependence structure among hypotheses, ...
Note (first note only)
- The FDR procedures that are developed under the independence assumption, even valid, may suffer from substantial loss of efficiency when the dependence structure is highly informative.
- our procedure is built on a new test statistic, the local index of significance (LIS)
- For independent tests, when determining the level of significance of a hypothesis, a p-value approach considers each hypothesis separately, whereas an Lfdr approach considers the m hypotheses simultaneously by incorporating
|
| |
posted to multiple_testing statistics
by timflutre
on 2013-05-02 04:05:12
Abstract
We develop a compound decision theory framework for multiple-testing problems and derive an oracle rule based on the z values that minimizes the false nondiscovery rate (FNR) subject to a constraint on the false discovery rate (FDR). We show that many commonly used multiple-testing procedures, which are p value?based, are inefficient, and propose an adaptive procedure based on the z values. The z value?based adaptive procedure asymptotically attains the performance of the z value oracle procedure and is more efficient than ...
Note (first note only)
- false discovery rate: FDR = E(N_10/R | R>0) x Pr(R>0)
- positive FDR: pFDR = E(N_10/R | R>0)
- marginal FDR: mFDR = E(N_10) / E(R)
- pFDR and mFDR are equivalent when test statistics come from a mixture of the null and nonnull distributions (Storey 2003)
- mFDR is equal to FDR + remaining of order m^-0.5 (Genovese and Wasserman 2002)
|
| |
(08 March 2011)
posted to book statistics
by timflutre
on 2013-04-25 05:07:31
|
| |
posted to philosophy
by timflutre
on 2013-04-23 04:05:15
Note (first note only)
Un raisonnement absurde
L’absurde et le suicide
Il n’y a qu’un problème philosophique vraiment sérieux: c’est le suicide. Juger que la vie vaut ou ne vaut pas la peine d’être vécue, c’est répondre à la question fondamentale de la philosophie.
Commencer à penser, c’est commencer d’être miné.
Vivre, naturellement, n’est jamais facile. On continue à faire les gestes que l’existence commande, pour beaucoup de raisons dont la première est l’habitude. Mourir volontairement suppose qu’on a reconnu, même instinctivement, le caractère dérisoire de cette habitude, l’absence
|
| |
Abstract
With incomplete lineage sorting (ILS), the genealogy of closely related species differs along their genomes. The amount of ILS depends on population parameters such as the ancestral effective population sizes and the recombination rate, but also on the number of generations between speciation events. We use a hidden Markov model parameterized according to coalescent theory to infer the genealogy along a four-species genome alignment of closely related species and estimate population parameters. We analyze a basic, panmictic demographic model and study ...
|
| |
Abstract
In “Antifragile” [1], Taleb provides a fresh perspective on how one may gain from disorder. In this short note, we formalize and unify in a single premium (a schematic view of) the concavity/convexity and conflation effects described by Taleb. We show that this premium relies on a generalization of a well-known class of distortion measures of information geometry, namely Bregman divergences. We exemplify some properties of this premium, and discuss them in the light of “Antifragile” [1]. ...
Note (first note only)
- Taleb: things that gain from disorder/uncertainty/variability are antifragile
- we spend too much time and resources trying to predict (unpredictable) outcomes -> what matters is not the uncertain outcome x but how it affects us, i.e. our response function f(x)
- conflation error: confusing x for f(x), or E[x] for f(E[x])
- convexity bias: if f is convex, we gain from disorder: E[f(x)] - f(E[x]) > 0
|
| |
Abstract
The probability of disease development in a defined time period is described by a logistic regression model. A model for the regression variable, given disease status, is induced and is applied to case-control data. It is shown that the odds ratio estimators and their asymptotic variance matrices may be obtained by applying the original logistic regression model to the case-control study as if the data had been obtained in a prospective study. This result gives a flexible and convenient method of ...
Note (first note only)
- a prospective study follows a cohort over time, and at the end we can write p(Disease | z), but it's hard to do with rare disease in humans because it requires a large sample size
- a retrospective study samples cases and controls and we write p(z | Disease): although the full prospective model can't be estimated from case-control data alone, the odds ratio can
- they specify a prospective model which
|
| |
posted to algorithm statistics
by timflutre
on 2013-04-12 16:06:32
Abstract
The expectation-maximization (EM) algorithm is a popular approach for obtaining maximum likelihood estimates in incomplete data problems because of its simplicity and stability (e.g. monotonic increase of likelihood). However, in many applications the stability of EM is attained at the expense of slow, linear convergence. We have developed a new class of itera- tive schemes, called squared iterative methods (SQUAREM), to accelerate EM, without compromising on simplicity and stability. SQUAREM generally achieves superlinear convergence in problems with a large fraction of ...
|
| |
(26 December 2001)
Abstract
"C. R. Rao would be found in almost any statistician's list of five outstanding workers in the world of Mathematical Statistics today. His book represents a comprehensive account of the main body of results that comprise modern statistical theory."<BR> -W. G. Cochran <P>"[C. R. Rao is] one of the pioneers who laid the foundations of statistics which grew from ad hoc origins into a firmly grounded mathematical science."<BR> -B. ...
Note (first note only)
1 Algebra of vectors and matrices
2 Probability theory, tools and techniques
3 Continuous probability models
4 The theory of least-squares and analysis of variance
5 Criteria and methods of estimation
6 Large sample theory and methods
7 Theory of statistical inference
8 Multivariate analysis
|
| |

(01 August 1989)
Abstract
The success of the first edition of Generalized Linear Models led to the updated Second Edition, which continues to provide a definitive unified, treatment of methods for the analysis of diverse types of data. Today, it remains popular for its clarity, richness of content and direct relevance to agricultural, biological, health, engineering, and other applications.The authors focus on examining the way a response variable depends on a combination of explanatory variables, treatment, and classification variables. They give particular emphasis to the ...
Note (first note only)
1 Introduction
2 An outline of generalized linear models
- exponential family distribution: exp[(yθ - b(θ)) / a(φ) + c(y,θ)] where θ is canonical param, φ=σ2 is dispersion (=1/ν precision)
- in fact the exp family is a weighted family of distributions with exp(θy) as weight function for base density f0 and, as a consequence, function b happens to be the cumulant of f0
- calculate the first and second derivatives of log-likelihood w.r.t.
|
| |
|
| |
posted to imputation jc
by timflutre
on 2013-04-01 17:36:28
Abstract
Recently-developed genotype imputation methods are a powerful tool for detecting untyped genetic variants that affect disease susceptibility in genetic association studies. However, existing imputation methods require individual-level genotype data, whereas, in practice, it is often the case that only summary data are available. For example, this may occur because, for reasons of privacy or politics, only summary data are made available to the research community at large; or because only summary data are collected, as in DNA pooling experiments. In this ...
Note (first note only)
- y is a vector of allele frequencies at P SNPs, assumed to follow MVN(mu,Sigma) conditionally on panel data M
- mu and Sigma are parametrized in terms of the panel as well as something like mutation rate and recombination rate
- y can be partitioned into typed and untyped, and thus the distribution of untyped is available conditionally on typed and M, but it requires inverting Sigma
- every entry of
|
| |
(8 November 2010)
posted to linear_algebra statistics
by timflutre
on 2013-03-30 16:41:29
Abstract
This handout details all the mathematical formulas behind the multivariate linear regression (estimators, covariance, tests). ...
|
| |
posted to a3 meta_analysis statistics
by timflutre
on 2013-03-22 18:06:29
Abstract
Global expression analyses using microarray technologies are becoming more common in genomic research, therefore, new statistical challenges associated with combining information from multiple studies must be addressed. In this paper we will describe our proposal for an adaptively weighted (AW) statistic to combine multiple genomic studies for detecting differentially expressed genes. We will also present our results from comparisons of our proposed AW statistic to Fisher’s equally weighted (EW), Tippett’s minimum p-value (minP) and Pearson’s (PR) statistics. Due to the absence ...
Note (first note only)
- meta-analysis of K studies, e.g. gene expression between controls and cases
- global null hypothesis for each gene g: θ_g1 = ... = θ_gK = 0 where θ_gk represents the gene effect of gene g and study k
- 2 possible alternative hypotheses: HA -> all θ_gk != 0 ; or HB -> at least one θ_gk != 0
- the set of significant genes under HB may represent experimental and
|
| |
Abstract
Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading im- putation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while ...
Note (first note only)
- matrix X with m individuals in row and n sorted SNPs in columns: fill the missing values (individual entries, or full rows/columns)
- for a given a region (set of SNPs), we expect a small nb of haplotypes, thus we want to find a low-rank matrix approximating X well
- find the rank of the imputed matrix Z that minimizes the Frobenius norm between X and Z
- in order to
|
| |
by Samsiddhi Bhattacharjee, Preetha Rajaraman, Kevin B. Jacobs, et al.William A. Wheeler, Beatrice S. Melin, Patricia Hartge, GliomaScan Consortium, Meredith Yeager, Charles C. Chung, Stephen J. Chanock, Nilanjan Chatterjee
posted to a3 gwas meta_analysis
by timflutre
on 2013-03-17 19:39:20
Abstract
Pooling genome-wide association studies (GWASs) increases power but also poses methodological challenges because studies are often heterogeneous. For example, combining GWASs of related but distinct traits can provide promising directions for the discovery of loci with small but common pleiotropic effects. Classical approaches for meta-analysis or pooled analysis, however, might not be suitable for such analysis because individual variants are likely to be associated with ...
Note (first note only)
- they specifically speak of GWAS (i.e. not eQTL mapping)
- for each SNP, run a linear regression in each study, get estimates of effect size and std error, and get the Z score
- compute a Z score over all subsets of studies (with appropriate weights) and assess evidence against the global null using the max of these Z scores
- they can handle correlations due to same individuals in different
|
| |
Abstract
Most approaches for analyzing ChIP-Seq data are focused on inferring exact protein binding sites from a single library. However, frequently multiple ChIP-Seq libraries derived from differing cell lines or tissue types from the same individual may be available. In such a situation, a separate analysis for each tissue or cell line may be inefficient. Here, we describe a novel method to analyze such data that ...
Note (first note only)
Single library
- Y_i is log(ChIP-seq reads) around gene i -> does it depend on log(gene length), X_i1, and GC content at promoter, X_i2?
- Y_i = (1-Z_i)(b_0 + b_1X_i1 + b_2X_i1^2) + Z_i(b'_0 + b'_1X_i1 + b'_2X_i1^2) + b_3X_i2 + e_i
- pi_i = P(Z_i=1) -> proba that gene is methylated ; pi = sum_i pi_i
- fit model with EM algorithm
Multiple libraries
- jointly analyze all K libraries to
|
| |

(2002)
Abstract
Generalized linear models provide a unified theoretical and conceptual framework for many of the most commonly used statistical methods. In the ten years since publication of the first edition of this bestselling text, great strides have been made in the development of new methods and in software for generalized linear models and other closely related models.Thoroughly revised and updated, An Introduction to Generalized Linear Models, Second Edition continues to initiate intermediate students of statistics, and the many other disciplines that use ...
Note (first note only)
3 Exponential Family and Generalized Linear Models
- f(y; θ) = s(y)t(θ)exp[a(y)b(θ)] = exp[a(y)b(θ) + c(θ) + d(y)]
- link function, canonical form, natural parameter b(θ), nuisances parameters (other than θ), overdispersion, score statistics (first derivative of loglik: expectation is 0, variance is called information and is also second derivative of loglik)
4 Estimation
- iterative reweighted least squares (IRLS)
9 Count Data, Poisson Regression and Log-Linear Models
11 Clustered and Longitudinal Data
|
| |
by Jennifer K. Lowe, Julian B. Maller, Itsik Pe'er, et al.Benjamin M. Neale, Jacqueline Salit, Eimear E. Kenny, Jessica L. Shea, Ralph Burkhardt, J. Gustav Smith, Weizhen Ji, Martha Noel, Jia N. Foo, Maude L. Blundell, Vita Skilling, Laura Garcia, Marcia L. Sullivan, Heather E. Lee, Anna Labek, Hope Ferdowsian, Steven B. Auerbach, Richard P. Lifton, Christopher Newton-Cheh, Jan L. Breslow, Markus Stoffel, Mark J. Daly, David M. Altshuler, Jeffrey M. Friedman
Abstract
It has been argued that the limited genetic diversity and reduced allelic heterogeneity observed in isolated founder populations facilitates discovery of loci contributing to both Mendelian and complex disease. A strong founder effect, severe isolation, and substantial inbreeding have dramatically reduced genetic diversity in natives from the island of Kosrae, Federated States of Micronesia, who exhibit a high prevalence of obesity and other metabolic disorders. We hypothesized that genetic drift and possibly natural selection on Kosrae might have increased the frequency ...
Note (first note only)
- paper from 2009 having problems with high genetic relatedness and high number of markers, would be interesting to do again with LMM
|
| |
posted to ethics miscellanee politics
by timflutre
on 2013-02-28 19:32:13
 /  /
Abstract
Values intersect with science in three primary ways. First, there are values, particularly epistemic values, which guide scientific research itself. Second, the scientific enterprise is always embedded in some particular culture and values enter science through its individual practitioners, whether consciously or not. Finally, values emerge from science, both as a product and process, and may be redistributed more broadly in the culture or society. Also, scientific discoveries may pose new social challenges about values, though the values themselves may be ...
|
| |
(22 April 2009)
posted to algorithm statistics tutorial
by timflutre
on 2013-02-27 22:31:22
 /  /
Abstract
Principal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used but (sometimes) poorly understood. The goal of this paper is to dispel the magic behind this black box. This manuscript focuses on building a solid intuition for how and why principal component analysis works. This manuscript crystallizes this knowledge by deriving from simple intuitions, the mathematics behind PCA. This tutorial does not shy away from explaining the ideas informally, nor does it ...
Note (first note only)
- data: n samples, each of m measurement types -> gather into matrix X which is m x n
- thus initially, the data are in the natural basis (basis of the observations) but we want to find a new basis allowing us to discard useless and redundant measurements
- we want a new matrix such as PX=Y, i.e. the rows of P are the new basis for the columns of X
|
| |
(2009)
posted to book programming
by timflutre
on 2013-02-21 04:15:05
Abstract
With the new C Standard and Technical Report 2 (TR2), multi-threading is coming to C in a big way. TR2 will provide higher-level synchronization facilities that allow for a much greater level of abstraction, and make programming multi-threaded applications simpler and safer. As a guide and reference to the new concurrency features in the upcoming C Standard and TR2, this book is invaluable for existing programmers familiar with writing multi-threaded code in C using platform-specific APIs, or in other languages, as well ...
|
| |
(2013)
posted to agriculture law
by timflutre
on 2013-02-14 00:14:01
|
| |
|
| |
Abstract
This paper studies summary measures of the predictive power of a generalized linear model, paying special attention to a generalization of the multiple correlation coefficient from ordinary linear regression. The population value is the correlation between the response and its conditional expectation given the predictors, and the sample value is the correlation between the observed response and the model predicted value. We compare four estimators of the measure in terms of bias, mean squared error and behaviour in the presence of ...
Note (first note only)
- 2 random variables: univariate response Y and (possibly multivariate) predictor X
- N samples (X_i,Y_i) -> MLE of E(Y|X): \hat{Y}
- if single predictor: cor(Y, E(Y|X)) = |beta| \sqrt(Var(X) / Var(Y))
- the proportional relationship between beta and cor(Y, E(Y|X)) for univariate X does not hold for an arbitrary GLM, although one can show (B. Zheng, unpublished dissertation, 1997) that an approximate relationship of this type exists when beta is close
|
| |
by Marie-Agnès Dillies, Andrea Rau, Julie Aubert, et al.Christelle Hennequet-Antier, Marine Jeanmougin, Nicolas Servant, Céline Keime, Guillemette Marot, David Castel, Jordi Estelle, Gregory Guernec, Bernd Jagla, Luc Jouneau, Denis Laloë, Caroline Le Gall, Brigitte Schaëffer, Stéphane Le Crom, Mickaël Guedj, Florence Jaffrézic
Abstract
During the last 3 years, a number of approaches for the normalization of RNA sequencing data have emerged in the literature, differing both in the type of bias adjustment and in the statistical strategy adopted. However, as data continue to accumulate, there has been no clear consensus on the appropriate normalization method to be used or the impact of a chosen method on the downstream analysis. In this work, we focus on a comprehensive comparison of seven recently proposed normalization methods ...
Note (first note only)
- in their most difficult case (different library size and few high-count genes), only DESeq and TMM (from edgeR) are able to control the false-positive rate while also maintaining the power to detect differentially expressed genes
- the authors restricted themselves to evaluating differential expression, thus they looked at neither GC content nor gene length
|
| |
Abstract
Estimation of narrow-sense heritability, h2, from genome-wide SNPs genotyped in unrelated individuals has recently attracted interest and offers several advantages over traditional pedigree-based methods. With the use of this approach, it has been estimated that over half the heritability of human height can be attributed to the ∼300,000 SNPs on a genome-wide genotyping array. In comparison, only 5%–10% can be explained by SNPs reaching genome-wide significance. We investigated via simulation the validity of several key assumptions underpinning the mixed-model analysis used ...
|
| |
(November 2012)
Note (first note only)
- liberalism: protecting property and civil liberties, promoting individual autonomy and tolerance, securing a free press, ruling through limited government and universal law, and preserving a commitment to equal opportunity and meritocracy
- hackers: computer afcionados driven by an inquisitive passion for tinkering and learning technical systems, and frequently committed to an ethical version of information freedom
- hackers challenge one strain of liberal jurisprudence, intellectual property, by drawing on and reformulating ideals
|
| |
Abstract
With the development of new sequencing techniques, the number of sequenced plant genomes is increasing. However, accurate annotation of these sequences remains a major challenge, in particular with regard to transposable elements (TEs). The aim of this chapter is to provide a roadmap for researchers involved in genome projects to address this issue. We list several widely used tools for each step of the TE annotation process, from the identification of TE families to the annotation of TE copies. We assess ...
|
| |
(01 April 2003)
Abstract
Consistently lauded for its lively, readable prose, this revised and updated edition of <I>A People's History of the United States</I> turns traditional textbook history on its head. Howard Zinn infuses the often-submerged voices of blacks, women, American Indians, war resisters, and poor laborers of all nationalities into this thorough narrative that spans American history from Christopher Columbus's arrival to an afterword on the Clinton presidency. <p> Addressing his trademark reversals of perspective, Zinn--a teacher, historian, ...
|
| |
Abstract
Expression quantitative trait loci (eQTL) studies have established convincing relationships between genetic variants and gene expression. Most of these studies focused on the mean of gene expression level, but not the variance of gene expression level (i.e., gene expression variability). In the present study, we systematically explore genome-wide association between genetic variants and gene expression variability in humans. We adapt the double generalized linear model ...
Note (first note only)
- y_i = \mu + \beta_1 g_i + \beta_2 x_i + \epsilon_i with \epsilon_i ~ N(0, \sigma^2 exp(\theta g_i))
- x_i indicates pop structure, \beta_1 is the effect of the genotype on the mean and \theta is the effect of the genotype on the variance
- get 2 p-values, 1 for \beta_1 == 0 and 1 for \theta == 0, see Verbyla & Smyth
|
| |
Note (first note only)
- mutational load: the human genome seems too large, given the observed human mutation rate. If the entire human genome were functional (in the sense of being under selective pressure), we would have too many deleterious mutations per generation.
- there are many examples of related species in the same genus that have haploid genome sizes that differ by three- to eightfold
- The C-value paradox is mostly (though not entirely) explained by different
|
| |
(11 January 1974)
Abstract
Nearly 30 years ago, John Horton Conway introduced a new way to construct numbers. Donald E. Knuth, in appreciation of this revolutionary system, took a week off from work on The Art of Computer Programming to write an introduction to Conway's method. Never content with the ordinary, Knuth wrote this introduction as a work of fiction-a novelette. If not a steamy romance, the book nonetheless shows how a young couple turned on to pure mathematics and found total happiness. ...
Note (first note only)
a copy is freely browsable on scribd
|
| |
Abstract
Generalized linear mixed models (GLMMs) continue to grow in popularity due to their ability to directly acknowledge multiple levels of dependency and model different data types. For small sample sizes especially, likelihood-based inference can be unreliable with variance components being particularly difficult to estimate. A Bayesian approach is appealing but has been hampered by the lack of a fast implementation, and the difficulty in specifying prior distributions with variance components again being particularly problematic. Here, we briefly review previous approaches to ...
|
| |
Statistica Sinica, Vol. 19 (2009), pp. 631-649
posted to jc statistics
by timflutre
on 2013-01-21 15:43:55
Abstract
A Wishart model is proposed for random distance matrices in which the components are correlated gamma random variables, all having the same degrees of freedom. The marginal likelihood is obtained in closed form. Its use is illustrated by multidimensional scaling, by rooted tree models for response covariances in social survey work, and unrooted trees for ancestral relationships in genetic applications. ...
|
| |
Abstract
Circuit theory has seen extensive recent use in the field of ecology, where it is often applied to study functional connectivity. The landscape is typically represented by a network of nodes and resistors, with the resistance between nodes a function of landscape characteristics. The effective distance between two locations on a landscape is represented by the resistance distance between the nodes in the network. Circuit theory has been applied to many other scientific fields for exploratory analyses, but parametric models for ...
|
| |
Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 65, No. 3. (1 August 2003), pp. 679-700, doi:10.1111/1467-9868.00409
posted to a3s algorithm bayesian mcmc
by timflutre
on 2012-12-17 21:48:30
Abstract
Summary. Reversible jump methods are the most commonly used Markov chain Monte Carlo tool for exploring variable dimension statistical models. Recently, however, an alternative approach based on birth-and-death processes has been proposed by Stephens for mixtures of distributions. We show that the birth-and-death setting can be generalized to include other types of continuous time jumps like split-and-combine moves in the spirit of Richardson and Green. We illustrate these extensions both for mixtures of distributions and for hidden Markov models. We demonstrate ...
|
| |
Note (first note only)
- L’originalité la plus radicale de Wikipédia tient sans doute moins à l’écriture participative qu’à cette mutualisation des procédures de surveillance et de sanction qui permet à la communauté de veiller sur elle-même.
|
| |
Abstract
Quantitative computational models play an increasingly important role in modern biology. Such models typically involve many free parameters, and assigning their values is often a substantial obstacle to model development. Directly measuring in vivo biochemical parameters is difficult, and collectively fitting them to other experimental data often yields large parameter uncertainties. Nevertheless, in earlier work we showed in a growth-factor-signaling model that collective fitting could yield well-constrained predictions, even when it left individual parameters very poorly constrained. We also showed that ...
|