Tags

timflutre's library 1402 articles

 
 

A fast, powerful method for detecting identity by descent.

  [CiTO]
American journal of human genetics, Vol. 88, No. 2. (11 February 2011), pp. 173-182, doi:10.1016/j.ajhg.2011.01.010

Abstract

We present a method, fastIBD, for finding tracts of identity by descent (IBD) between pairs of individuals. FastIBD can be applied to thousands of samples across genome-wide SNP data and is significantly more powerful for finding short tracts of IBD than existing methods for finding IBD tracts in such data. We show that fastIBD can detect facets of population structure that are not revealed by ...

 

FDR and Bayesian Multiple Comparisons Rules

  [CiTO]
In Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics (June 2006)
posted to bayesian multiple_testing statistics by timflutre on 2013-05-17 22:58:33 **

Abstract

We discuss Bayesian approaches to multiple comparison problems, using a decision theoretic perspective to critically compare competing approaches. We set up decision problems that lead to the use of FDR-based rules and generalizations. Alternative definitions of the probability model and the utility function lead to different rules and problem-specific adjustments. Using a loss function that controls realized FDR we derive an optimal Bayes rule that is a variation of the Benjamini and Hochberg (1995) procedure. The cutoff is based on increments in ordered posterior probabilities instead of ordered ...

Note (first note only)

  • i=1...n tests
  • d_i = 1 means "reject i-th test"
  • D = sum_i d_i is the sum of rejections (i.e. the nb of discoveries)
  • r_i = 0 means "truth of i-th test is H0"
  • FDR = (sum_i (1-r_i) d_i) / D
 

Recent common ancestors of all present-day individuals

  [CiTO]
Advances in Applied Probability, Vol. 31, No. 4. (December 1999), pp. 1002-1026, doi:10.1239/aap/1029955256
posted to coalescence population_genetics by timflutre on 2013-05-17 20:40:05 read

Abstract

Previous study of the time to a common ancestor of all present-day individuals has focused on models in which each individual has just one parent in the previous generation. For example, `mitochondrial Eve' is the most recent common ancestor (MRCA) when ancestry is defined only through maternal lines. In the standard Wright-Fisher model with population size n, the expected number of generations to the MRCA is about 2n, and the standard deviation of this time is also of order n. Here ...

Note (first note only)

  • In (discrete-time) Wright-Fisher model, with population size n, the expected number of generations to the MRCA is about 2n, and the standard deviation of this time is also of order n.
  • e.g. human mtDNA indicates an MRCA (dubbed "Eve") who lived 100,000 to 200,000 years ago
  • In a two-parent analog of the Wright-Fisher model, the nb of generations to MRCA tightly concentrates around log2(n) when n->inf. E.g. log2(10^6)=20.
 

Modelling the recent common ancestry of all living humans.

  [CiTO]
Nature, Vol. 431, No. 7008. (30 September 2004), pp. 562-566, doi:10.1038/nature02842
posted to coalescence human population_genetics by timflutre  on 2013-05-17 20:38:21 read along with 8 people and 4 groups balicea bpacker dahnielson damm inbetweener kehan larios skoch3 biodiversity_conservation Bioinformatics GrassBase vision-ng

Abstract

If a common ancestor of all living humans is defined as an individual who is a genealogical ancestor of all present-day people, the most recent common ancestor (MRCA) for a randomly mating population would have lived in the very recent past. However, the random mating model ignores essential aspects of population substructure, such as the tendency of individuals to choose mates from the same social ...

Note (first note only)

  • see this paper of 1999 for mathematical arguments
  • main message: "substantial forms of population subdivision can still be compatible with very recent common ancestors"
 

The Geography of Recent Genetic Ancestry across Europe

  [CiTO]
PLoS Biol, Vol. 11, No. 5. (7 May 2013), e1001555, doi:10.1371/journal.pbio.1001555

Abstract

The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (in the Population Reference Sample [POPRES] dataset) to conduct one of the first surveys of recent genealogical ancestry over the past ...

Note (first note only)

  • key notion: distinction between genealogical ancestor and genetic ancestor
  • IBD block: a contiguous segment of genome inherited (on at least one chromosome) from a shared common ancestor without intervening recombination
  • with this definition, everyone is IBD everywhere, but mostly on very short, old segments; thus focus
 

The ethics of chance

  [CiTO]
(May 1997)
posted to philosophy politics statistics by timflutre on 2013-05-15 21:49:03 **

Abstract

The following text is divided in four chapters. Each of the chapters has its own and unique place in the argumentative structure of this thesis. Chapter 1 is the engine of the thesis: it provides the necessary background information and develops the notions and concepts that will be of importance in this dissertation. In order to make the argument lift off, two wings are needed. Chapter 2 is the morally normative wing, whereas Chapter 3 constitutes the technically normative wing of ...

Note (first note only)

  • A coincidence is an a posteriori recognition of a synchronic occurrence of two independent --- or partly conflicting --- events, whereas a chance or probability is a matter of a priori anticipation.
  • Economic theory, in general, is at best a set of tautologies that can function as a descriptive model, but never as a set of prescriptive rules for distributive justice.
 

The ENCODE project: Missteps overshadowing a success

  [CiTO]
Current Biology, Vol. 23, No. 7. (April 2013), pp. R259-R261, doi:10.1016/j.cub.2013.03.023
posted to genomics review by timflutre on 2013-05-11 19:47:35 read along with 4 people egonw epermal mikelove neils

Abstract

Two clichés of science journalism have now played out around the ENCODE project. ENCODE’s publicity first presented a misleading “all the textbooks are wrong” narrative about noncoding human DNA. Now several critiques of ENCODE’s narrative have been published, and one was so vitriolic that it fueled “undignified academic squabble” stories that focused on tone more than substance. Neither story line does justice to our actual understanding of genomes, to ENCODE’s results, or to the role of big science in biology. ...

Note (first note only)

  • There are three categories of big science: the big experiment, the map, and the leading wedge.
  • A big experiment is driven by a single question or hypothesis test, but requires a large scale community investment.
  • A map is a data resource — comprehensive, complete, closed ended — to be used by multiple groups, over a long time, for multiple purposes.
  • A leading wedge is a
 

Effect of E-printing on Citation Rates in Astronomy and Physics

  [CiTO]
(13 Apr 2006)
posted to no-tag by timflutre on 2013-05-11 18:52:39 ** along with 4 people ansobol dpeeler ehenneken jrw

Abstract

In this report we examine the change in citation behavior since the introduction of the arXiv e-print repository (Ginsparg, 2001). It has been observed that papers that initially appear as arXiv e-prints get cited more than papers that do not (Lawrence, 2001; Brody et al., 2004; Schwarz & Kennicutt, 2004; Kurtz et al., 2005a, Metcalfe, 2005). Using the citation statistics from the NASA-Smithsonian Astrophysics Data System (ADS; Kurtz et al., 1993, 2000), we confirm the findings from other studies, we examine the average citation rate to e-printed papers ...

 

A Statistical Framework for Joint eQTL Analysis in Multiple Tissues

  [CiTO]
PLoS Genetics, Vol. 9, No. 5. (9 May 2013), e1003486, doi:10.1371/journal.pgen.1003486
posted to bayesian gene_expression quantitative_genetics statistics by timflutre on 2013-05-11 18:18:14 read

Abstract

Mapping expression Quantitative Trait Loci (eQTLs) represents a powerful and widely adopted approach to identifying putative regulatory variants and linking them to specific genes. Up to now eQTL studies have been conducted in a relatively narrow range of tissues or cell types. However, understanding the biology of organismal phenotypes will involve understanding regulation in multiple tissues, and ongoing studies are collecting eQTL data in dozens of cell types. Here we present a statistical framework for powerfully detecting eQTLs in multiple tissues ...

 

Clean Code: A Handbook of Agile Software Craftsmanship

  [CiTO]
(11 August 2008)
posted to programming by timflutre  on 2013-05-08 21:31:32 ** along with 9 people and 2 groups jimbarritt mabi merezano mtopf mxro nicholasVaidyanathan researchVictoria saqibrehan tompauwaert Information Networks and Knowledge Management Knowledge Networks

Abstract

Even bad code can function. But if code isn't clean, it can bring a development organization to its knees. Every year, countless hours and significant resources are lost because of poorly written code. But it doesn't have to be that way. Noted software expert Robert C. Martin presents a revolutionary paradigm with _**Clean Code: A Handbook of Agile Software Craftsmanship**_. Martin has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code â? on ...

 

The Variational Bayesian EM Algorithm for Incomplete Data: with Application to Scoring Graphical Model Structures

  [CiTO]
In Bayesian Statistics 7 (2003), pp. 453-462
posted to statistics variational_inference by timflutre on 2013-05-07 17:58:14 read

Abstract

We present an efficient procedure for estimating the marginal likelihood of probabilistic models with latent variables or incomplete data. This method constructs and optimises a lower bound on the marginal likelihood using variational calculus, resulting in an iterative algorithm which generalises the EM algorithm by maintaining posterior distributions over both latent variables and parameters. We define the family of conjugate-exponential models—which includes finite mixtures of exponential family models, factor analysis, hidden Markov models, linear state-space models, and other models of interest—for ...

 

Large-scale multiple testing under dependence

  [CiTO]
Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 71, No. 2. (1 April 2009), pp. 393-424, doi:10.1111/j.1467-9868.2008.00694.x
posted to multiple_testing statistics by timflutre on 2013-05-02 15:55:08 read along with 1 person djkt

Abstract

Summary.  The paper considers the problem of multiple testing under dependence in a compound decision theoretic framework. The observed data are assumed to be generated from an underlying two-state hidden Markov model. We propose oracle and asymptotically optimal data-driven procedures that aim to minimize the false non-discovery rate FNR subject to a constraint on the false discovery rate FDR. It is shown that the performance of a multiple-testing procedure can be substantially improved by adaptively exploiting the dependence structure among hypotheses, ...

Note (first note only)

  • The FDR procedures that are developed under the independence assumption, even valid, may suffer from substantial loss of efficiency when the dependence structure is highly informative.
  • our procedure is built on a new test statistic, the local index of significance (LIS)
  • For independent tests, when determining the level of significance of a hypothesis, a p-value approach considers each hypothesis separately, whereas an Lfdr approach considers the m hypotheses simultaneously by incorporating
 

Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control

  [CiTO]
Journal of the American Statistical Association, Vol. 102, No. 479. (1 September 2007), pp. 901-912, doi:10.1198/016214507000000545
posted to multiple_testing statistics by timflutre on 2013-05-02 04:05:12 read

Abstract

We develop a compound decision theory framework for multiple-testing problems and derive an oracle rule based on the z values that minimizes the false nondiscovery rate (FNR) subject to a constraint on the false discovery rate (FDR). We show that many commonly used multiple-testing procedures, which are p value?based, are inefficient, and propose an adaptive procedure based on the z values. The z value?based adaptive procedure asymptotically attains the performance of the z value oracle procedure and is more efficient than ...

Note (first note only)

  • false discovery rate: FDR = E(N_10/R | R>0) x Pr(R>0)
  • positive FDR: pFDR = E(N_10/R | R>0)
  • marginal FDR: mFDR = E(N_10) / E(R)
  • pFDR and mFDR are equivalent when test statistics come from a mixture of the null and nonnull distributions (Storey 2003)
  • mFDR is equal to FDR + remaining of order m^-0.5 (Genovese and Wasserman 2002)
  • false
 

Negative Binomial Regression

  [CiTO]
(08 March 2011)
posted to book statistics by timflutre on 2013-04-25 05:07:31 **
 

Le mythe de Sisyphe

  [CiTO]
posted to philosophy by timflutre on 2013-04-23 04:05:15 read

Note (first note only)

Un raisonnement absurde

L’absurde et le suicide

Il n’y a qu’un problème philosophique vraiment sérieux: c’est le suicide. Juger que la vie vaut ou ne vaut pas la peine d’être vécue, c’est répondre à la question fondamentale de la philosophie.

Commencer à penser, c’est commencer d’être miné.

Vivre, naturellement, n’est jamais facile. On continue à faire les gestes que l’existence commande, pour beaucoup de raisons dont la première est l’habitude. Mourir volontairement suppose qu’on a reconnu, même instinctivement, le caractère dérisoire de cette habitude, l’absence

 

Ancestral Population Genomics: The Coalescent Hidden Markov Model Approach

  [CiTO]
Genetics, Vol. 183, No. 1. (01 September 2009), pp. 259-274, doi:10.1534/genetics.109.103010
posted to coalescence hmm jc population_genetics by timflutre  on 2013-04-22 00:33:48 ** along with 8 people aethelwine aprasad avilella aylwyn djkt epermal jdutheil operon

Abstract

With incomplete lineage sorting (ILS), the genealogy of closely related species differs along their genomes. The amount of ILS depends on population parameters such as the ancestral effective population sizes and the recombination rate, but also on the number of generations between speciation events. We use a hidden Markov model parameterized according to coalescent theory to infer the genealogy along a four-species genome alignment of closely related species and estimate population parameters. We analyze a basic, panmictic demographic model and study ...

 

Convexity and Conflation Biases as Bregman Divergences: A note

  [CiTO]
posted to evolvability robustness statistics by timflutre on 2013-04-19 03:00:12 **

Abstract

In “Antifragile” [1], Taleb provides a fresh perspective on how one may gain from disorder. In this short note, we formalize and unify in a single premium (a schematic view of) the concavity/convexity and conflation effects described by Taleb. We show that this premium relies on a generalization of a well-known class of distortion measures of information geometry, namely Bregman divergences. We exemplify some properties of this premium, and discuss them in the light of “Antifragile” [1]. ...

Note (first note only)

  • Taleb: things that gain from disorder/uncertainty/variability are antifragile
  • we spend too much time and resources trying to predict (unpredictable) outcomes -> what matters is not the uncertain outcome x but how it affects us, i.e. our response function f(x)
  • conflation error: confusing x for f(x), or E[x] for f(E[x])
  • convexity bias: if f is convex, we gain from disorder: E[f(x)] - f(E[x]) > 0
 

Logistic disease incidence models and case-control studies

  [CiTO]
Biometrika, Vol. 66, No. 3. (01 December 1979), pp. 403-411, doi:10.1093/biomet/66.3.403

Abstract

The probability of disease development in a defined time period is described by a logistic regression model. A model for the regression variable, given disease status, is induced and is applied to case-control data. It is shown that the odds ratio estimators and their asymptotic variance matrices may be obtained by applying the original logistic regression model to the case-control study as if the data had been obtained in a prospective study. This result gives a flexible and convenient method of ...

Note (first note only)

  • a prospective study follows a cohort over time, and at the end we can write p(Disease | z), but it's hard to do with rare disease in humans because it requires a large sample size
  • a retrospective study samples cases and controls and we write p(z | Disease): although the full prospective model can't be estimated from case-control data alone, the odds ratio can
  • they specify a prospective model which
 

Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm

  [CiTO]
Scandinavian Journal of Statistics, Vol. 35, No. 2. (June 2008), pp. 335-353, doi:10.1111/j.1467-9469.2007.00585.x
posted to algorithm statistics by timflutre on 2013-04-12 16:06:32 **

Abstract

The expectation-maximization (EM) algorithm is a popular approach for obtaining maximum likelihood estimates in incomplete data problems because of its simplicity and stability (e.g. monotonic increase of likelihood). However, in many applications the stability of EM is attained at the expense of slow, linear convergence. We have developed a new class of itera- tive schemes, called squared iterative methods (SQUAREM), to accelerate EM, without compromising on simplicity and stability. SQUAREM generally achieves superlinear convergence in problems with a large fraction of ...

 

Linear Statistical Inference and its Applications

  [CiTO]
(26 December 2001)
posted to book statistics by timflutre on 2013-04-09 16:17:21 ** along with 2 people renreff toomash

Abstract

"C. R. Rao would be found in almost any statistician's list of five outstanding workers in the world of Mathematical Statistics today. His book represents a comprehensive account of the main body of results that comprise modern statistical theory."<BR> -W. G. Cochran <P>"[C. R. Rao is] one of the pioneers who laid the foundations of statistics which grew from ad hoc origins into a firmly grounded mathematical science."<BR> -B. ...

Note (first note only)

1 Algebra of vectors and matrices

2 Probability theory, tools and techniques

3 Continuous probability models

4 The theory of least-squares and analysis of variance

5 Criteria and methods of estimation

6 Large sample theory and methods

7 Theory of statistical inference

8 Multivariate analysis

 

Generalized Linear Models

  [CiTO]
(01 August 1989)
posted to book statistics by timflutre  on 2013-04-09 14:58:15 ** along with 17 people abrentnall aidankeane ampresanis brian dianella diplstat emanueleolivetti engelhardt frankrijmen gjabel gtaralds kohei-h LyTinWheedle maxlzentrum sbarthelme vplagnol Yshgao

Abstract

The success of the first edition of Generalized Linear Models led to the updated Second Edition, which continues to provide a definitive unified, treatment of methods for the analysis of diverse types of data. Today, it remains popular for its clarity, richness of content and direct relevance to agricultural, biological, health, engineering, and other applications.The authors focus on examining the way a response variable depends on a combination of explanatory variables, treatment, and classification variables. They give particular emphasis to the ...

Note (first note only)

1 Introduction

2 An outline of generalized linear models

  • exponential family distribution: exp[(yθ - b(θ)) / a(φ) + c(y,θ)] where θ is canonical param, φ=σ2 is dispersion (=1/ν precision)
  • in fact the exp family is a weighted family of distributions with exp(θy) as weight function for base density f0 and, as a consequence, function b happens to be the cumulant of f0
  • calculate the first and second derivatives of log-likelihood w.r.t.
 

Applied statistics : principles and examples

  [CiTO]
(1981)
posted to book statistics by timflutre on 2013-04-09 14:49:54 ** along with 1 person vprieto
 

Using linear predictors to impute allele frequencies from summary or pooled genotype data

  [CiTO]
The Annals of Applied Statistics, Vol. 4, No. 3. (September 2010), pp. 1158-1182, doi:10.1214/10-aoas338
posted to imputation jc by timflutre on 2013-04-01 17:36:28 read

Abstract

Recently-developed genotype imputation methods are a powerful tool for detecting untyped genetic variants that affect disease susceptibility in genetic association studies. However, existing imputation methods require individual-level genotype data, whereas, in practice, it is often the case that only summary data are available. For example, this may occur because, for reasons of privacy or politics, only summary data are made available to the research community at large; or because only summary data are collected, as in DNA pooling experiments. In this ...

Note (first note only)

  • y is a vector of allele frequencies at P SNPs, assumed to follow MVN(mu,Sigma) conditionally on panel data M
  • mu and Sigma are parametrized in terms of the panel as well as something like mutation rate and recombination rate
  • y can be partitioned into typed and untyped, and thus the distribution of untyped is available conditionally on typed and M, but it requires inverting Sigma
  • every entry of
 

Multivariate linear models

  [CiTO]
(8 November 2010)
posted to linear_algebra statistics by timflutre on 2013-03-30 16:41:29 read

Abstract

This handout details all the mathematical formulas behind the multivariate linear regression (estimators, covariance, tests). ...

 

An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies

  [CiTO]
The Annals of Applied Statistics, Vol. 5, No. 2A. (June 2011), pp. 994-1019, doi:10.1214/10-aoas393
posted to a3 meta_analysis statistics by timflutre on 2013-03-22 18:06:29 read

Abstract

Global expression analyses using microarray technologies are becoming more common in genomic research, therefore, new statistical challenges associated with combining information from multiple studies must be addressed. In this paper we will describe our proposal for an adaptively weighted (AW) statistic to combine multiple genomic studies for detecting differentially expressed genes. We will also present our results from comparisons of our proposed AW statistic to Fisher’s equally weighted (EW), Tippett’s minimum p-value (minP) and Pearson’s (PR) statistics. Due to the absence ...

Note (first note only)

  • meta-analysis of K studies, e.g. gene expression between controls and cases
  • global null hypothesis for each gene g: θ_g1 = ... = θ_gK = 0 where θ_gk represents the gene effect of gene g and study k
  • 2 possible alternative hypotheses: HA -> all θ_gk != 0 ; or HB -> at least one θ_gk != 0
  • the set of significant genes under HB may represent experimental and
 

Genotype imputation via matrix completion

  [CiTO]
Genome Research, Vol. 23, No. 3. (01 March 2013), pp. 509-518, doi:10.1101/gr.145821.112
posted to imputation jc linear_algebra matrix_factorization by timflutre  on 2013-03-18 14:32:19 read along with 1 person and 1 group qayub Journal picks

Abstract

Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading im- putation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while ...

Note (first note only)

  • matrix X with m individuals in row and n sorted SNPs in columns: fill the missing values (individual entries, or full rows/columns)
  • for a given a region (set of SNPs), we expect a small nb of haplotypes, thus we want to find a low-rank matrix approximating X well
  • find the rank of the imputed matrix Z that minimizes the Frobenius norm between X and Z
  • in order to
 

A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits.

  [CiTO]
American journal of human genetics, Vol. 90, No. 5. (4 May 2012), pp. 821-835, doi:10.1016/j.ajhg.2012.03.015
posted to a3 gwas meta_analysis by timflutre on 2013-03-17 19:39:20 read

Abstract

Pooling genome-wide association studies (GWASs) increases power but also poses methodological challenges because studies are often heterogeneous. For example, combining GWASs of related but distinct traits can provide promising directions for the discovery of loci with small but common pleiotropic effects. Classical approaches for meta-analysis or pooled analysis, however, might not be suitable for such analysis because individual variants are likely to be associated with ...

Note (first note only)

  • they specifically speak of GWAS (i.e. not eQTL mapping)
  • for each SNP, run a linear regression in each study, get estimates of effect size and std error, and get the Z score
  • compute a Z score over all subsets of studies (with appropriate weights) and assess evidence against the global null using the max of these Z scores
  • they can handle correlations due to same individuals in different
 

A new approach for the joint analysis of multiple ChIP-seq libraries with application to histone modification.

  [CiTO]
Statistical applications in genetics and molecular biology, Vol. 11, No. 3. (2012), doi:10.1515/1544-6115.1660
posted to a3 chromatin statistics by timflutre on 2013-03-17 18:11:48 read along with 1 person mikelove

Abstract

Most approaches for analyzing ChIP-Seq data are focused on inferring exact protein binding sites from a single library. However, frequently multiple ChIP-Seq libraries derived from differing cell lines or tissue types from the same individual may be available. In such a situation, a separate analysis for each tissue or cell line may be inefficient. Here, we describe a novel method to analyze such data that ...

Note (first note only)

Single library

  • Y_i is log(ChIP-seq reads) around gene i -> does it depend on log(gene length), X_i1, and GC content at promoter, X_i2?
  • Y_i = (1-Z_i)(b_0 + b_1X_i1 + b_2X_i1^2) + Z_i(b'_0 + b'_1X_i1 + b'_2X_i1^2) + b_3X_i2 + e_i
  • pi_i = P(Z_i=1) -> proba that gene is methylated ; pi = sum_i pi_i
  • fit model with EM algorithm

Multiple libraries

  • jointly analyze all K libraries to
 

An Introduction to Generalized Linear Models

  [CiTO]
(2002)

Abstract

Generalized linear models provide a unified theoretical and conceptual framework for many of the most commonly used statistical methods. In the ten years since publication of the first edition of this bestselling text, great strides have been made in the development of new methods and in software for generalized linear models and other closely related models.Thoroughly revised and updated, An Introduction to Generalized Linear Models, Second Edition continues to initiate intermediate students of statistics, and the many other disciplines that use ...

Note (first note only)

3 Exponential Family and Generalized Linear Models

  • f(y; θ) = s(y)t(θ)exp[a(y)b(θ)] = exp[a(y)b(θ) + c(θ) + d(y)]
  • link function, canonical form, natural parameter b(θ), nuisances parameters (other than θ), overdispersion, score statistics (first derivative of loglik: expectation is 0, variance is called information and is also second derivative of loglik)

4 Estimation

  • iterative reweighted least squares (IRLS)

9 Count Data, Poisson Regression and Log-Linear Models

11 Clustered and Longitudinal Data

 

Genome-Wide Association Studies in an Isolated Founder Population from the Pacific Island of Kosrae

  [CiTO]
PLoS Genet, Vol. 5, No. 2. (6 February 2009), e1000365, doi:10.1371/journal.pgen.1000365

Abstract

It has been argued that the limited genetic diversity and reduced allelic heterogeneity observed in isolated founder populations facilitates discovery of loci contributing to both Mendelian and complex disease. A strong founder effect, severe isolation, and substantial inbreeding have dramatically reduced genetic diversity in natives from the island of Kosrae, Federated States of Micronesia, who exhibit a high prevalence of obesity and other metabolic disorders. We hypothesized that genetic drift and possibly natural selection on Kosrae might have increased the frequency ...

Note (first note only)

  • paper from 2009 having problems with high genetic relatedness and high number of markers, would be interesting to do again with LMM
 

Values in science: an introduction

  [CiTO]
posted to ethics miscellanee politics by timflutre on 2013-02-28 19:32:13 read/This user's rating 5.0/Average rating 5.0

Abstract

Values intersect with science in three primary ways. First, there are values, particularly epistemic values, which guide scientific research itself. Second, the scientific enterprise is always embedded in some particular culture and values enter science through its individual practitioners, whether consciously or not. Finally, values emerge from science, both as a product and process, and may be redistributed more broadly in the culture or society. Also, scientific discoveries may pose new social challenges about values, though the values themselves may be ...

 

A tutorial on Principal Component Analysis

  [CiTO]
(22 April 2009)
posted to algorithm statistics tutorial by timflutre on 2013-02-27 22:31:22 read/This user's rating 5.0/Average rating 5.0

Abstract

Principal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used but (sometimes) poorly understood. The goal of this paper is to dispel the magic behind this black box. This manuscript focuses on building a solid intuition for how and why principal component analysis works. This manuscript crystallizes this knowledge by deriving from simple intuitions, the mathematics behind PCA. This tutorial does not shy away from explaining the ideas informally, nor does it ...

Note (first note only)

  • data: n samples, each of m measurement types -> gather into matrix X which is m x n
  • thus initially, the data are in the natural basis (basis of the observations) but we want to find a new basis allowing us to discard useless and redundant measurements
  • we want a new matrix such as PX=Y, i.e. the rows of P are the new basis for the columns of X
 

C++ concurrency in action: practical multithreading

  [CiTO]
(2009)
posted to book programming by timflutre on 2013-02-21 04:15:05 **

Abstract

With the new C Standard and Technical Report 2 (TR2), multi-threading is coming to C in a big way. TR2 will provide higher-level synchronization facilities that allow for a much greater level of abstraction, and make programming multi-threaded applications simpler and safer. As a guide and reference to the new concurrency features in the upcoming C Standard and TR2, this book is invaluable for existing programmers familiar with writing multi-threaded code in C using platform-specific APIs, or in other languages, as well ...

 

Seed giants versus US farmers

  [CiTO]
(2013)
edited by Debbie Barker
posted to agriculture law by timflutre on 2013-02-14 00:14:01 **
 

The GEM mapper: fast, accurate and versatile alignment by filtration

  [CiTO]
Nat Meth, Vol. 9, No. 12. (28 December 2012), pp. 1185-1188, doi:10.1038/nmeth.2221
posted to algorithm alignment bioinformatics by timflutre  on 2013-02-13 15:45:16 ** along with 12 people and 1 group babakap cantalapiedra cicca dakelley jbhiatt JCmoure jeanmonlong muratsincan NGS_Array_References pickw siarheimanakov Vinz Ciccarelli Lab
 

Summarizing the predictive power of a generalized linear model

  [CiTO]
Statistics in Medicine, Vol. 19, No. 13. (2000), pp. 1771-1781, doi:10.1002/1097-0258(20000715)19:13<1771::aid-sim485>3.0.co;2-p
posted to statistics by timflutre  on 2013-02-03 01:46:56 read along with 4 people austin987 LVIS-BRD slack---line Zephyrus

Abstract

This paper studies summary measures of the predictive power of a generalized linear model, paying special attention to a generalization of the multiple correlation coefficient from ordinary linear regression. The population value is the correlation between the response and its conditional expectation given the predictors, and the sample value is the correlation between the observed response and the model predicted value. We compare four estimators of the measure in terms of bias, mean squared error and behaviour in the presence of ...

Note (first note only)

  • 2 random variables: univariate response Y and (possibly multivariate) predictor X
  • N samples (X_i,Y_i) -> MLE of E(Y|X): \hat{Y}
  • if single predictor: cor(Y, E(Y|X)) = |beta| \sqrt(Var(X) / Var(Y))
  • the proportional relationship between beta and cor(Y, E(Y|X)) for univariate X does not hold for an arbitrary GLM, although one can show (B. Zheng, unpublished dissertation, 1997) that an approximate relationship of this type exists when beta is close
 

A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

  [CiTO]
Briefings in Bioinformatics (17 September 2012), doi:10.1093/bib/bbs046
posted to bioinformatics gene_expression rnaseq by timflutre  on 2013-02-01 21:50:12 read along with 21 people and 2 groups astoddard bioinfo_bz cswarth daveGerrard druvus dswan fuadgwadry Gig77 golharam heathervincent jwfoley keysoonpals LuciaPu nailest NGS_Array_References nunofonseca rpiro sjcockell sotacam tonamswish Zephyrus Journal picks Translational interest

Abstract

During the last 3 years, a number of approaches for the normalization of RNA sequencing data have emerged in the literature, differing both in the type of bias adjustment and in the statistical strategy adopted. However, as data continue to accumulate, there has been no clear consensus on the appropriate normalization method to be used or the impact of a chosen method on the downstream analysis. In this work, we focus on a comprehensive comparison of seven recently proposed normalization methods ...

Note (first note only)

  • in their most difficult case (different library size and few high-count genes), only DESeq and TMM (from edgeR) are able to control the false-positive rate while also maintaining the power to detect differentially expressed genes
  • the authors restricted themselves to evaluating differential expression, thus they looked at neither GC content nor gene length
 

Improved Heritability Estimation from Genome-wide SNPs

  [CiTO]
The American Journal of Human Genetics, Vol. 91, No. 6. (7 December 2012), pp. 1011-1021, doi:10.1016/j.ajhg.2012.10.010
posted to heritability jc mixed_model by timflutre  on 2013-02-01 15:33:07 ** along with 2 people and 1 group djkt nailest PollardWall

Abstract

Estimation of narrow-sense heritability, h2, from genome-wide SNPs genotyped in unrelated individuals has recently attracted interest and offers several advantages over traditional pedigree-based methods. With the use of this approach, it has been estimated that over half the heritability of human height can be attributed to the ∼300,000 SNPs on a genome-wide genotyping array. In comparison, only 5%–10% can be explained by SNPs reaching genome-wide significance. We investigated via simulation the validity of several key assumptions underpinning the mixed-model analysis used ...

 

Coding Freedom: The Ethics and Aesthetics of Hacking

  [CiTO]
(November 2012)

Note (first note only)

  • liberalism: protecting property and civil liberties, promoting individual autonomy and tolerance, securing a free press, ruling through limited government and universal law, and preserving a commitment to equal opportunity and meritocracy
  • hackers: computer afcionados driven by an inquisitive passion for tinkering and learning technical systems, and frequently committed to an ethical version of information freedom
  • hackers challenge one strain of liberal jurisprudence, intellectual property, by drawing on and reformulating ideals
 

Transposable element annotation in completely sequenced eukaryote genomes

  [CiTO]
In Plant Transposable Elements: Impact on Genome Structure and Function, Vol. 24 (2012), pp. 17-39, doi:10.1007/978-3-642-31842-9_2

Abstract

With the development of new sequencing techniques, the number of sequenced plant genomes is increasing. However, accurate annotation of these sequences remains a major challenge, in particular with regard to transposable elements (TEs). The aim of this chapter is to provide a roadmap for researchers involved in genome projects to address this issue. We list several widely used tools for each step of the TE annotation process, from the identification of TE families to the annotation of TE copies. We assess ...

 

A People's History of the United States: 1492-Present

  [CiTO]
(01 April 2003)
posted to book history sociology by timflutre on 2013-01-31 02:21:00 ** along with 2 people pietrosperoni tystl

Abstract

Consistently lauded for its lively, readable prose, this revised and updated edition of <I>A People's History of the United States</I> turns traditional textbook history on its head. Howard Zinn infuses the often-submerged voices of blacks, women, American Indians, war resisters, and poor laborers of all nationalities into this thorough narrative that spans American history from Christopher Columbus's arrival to an afterword on the Clinton presidency. <p> Addressing his trademark reversals of perspective, Zinn--a teacher, historian, ...

 

Genetic variants contribute to gene expression variability in humans.

  [CiTO]
Genetics, Vol. 193, No. 1. (01 January 2013), pp. 95-108, doi:10.1534/genetics.112.146779

Abstract

Expression quantitative trait loci (eQTL) studies have established convincing relationships between genetic variants and gene expression. Most of these studies focused on the mean of gene expression level, but not the variance of gene expression level (i.e., gene expression variability). In the present study, we systematically explore genome-wide association between genetic variants and gene expression variability in humans. We adapt the double generalized linear model ...

Note (first note only)

  • double GLM:
  • y_i = \mu + \beta_1 g_i + \beta_2 x_i + \epsilon_i with \epsilon_i ~ N(0, \sigma^2 exp(\theta g_i))
  • x_i indicates pop structure, \beta_1 is the effect of the genotype on the mean and \theta is the effect of the genotype on the variance
  • get 2 p-values, 1 for \beta_1 == 0 and 1 for \theta == 0, see Verbyla & Smyth
 

The C-value paradox, junk DNA and ENCODE

  [CiTO]
Current Biology, Vol. 22, No. 21. (6 November 2012), pp. R898-R899, doi:10.1016/j.cub.2012.10.002
posted to review transposable_element by timflutre  on 2013-01-26 15:19:46 read along with 4 people and 1 group dullhunk gwallau nailest pedrobeltrao Journal picks

Note (first note only)

  • mutational load: the human genome seems too large, given the observed human mutation rate. If the entire human genome were functional (in the sense of being under selective pressure), we would have too many deleterious mutations per generation.
  • there are many examples of related species in the same genus that have haploid genome sizes that differ by three- to eightfold
  • The C-value paradox is mostly (though not entirely) explained by different
 

Surreal Numbers

  [CiTO]
(11 January 1974)
posted to book mathematics by timflutre on 2013-01-24 04:28:22 ** along with 2 people Benja NitinCR

Abstract

Nearly 30 years ago, John Horton Conway introduced a new way to construct numbers. Donald E. Knuth, in appreciation of this revolutionary system, took a week off from work on The Art of Computer Programming to write an introduction to Conway's method. Never content with the ordinary, Knuth wrote this introduction as a work of fiction-a novelette. If not a steamy romance, the book nonetheless shows how a young couple turned on to pure mathematics and found total happiness. ...

Note (first note only)

a copy is freely browsable on scribd

 

Bayesian inference for generalized linear mixed models

  [CiTO]
Biostatistics, Vol. 11, No. 3. (01 July 2010), pp. 397-412, doi:10.1093/biostatistics/kxp053

Abstract

Generalized linear mixed models (GLMMs) continue to grow in popularity due to their ability to directly acknowledge multiple levels of dependency and model different data types. For small sample sizes especially, likelihood-based inference can be unreliable with variance components being particularly difficult to estimate. A Bayesian approach is appealing but has been hampered by the lack of a fast implementation, and the difficulty in specifying prior distributions with variance components again being particularly problematic. Here, we briefly review previous approaches to ...

 

Marginal likelihood for distance matrices

  [CiTO]
Statistica Sinica, Vol. 19 (2009), pp. 631-649
posted to jc statistics by timflutre on 2013-01-21 15:43:55 **

Abstract

A Wishart model is proposed for random distance matrices in which the components are correlated gamma random variables, all having the same degrees of freedom. The marginal likelihood is obtained in closed form. Its use is illustrated by multidimensional scaling, by rooted tree models for response covariances in social survey work, and unrooted trees for ancestral relationships in genetic applications. ...

 

Circuit Theory and Model-Based Inference for Landscape Connectivity

  [CiTO]
Journal of the American Statistical Association, Vol. 108, No. 501. (12 September 2012), pp. 22-33, doi:10.1080/01621459.2012.724647
posted to ecology jc population_genetics by timflutre on 2013-01-21 15:37:23 ** along with 1 person nsm120

Abstract

Circuit theory has seen extensive recent use in the field of ecology, where it is often applied to study functional connectivity. The landscape is typically represented by a network of nodes and resistors, with the resistance between nodes a function of landscape characteristics. The effective distance between two locations on a landscape is represented by the resistance distance between the nodes in the network. Circuit theory has been applied to many other scientific fields for exploratory analyses, but parametric models for ...

 

Reversible jump, birth-and-death and more general continuous time Markov chain Monte Carlo samplers

  [CiTO]
Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 65, No. 3. (1 August 2003), pp. 679-700, doi:10.1111/1467-9868.00409
posted to a3s algorithm bayesian mcmc by timflutre on 2012-12-17 21:48:30 **

Abstract

Summary. Reversible jump methods are the most commonly used Markov chain Monte Carlo tool for exploring variable dimension statistical models. Recently, however, an alternative approach based on birth-and-death processes has been proposed by Stephens for mixtures of distributions. We show that the birth-and-death setting can be generalized to include other types of continuous time jumps like split-and-combine moves in the spirit of Richardson and Green. We illustrate these extensions both for mixtures of distributions and for hidden Markov models. We demonstrate ...

 

La vigilance participative. Une interprétation de la gouvernance de Wikipédia

  [CiTO]
Réseaux, Vol. 154, No. 2. (2009), 51, doi:10.3917/res.154.0051
posted to politics by timflutre  on 2012-12-16 21:56:42 ** along with 1 group ScientificRedCards

Note (first note only)

  • L’originalité la plus radicale de Wikipédia tient sans doute moins à l’écriture participative qu’à cette mutualisation des procédures de surveillance et de sanction qui permet à la communauté de veiller sur elle-même.
 

Universally sloppy parameter sensitivities in systems biology models.

  [CiTO]
PLoS Computational Biology, Vol. 3, No. 10. (1 October 2007), pp. 1871-1878, doi:10.1371/journal.pcbi.0030189

Abstract

Quantitative computational models play an increasingly important role in modern biology. Such models typically involve many free parameters, and assigning their values is often a substantial obstacle to model development. Directly measuring in vivo biochemical parameters is difficult, and collectively fitting them to other experimental data often yields large parameter uncertainties. Nevertheless, in earlier work we showed in a growth-factor-signaling model that collective fitting could yield well-constrained predictions, even when it left individual parameters very poorly constrained. We also showed that ...

Note: You may cite this page as: http://www.citeulike.org/user/timflutre/

Result page: 1 2 3 4 5 6 7 8 9 10 Next

Create CiTO

Create a CiTO relationship by dragging the [CiTO] link onto another article.

Alternatively, drag two articles into the two boxes below. This is useful when the two articles are not on the same page - the articles will be remembered between pages.

This article...

...this one

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.