| |
In In Proceedings 8th Workshop on Algorithm Engineering and Experiments ALENEX’06, Vol. 6 (2006), pp. 86-94.
Abstract
To cover the edges of a graph with a minimum number of cliques is an NP-complete problem with many applications. The state-of-the-art solving algorithm is a polynomial-time heuristic from the 1970’s. We present an improvement of this heuristic. Our main contribution, however, is the development of efficient and effective polynomial-time data reduction rules that, combined with a search tree algorithm, allow for exact problem solutions in competitive time. This is confirmed by experiments with real-world and synthetic data. Moreover, we prove ...
|
| |
Neural Networks, Vol. 20, No. 1. (January 2007), pp. 109-128.
Abstract
Fuzzy ARTMAP neural networks have been proven to be good classifiers on a variety of classification problems. However, the time that Fuzzy ARTMAP takes to converge to a solution increases rapidly as the number of patterns used for training is increased. In this paper we examine the time Fuzzy ARTMAP takes to converge to a solution and we propose a coarse grain parallelization technique, based on a pipeline approach, to speed-up the training process. In particular, we have parallelized Fuzzy ARTMAP ...
|
| |
In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (2008), pp. 213-220.
Abstract
The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and the other set consists of negative examples. However, it is often the case that the available training data are an incomplete set of positive examples, and a set of unlabeled examples, some of which are positive and some of which are negative. The problem solved in this paper is how ...
|
| |
In Church's Thesis After 70 Years (28 February 2007), pp. 518-544.
|
| |
In Handbook of Combinatorial Optimization, Vol. 4 (1999), pp. 1-74.
|
| |
Nature, Vol. 460, No. 7257. (26 July 2009), pp. 894-898.
Abstract
The breadth of genomic diversity found among organisms in nature allows populations to adapt to diverse environments1, 2. However, genomic diversity is difficult to generate in the laboratory and new phenotypes do not easily arise on practical timescales3. Although in vitro and directed evolution methods4, 5, 6, 7, 8, 9 have created genetic variants with usefully altered phenotypes, these methods are limited to laborious and serial manipulation of single genes and are not used for parallel and continuous directed evolution of ...
|
| |
In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (2009), pp. 737-746.
Abstract
Algorithms based on simulating stochastic flows are a simple and natural solution for the problem of clustering graphs, but their widespread use has been hampered by their lack of scalability and fragmentation of output. In this article we present a multi-level algorithm for graph clustering using flows that delivers significant improvements in both quality and speed. The graph is first successively coarsened to a manageable size, and a small number of iterations of flow simulation is performed on the coarse graph. ...
|
| |
In In 11th Europ. Symp. Algorithms, Vol. 2832 (2003), pp. 568-579.
Abstract
Abstract. A promising approach to graph clustering is based on the intuitive notion of intra-cluster density vs. inter-cluster sparsity. While both formalizations and algorithms focusing on particular aspects of this rather vague concept have been proposed no conclusive argument on their appropriateness has been given. As a first step towards understanding the consequences of particular conceptions, we conducted an experimental evaluation of graph clustering approaches. By combining proven techniques from graph partitioning and geometric clustering, we also introduce a new approach ...
|
| |
Science, Vol. 301, No. 5640. (19 September 2003), pp. 1734-1736.
Abstract
Erythrocytic mechanisms involved in malarial infection are poorly understood. We have found that signaling via the erythrocyte beta2-adrenergic receptor and heterotrimeric guanine nucleotide-binding protein (Galphas) regulated the entry of the human malaria parasite Plasmodium falciparum. Agonists that stimulate cyclic adenosine 3',5'-monophosphate production led to an increase in malarial infection that could be blocked by specific receptor antagonists. Moreover, peptides designed to inhibit Galphas protein function reduced parasitemia in P. falciparum cultures in vitro, and beta-antagonists reduced parasitemia of P. berghei infections ...
|
| |
PLoS pathogens, Vol. 4, No. 6. (June 2008)
by Christiaan van Ooij, Pamela Tamez, Souvik Bhattacharjee, et al.N. Luisa Hiller, Travis Harrison, Konstantinos Liolios, Taco Kooij, Jai Ramesar, Bharath Balu, John Adams, Andrew P. Waters, Andy Waters, Chris J. Janse, Chris Janse, Kasturi Haldar
Abstract
The malaria agent Plasmodium falciparum is predicted to export a "secretome" of several hundred proteins to remodel the host erythrocyte. Prediction of protein export is based on the presence of an ER-type signal sequence and a downstream Host-Targeting (HT) motif (which is similar to, but distinct from, the closely related Plasmodium Export Element [PEXEL]). Previous attempts to determine the entire secretome, using either the HT-motif or the PEXEL, have yielded large sets of proteins, which have not been comprehensively tested. We ...
|
| |
Nucl. Acids Res., Vol. 28, No. 1. (1 January 2000), pp. 235-242.
Abstract
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource. 10.1093/nar/28.1.235 ...
|
| |
Machine Learning, Vol. 20, No. 3. (1 September 1995), pp. 273-297.
Abstract
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to ...
|
| |
Bioinformatics (Oxford, England), Vol. 24, No. 19. (1 October 2008), pp. 2265-2266.
Abstract
Cytoprophet is a software tool that allows prediction and visualization of protein and domain interaction networks. It is implemented as a plug-in of Cytoscape, an open source software framework for analysis and visualization of molecular networks. Cytoprophet implements three algorithms that predict new potential physical interactions using the domain composition of proteins and experimental assays. The algorithms for protein and domain interaction inference include maximum likelihood estimation (MLE) using expectation maximization (EM); the set cover approach maximum specificity set cover (MSSC) ...
|
| |
Nature Protocols, Vol. 2, No. 10. (27 September 2007), pp. 2366-2382.
by Melissa S. Cline, Michael Smoot, Ethan Cerami, et al.Allan Kuchinsky, Nerius Landys, Chris Workman, Rowan Christmas, Iliana Avila-Campilo, Michael Creech, Benjamin Gross, Kristina Hanspers, Ruth Isserlin, Ryan Kelley, Sarah Killcoyne, Samad Lotia, Steven Maere, John Morris, Keiichiro Ono, Vuk Pavlovic, Alexander R. Pico, Aditya Vailaya, Peng-Liang Wang, Annette Adler, Bruce R. Conklin, Leroy Hood, Martin Kuiper, Chris Sander, Ilya Schmulevich, Benno Schwikowski, Guy J. Warner, Trey Ideker, Gary D. Bader
Abstract
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative ...
|
| |
Genome Research, Vol. 13, No. 11. (November 2003), pp. 2498-2504.
Abstract
10.1101/gr.1239303 Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression ...
|
| |
Journal of Medicinal Chemistry, Vol. 47, No. 8. (1 April 2004), pp. 1879-1881.
Abstract
PMID: 15055986 Docking of the 5CITEP inhibitor to snapshots of a 2 ns HIV-1 integrase MD trajectory indicated a previously uncharacterized trench adjacent to the active site that intermittently opens. Further docking studies of novel ligands with the potential to bind to both regions showed greater selective affinity when able to bind to the trench. Our ranking of ligands is open to experimental testing, and our approach suggests a new target for HIV-1 therapeutics. ...
|
| |
Int. J. Bioinformatics Res. Appl., Vol. 4, No. 1. (2008), pp. 11-27.
Abstract
Identifying functional modules is believed to reveal most cellular processes. There have been many computational approaches to investigate the underlying biological structures (Bader and Hogue, 2003; Dhillon et al., 2005; Krogan et al., 2006; Ramadan et al., 2005; Xiong et al., 2005; Zhang et al., 2004). A spectral clustering method plays a critical role identifying functional modules in a yeast protein-protein network (Ramadan et al., 2005). We present an unweighted-graph version of a multilevel spectral algorithm which more accurately identifies protein ...
|
| |
Proceedings of the 23rd International Conference on Machine Learning (2006)
Abstract
Graph data is getting increasingly popular in, e.g., bioinformatics and text processing. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraphs, the dimensionality gets too large for usual statistical methods. We propose an efficient method for learning a binomial mixture model in this feature space. Combining the l1 regularizer and the data structure called DFS code tree, the ...
|
| |
IEEE Transactions on Evolutionary Computation, Vol. 9 (2005)
Abstract
Abstract—Estimation of distribution algorithms sample new solutions (offspring) from a probability model which characterizes the distribution of promising solutions in the search space at each generation. The location information of solutions found so far (i.e., the actual positions of these solutions in the search space) is not directly used for generating offspring in most existing estimation of distribution algorithms. This paper introduces a new operator, called guided mutation. Guided mutation generates offspring through combination of global statistical information and the location ...
|
| |
The Journal of chemical physics, Vol. 128, No. 14. (14 April 2008)
Abstract
We propose a novel normal mode multiple time stepping Langevin dynamics integrator called NML. The aim is to approximate the kinetics or thermodynamics of a biomolecule by a reduced model based on a normal mode decomposition of the dynamical space. Our basis set uses the eigenvectors of a mass reweighted Hessian matrix calculated with a biomolecular force field. This particular choice has the advantage of an ordering according to the eigenvalues, which have a physical meaning of being the square of ...
|
| |
Nucl. Acids Res., Vol. 36, No. suppl_1. (11 January 2008), pp. D281-288.
by Robert D. Finn, John Tate, Jaina Mistry, et al.Penny C. Coggill, Stephen J. Sammut, Hans-Rudolf Hotz, Goran Ceric, Kristoffer Forslund, Sean R. Eddy, Erik L. L. Sonnhammer, Alex Bateman
Abstract
Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metagenomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden ...
|
| |
Nucleic acids research, Vol. 37, No. Database issue. (1 January 2009), pp. D169-174.
Abstract
The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information that is essential for modern biological research. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute, the Protein Information Resource and the Swiss Institute of Bioinformatics. The core activities include manual curation of protein sequences assisted by computational analysis, sequence archiving, a user-friendly UniProt website and the provision of additional ...
|
| |
The Journal of clinical investigation, Vol. 109, No. 11. (June 2002), pp. 1445-1452.
Abstract
Human thyrotropin (TSH), luteotropin (LH), follitropin (FSH), and chorionic gonadotropin are members of the heterodimeric glycoprotein hormone family. The common alpha subunit forms noncovalent heterodimers with different beta subunits. Two novel human glycoprotein hormonelike genes, alpha2 (A2) and beta5 (B5), recently have been identified. Using a yeast two-hybrid assay, the two subunits were found as potential heterodimerization partners. Immunological analyses confirmed the heterodimerization of A2 and B5 in transfected cells and their colocalization in the anterior pituitary. Recombinant A2/B5 heterodimeric glycoproteins, ...
|
| |
Science, Vol. 306, No. 5703. (10 December 2004), pp. 1934-1937.
Abstract
Malaria parasites secrete proteins across the vacuolar membrane into the erythrocyte, inducing modifications linked to disease and parasite survival. We identified an 11-amino acid signal required for the secretion of proteins from the Plasmodium falciparum vacuole to the human erythrocyte. Bioinformatics predicted a secretome of >320 proteins and conservation of the signal across parasite species. Functional studies indicated the predictive value of the signal and its role in targeting virulence proteins to the erythrocyte and implicated its recognition by a receptor/transporter. ...
|
| |
International Journal of Computers, Communications & Control (2009), pp. 104-117.
Abstract
Fuzzy ARTMAP with Relevance factor (FAMR) is a Fuzzy ARTMAP (FAM)neural architecture with the following property: Each training pairhas a relevance factor assigned to it, proportional to theimportance of that pair during the learning phase. Using a relevancefactor adds more flexibility to the training phase, allowing rankingof sample pairs according to the confidence we have in theinformation source or in the pattern itself. We introduce a novel FAMR architecture: FAMR with Feature Weighting(FAMRFW). In the first stage, the training data features areweighted. ...
|
| |
IEEE/ACM Trans. Comput. Biol. Bioinformatics, Vol. 2, No. 2. (2005), pp. 119-130.
Abstract
Protein-protein interactions play a number of central roles in many cellular functions, including DNA replication, transcription and translation, signal transduction, and metabolic pathways. A recent increase in the number of protein-protein interactions has made predicting unknown protein-protein interactions important for the understanding of living cells. However, the protein-protein interactions experimentally obtained so far are often incomplete and contradictory and, consequently, existing computational prediction methods have integrated evidence (latent knowledge of proteins) from different and more reliable sources. Analyzing the relationships between ...
|
| |
In EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology (2009), pp. 472-480.
Abstract
Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others. ...
|
| |
In ICANNGA '07: Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part II (2007), pp. 399-406.
Abstract
An efficient branch and bound algorithm for matching protein structures has been developed. The compared protein structures are represented as graphs and a product graph of these graphs is calculated. The resulting product graph is then the input to our algorithm. A maximum clique in the product graph corresponds to the maximum common substructure in the original graphs. Our algorithm, which gives an approximate solution to the maximum clique problem, is compared with exact algorithms commonly used in bioinformatics for protein ...
|
| |
In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (2005), pp. 629-634.
Abstract
Graph clustering (also called graph partitioning) --- clustering the nodes of a graph --- is an important problem in diverse data mining applications. Traditional approaches involve optimization of graph clustering objectives such as normalized cut or ratio association; spectral methods are widely used for these objectives, but they require eigenvector computation which can be slow. Recently, graph clustering with a general cut objective has been shown to be mathematically equivalent to an appropriate weighted kernel k -means objective function. In ...
Note (first note only)
Multilevel graph partitioning algorithm outperforms other partitioning algorithms.
|
| |
In In Proceedings of the ICML (2002), pp. 315-322.
Abstract
The application of kernel-based learning algorithms has, so far, largely been confined to realvalued data and a few special data types, such as strings. In this paper we propose a general method of constructing natural families of kernels over discrete structures, based on the matrix exponentiation idea. In particular, we focus on generating kernels on graphs, for which we propose a special class of exponential kernels, based on the heat equation, called ...
|
| |
In BMEI '08: Proceedings of the 2008 International Conference on BioMedical Engineering and Informatics (2008), pp. 83-86.
Abstract
With the increasing availability of complete genome sequences, there are more excellent opportunity for further research and development of tools for functional studies in proteomics. Phylogenetic profile is a basic non-homology method in functional proteomics. This paper describes the theory of phylogenetic profile, focuses on the improvement of K-mean clustering algorithm in analyzing the protein phylogenetic profile; also compare the method with the traditional K-mean clustering algorithm. The experimental results show that the improved K-mean clustering algorithm is fast and effective; ...
|
| |
Evolutionary Computation, 2007. CEC 2007. IEEE Congress on In Evolutionary Computation, 2007. CEC 2007. IEEE Congress on (2007), pp. 320-325.
Abstract
This paper proposes a novel hybrid GA/SVM method that can predict the interactions between proteins intermediated by the protein-domain relations. Firstly, we represented a protein by the domains contained inside, which can consider the effects of domain duplication. To simulate the combination of different domains, a transformation of the domain composition was taken subsequently. Further, a genetic algorithm was used to seek the optimized transformation, which had been adopted as the input vector of a predictor constructed using support vector machines ...
Note (first note only)
References to techniques for PPI prediction
|
| |
Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on In Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on, Vol. 2 (2006), pp. 1676-1681.
Abstract
The classification of protein sequences into families is an important tool in the annotation of structural and functional properties to newly discovered proteins. We present a classification system using pattern recognition techniques to create a numerical vector representation of a protein sequence and then classify the sequence into a number of given families. We introduce the use of fuzzy ARTMAP classifiers and show that coupled with the genetic algorithm based feature subset selection, the system is able to classify protein sequences ...
Note (first note only)
SCOP/ASTRAL sequence classification
Comparison to other techniques (GLM, RBF, MLP, k-Nearest Neighbor)
References
* Review on sequence alignment techniques
* Review on alignment-free sequence comparison
* discussion on difference between local and global features
|
| |
Neural Networks, IEEE Transactions on In Neural Networks, IEEE Transactions on, Vol. 3, No. 5. (06 August 2002), pp. 698-713.
Abstract
A neural network architecture is introduced for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors, which may represent fuzzy or crisp sets of features. The architecture, called fuzzy ARTMAP, achieves a synthesis of fuzzy logic and adaptive resonance theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, resonance, and learning. Four classes of simulation illustrated fuzzy ARTMAP performance ...
|
| |
Neural Networks, Vol. 4, No. 6. (1991), pp. 759-771.
Abstract
A Fuzzy Adaptive Resonance Theory ( ART ) model capable of rapid stable learning of recognition categories in response to arbitrary sequences of analog or binary input patterns is described. Fuzzy ART incorporates computations from fuzzy set theory into the ART 1 neural network, which learns to categorize only binary input patterns. The generalization to learning both analog and binary input patterns is achieved by replacing appearances of the intersection operator (∩) in ART 1 ...
|
| |
Neural Networks, Vol. 4, No. 5. (1991), pp. 565-588.
Abstract
This article introduces a new neural network architecture, called ARTMAP, that autonomously learns to classify arbitrarily many, arbitrarily ordered vectors into recognition categories based on predictive success. This supervised learning system is built up from a pair of Adaptive Resonance Theory modules (ART a and ART b ) that are capable of self-organizing stable recognition categories in response to arbitrary sequences of input patterns. During training trials, the ART a module receives a stream [ a ( ...
|
| |
Neural Networks, Vol. 18, No. 8. (October 2005), pp. 1093-1110.
Abstract
Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here, we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, ...
|
| |
Int. J. Pattern Recognition Artif. Intell, Vol. 2004 (2004), pp. 299-313.
Abstract
Two distance measures for attributed graphs are presented that are based on the maximal similarity common subgraph of two graphs. They are generalizations of two existing distance measures based on the maximal common subgraph. The new measures are superior to the well-known measures based on elementary edit transformations in that no particular edit operations (together with their costs) need to be defined. Moreover, they can deal not only with structural distortions, but also with perturbations of attributes. It is shown that ...
|