| |
posted to cdk--usespackage--fingerprint
by egonw
on 2013-01-03 13:47:44
Abstract
Relating chemical features to bioactivities is critical in molecular design and is used extensively in the lead discovery and optimization process. A variety of techniques from statistics, data mining and machine learning have been applied to this process. In this study, we utilize a collection of methods, called associative classification mining (ACM), which are popular in the data mining community, but so far have not been applied widely in cheminformatics. More specifically, classification based on predictive association rules (CPAR), classification based ...
|
| |
posted to cdk--usespackage--fingerprint
by egonw
on 2013-01-03 12:09:12
Abstract
HERG potassium channels have a critical role in the normal electrical activity of the heart. The blockade of hERG channels in heart cells can result in a potentially fatal disorder called long QT syndrome. HERG channels can be blocked by compounds with diverse structures belonging to several drug classes. Presented herein are generative (Generative Topographic Maps) and discriminative (Support Vector Machines) classification models to categorize the compounds in silico into active and inactive classes by using different types of descriptors. The ...
|
| |
Ind. Eng. Chem. Res. In Industrial & Engineering Chemistry Research, Vol. 51, No. 44. (9 October 2012), pp. 14337-14343, doi:10.1021/ie3021895
Abstract
In this work, we apply generative topographic maps as a universal approach for data visualization and structure?property modeling of melting points (mp), which is one of the most important physical properties for the design and application of ionic liquids (ILs) as green solvents. Data visualization is part of a more general concept of chemography, which is a relatively new field dealing with visualization of chemical data, representation of chemical space, and navigation in this space. This field has received much attention ...
|
| |
Abstract
Chagas disease chemotherapy, currently based on only two drugs, nifurtimox and benznidazole, is far from satisfactory and therefore the development of new antichagasic compounds remains an important goal. On the basis of antichagasic properties previously described for some 1,2-disubstituted 5-nitroindazolin-3-ones (21, 33) and in order to initiate the optimization of activity of this kind of compounds, we have prepared a series of related analogs (22-32, 34-38, 58 and 59) and tested in vitro these products against epimastigote forms of Trypanosoma cruzi. ...
|
| |
posted to cdk--usespackage--fingerprint
by egonw
on 2012-11-03 12:00:45
Abstract
Versatile event-based approaches for the definition of novel information theory-based indices (IFIs) are presented. An event in this context is the criterion followed in the ?discovery? of molecular substructures, which in turn serve as basis for the construction of the generalized incidence and relations frequency matrices, Q and F, respectively. From the resultant F, Shannon's, mutual, conditional and joint entropy-based IFIs are computed. In previous reports, an event named connected subgraphs was presented. The present study is an extension of this ...
|
| |
Abstract
Predicting blood-brain barrier (BBB) permeability is essential to drug development, as a molecule cannot exhibit pharmacological activity within the brain parenchyma without first transiting this barrier. Understanding the process of permeation, however, is complicated by a combination of both limited passive diffusion and active transport. Our aim here was to establish predictive models for BBB drug permeation that include both active and passive transport. A database of 153 compounds was compiled using in vivo surface permeability product (logPS) values in rats ...
|
| |
Rapid Communications in Mass Spectrometry, Vol. 26, No. 19. (15 October 2012), pp. 2275-2286
posted to cdk--usespackage--fingerprint
by egonw
on 2012-09-04 06:54:32
Abstract
Metabolite identification plays a crucial role in the interpretation of metabolomics research results. Due to its sensitivity and widespread implementation, a favourite analytical method used in metabolomics is electrospray mass spectrometry. In this paper, we demonstrate our results in attempting to incorporate the potentials of multistage mass spectrometry into the metabolite identification routine. New software tools were developed and implemented which facilitate the analysis of multistage mass spectra and allow for efficient removal of spectral artefacts. The pre-processed fragmentation patterns are ...
|
| |
Anal. Chem. In Analytical Chemistry, Vol. 84, No. 7. (5 March 2012), pp. 3417-3426, doi:10.1021/ac300304u
posted to cdk--usespackage--fingerprint
by egonw
on 2012-08-15 07:52:40
Abstract
Mass spectrometry allows sensitive, automated, and high-throughput analysis of small molecules. In principle, tandem mass spectrometry allows us to identify ?unknown? small molecules not in any database, but the automated interpretation of such data is in its infancy. Fragmentation trees have recently been introduced for the automated analysis of the fragmentation patterns of small molecules. We present a method for the automated comparison of such fragmentation patterns, based on aligning the compounds? fragmentation trees. We cluster compounds based solely on their ...
|
| |
J. Chem. Inf. Model. In Journal of Chemical Information and Modeling, Vol. 52, No. 8. (8 August 2012), pp. 2181-2191, doi:10.1021/ci300047k
Abstract
The notion of activity cliffs is an intuitive approach to characterizing structural features that play a key role in modulating biological activity of a molecule. A variety of methods have been described to quantitatively characterize activity cliffs, such as SALI and SARI. However, these methods are primarily retrospective in nature; highlighting cliffs that are already present in the data set. The current study focuses on employing a pairwise characterization of a data set to train a model to predict whether a ...
|
| |
Abstract
Inferring drug–drug interactions (DDIs) is an essential step in drug development and drug administration. Most computational inference methods focus on modeling drug pharmacokinetics, aiming at interactions that result from a common metabolizing enzyme (CYP). Here, we introduce a novel prediction method, INDI (INferring Drug Interactions), allowing the inference of both pharmacokinetic, CYP-related DDIs (along with their associated CYPs) and pharmacodynamic, non-CYP associated ones. On cross validation, it obtains high specificity and sensitivity levels (AUC (area under the receiver-operating characteristic curve)0.93). In ...
|
| |
Journal of Computer-Aided Molecular Design In Journal of Computer-Aided Molecular Design, Vol. 26, No. 9. (14 July 2012), pp. 995-1003, doi:10.1007/s10822-012-9587-5
Abstract
New approaches are needed that can help decrease the unsustainable failure in small-molecule drug discovery. Ligand Efficiency Indices (LEI) are making a great impact on early-stage compound selection and prioritization. Given a target-ligand database with chemical structures and associated biological affinities/activities for a target, the AtlasCBS server generates two-dimensional, dynamical representations of its contents in terms of LEI. These variables allow an effective decoupling of the chemical (angular) and biological (radial) components. BindingDB, PDBBind and ChEMBL databases are currently implemented. Proprietary ...
|
| |
posted to cdk--usespackage--fingerprint
by egonw
on 2012-04-29 07:25:35
Abstract
The identification of interactions between drugs and proteins plays key roles in understanding mechanisms underlying drug actions and can lead to new drug design strategies. Here, we present a novel statistical approach, namely PDTD (Predicting Drug Targets with Domains), to predict potential target proteins of new drugs based on derived interactions between drugs and protein domains. The known target proteins of those drugs that have similar therapeutic effects allow us to infer interactions between drugs and protein domains which in turn ...
|
| |
Wiley Interdisciplinary Reviews: Computational Molecular Science (2012), pp. n/a-n/a, doi:10.1002/wcms.1087
posted to cdk--usespackage--fingerprint
by egonw
on 2012-04-28 10:41:33
|
| |
Abstract
BACKGROUND:Model-based virtual screening plays an important role in the early drug discovery stage. The outcomes of high-throughput screenings are a valuable source for machine learning algorithms to infer such models. Besides a strong performance, the interpretability of a machine learning model is a desired property to guide the optimization of a compound in later drug discovery stages. Linear support vector machines showed to have a convincing performance on large-scale data sets. The goal of this study is to present a heat ...
|
| |
Abstract
Combinatorial therapy is a promising strategy for combating complex disorders due to improved efficacy and reduced side effects. However, screening new drug combinations exhaustively is impractical considering all possible combinations between drugs. Here, we present a novel computational approach to predict drug combinations by integrating molecular and pharmacological data. Specifically, drugs are represented by a set of their properties, such as their targets or indications. By integrating several of these features, we show that feature patterns enriched in approved drug combinations ...
|
| |
Abstract
Representations of chemical datasets in spreadsheet format are important for ready data assimilation and manipulation. In addition to the normal spreadsheet facilities, chemical spreadsheets need to have visualisable chemical structures and data searchable by chemical as well as textual queries. Many such chemical spreadsheet tools are available, some operating in the familiar Microsoft Excel environment. However, within this group, the performance of Excel is often ...
|
| |
Journal of computational biology : a journal of computational molecular cell biology, Vol. 19, No. 2. (February 2012), pp. 163-174, doi:10.1089/cmb.2011.0264
Abstract
Elucidating signaling pathways is a fundamental step in understanding cellular processes and developing new therapeutic strategies. Here we introduce a method for the large-scale elucidation of signaling pathways involved in cellular response to drugs. Combining drug targets, drug response expression profiles, and the human physical interaction network, we infer 99 human drug response pathways and study their properties. Based on the newly inferred pathways, we ...
|
| |
|
| |
Abstract
In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure–activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods ...
|
| |
Abstract
Cytochrome P450 inhibitory promiscuity of a drug has potential effects on the occurrence of clinical drug–drug interactions. Understanding how a molecular property is related to the P450 inhibitory promiscuity could help to avoid such adverse effects. In this study, an entropy-based index was defined to quantify the P450 inhibitory promiscuity of a compound based on a comprehensive data set, containing more than 11,500 drug-like compounds with inhibition against five major P450 isoforms, 1A2, 2C9, 2C19, 2D6, and 3A4. The results indicated ...
|
| |
Abstract
Inferring potential drug indications, for either novel or approved drugs, is a key step in drug development. Previous computational methods in this domain have focused on either drug repositioning or matching drug and disease gene expression profiles. Here, we present a novel method for the large-scale prediction of drug indications (PREDICT) that can handle both approved drugs and novel molecules. Our method is based on the observation that similar drugs are indicated for similar diseases, and utilizes multiple drug–drug and disease–disease ...
|
| |
J. Chem. Inf. Model. In Journal of Chemical Information and Modeling, Vol. 51, No. 8. (20 July 2011), pp. 1840-1847, doi:10.1021/ci200242c
Abstract
Chemical liabilities, such as adverse effects and toxicity, have a major impact on today?s drug discovery process. In silico prediction of chemical liabilities is an important approach which can reduce costs and animal testing by complementing or replacing in vitro and in vivo liability models. There is a lack of integrated, extensible decision support systems for chemical liability assessment which run quickly and have easily interpretable results. Here we present a method which integrates similarity searches, structural alerts, and QSAR models ...
|
| |
In Proceeding of the 11th annual international ACM/IEEE joint conference on Digital libraries (2011), pp. 325-334, doi:10.1145/1998076.1998137
Abstract
Nowadays, the information access is conducted almost exclusively using the Web. Simple keyword based Web search engines, e.g. Google or Yahoo!, offer suitable retrieval and ranking features. In contrast, for highly specialized domains, represented by digital libraries, these features are insufficient. Considering the domain of chemistry, where searching for relevant literature is essentially centered on chemical entities. Beside commercial information providers such as Chemical Abstract Service (CAS) numerous groups are working on building free chemical search engines to overcome the expensive ...
|
| |
Abstract
The AMBIT web services package is one of the several existing independent implementations of the OpenTox Application Programming Interface and is built according to the principles of the Representational State Transfer (REST) architecture. The Open Source Predictive Toxicology Framework, developed by the partners in the EC FP7 OpenTox project, aims at providing a unified access to toxicity data and predictive models, as well as validation procedures. This is achieved by i) an information model, based on a common OWL-DL ontology ii) ...
|
| |
Abstract
The synthetic feasibility of any compound library used for virtual screening is critical to the drug discovery process. TIN, a recursive acronym for ‘TIN Is Not commercial’, is a virtual combinatorial database enumeration of diversity-orientated multicomponent syntheses (MCR). Using a ‘one-pot’ synthetic technique, 12 unique small molecule scaffolds were developed, predominantly styrylisoxazoles and bis-acetylenic ketones, with extensive derivatization potential. Importantly, the scaffolds were accessible in a single operation from commercially available sources containing R-groups which were then linked combinatorially. This resulted ...
|
| |
J. Chem. Inf. Model. In Journal of Chemical Information and Modeling, Vol. 51, No. 3. (7 March 2011), pp. 521-531, doi:10.1021/ci100399j
Abstract
Advanced high-throughput screening (HTS) technologies generate great amounts of bioactivity data, and this data needs to be analyzed and interpreted with attention to understand how these small molecules affect biological systems. As such, there is an increasing demand to develop and adapt cheminformatics algorithms and tools in order to predict molecular and pharmacological properties on the basis of these large data sets. In this manuscript, we report a novel machine-learning-based ligand classification algorithm, named Ligand Classifier of Adaptively Boosting Ensemble Decision ...
|
| |
Journal of Chemical Information and Modeling, Vol. 51, No. 3. (28 March 2011), pp. 670-679, doi:10.1021/ci100410h
posted to cdk--usespackage--fingerprint
by egonw
on 2011-02-01 07:20:09
Abstract
The goal of this paper is to present and describe a novel 2D- and 3D-QSAR (quantitative structure−activity relationship) binary classification data set for the inhibition of c-Jun N-terminal kinase-3 with previously unpublished activities for a diverse set of compounds. JNK3 is an important pharmaceutical target because it is involved in many neurological disorders. Accordingly, the development of JNK3 inhibitors has gained increasing interest. 2D and 3D versions of the data set were used, consisting of 313 (70 actives) and 249 (60 ...
|
| |
Abstract
Abstract Introduction PaDEL-Descriptor is a software for calculating molecular descriptors and fingerprints. The software currently calculates 797 descriptors (663 1D, 2D descriptors, and 134 3D descriptors) and 10 types of fingerprints. These descriptors and fingerprints are calculated mainly using The Chemistry Development Kit. Some additional descriptors and fingerprints were added, which include atom type electrotopological state descriptors, McGowan volume, molecular linear free energy relation descriptors, ring counts, count of chemical substructures identified by Laggner, and binary fingerprints and count of chemical ...
|
| |
Abstract
Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source molecular descriptors [e.g., chemistry development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary commercial software. We initially evaluated open source descriptors and model building algorithms using a training set of approximately 50,000 molecules and a test set of approximately 25,000 molecules with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors ...
|