Now showing 1 - 6 of 6
  • 2005Journal Article
    [["dc.bibliographiccitation.firstpage","3568"],["dc.bibliographiccitation.issue","17"],["dc.bibliographiccitation.journal","Bioinformatics"],["dc.bibliographiccitation.lastpage","3569"],["dc.bibliographiccitation.volume","21"],["dc.contributor.author","Tech, Maike"],["dc.contributor.author","Pfeifer, N."],["dc.contributor.author","Morgenstern, Burkhard"],["dc.contributor.author","Meinicke, Peter"],["dc.date.accessioned","2018-11-07T10:55:56Z"],["dc.date.available","2018-11-07T10:55:56Z"],["dc.date.issued","2005"],["dc.description.abstract","We provide the tool 'TICO' (Translation Initiation site COrrection) for improving the results of conventional gene finders for prokaryotic genomes with regard to exact localization of the translation initiation site (TIS). At the current state TICO provides an interface for direct post processing of the predictions obtained from the widely used program GLIMMER. Our program is based on a clustering algorithm for completely unsupervised scoring of potential TIS locations."],["dc.identifier.doi","10.1093/bioinformatics/bti563"],["dc.identifier.isi","000231472500017"],["dc.identifier.pmid","15994191"],["dc.identifier.uri","https://resolver.sub.uni-goettingen.de/purl?gro-2/49897"],["dc.notes.status","zu prüfen"],["dc.notes.submitter","Najko"],["dc.publisher","Oxford Univ Press"],["dc.relation.issn","1367-4803"],["dc.title","TICO: a tool for improving predictions of prokaryotic translation initiation sites"],["dc.type","journal_article"],["dc.type.internalPublication","yes"],["dc.type.peerReviewed","yes"],["dc.type.status","published"],["dspace.entity.type","Publication"]]
    Details DOI PMID PMC WOS
  • 2006Journal Article
    [["dc.bibliographiccitation.artnumber","121"],["dc.bibliographiccitation.journal","BMC Bioinformatics"],["dc.bibliographiccitation.volume","7"],["dc.contributor.author","Tech, Maike"],["dc.contributor.author","Meinicke, Peter"],["dc.date.accessioned","2018-11-07T10:06:29Z"],["dc.date.available","2018-11-07T10:06:29Z"],["dc.date.issued","2006"],["dc.description.abstract","Background: Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation of prokaryotic TIS. However, inherent difficulties of these approaches arise from the considerable variation of TIS characteristics across different species. Therefore prior assumptions about the properties of prokaryotic gene starts may cause suboptimal predictions for newly sequenced genomes with TIS signals differing from those of well-investigated genomes. Results: We introduce a clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation. As compared with other methods for improving predictions of gene starts in bacterial genomes, our approach is not based on any specific assumptions about prokaryotic TIS. Despite the generality of the underlying algorithm, the prediction rate of our method is competitive on experimentally verified test data from E. coli and B. subtilis. Regarding genomes with high G+C content, in contrast to some previously proposed methods, our algorithm also provides good performance on P. aeruginosa, B. pseudomallei and R. solanacearum. Conclusion: On reliable test data we showed that our method provides good results in post-processing the predictions of the widely-used program GLIMMER. The underlying clustering algorithm is robust with respect to variations in the initial TIS annotation and does not require specific assumptions about prokaryotic gene starts. These features are particularly useful on genomes with high G+C content. The algorithm has been implemented in the tool >> TICO << (TIs COrrector) which is publicly available from our web site."],["dc.identifier.doi","10.1186/1471-2105-7-121"],["dc.identifier.isi","000236674600001"],["dc.identifier.pmid","16526950"],["dc.identifier.purl","https://resolver.sub.uni-goettingen.de/purl?gs-1/4403"],["dc.identifier.uri","https://resolver.sub.uni-goettingen.de/purl?gro-2/39102"],["dc.notes.intern","Merged from goescholar"],["dc.notes.status","zu prüfen"],["dc.notes.submitter","Najko"],["dc.publisher","Biomed Central Ltd"],["dc.relation.issn","1471-2105"],["dc.rights","Goescholar"],["dc.rights.uri","https://goescholar.uni-goettingen.de/licenses"],["dc.title","An unsupervised classification scheme for improving predictions of prokaryotic TIS"],["dc.type","journal_article"],["dc.type.internalPublication","yes"],["dc.type.peerReviewed","yes"],["dc.type.status","published"],["dc.type.version","published_version"],["dspace.entity.type","Publication"]]
    Details DOI PMID PMC WOS
  • 2009Journal Article
    [["dc.bibliographiccitation.firstpage","W101"],["dc.bibliographiccitation.journal","Nucleic Acids Research"],["dc.bibliographiccitation.lastpage","W105"],["dc.bibliographiccitation.volume","37"],["dc.contributor.author","Hoff, Katharina J."],["dc.contributor.author","Lingner, Thomas"],["dc.contributor.author","Meinicke, Peter"],["dc.contributor.author","Tech, Maike"],["dc.date.accessioned","2018-11-07T08:28:27Z"],["dc.date.available","2018-11-07T08:28:27Z"],["dc.date.issued","2009"],["dc.description.abstract","Metagenomic sequencing projects yield numerous sequencing reads of a diverse range of uncultivated and mostly yet unknown microorganisms. In many cases, these sequencing reads cannot be assembled into longer contigs. Thus, gene prediction tools that were originally developed for whole-genome analysis are not suitable for processing metagenomes. Orphelia is a program for predicting genes in short DNA sequences that is available through a web server application (http://orphelia.gobics.de). Orphelia utilizes prediction models that were created with machine learning techniques on the basis of a wide range of annotated genomes. In contrast to other methods for metagenomic gene prediction, Orphelia has fragment length-specific prediction models for the two most popular sequencing techniques in metagenomics, chain termination sequencing and pyrosequencing. These models ensure highly specific gene predictions."],["dc.identifier.doi","10.1093/nar/gkp327"],["dc.identifier.isi","000267889100019"],["dc.identifier.pmid","19429689"],["dc.identifier.purl","https://resolver.sub.uni-goettingen.de/purl?gs-1/5949"],["dc.identifier.uri","https://resolver.sub.uni-goettingen.de/purl?gro-2/16421"],["dc.notes.intern","Merged from goescholar"],["dc.notes.status","zu prüfen"],["dc.notes.submitter","Najko"],["dc.publisher","Oxford Univ Press"],["dc.relation.issn","0305-1048"],["dc.rights","Goescholar"],["dc.rights.uri","https://goescholar.uni-goettingen.de/licenses"],["dc.title","Orphelia: predicting genes in metagenomic sequencing reads"],["dc.type","journal_article"],["dc.type.internalPublication","yes"],["dc.type.peerReviewed","yes"],["dc.type.status","published"],["dc.type.version","published_version"],["dspace.entity.type","Publication"]]
    Details DOI PMID PMC WOS
  • 2004Journal Article
    [["dc.bibliographiccitation.artnumber","169"],["dc.bibliographiccitation.journal","BMC Bioinformatics"],["dc.bibliographiccitation.volume","5"],["dc.contributor.author","Meinicke, Peter"],["dc.contributor.author","Tech, Maike"],["dc.contributor.author","Morgenstern, Burkhard"],["dc.contributor.author","Merkl, R."],["dc.date.accessioned","2018-11-07T10:44:36Z"],["dc.date.available","2018-11-07T10:44:36Z"],["dc.date.issued","2004"],["dc.description.abstract","Background: Kernel-based learning algorithms are among the most advanced machine learning methods and have been successfully applied to a variety of sequence classification tasks within the field of bioinformatics. Conventional kernels utilized so far do not provide an easy interpretation of the learnt representations in terms of positional and compositional variability of the underlying biological signals. Results: We propose a kernel-based approach to datamining on biological sequences. With our method it is possible to model and analyze positional variability of oligomers of any length in a natural way. On one hand this is achieved by mapping the sequences to an intuitive but high-dimensional feature space, well-suited for interpretation of the learnt models. On the other hand, by means of the kernel trick we can provide a general learning algorithm for that high-dimensional representation because all required statistics can be computed without performing an explicit feature space mapping of the sequences. By introducing a kernel parameter that controls the degree of position-dependency, our feature space representation can be tailored to the characteristics of the biological problem at hand. A regularized learning scheme enables application even to biological problems for which only small sets of example sequences are available. Our approach includes a visualization method for transparent representation of characteristic sequence features. Thereby importance of features can be measured in terms of discriminative strength with respect to classification of the underlying sequences. To demonstrate and validate our concept on a biochemically well-defined case, we analyze E. coli translation initiation sites in order to show that we can find biologically relevant signals. For that case, our results clearly show that the Shine-Dalgarno sequence is the most important signal upstream a start codon. The variability in position and composition we found for that signal is in accordance with previous biological knowledge. We also find evidence for signals downstream of the start codon, previously introduced as transcriptional enhancers. These signals are mainly characterized by occurrences of adenine in a region of about 4 nucleotides next to the start codon. Conclusions: We showed that the oligo kernel can provide a valuable tool for the analysis of relevant signals in biological sequences. In the case of translation initiation sites we could clearly deduce the most discriminative motifs and their positional variation from example sequences. Attractive features of our approach are its flexibility with respect to oligomer length and position conservation. By means of these two parameters oligo kernels can easily be adapted to different biological problems."],["dc.identifier.doi","10.1186/1471-2105-5-169"],["dc.identifier.isi","000226616700003"],["dc.identifier.pmid","15511290"],["dc.identifier.purl","https://resolver.sub.uni-goettingen.de/purl?gs-1/4439"],["dc.identifier.uri","https://resolver.sub.uni-goettingen.de/purl?gro-2/47306"],["dc.notes.intern","Merged from goescholar"],["dc.notes.status","zu prüfen"],["dc.notes.submitter","Najko"],["dc.publisher","Biomed Central Ltd"],["dc.relation.issn","1471-2105"],["dc.rights","Goescholar"],["dc.rights.uri","https://goescholar.uni-goettingen.de/licenses"],["dc.title","Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites"],["dc.type","journal_article"],["dc.type.internalPublication","yes"],["dc.type.peerReviewed","yes"],["dc.type.status","published"],["dc.type.version","published_version"],["dspace.entity.type","Publication"]]
    Details DOI PMID PMC WOS
  • 2008Journal Article
    [["dc.bibliographiccitation.artnumber","217"],["dc.bibliographiccitation.journal","BMC Bioinformatics"],["dc.bibliographiccitation.volume","9"],["dc.contributor.author","Hoff, Katharina J."],["dc.contributor.author","Tech, Maike"],["dc.contributor.author","Lingner, Thomas"],["dc.contributor.author","Daniel, Rolf"],["dc.contributor.author","Morgenstern, Burkhard"],["dc.contributor.author","Meinicke, Peter"],["dc.date.accessioned","2018-11-07T11:15:57Z"],["dc.date.available","2018-11-07T11:15:57Z"],["dc.date.issued","2008"],["dc.description.abstract","Background: Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results: We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion: Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA fragments. In particular, the combination of linear discriminants and neural networks is promising and should be considered for integration into metagenomic analysis pipelines. The data sets can be downloaded from the URL provided ( see Availability and requirements section)."],["dc.identifier.doi","10.1186/1471-2105-9-217"],["dc.identifier.isi","000256421900002"],["dc.identifier.pmid","18442389"],["dc.identifier.purl","https://resolver.sub.uni-goettingen.de/purl?gs-1/8429"],["dc.identifier.uri","https://resolver.sub.uni-goettingen.de/purl?gro-2/54482"],["dc.notes.intern","Merged from goescholar"],["dc.notes.status","zu prüfen"],["dc.notes.submitter","Najko"],["dc.publisher","Biomed Central Ltd"],["dc.relation.issn","1471-2105"],["dc.rights","Goescholar"],["dc.rights.uri","https://goescholar.uni-goettingen.de/licenses"],["dc.title","Gene prediction in metagenomic fragments: A large scale machine learning approach"],["dc.type","journal_article"],["dc.type.internalPublication","yes"],["dc.type.peerReviewed","yes"],["dc.type.status","published"],["dc.type.version","published_version"],["dspace.entity.type","Publication"]]
    Details DOI PMID PMC WOS
  • 2006Journal Article
    [["dc.bibliographiccitation.firstpage","W588"],["dc.bibliographiccitation.journal","Nucleic Acids Research"],["dc.bibliographiccitation.lastpage","W590"],["dc.bibliographiccitation.volume","34"],["dc.contributor.author","Tech, Maike"],["dc.contributor.author","Morgenstern, Burkhard"],["dc.contributor.author","Meinicke, Peter"],["dc.date.accessioned","2018-11-07T09:39:01Z"],["dc.date.available","2018-11-07T09:39:01Z"],["dc.date.issued","2006"],["dc.description.abstract","Exact localization of the translation initiation sites (TIS) in prokaryotic genomes is difficult to achieve using conventional gene finders. We recently introduced the program TICO for postprocessing TIS predictions based on a completely unsupervised learning algorithm. The program can be utilized through our web interface at http://tico.gobics.de/ and it is also freely available as a commandline version for Linux and Windows. The latest version of our program provides a tool for visualization of the resulting TIS model. Although the underlying method is not based on any specific assumptions about characteristic sequence features of prokaryotic TIS the prediction rates of our tool are competitive on experimentally verified test data."],["dc.identifier.doi","10.1093/nar/gkl313"],["dc.identifier.isi","000245650200117"],["dc.identifier.pmid","16845076"],["dc.identifier.purl","https://resolver.sub.uni-goettingen.de/purl?goescholar/4132"],["dc.identifier.uri","https://resolver.sub.uni-goettingen.de/purl?gro-2/33191"],["dc.notes.intern","Merged from goescholar"],["dc.notes.status","zu prüfen"],["dc.notes.submitter","Najko"],["dc.publisher","Oxford Univ Press"],["dc.relation.issn","0305-1048"],["dc.rights","Goescholar"],["dc.rights.uri","https://goescholar.uni-goettingen.de/licenses"],["dc.title","TICO: a tool for postprocessing the predictions of prokaryotic translation initiation sites"],["dc.type","journal_article"],["dc.type.internalPublication","yes"],["dc.type.peerReviewed","yes"],["dc.type.status","published"],["dc.type.version","published_version"],["dspace.entity.type","Publication"]]
    Details DOI PMID PMC WOS