#09 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z



Type: Driver
File Name: denovo_compression_23232.zip
File Size: 928.2 KB
38 (4.39)
Downloads: 28
Supported systems: Windows 2K, Windows XP, Windows Vista, Windows Vista 64 bit, Windows 7, Windows 7 64 bit, Windows 8, Windows 8 64 bit, Windows 10
Price: Free* (*Free Registration Required)

Download Now

Skip to search form Skip to main content.

Data compression facilitates genome assembly

denovo compression JonesWalter L. Efficient de novo assembly of large genomes using compressed data structures. Save to Library.

As a concrete example of space saving, a GB human fasta file weights only denovo compression GB after leon compression! Gold plated terminals to resist corrosion.

For the RNA-seq and metagenomic datasets, the bifurcation and un-anchored reads components represent the major part of the compressed DNA stream sizes. This is due to the heterogeneous sequence abundances in these kinds denovo compression datasets. In such cases, sequencing errors cannot be identified solely based on the kmer abundances and the solidity threshold is less effective in simplifying the graph.


For instance in the case of RNA-seq, highly transcribed genes are likely to generate parts of the de Bruijn Graph with a high denovo compression of branchings, the majority of them corresponding to sequencing errors. Conversely, in the metagenomic dataset, numerous species have a low abundance in the sample and their genome is not represented in the de Bruijn Graphresulting in a high number of un-anchored reads. Among the tested datasets, three correspond to the same target species E. In Fig.

The Ion-Torrent dataset has the lowest compression ratio and this is mainly due to the bifurcation and sequencing errors components. This is explained by the sequencing errors that are mostly insertions and deletions, which are not well handled by the current denovo compression algorithm an insertion or deletion implies the rest of the read will be encoded as denovo compressioncontrary to substitution errors. Consequently, for the same amount of DNA, there are fewer reads and therefore fewer anchors to be encoded.

This explains the great difference in the relative contribution of the anchor address component. Note that denovo compression DNA compression ratio are roughly similar between both protocols, but this is due to a higher number of sequencing errors in denovo compression particular MiSeq dataset. Since the technologies are evolving to produce longer reads with fewer sequencing errors, this suggests that LEON compression ratio will easily fit the technology evolutions.

Lastly, because of the anchor selection procedure, initial read order may theoretically impact compression denovo compression. This can probably be explained by the large amount of reads that could not be mapped to the assembled contigs, either because they were incomplete or too fragmented. However, it looses important read pairing information and thus cannot be directly compared to lossless methods.

Leon: Read compression

Moreover, it only compresses the DNA sequence part and completely discards header and quality scores. It seems than keeping read pairing denovo compression without degrading compression ratio is not a simple task for read-reordering methods. To be on par with LEON lossy quality scores compression, other tools were also run in a lossy compression mode when available see command lines in Additional file 1 : Table ST2.


LEON achieves much higher compression of quality scores than other tools, Additional comparisons on other types of datasets are shown in Fig. National Center for Biotechnology InformationU. Journal List Nucleic Acids Res denovo compression.

asus p7p55d-e lxQuick Overview
paperport onetouch 5300 scannerGUDID 00885556533147

denovo compression Nucleic Acids Res. Published online Aug Daniel C. Ruzzo1, 2, 3 Xinxia Peng4 and Michael G.

Algorithms for Next-Generation Sequencing Data: Techniques, Approaches, and - Google Livros

Katze 4. This chapter focuses on de denovo compression NGS data compression, which remains a very challenging issue. Here, no reference genome is considered.


This chapter will not discuss any more of this type denovo compression compression as it is dedicated to de novo compression, i.e., compression without reference.

Related Posts