2020-10-08Zeitschriftenartikel
A decade of de novo transcriptome assembly: Are we there yet?
Hölzer, Martin
A decade ago, de novo transcriptome assembly evolved as a versatile and power-
ful approach to make evolutionary assumptions, analyse gene expression, and anno-
tate novel transcripts, in particular, for non-model organisms lacking an appropriate
reference genome. Various tools have been developed to generate a transcriptome
assembly, and even more computational methods depend on the results of these
tools for further downstream analyses. In this issue of Molecular Ecology Resources,
Freedman et al. (Mol Ecol Resourc 2020) present a comprehensive analysis of errors
in de novo transcriptome assemblies across public data sets and different assembly
methods. They focus on two implicit assumptions that are often violated: First, the
assembly presents an unbiased view of the transcriptome. Second, the expression
estimates derived from the assembly are reasonable, albeit noisy, approximations of
the relative frequency of expressed transcripts. They show that appropriate filtering
can reduce this bias but can also lead to the loss of a reasonable number of highly
expressed transcripts. Thus, to partly alleviate the noise in expression estimates, they
propose a new normalization method called length-rescaled CPM. Remarkably, the
authors found considerable distortions at the nucleotide level, which leads to an un-
derestimation of diversity in transcriptome assemblies. The study by Freedman et al.
(Mol Ecol Resourc 2020) clearly shows that we have not yet reached “high-quality” in
the field of transcriptome assembly. Above all, it helps researchers be aware of these
problems and filter and interpret their transcriptome assembly data appropriately
and with caution.
Files in this item