Genome and Metagenome Data Analysis for Education

  • Сергей Владимирович Казаков ITMO, St. Petersburg, Russia
  • Анатолий Абрамович Шалыто ITMO, St. Petersburg, Russia
Keywords: bioinformatics, DNA, genome, metagenome, DNA sequencing, de novo genome assembly, comparative metagenome analysis, personal computer

Abstract

In this paper we address two problems of analyzing genome and metagenome sequencing data — de novo genome assembly problem (assembly of an unknown genome) and problem of comparative metagenome analysis which arises in the analysis of microorganisms in soil, sea, human gut, etc. Despite these problems are of interest to scientists working in the biology area, using them for education is essential for teaching medical students, biologists, bioinformaticians and also in the process of further training of specialists in this areas. In this paper we present a survey of methods for de novo genome assembly and comparative metagenome analysis, examine the possibility of using such approaches in educational processes and propose novel approaches for solving these problems. Proposed solutions have already been used for educating students in the Peter the Great St.Petersburg Polytechnic University. In this paper we also present the results of experiments of comparing proposed methods against known ones.

Author Biographies

Сергей Владимирович Казаков, ITMO, St. Petersburg, Russia

Sergey V. Kazakov: Postgraduate student, Computer Technologies Department, ITMO University.

Анатолий Абрамович Шалыто, ITMO, St. Petersburg, Russia

Anatoly A. Shalyto: Doctor of Science, Professor, Head of Programming Technologies Department, ITMO University.

References

[1] S. C. Schuster, “Next-generation sequencing transforms today’ s biology,” Nat. Methods, vol. 5, no. 1, pp. 16–18, 2008; doi: 10.1038/NMETH1156
[2] J. R. Miller, S. Koren, and G. Sutton, “Assembly algorithms for next-generation sequencing data,” Genomics, vol. 95, no. 6, pp. 315–327, 2010; doi:10.1016/j.ygeno.2010.03.001
[3] A. A. Sergushichev, A. V. Alexandrov, S. V. Kazakov, F. N. Tsarev, and A. A. Shalyto “Sovmestnoe primenenie grafa de Breina, grafa perekrytii i mikrosborki dlya de novo sborki genoma” [Combining De Bruijn Graphs, Overlap Graphs and Microassembly for De Novo Genome Assembly], Izv. Saratov Univ. (N.S.), Ser. Math. Mech. Inform., vol. 13, no. 2-2, pp. 51–57, 2013 (in Russian).
[4] A. V. Alexandrov, S. V. Kazakov, S. V. Melnikov, A. A. Sergushichev, and F. N. Tsarev, “Metod sborki kontigov genomnykh posledovatel'nostei na osnove sovmestnogo primeneniya grafov de Bryuina i grafov perekrytii” [Genome contigs assembly method based on the Brujin graphs and overlap graphs], Nauchno-tekhnicheskii vestnik informatsionnykh tekhnologii, mekhaniki i optiki, no. 6(82), pp. 93–98, 2012 (in Russian).
[5] A. Alexandrov, S. Kazakov, S. Melnikov, A. Sergushichev, A. Shalyto, and F. Tsarev, “Combining de Bruijn graph, overlaps graph and microassembly for de novo genome assembly,” in Proc. of “Bioinformatics 2012”, Stockholm, Sweden, pp. 72.
[6] A. V. Zimin, G. Marçais, D. Puiu, M. Roberts, S. L. Salzberg, and J. A. Yorke, “The MaSuRCA genome assembler,” Bioinformatics, vol. 29, no. 21, pp. 2669–2677, Nov. 2013; doi: 10.1093/bioinformatics/btt476
[7] R. Chikhi and G. Rizk, “Space-efficient and exact de Bruijn graph representation based on a Bloom filter,” Algorithms for Molecular Biology, vol. 8, no. 22, 2013; doi:10.1186/1748-7188-8-22
[8] A. Bankevich et al., “SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing,” Journal of Computational Biology, vol. 19, no. 5, pp. 455–477, 2012; doi: 10.1089/cmb.2012.0021
[9] D. R. Zerbino and E. Birney, “Velvet: algorithms for de novo short read assembly using de Bruijn graphs,” Genome research, vol. 18, no. 5, pp. 821–829, May 2008; doi: 10.1101/gr.074492.107
[10] D. Kleftogiannis, P. Kalnis, and V. B. Bajic, “Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures,” PloS one, no. 8(9) e75505, 2013; doi:10.1371/journal.pone.0075505
[11] CLC Genomics Workbench‒QIAGEN Bioinformatics. [Online]. Available: https://www.qiagenbioinformatics.com/products/clc-genomics-workbench
[12] J. Handelsman, M. R. Rondon, S. F. Brady, J. Clardy, and R. M. Goodman “Molecular biological access to the chemistry of unknown soilmicrobes: a new frontier for natural products,” Chemistry&biology, vol. 5, no. 10, pp. R245–R249, 1998; doi:10.1016/S1074-5521(98)90108-9
[13] D. E. Wood and S. L. Salzberg, “Kraken: ultrafast metagenomic sequence classification using exact alignments,” Genome biology, vol. 15, no. 3, R46, 2014; doi:10.1186/gb-2014-15-3-r46
[14] D. T. Truong et al., “MetaPhlAn2 for enhanced metagenomic taxonomic profiling,” Nature methods, vol. 12, no. 10, pp. 902–903, 2015; doi:10.1038/nmeth.3589
[15] B. E. Dutilh et al., “Reference-independent comparative metagenomics using cross-assembly: crass,” Bioinformatics, vol. 28, no. 24, pp. 3225–3231, 2012; doi:10.1093/bioinformatics/bts613
[16] Y.-W. Wu and Y. Ye, “A novel abundance-based algorithm for binning metagenomic sequences usingl-tuples,” Journal of Computational Biology, vol. 18, no. 3, pp. 523–534, 2011; doi:10.1089/cmb.2010.0245.
[17] V. I. Ulyantsev, S. V. Kazakov, V. B. Dubinkina, A. V. Tyakht, and D. G. Alexeev, “MetaFast: fast reference-free graph-based comparison of shotgun metagenomic data,” Bioinformatics, vol. 32, no. 18, pp. 2760-2767, 2016; doi:10.1093/bioinformatics/btw312
Published
2016-06-30
How to Cite
Казаков, С. В., & Шалыто, А. А. (2016). Genome and Metagenome Data Analysis for Education. Computer Tools in Education, (3), 5-15. Retrieved from http://cte.eltech.ru/ojs/index.php/kio/article/view/1397
Section
Computer science