Cookie Policy

This site uses cookies. When browsing the site, you are consenting its use. Learn more

I understood

De Novo based sequence assembly of next generation sequence data without chimeras: improved annotation, gene expression profiles and haplotype
reconstruction.

The vast quantities of read data generated by current sequencing platforms, hereafter referred to by the coined phrase "next generation sequencers" (NGS), have led to previously unattainable insight into biology. Generally, prior to usage, reads must be corrected for error and then either mapped to a reference, or assembled into contig sequences representing transcripts or chromosomes. Although many early difficulties have been overcome, accurately reconstructing diversity within complex datasets, such as those from transcriptomes harbouring large amounts of isoform variation or from rapidly evolving viral populations, has remained elusive. Reference based approaches are limited to where a reference exists. De novo based ones can lead to chimeric contigs that, despite having sequence similarity to transcripts, do not maintain relationships between co-evolving sites, recombination breakpoints or gene expression profiles. Chimeras reduce the power of NGS to dissect evolutionary dynamics. They are also in danger of misleading future studies as, being routinely placed into public databases, they reduce the quality of future annotations and reference datasets. A solution to the problem of chimeric sequence assembly and analysis will be developed within this proposal. Previously, members of this team developed algorithms for assembling non-chimeric contigs from small datasets containing long reads as well as for reliably annotating such contigs. Initially we will adapt our assembly algorithm to accommodate large datasets and a wide range of read lengths. It will then be used to explore and develop feature rich tools, within an integrated framework, for the assembly, annotation and analysis of data derived from sources harbouring complex variation. The framework primarily to a number of key areas of research including (i) Economically important fish populations such as Sardine and the farmed fish sepcies Dicentrarchus labrax and Sparus aurata (ii) Soil metagenomic studies with a focus on radioactive contamination of soil and water within deactivated Uranium mines in Portugal (iii) other environmental studies including the role of transcriptomics in localized environmental adaption and (iv) parasitic nematodes, medical venomics and rapidly evolving viruses such as HIV-1. Additionally, our host institute is in the late stages of setting up an NGS genomics and bioinformatics facility with 3.1 million euros from the European Commission Seventh Framework program (grant no. 286431). As such directly available to this project are (i) maintained servers (and other hardware) capable of the computational tasks required, (ii) two Illumina NGS platforms and (iii) a vast increase in bioinformatics expertise. Having the cutting-edge research proposed in this project funded will place our host institute at the forefront of bioinformatics research both within Portugal and internationally.

Team
Principal Investigator
John Archer

John Archer

Position: Principal Researcher
Group:
COMPBIO
View
Researchers
Antonio Muñoz Mérida

Antonio Muñoz Mérida

Position: Principal Researcher
View
Arie van der Meijden

Arie van der Meijden

Position: Post-Doc Researcher
Group:
AP
View
Pedro Tarroso

Pedro Tarroso

Position: Post-Doc Researcher
Groups:
BEPE, BIODESERTS
View
Raquel Xavier

Raquel Xavier

Position: Auxiliary Researcher
Group:
AquaGenPhy
View
Stephen Joseph Sabatino

Stephen Joseph Sabatino

Position: Research Associate
Group:
EVOLGEN
View
Zbyszek Boratynski

Zbyszek Boratynski

Position: Assistant Researcher
Groups:
BIODESERTS, FBIO
View
State
Ongoing
Proponent Institution
CIBIO-InBIO
Funded by
FCT
Dates
2017 (Duration: 3 years)
Reference
PTDC/BIA-EVL/29115/2017
Share this: