Bioinformatics and Genomics Technologies in the Analysis of Human Genome Diversity and Disease
The human genome contains ~30,000 genes that together encode mankind’s ‘blueprint for life'. However, the detailed content of ‘the genome’ differs quite considerably between individuals. Many million single nucleotide polymorphisms (SNPs) distinguish one person’s DNA from that of another, along with thousands of very small to very large regions of structural variability and copy number variation (CNV) that affect genes and their intervening sequences. All this variability contributes to making each person unique, both in their appearance and behaviour as well as in terms of the diseases they suffer and how they reaction to drugs.
Intense and massive projects are now underway to search for significant ‘associations' between naturally occurring DNA variation (SNPs and/or CNV) and a range of different diseases and biomedical traits. These studies are generating valuable datasets which, due to their size and complexity, need to be skilfully managed to ensure they are appropriately and ethically disseminated, properly quality controlled, integrated together, and safely archived.
In our laboratory, both by our own efforts and as part of international consortia, we are;
- innovating and improving methods for the experimental study of DNA variation in relation to common disease, and
- devising next-generation approaches towards web-based data management of complex information emerging from the global study of DNA variation in disease contexts.
Ongoing wet-lab projects include:
• Transferring the Dynamic Allele-Specific Hybridisation (DASH) method onto micro-arrays, as a supremely robust and generic solution for use in DNA Diagnostics. This system tracks DNA melting in real time, thereby detecting all sequence variants in all sequence contexts by means of standardised run conditions
• Devising technologies (MegaPlex-PCR and novel forms of physical capture on beads and arrays) that will enable target genome regions to be enriched many billion to trillion fold, to feed advanced DNA sequencing platforms
• Development of the ‘Datta-Array’ concept which aims to enable 1,000,000 parallel reactions in ~10pL volume, for high-throughput scanning and molecular counting studies
• Using the above technologies, plus others, to decipher a complex region of structural variability in the human genome, which impacts dramatically (~3 fold risk differential) upon rheumatoid arthritis and potentially many other diseases
Ongoing bioinformatics projects include:
• Leading the ‘GEN2PHEN’ project, which is a large-scale International Project that aims to unify human and model organism genetic variation databases towards increasingly holistic views into Genotype-To-Phenotype (G2P) data
[funded by the European Commission within the VII Framework Programme]
• Providing the Human Genome Variation database of G2P data HGVbaseG2P, which aims to provide a centralized compilation of summary level findings from genetic association studies, both large and small
• Creation of ‘TraceSpace’, which is a web-interface that provides graphical display and filtering options enabling researchers to explore deeply extensive catalogues of human genome sequence trace files. This allows structural variation to be deciphered in a powerful new way, with unprecedented sensitivity, without recourse to wet lab experimentation