The Cancer Genomics Linkage Application will enable the integration and re-use of the cancer genomics data available from public repositories such as the International Cancer Genome Consortium (ICGC). This will be accomplished through the capability being developed by the “Early Activity” of the Genomics Virtual Laboratory (GVL-EA). It will enable researchers, such as Professors Andrew Biankin, John Mattick (Garvan Institute for Medical Research) or Sean Grimmond (Queensland Centre for Medical Genomics), to access genomic datasets of international importance and to integrate them with their own clinical and genomic datasets in order to explore, discover and validate key genomic abnormality that cause cancer. The product will further provide the mechanism for such researchers to publish and to make available their analysis for re-use by the community.
The product aims to provide the ability for biologists and clinicians to easily integrate their own research data with datasets from multiple data sources. The Integration of the datasets into a common location and enabling access and mining using best practice workflow tools will enable the Australian cancer researchers to accelerate their discovery processes and to be internationally competitive. Although this project will have a particular focus on pancreatic cancer research as carried out by the Australian Pancreatic Cancer Genome Initiative (APGI), the application can also support the wider cancer research community.
Download the application from here.
Our target users for this project are researchers of the Australian genomics community, in particular biologists and clinicians of the Australian Pancreatic Cancer Genome Initiative.
The challenges faced by these researchers are:
- The manual process required to integrate their research data with other datasets
- No availability of standardised analytical processes
- The delay in transitioning from analysis to publication ready result
This project leverages the environment set up provided by the GVL-EA. The GVL-EA provides biologists and clinicians a ready to use analysis platform bypassing the time-consuming and complex process of setting up the underlying computational infrastructure. The "Cancer Genomics Linkage Application" will extend this environment by developing tools to manage data synchronisation; integration tools specific to the APGI research and provide some exemplar workflows using the Galaxy workflow system.
This application aims to increase the ease by which users can integrate their data with other data sources. It will provide mechanisms to publish data sources and user workflows to Research Data Australia (RDA) in order to increase search ability and re-use by the Australian research community.
The Cancer Genomics Linkage Application will
enable the in-depth interrogation of cancer genomic datasets and allow the comparison to other genomic datasets by providing research Biologists and Clinicians with direct access to them through the Genomics Virtual Lab-Early Activity (GVL-EA).
This application will focus on the research being carried out by the Australian Pancreatic Cancer Genome Initiative (APGI) and aims to:
- Provide local access to a collection of selected public data sources e.g. the ICGC open access data, the 1000 Genomes (pilot 2 trios alignment)
- Enable researchers to transform and integrate these datasets along with user uploaded data via the Galaxy workflow system as part of the GVL-EA
- Provide tool wrappers for somatic mutation analysis
- Provide exemplar workflows using Galaxy that demonstrates how to integrate the tools and datasets
- Enable APGI researchers to share their workflows, making them available for re-use and to obtain a persistent identifier for publication
- Enable automatic generation of compliant RIF-CS for publication to Research Data Australia
- Assists researchers in accelerating their discovery process, reducing the time to publication
- Make these integration and analysis workflows available through the GVL-EA to the APGI researchers and the cancer research community more broadly
Development Team
Dr Dominique Gorse (QFAB)
–Project Manager
Dominique Gorse is leading
the development of QFAB’s platform for integrated and accessible bioinformatics
which is designed to support large multi-institution research projects and to
provide advanced bioinformatics solutions to the biotechnology, pharmaceutical,
clinical and research communities.
Dominique has over 19 years' experience in software development,
information management, data mining and data modelling applied to life science.
Over the years, he has developed expertise in using Agile project management
methodologies for the efficient delivery of quality projects and their
alignment with business objectives. He has a strong record of growing
technology companies such as Synt:em, (France, 1996-2001), and Bio-Layer
(Australia, 2001-2007).
Dr Xin-Yi Chua (QFAB) –
Scrum Master and Project Lead
Xin-Yi Chua is a Senior Bioinformatician at the Queensland Facility for
Advanced Bioinformatics (QFAB).
She received her doctorate at the Queensland University of Technology for her
work in applying machine learning approaches to enhance performance in
inferring transcriptional regulatory interactions in bacteria using the
comparative genomics approach. From this work she has also developed a keen
interest in visual analytics and investigating methods to capture information
from large scale genomic data to facilitate intuitive and rapid comprehension.
This is motivated by the need to reduce the now apparent gap between data
generation and data analysis; the ability to automate repeatable discovery
processes and quickly highlight regions of interest for further verification
will increase how researchers can maximize their efforts.
Pierre-Alain Chaumeil
(QFAB) – Software Developer
Pierre-Alain Chaumeil has a
background in Bioinformatics having completed a Bachelor of Science majoring in
Biology of Organisms and a Master degree in Bioinformatics. He has programming
skills in different languages such as Java, Perl, Eiffel, C/C++ and Python.
Pierre-Alain was involved in
the development of QFAB Systems Biology Platform to provide Australian
researchers in R&D organisations and industry with direct, scalable access
to the most internationally comprehensive, expert-curated and integrated
genomic, proteomic and metabolic datasets available, and to industry-standard
tools for data integration, analysis, visualization and electronic collaboration.
Anne Kunert (QFAB) –
Software Developer
Anne Kunert has a background
in Computer Sciences, having completed a Diploma Degree in Computer Science and
Arts at the Technical University of Dortmund (Germany). Anne joined the QFAB team as software developer in early
2012. She has experience in a diverse range of interdisciplinary IT projects, particularly
in software development for life science research applications.
Product Owners
Dr Mark Cowley
Representative of Cancer Bioinformatics at Garvan Institute of Medical Research
Mark Cowley is a Senior
Research Officer of the Cancer Bioinformatics group at the Garvan
Institute of Medical Research
headed by Dr Jianmin Wu. The group aims to apply computational and statistical
methods to model biological and clinical questions related to cancer biology
and translational research. Currently the group is working on integrative analysis of multidimensional "-omics" datasets, generated by deep sequencing of the cancer genome, transcriptome, epigenome, and proteome, as well as in vitro and in vivo functional screens, with the aim of identifying candidate driver mutations and pathways aberrations in pancreatic cancer.
John Pearson
Representative of the Queensland Centre for Medical Genomics
John Pearson is the Senior
Bioinformatics Manager at Queensland Centre for Medical Genomics (QCMG). One of his current research
interests is next generation sequencing using Life Technologies SOLiD, Illumina
HiSeq and Ion Torrent PGM.
Dr Cas Simons
Representative of the Institute for Molecular Bioscience
Cas Simons is
a Senior Bioinformatician at
the Institute for Molecular Bioscience (IMB). He has a demonstrated ability to utilise both genomic and transcriptomic data from a wide variety of sources to investigate the complex transcriptional outputs of eukaryote genomes and deliver high impact research outcomes. His research background includes using the power of next generation sequencing to understand the transcriptional output of the cell. One highlight of his work was the discovery of a novel class of small RNAs that are tightly associated with mammalian splice sites.
Dr Jeff
Christiansen
Representative of the Australian National Data Service
Jeff Christiansen is a Senior
Business Analyst at the Australian National Data Service (ANDS), primarily working on biology-themed
projects. His background is in both biological research and data management,
primarily in the areas of gene expression and imaging.