Although cancers may look the same under the microscope, they behave quite differently at the genomic level. By identifying these genetic differences, researchers can start to understand what treatment works for particular patients and administer those first. Understanding which cancers don’t respond to treatment is also vital as this allows researchers to focus down on the molecular mechanisms that are responsible and design new drugs to attack those particular mechanisms that cause the cancer to be lethal .
In this context, the primary goal of the International Cancer Genome Consortium (ICGC) is to generate comprehensive catalogues of genomic abnormalities in tumours from 50 different cancer types and/or subtypes which are of clinical and societal importance across the globe. The Australian component of that Consortium, the Australian Pancreatic Cancer GenomeInitiative (APGI) is delivering the genomic data associated with pancreatic tumour samples. For this, the APGI uses the best clinical material available, with well-characterised and accurately annotated clinico-pathological, treatment and outcome data acquired prospectively. However, discovery of variations and similarities within the genomes of a given cancer collection and being able to compare these to other datasets remain a significant challenge for biologist and clinician researchers. The effective re-use of datasets of international importance is limited by the ability of research biologists and clinicians to access and to use computational and data infrastructure. A researcher performing such an analysis is currently expected to be conversant with programming, command line scripting, data management, high performance computing, network-based communications, and visualisation at a minimum. Additionally, they must ensure that the steps in the analysis are recorded in sufficient detail for the results to be reproduced in-house at least, and ideally would document the analysis in such a way as to allow publication of the method(s).
The Cancer Genomics Linkage Application funded by ANDS enables the re-use and integration of data available from public repositories such as the ICGC variant database or the DrugBank database of drugs and drug targets by leveraging the Genomics Virtual Lab capability on the research cloud. Researchers, such as Professor Andrew Biankin and colleagues from the Garvan Institute for Medical Research are now able to access genomic datasets of international importance and to integrate them with their own clinical and genomic datasets in order to explore, discover and validate key genomic abnormality that cause cancer using user friendly computational workflows. The project further provides the mechanism for such researchers to publish and to make available their analysis for re-use by the community.
The project enables the in-depth interrogation of cancer genomic datasets and allows the comparison to other genomic datasets by providing research biologists and clinicians with direct access to them through the Genomics Virtual Lab (GVL). The key benefits to the Australian genomics community gained from this project are
- Integration of analysis tools, public and private datasets, and visualisation platforms: streamlining research and reducing time from experiment to publication
- Enhanced collaboration between researchers and across the community through shared datasets, workflows and customised toolsets
- Reproducibility and research provenance: workflow engines record all aspects of an experiment allowing for confidence in repeatability and for the publication of workflows along with the resulting data. The application enables researchers to mint their workflows using Digital Object Identifier (DOI).