Analysis of high-throughput sequencing data requires access to specialised computational infrastructure in terms of hardware, software and data. For example, the computational process for identifying and analysing genetic differences from the sequencing data generated by the International Cancer Genome Consortium (ICGC) involves hundreds of steps and requires many computational tools some of which have multiple versions. Access to genomic datasets of international importance and the ability to integrate them with the researcher's own clinical and genomic datasets are also critical in order to explore, discover and validate key genomic abnormality that cause cancer. Such analyses have been the reserved domain of bioinformaticians, but there is a critical need to put these tools in the hands of research biologists and clinicians.The project enables the in-depth interrogation of cancer genomic data and allows the comparison to other genomic data by providing research biologists and clinicians with direct access to them through the Genomics Virtual Lab (GVL) of NeCTAR. It addresses the combined challenges faced by these researchers:
- Integrated access to multiple data sources
- Availability of standardised analytical processes
- Ability to analyse easily and publish the results more quickly
The project is composed of four interrelated components as depicted below in Figure 1:
- Data: a set of tools have been developed to manage data from public resources and synchronise them for use from within the GVL environment.
- Tools: a set of command line analysis tools commonly used by cancer researchers have be deployed for use from within the GVL environment, so that they can be used without the command line.
- Workflows: a set of workflows common to cancer research have been developed using the Galaxy workflow system within the GVL environment. These workflows form the engine fundamental to the transformation and integration of multiple tools and datasets to accelerate cancer research.
- Publication: a mechanism has been developed to automatically extract required information from Galaxy in order to generate a collection description using the RIF-CS standard. A persistent identifier (DOI) can be assigned to workflows.
|Figure 1: Component Overview|