Analysis of high-throughput sequencing data requires access to
specialised computational infrastructure in terms of hardware, software and
data. For example, the computational process for identifying and analysing
genetic differences from the sequencing data generated by the International
Cancer Genome Consortium (ICGC) involves hundreds of
steps and requires many computational tools some of which have multiple
versions. Access to genomic datasets of international importance and the
ability to integrate them with the researcher's own clinical and genomic datasets
are also critical in order to explore, discover and validate key genomic
abnormality that cause cancer. Such analyses have been the reserved domain of bioinformaticians,
but there is a critical need to put these tools in the hands of research
biologists and clinicians.
The project enables the in-depth interrogation of cancer genomic
data and allows the comparison to other genomic data by providing
research biologists and clinicians with direct access to them through the
Genomics Virtual Lab (GVL) of NeCTAR. It addresses the combined challenges faced by these
researchers:- Integrated access to multiple data sources
- Availability of standardised analytical processes
- Ability to analyse easily and publish the results more quickly
The
project is composed of four interrelated components as
depicted below in Figure 1:
- Data: a set of tools have been developed to manage data from public resources and synchronise them for use from within the GVL environment.
- Tools: a set of command line analysis tools commonly used by cancer researchers have be deployed for use from within the GVL environment, so that they can be used without the command line.
- Workflows: a set of workflows common to cancer research have been developed using the Galaxy workflow system within the GVL environment. These workflows form the engine fundamental to the transformation and integration of multiple tools and datasets to accelerate cancer research.
- Publication: a mechanism has been developed to automatically extract required information from Galaxy in order to generate a collection description using the RIF-CS standard. A persistent identifier (DOI) can be assigned to workflows.
Figure 1: Component Overview |
No comments:
Post a Comment