Seven Bridges Genomics supported a leading U.S. academic research lab’s project to share bioinformatics pipelines and datasets with researchers around the world. Internally-developed pipelines moved into the computing cloud and made raw data more widely available when HugeSeq connected 20 tools on the Seven Bridges Platform.
In March 2012, Hugo Lam introduced High Throughput Genome Sequencing (HugeSeq), a pipeline of software tools. It could detect and annotate a broad range of genetic variants by connecting 20 tools into three phases of analysis: mapping, sorting, and reduction. Although the pipeline had new benefits, and was profiled in Nature Biotechnology, it’s sheer size and complexity limited its use.
Individuals also faced obstacles in setting up the right computational environment. To use HugeSeq could require rewriting a scheduling system to fit new infrastructure and could create a strain on infrastructure including memory capacity and processors. Many researchers lacked the capacity to deploy it, so Seven Bridges Genomics brought HugeSeq to the cloud, making it more accessible and scalable.
HugeSeq hosted in the cloud enabled broader use by scientists of different computing skills and diverse backgrounds. The pipeline is open to a wider audience regardless of coding knowledge and standardized, so users no longer need to install dependencies and configure their own servers. On the Seven Bridges Platform, HugeSeq is fully assembled and ready to analyze user data. Powered by virtually limitless computing resources, it seamlessly manages large input files, samples or simultaneous analyses.
On the Seven Bridges Platform, HugeSeq is accessible in a visual, web-based format. All parameters that can be changed on the command line may be accessed and modified via pull-down menus and other visual cues that remain standard across the Platform. These parameters are automatically saved so that, even years from now, researchers can go back to their analyses and access the exact pipelines they have run, with the same versions of tools and parameters. Not only will they see details of their pipeline, but they can re-run the exact pipeline version, even if the individual software tools have been upgraded.
All experiments are fully reproducible using the precise settings and systems deployed in the very first instance – regardless of when or where they were done.
Researchers from six Stanford labs tested HugeSeq and other pipelines on the Seven Bridges Platform for various organisms and analyses:
- Single cell RNA-Seq,
- Differential expression,
- Quality control,
Even more convenient for researchers, they can can upload sequenced data directly from their core facility to the cloud – without the need to handle the data themselves.
Alternatively, a bioinformatician can set up pipelines so that researchers themselves can run analyses without constant tweaking and set-up.