Your address will show here +12 34 56 78
2022 Blogs, Blog, Featured

Relevance Lab has been collaborating with AWS Partnership teams over the last one year to create Genomics Cloud for enabling Next Generation Sequencing (NGS) on-demand. This is one of the dominant use cases for scientific research in the cloud, driven by healthcare and life sciences groups exploring ways to make NGS better, faster and cheaper so that researchers can focus on science and not complex infrastructure.

RL offers a product RLCatalyst Research Gateway that facilitates Scientific Research with easier access to big compute infrastructure, large data sets, powerful analytics tools and a secure research environment, and the ability to drive self-service research with tight cost and budget controls.

Taking the concept of making NGS processing more frictionless, the new functionality being added to RLCatalyst Research Gateway allows researchers to use “Sequencing as Service” by choosing their preferred pipeline processing engines covering both open source platforms like Nextflow and Chromwell and also commercially available engines from Illumina Dragen and NVidia Parabricks. The top use cases for AWS Genomics in the Cloud implemented by this product are given below. Providing an out-of-the-box solution, the product ensures significant costs and efforts saving for customers.

Top Use Cases

Data Transfer and Storage
The high volume of genomics data requires efficient data transfer from sequencers and storing raw data for further quality checks and mapping in a cost-effective manner. AWS enables researchers to manage large-scale data that has outpaced the capacity of on-premises infrastructure. By transferring data to the AWS cloud, organizations can take advantage of high-throughput data ingestion, cost-effective storage options, secure access, and efficient searching to propel genomics research forward.

Genomic Workflow Automation for Secondary Analysis
Genomics organizations can speed up performing secondary analyses and running reproducible and scalable workflows while minimizing IT overhead using open source solutions (Cromwell and Nextflow) or partner (NVIDIA and DRAGEN) solutions. AWS offers services for scalable, cost-effective data analysis and simplified orchestration for running and automating parallelizable workflows.

Data Aggregation
With growing samples of data and variant analysis needs on output data, there is a need to create a Genomic Data Lake for research and interpretation of results that are the foundation of precision medicine. AWS enables organizations to harmonize multi-omics datasets and govern robust data access controls and permissions across a global infrastructure to maintain data integrity as research involves more collaborators and stakeholders. AWS simplifies the ability to store, query, and analyze genomics data, and link with clinical information.

Tertiary Analysis with Interpretation and Deep Learning
As the need for precision medicine grows based on genomic sequencing and analysis of patterns, it requires integrated datasets and knowledge bases, large computational power, big data analytics, and machine learning at scale, which, historically, can take weeks or months, delaying time to insights. AWS accelerates the analysis of big genomics data by leveraging machine learning and high-performance computing. With AWS, researchers have access to greater computing efficiencies at scale, reproducible data processing, data integration capabilities to pull in multi-modal datasets, and public data for clinical annotation—all within a compliance-ready environment.

Open Data Sets
As more life science researchers move to the cloud and develop cloud-native workflows, they bring reference datasets with them, often in their own personal buckets, leading to duplication, silos, and poor version documentation of commonly used datasets. The AWS Open Data Program (ODP) helps democratize data access by making it readily available in Amazon S3, providing the research community with a single documented source of truth. This increases study reproducibility, stimulates community collaboration, and reduces data duplication. The ODP also covers the cost of Amazon S3 storage, egress, and cross-region transfer for accepted datasets.

Cost Optimization
Usage of large-scale compute resources and large data sets for multiple job analyses can be a resource-intensive task with significant cost impacts that need proper capacity planning, tracking, and optimization. Researchers utilize massive genomics datasets that require large-scale storage options and powerful computational processing, which can be cost-prohibitive. AWS presents cost-saving opportunities for genomics researchers across the data lifecycle—from storage to interpretation. AWS infrastructure and data services enable organizations to save time, money and devote more resources to science.

Concept of Sequencing as Service
The concept of “Sequencing as Service” on Cloud is explained below.


Key Building Blocks for “Sequencing as Service” Architecture
The solution for supporting easy use of Genomics Sequencing in the cloud supports the following key components to meet the need of researchers, scientists, developers, and analysts to efficiently run their experiments without the need for deep expertise in the backend computing capabilities.

Genomics Pipeline Processing Engine
The researchers’ community uses popular open-source tools like Nextflow and Cromwell for large data sets by leveraging HPC systems, and the orchestration layer is managed by these tools.

Nextflow is a bioinformatics workflow manager that enables the development of portable and reproducible workflows. It supports deploying workflows on a variety of execution platforms, including local, HPC schedulers, AWS Batch, Google Cloud Life Sciences, and Kubernetes.

Cromwell is a workflow execution engine that simplifies the orchestration of computing tasks needed for genomics analysis. Cromwell enables genomics researchers, scientists, developers, and analysts to efficiently run their experiments without the need for deep expertise in backend computing capabilities.

Many organizations also use commercial tools like Illumina Dragen and NVidia Parabricks for similar solutions that are more optimized in reducing processing timelines but also come with a price.

Open Source Repositories for Common Genomics Workflows
The solution needs to allow researchers to leverage work done by different communities and tools to reuse existing available workflows and containers easily. Researchers can leverage any of the existing pipelines & containers or can also create their own implementations by leveraging existing standards.

GATK4 is a Genome Analysis Toolkit for Variant Discovery in high-throughput sequencing data. Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

BioContainers – A community-driven project to create and manage bioinformatics software containers.

Docstore – Dockstore is a free and open-source platform for sharing reusable and scalable analytical tools and workflows. It’s developed by the Cancer Genome Collaboratory and used by the GA4GH.

nf-core Pipelines – A community effort to collect a curated set of analysis pipelines built using Nextflow.

Workflow Description Language (WDL) is a way to specify data processing workflows with a human-readable and writable syntax.

AWS Batch for High-Performance Computing
AWS has many services that can be used for genomics. In this solution, the core architecture is with AWS Batch, a managed service that is built on top of other AWS services, such as Amazon EC2 and Amazon Elastic Container Service (ECS). Also, proper security is provided with Roles via AWS Identity and Access Management (IAM), a service that helps you control who is authenticated (signed in) and authorized (has permissions) to use AWS resources.

Large Data Sets Storage and Access to Open Data Sets
AWS cloud is leveraged to deal with the needs of large data sets for storage, processing, and analytics using the following key products.

Amazon S3 for high-throughput data ingestion, cost-effective storage options, secure access, and efficient searching.

AWS Datasync for secure, online service that automates and accelerates moving data between on-premises and AWS storage services.

AWS Open Datasets program houses are openly available, with 40+ open life sciences data repositories.

Outputs Analysis and Monitoring Tools
One of the key building blocks for Genomic Data Analysis needs access to common tools like the following integrated into the solution.

Multi-QC Reports MultiQC searches a given directory for analysis logs and compiles an HTML report. It’s a general-use tool, perfect for summarising the output from numerous bioinformatics tools.

IGV (Integrated Genomics Viewer) is a high-performance, easy-to-use, interactive tool for the visual exploration of genomic data.

RStudio for Genomics since R is one of the most widely-used and powerful programming languages in bioinformatics. R especially shines where a variety of statistical tools are required (e.g. RNA-Seq, population genomics, etc.) and in the generation of publication-quality graphs and figures.

Genomics Data Lake
AWS Data Lake for creating genomics data lake for tertiary processing. Once the Secondary analysis generates outputs, typically in Variant Calling Format (VCF) for further analysis, there is a need to move such data into a Genomics Data Lake for tertiary processing. Leveraging standard AWS tools and solution framework, a Genomics Data Lake is implemented and integrated with the end-to-end sequencing processing pipeline.

Variant Calling Format specification is used in bioinformatics for storing gene sequence variations, typically in a compressed text file. According to the VCF specification, a VCF file has meta-information lines, a header line, and data lines. Compressed VCF files are indexed for fast data retrieval (random access) of variants from a range of positions.

VCF files, though popular in bioinformatics, are a mixed file type that include a metadata header and a more structured table-like body. Converting VCF files into the Parquet format works excellently in distributed contexts like a Data Lake.

Cost Analysis of Workflows
One of the biggest concerns for users of Genomic pipelines processing in Cloud is control on budget and cost that is provided by RLCatalyst Research Gateway by tracking spending across Projects, Researchers, Workflow runs at a granular level and allowing for optimizing spend by using techniques like Spot instances and on-demand computing. There are guardrails built-in for appropriate controls and corrective actions. Users can run sequencing workflows using their own AWS accounts, allowing for transparent control and visibility.

A typical researcher flow for using RLCatalyst Research Gateway for “Sequencing as Service” is explained in the workflow below.


Common Use Case Demonstration – Sarek
While the solution allows any public pipeline built with Workflow Description Language (WDL), Common Workflow Language (CWL), and Nextflow specifications, for this Blog, we have chosen the following popular sample.


Steps on how to use RLCatalyst Research Gateway for the Use Case

1. From the available products tab, provision an S3 product to create a bucket to hold your sample data. Once the bucket is created, use the “Explore” action to view the bucket contents. Use the “Add File” and “Add Folders” buttons to upload your input data to the bucket. From the “Product Details” tab, copy the name of the bucket created.


2. Provision a Nextflow-Advanced product in the Research Gateway. Select the nf-core/sarek pipeline in the PipelineName field by searching for “sarek”.


Use the bucket-name copied in step 1 as the InputDataLocation. Choose a Key Pair that allows you to connect to the head node or create a new one.

3. Once provisioning is complete, use the “SSH to Server” button to connect to the head node. Change directory to the sarek folder (which has the clone of the git repository selected in the pipeline name). You can now run the pipeline using the “nextflow run main.nf -profile test,docker,batch” command.


4. Use Monitor Pipeline to monitor the progress of the job. This will launch the Nextflow Tower URL in a separate browser tab.


5. View the output files using the “View Outputs” button. Download the files by clicking on the links.


6. View Project Costs


7. View Researcher Costs


8. View Workspace Costs


Summary
To make it easier for institutions, principal investigators, and researchers for large-scale genomic sequencing in the cloud, we provide the fundamental building blocks for “Sequencing as Service”. The integrated product covers large data sets access, support for popular pipeline engines, access to open-source pipelines & containers, AWS HPC environments, Analytics tools, and cost tracking that takes away the pains of managing infrastructure, data, security, and costs to enable researchers to focus on science.

To know more about how you can start your Genomic sequencing in the AWS cloud in 30 minutes using our solution at https://research.rlcatalyst.com, feel free to contact marketing@relevancelab.com.

References
High-performance genetic datastore on AWS S3 using Parquet and Arrow
Parallelizing Genome Variant Analysis



0

2022 Blogs, Blog, Feature Blog, Featured

Relevance Lab has been an AWS partner for almost a decade now. The primary transition in 2021 was moving from a pure consulting partner to a niche technology partner of AWS based on the strengths of two new ISV Product launches with RLCatalyst Research Gateway and RLCatalyst AppInsights.


  • RLCatalyst Research Gateway drives scientific research in AWS cloud, especially for Genomic Research and Analytics
  • RLCatalyst AppInsights is built on AWS Service Catalog AppRegistry and helps achieve an “Application-Centric” view for cloud assets, costs, health, and security to achieve Governance360

Customers have been demanding a “Solutions” approach from their partners that combine the strength of Products (own + third party) and Services to provide a unique business solution that removes friction and helps deliver key value. This is only possible by unifying the strength of Products + Services to create platform-based offerings delivered with a unique playbook for driving digital transformation.

The top-5 trends we observed in last one year regarding customer needs for cloud adoption are the following:


  • Cloud Adoption Acceleration
    • “Cloud Only” adoption to accelerate momentum of transitioning all internal systems, applications, and services to IaaS, PaaS, and SaaS solutions with an automation-first approach
  • DevOps Automation Led Operations
    • Critical focus on AIOps to ensure digital business operations are proactively managed with best practices on operations with Site Reliability Engineering (SRE) and DevOps, leveraging ServiceNow platform
  • Frictionless Digital Workflows and Business Interactions
    • End-to-end business process integration with applications across self-developed products, PaaS platforms, and third-party SaaS solutions covering Shopify, Adobe Experience Manager, Demandware, Oracle Fusion, SOA/API Gateways, etc.
  • Cloud Data Lakes and Actionable Intelligence
    • Focus on agile business analytics with use of cloud-based data platforms leveraging Snowflake, Databricks, Azure Data Factory, AWS Data Lakes, etc., and integration with AI/ML tools with Sagemaker and RStudio
  • Security, Compliance, and Cost Management with focus on Governance360
    • Critical focus on security, governance, and cost optimization with a proactive model driven by a strong automation foundation

In this blog, we will primarily cover the strategic AWS partnership achievements of our products and solutions leveraged to help our customers use cloud “The Right Way”. The business benefits of this automation and platform led approach helped some of the key customers achieve significant outcomes, as explained below:


  • Speeded up product delivery cycles by 3x leveraging Agile + DevOps approach for Product Engineering and Application Migrations
  • Cut down cloud cost spending by 30% with better capacity utilization and effective cloud costs tracking at a granular level of business units, applications, customer usage patterns, and transaction costs optimization
  • Leveraging Automated Service Management achieved 70% handling of inbound tickets by smart BOTs using our product and RPA tools creating an Automation Factory
  • Proactive security and vulnerability management reducing the cost of compliance and reduced outages by 30%
  • Focus on effective data management and analytics with more real-time insights to business transactions and actionable intelligence leading to savings in excess of $300K annually for large supply chain use cases

Leveraging AWS cloud is a foundation enabler for all Relevance Lab products and solutions. The diagram below shows a high level overview of our AWS Ecosystem coverage.


The journey snapshot of the last 12 months is captured in the diagram below.


Relevance Lab and AWS Journey Highlights
To recap our key progress for this year, we are presenting a quick brief of the last 12 months in reverse chronological order.


  • December
    • Solid partnership with AWS APJ teams for go-to-market in the region for scientific research with RLCatalyst Research Gateway. There is a strong endorsement from AWS business teams and Solution Architects on RL solutions being a relevant offering for regional needs
    • Launch of Cloud Academy to train a new batch of people based on a platform-led model for the ability to rapidly create a large and competent workforce for cloud opportunities
    • CoE (Center of Excellence) teams pursuing new use cases High-Performance Computing (large and growing ecosystem) and AppStream-based training labs for education customers.
    • AppInsights product on ServiceNow emerging as a brand new product conceptualized and launched in 2021 with joint efforts with AWS Control Services group
  • November
    • Relevance Lab’s focus on addressing the Digital Transformation jigsaw puzzle with RLCatalyst and SPECTRA Platforms
    • ServiceOne and RLCatalyst Intelligent Automation Reference Architecture
    • SPECTRA Reference Architecture for agile analytics applications
    • Relevance Lab Hyperautomation approach to business optimization
    • Relevance Lab Service Maturity Model
  • October
    • Taking AWS cloud & ServiceNow solutions to multiple new prospects interested in understanding our offering across cloud management, automation, DevOps, and AIOps managed services
    • Showcasing RLCatalyst Research Gateway solutions to multiple public sector institutions, non-profit research centers, and health care providers
  • September
    • Key tracks of RL Cloud CoE covering Cloud Management, Automation, DevOps and AIOps shared
    • Summary of 10 year journey for RL Company and Product lifecycle shared
    • Launch of MyResearchCloud, An easy way to enable small and mid-sized customers to use the RLCatalyst Research Gateway SaaS product using “Bring Your Own Account”
    • RLCatalyst AppInsights launched on ServiceNow Store
  • August
    • RLCatalyst Platform, Solutions and Products consolidated offering for automation published
    • Automation-First approach for Plan-Build-Run of cloud adoption detailed
    • Maturity model for BOTs design published
    • RLCatalyst Genomics Pipeline work with Nextflow started
  • June-July
    • Joint efforts for the co-development on the open-source solutions for scientific research in cloud that emphasize on Health Informatics and Genomic processing space using RStudio
    • RLCatalyst Research Gateway solution has been reviewed and approved for ISV Path – a special program exclusively meant for the Independent Software Vendor (ISV) capabilities
    • Listing of Relevance Lab products and professional services on AWS Marketplace, which includes RLCatalyst Research Gateway SaaS and Governance360 solution built on AWS Control Tower
    • Selection of the AppInsights ServiceNow solution as the partner-built solution for AWS AppRegistry
    • Partnership with a specialist HIPAA governance solution provider for integrations into our Governance360 solution
    • Collaborating with the AWS recent solution announcement teams driving AWS Management and Governance Lens (part of AWS Well-Architected Framework prescribed offering)
  • May
    • RLCatalyst Research Gateway “test drive” by the first prospect with useful inputs to make the onboarding process much simpler and frictionless. Expectation to go from “No-Cloud” to “Full-cloud” experience for scientific researchers in less than 15 min (Uber-style)
    • Relevance Lab enters the elite AWS Service Delivery Program for niche partners for AWS Service Catalog
    • Relevance Lab SmartView, built on AWS AppRegistry new concept for dynamic application CMDB, getting significant appreciation and visibility from AWS Management and Governance teams
    • Ongoing co-development and collaboration with AWS Service Workbench groups to scale up RStudio on AWS Cloud with shared AWS ALB (Application Load Balancer) architecture
  • April
    • RLCatalyst Research Gateway common use cases implementation
    • “Automation-First” model for Cloud adoption elaborated
    • Common use cases for Cloud migration with focus on Application Migration
    • Original concept of SmartView Solution (later renamed AppInsights) for Application CMDB created.
  • March
    • “Automation-First” approach to use AWS cloud “The Right Way” detailed
    • Common use cases for scientific research published
    • AWS ISV Partner Path program adoption initiated
  • February
    • Research@Scale Architecture Blueprint created for an integrated offering combining strengths of Relevance Lab product, solutions and services
    • Conceptualizing Governance360 Solution built with AWS Control Tower customization framework
    • Started evaluation of AWS Service Workbench with BioInformatics Blueprint, RStudio, Sagemaker
    • ServiceOne Transition Blueprint created
  • January
    • Relevance Lab RLCatalyst Research Gateway product positioning in market with focus on “Blue Ocean Strategy” shared to create a niche offering
    • RLCatalyst Research Gateway launched as a SaaS product on AWS Marketplace
    • ServiceOne team worked on the Compliance as a Code Framework involving AWS Control Tower

RLCatalyst – Our Platform through 2021


ServiceOne: Our Cloud CoE Stories in 2021


Partnership Journey through Blogs, Webinar & Videos in 2021

It was a busy year at Relevance Lab. We published a number of blogs covering our solutions powered by our partnership with AWS. The following is a collection of blogs published on our website throughout the year:

Product Related



List of product knowledge videos on our YouTube channel.

Solutions and Consulting Related



We have also published blogs listed below on APN blog network in collaboration with AWS teams.



In addition, we successfully conducted a webinar with AWS and Dash Solutions. You can watch the recording here and download the presentation pdf here.

Summary
With the start of the new year 2022, we are very bullish about leveraging our AWS cloud Products and Solutions that help in driving Frictionless Business for our customers. There are 100000+ AWS Partners in the ecosystem worldwide, but Relevance Lab has created a unique differentiator and positioning leveraging the power of our IP Products as a key technology provider to complement our deep services competencies that are leading to tremendous momentum on new customer solutions.

Customers are continuing to face challenges with their business and supply chains in the pandemic era, and new business models are emerging that demand a new level of Agility + Automation. At Relevance Lab, we are constantly enhancing our offerings to help our customers navigate the Digital Transformation Puzzle and provide a unique value proposition with our global workforce across regions, critical investments in our IP platforms, and constant efforts on building deep competencies across cloud, data, and digital platforms.

To learn more about our cloud products, services and solutions, feel free to contact us at marketing@relevancelab.com.



0