Your address will show here +12 34 56 78
2022 Blogs, Blog, Feature Blog, Featured

Modern scientific research depends heavily on processing massive amounts of data which requires elastic, scalable, easy-to-use, and cost-effective computing resources. AWS Cloud provides such resources, but researchers still find it hard to navigate the AWS console. RLCatalyst Research Gateway simplifies access to HPC clusters using a self-service portal that takes care of all the nuts and bolts to provision an elastic cluster based on AWS ParallelCluster 3.0 within minutes. Researchers can leverage this for their scientific computing.

Relevance Lab has been collaborating with AWS Partnership teams over the last one year to simplify access to High Performance Computing across different fields like Genomics Analysis, Computational Fluid Dynamics, Molecular Biology, Earth Sciences, etc.

There is a growing need from customers to adopt the High Performance Computing capabilities in the public cloud. However this throws in key challenges related to right architecture, workload migration and cost management. Working closely with AWS HPC groups we have been enabling adoption of AWS HPC solutions with early adopters in Genomics and Fluid Dynamics with Higher Education and Healthcare customers. The primary ask is for a self-service Portal for planning, deploying and managing HPC workloads with security, cost management and automation. The figure below shows the key building blocks of HPC Architecture part of our solution.

AWS ParallelCluster 3.0
AWS ParallelCluster is an open source cluster management tool written using Python and is available via the standard python package index (PyPI). Version 3.0 also provides support for APIs and Research Gateway leverages this to integrate with the AWS Cloud to set up and use the HPC cluster for complex computational tasks. AWS ParallelCluster supports two different orchestrators, AWS Batch and Slurm, which cover a vast majority of the requirements in the field. ParallelCluster brings many benefits including easy scalability, manageability of clusters, and seamless migration to the cloud from on-premise HPC workloads.

FSx for Lustre
Amazon FSx for Lustre provides fully managed shared storage with the scalability and performance of the popular Lustre file system. This storage can be accessed with very low (sub-millisecond) latencies by the worker nodes in the HPC cluster and provides very high throughput.

NICE DCV is a high performance remote display protocol used to deliver remote desktops and application streaming from resources in the cloud to any device. Users can leverage this for their visualization requirements.

Research Gateway Provides a Self-Service Portal for AWS PCluster 3.0 Launch with Automatic Cost Tracking
Using RLCatalyst Research Gateway, research teams are organized into projects with their own catalog of self-service workspaces that researchers can provision easily with minimum knowledge of AWS cloud setup. The standard catalog, included with RLCatalyst Research Gateway, now has a new item called PCluster which a Principal Investigator can add to the project catalog to make it available to their team. This product is based on AWS ParallelCluster 3.0 which is a command line tool that advanced users can work with. Research Gateway has wrapped this tool with an intuitive user interface.

To see how you can set up an HPC cluster within minutes, check this video.

The figure below shows a standard catalog inside Research Gateway for users to provision PCluster and FSx for Lustre with ease.

Setting Up a Shared Cluster for Use in the Project
The PCluster product on Research Gateway offers a lot of flexibility. While researchers can set up and use their own clusters, sometimes there is a need to use a shared cluster across collaborators within the same project. Towards this goal, we have also brought in a feature that allows a user to “share” the cluster with the entire project team. The other users can then connect to the same cluster and submit jobs. For example a Principal Investigator might set up the cluster and share it with the researchers in the project to use for their computations.

Large Datasets Storage and Access to Open Datasets
AWS cloud is leveraged to deal with the needs of large datasets for storage, processing, and analytics using the following key products.

Amazon S3 for high-throughput data ingestion, cost-effective storage options, secure access, and efficient searching.

AWS Datasync for secure, online service that automates and accelerates moving data between on-premises and AWS storage services.

AWS Open Datasets program houses openly available, with 200+ open data repositories.

Cost Analysis of Jobs
Research Gateway injects cost allocation tags into the ParallelCluster so that all resources created are tagged and the cost of the scalable cluster can easily be monitored from the Research Gateway UI.

AWS Cloud provides services like AWS ParallelCluster and FSx for Lustre that can help users with High Performance Computing for their scientific computing needs. Research Gateway makes it easy to provision these services with a 1-Click, self-service model and provides cost and governance to help manage your budget.

To know more about how you can start your HPC needs in the AWS cloud in 30 minutes using our solution at, feel free to contact

Build Your Own Supercomputers in AWS Cloud with Ease – Research Gateway Allows Cost, Governance and Self-service with HPC and Quantum Computing
Leveraging AWS HPC for Accelerating Scientific Research on Cloud
Accelerating Genomics and High Performance Computing on AWS with Relevance Lab Research Gateway Solution


2022 Blogs, Blog, Feature Blog, Featured

As digital adoption grows, so do user expectations for always-on and reliable business services. Any downtime or service degradation can have serious impacts on the reputation of the company and its business with brutal reviews and poor customer satisfaction. The classic pursuit of DevOps helps businesses deliver new digital experiences faster, while Site Reliability Engineering (SRE) ensures the promises of better services actually stay consistent beyond the launch. Relevance Lab is helping customers take DevOps maturity to the next level with successful SRE adoptions and on the path to AIOps implementations.

Relevance Lab has been working with 50+ customers on the adoption of cloud, DevOps, and automation over the last decade. In the last few years, the interest in AIOps and SRE has grown, especially among large and complex hybrid enterprises. These companies have speeded up the journey to cloud adoption and techniques of DevOps + Automation across the products lifecycle. At the same time, there is confusion among these enterprises regarding taking their maturity to the next level of AIOps adoption with SRE.

Working closely with our existing customers and with best practices built as part of the journey, we present a framework for SRE adoption of large and complex enterprises by leveraging a unique approach from Relevance Lab. It’s built on the concept of RLCatalyst as a platform for SRE adoption that is faster, cheaper, and more consistent.

The basic questions we have heard from our customers looking at adopting SRE are the following:

  • We want to adopt the SRE best practices similar to Google, but our context of business, applications, and infrastructure is very diverse and needs to consider the legacy systems.
  • A number of applications in our organization are business applications that are very different from digital applications but need to be part of the overall SRE maturity.
  • The cloud adoption for our enterprise is a multi-year program, so we need a model that helps adopt SRE in an iterative manner.
  • The CIO landscape for global enterprises covers different continents, regions, business units (BU), countries, and products in a diverse play covering 200+ applications, and SRE needs to be a framework that is prescriptive but flexible for adoption.
  • The organizational structure for large enterprises is complex, with different specialized teams and specialist vendors helping manage operations across Infrastructure, Applications Support, and Service Delivery that was built for an era of on-premise systems but is not agile.
  • Different groups have tried a movement toward SRE adoption but lack a consistent blueprint and partner who can advise, build, and transform.
  • The reflection of lack of SRE presents on a daily basis with a long time for critical incident handling, issues tossing between groups, repetitive poor outcomes, and excessing focus on process compliance without end-user impacts.

The basic concepts covered in the blog are the following and are intended to act as a handbook for new enterprises in the adoption of the SRE framework:

  1. What is the definition and the scope of SRE for an enterprise?
  2. Is there a framework that can be used to adopt SRE for a large and complex hybrid enterprise?
  3. How can Relevance Lab help in the adoption and maturity of SRE for a new enterprise?
  4. What is unique about Relevance Lab solutions leveraging a combination of Platform + Services?

What is SRE?
SRE refers to Site Reliability Engineering. It is responsible for all the critical business services. It ensures that end customers can rely on IT for their mission-critical business services. Site Reliability Engineers (SREs) ensure the availability of these services, building the tools and automation to monitor and enable this availability. A successful SRE implementation also requires the right organizational structure along with tools and technologies.

SRE Building Blocks/Hierarchy of Reliability
Relevance Lab’s SRE Framework consists of 5 building blocks, as shown in the following image.

As shown above, the SRE building block consists of an Initial Assessment, Monitoring and Alerting Optimization, Incident handling with self-heal capability, Incident Prevention, and an end-to-end SRE dashboard.

RL SRE Framework
Relevance Lab’s SRE framework provides a unique approach of Platform + Competencies for multiple global enterprises. RL’s SRE adoption presents a unique way of solving the problems related to critical business applications availability, performance, and capacity optimization. The primary focus is on ensuring critical business services are available while all issues are proactively addressed. SRE also needs to ensure an automation-led operations model delivers better performance, quality, and reliability.

Our methodology for SRE Implementation consists of the following:

  • The initial step for any application group or family is to understand the current state of maturity. This is done by assessment checklist, and the outcome of this would decide if the application would qualify for SRE implementation. In case the application doesn’t qualify for SRE implementation, the next step would be to fix the basic requirements that need to be in place for effective SRE implementation. The same would be reassessed post putting the basic check in place.
  • Based on the assessment activity and the gaps identified, we will recommend the steps that need to be in place for an effective SRE model. The outcome of the assessment would translate into an Implementation Plan. Below are the 5 Steps required to implement SRE for an Organization:
    • Level 1: Monitoring – Focuses on 4 Golden Signals, Service Level Agreements (SLA) and Service Level Objectives (SLO)/Service Level Indicator (SLI), and Error Budgets
    • Level 2: Incident Response – Alert Management, On-Call Management, RACI, Escalation Chart, Operations Handbook.
    • Level 3: Post-Incident Review – Postmortem of Incidents, Prevention based on Root Cause Analysis
    • Level 4: Release and Capacity Management – Version Control System, Deployment using CI/CD, QA/Prod Environments, Pressure Test
    • Level 5: Reliability Platform Engineering – end-to-end SRE dashboard

Our Uniqueness
Relevance Lab’s SRE framework for any cloud or hybrid organization goes through Enablement Phase and Maturity Phase. In each phase, there are platform-related activities and application-related activities. Every application goes through a Phase 1 Enablement journey to reach stabilization and then move towards Phase 2 Maturity.

Phase 1 – Enablement is a basic SRE model that helps enterprise reach a basic level of SRE implementation, and this covers the first 3 stages of the Relevance Lab SRE Framework. This will include the implementation of new tools, processes, and platforms. At the end of this phase, a clear definition of the golden signals, SLI/SLOs against SLA, and Error Budgets are defined, monitored, and tracked. The refined runbooks and operating guides help in the proactive identification of Incidents and faster recovery due to on-call management. Activities like Post Incident Review, Pressure Tests, Load testing, etc., help stabilize the application and the infrastructure. As part of this phase, an SRE 1.0 dashboard is available as an output to monitor the SRE metrics.

Phase 2 – Maturity is an advanced SRE model which covers the last two stages of the Relevance Lab SRE Framework. It emphasizes on automation-first approach for an end-to-end lifecycle management, and includes advanced release management, auto-remediations for Incident management, security, and capacity management. This will be an ongoing maturity phase to bring in additional applications and BUs under the scope of the SRE model. The output of this phase will be an automated SRE 2.0 dashboard, which will have intelligence-based actionable insights & prevention.

Relevance Lab (RL) has worked with multiple large companies on the “Right Way” to adopt Cloud and SRE maturity models. We realize that each large enterprise has a different context-culture-constraint model covering organization structures, team skills/maturity, technology, and processes. Hence the right model for any organization will have to be created as a collaborative model, where RL will act as an advisor to Plan, Build and Run the SRE model based on the framework (RLCatalyst) they have created.

For more information on RL’s SRE framework and maturity model or for its implementation, feel free to contact


2022 Blogs, Blog, Feature Blog, Featured

Our goal, at Relevance Lab (RL), is to make scientific research in the cloud ridiculously simple for researchers and principal investigators. Cloud is driving major advancements in both Healthcare and Higher Education sectors. Rapidly being adopted by various organizations across these sectors in both commercial and public sector segments, research on the cloud is improving day-to-day lives with drug discoveries, healthcare breakthroughs, innovation of sustainable solutions, development of smart and safe cities, etc.

Powering these innovations, AWS cloud provides an infrastructure with more accessible and useful research-specific products that speed time to insights. Customers get more secure and frictionless collaboration capabilities across large datasets. However, setting up and getting started with complex research workloads can be time-taking. Researchers often look for simple and efficient ways to run their workloads.

RL addresses this issue with Research Gateway, a self-service cloud portal that allows customers to run secure and scalable research on the AWS cloud without any heavy-lifting of set-ups. In this blog, we will explore different use cases that simplify their workloads and accelerate their outcomes with Research Gateway. We will also elaborate on two specific use cases from the healthcare and higher education sector for the adoption of Research Gateway Software as a Service (SaaS) model.

Who Needs Scientific Research in the Cloud?
The entire scientific community is trying to speed up research for better human lives. While scientists want to focus on “science” and not “infrastructure”, it is not always easy to have a collaborative, secure, self-service, cost-effective, and on-demand research environment. While most customers have traditionally used on-premise infrastructure for research, there is always a key constraint on scaling up with limited resources. Following are some common challenges we have heard our customers say:

  • We have tremendous growth of data for research and are not able to manage with existing on-premise storage.
  • Our ability to start new research programs despite securing grants is severely limited by a lack of scale with existing setups.
  • We have tried the cloud but especially with High Performance Computing (HPC) systems are not confident about total spends and budget controls to adopt the cloud.
  • We have ordered additional servers, but for months, we have been waiting for the hardware to be delivered.
  • We can easily try new cloud accounts but bringing together Large Datasets, Big Compute, Analytics Tools, and Orchestration workflows is a complex effort.
  • We have built on-premise systems for research with Slurm, Singularity Containers, Cromwell/Netflow, custom pipelines and do not have the bandwidth to migrate to the cloud with updated tools and architecture.
  • We want to provide researchers the ability to have their ephemeral research tools and environments with budget controls but do not know how to leverage the cloud.
  • We are scaling up online classrooms and training labs for a large set of students but do not know how to build secure and cost-effective self-service environments like on-premise training labs.
  • We are requiring a data portal for sharing research data across multiple institutions with the right governance and controls on the cloud.
  • We need an ability to run Genomics Secondary Analysis for multiple domains like Bacterial research and Precision Medicines at scale with cost-effective per sample runs without worrying about tools, infrastructure, software, and ongoing support.

Keeping the above common needs in perspective, Research Gateway is solving the problems for the following key customer segments:

  • Education Universities
  • Healthcare Providers
    • Hospitals and Academic Medical Centers for Genomics Research
  • Drug Discovery Companies
  • Not-for-Profit Companies
    • Primarily across health, education, and policy research
  • Public Sector Companies
    • Looking into Food Safety, National Supercomputing centers, etc.

The primary solutions these customers are seeking from Research Gateway have been mentioned below:

  1. Analytics Workbench with tools like RStudio and Sagemaker
  2. Bioinformatics Containers and Tools from the standard catalog and bring your own tools
  3. Genomics Secondary Analysis in Cloud with 1-Click models using open source orchestration engines like Nextflow, Cromwell and specialized tools like DRAGEN, Parabricks, and Sentieon
  4. Virtual Training Labs in Cloud
  5. High Performance Computing Infrastructure with specialized tools and large datasets
  6. Research and Collaboration Portal
  7. Training and Learning Quantum Computing

The figure below shows the customer segments and their top use cases.

How Research Gateway is Powering Frictionless Outcomes?
Research Gateway allows researchers to conduct just-in-time research with 1-click access to research-specific products, provision pipelines in a few steps, and take control of the budget. This helps in the acceleration of discoveries and enables a modern study environment with projects and virtual classrooms.

Case Study 1: Accelerating Virtual Cloud Labs for the Bioinformatics Department of Singapore-based Higher Education University
During interaction with the university, the following needs were highlighted to the RL team by the university’s bioinformatics department:

Classroom Needs: Primary use case to enable Student Classrooms and Groups for learning Analytics, Genomics Workloads, and Docker-based tools

Research Needs: Used by a small group of researchers pursuing higher degrees in Bioinformatics space

Addressing the Virtual Classroom and Research Needs with Research Gateway
The SaaS model of Research Gateway is used with a hub-and-spoke architecture that allows customers to configure their own AWS accounts for projects to control users, products, and budgets seamlessly.

The primary solution includes:

  • Professors set up classrooms and assign students for projects based on semester needs
  • Usage of basic tools like RStudio, EC2 with Docker, MySQL, Sagemaker
  • Special ask of forwarding and connecting port to shared data on cloud-based for local RStudio IDE was also successfully put to use
  • End-of-day automated reports to students and professors on server “still running” for cost optimization
  • Ability to create multiple projects in a single AWS Account + Region for flexibility
  • Ability to assign and enforce student-level budget controls to avoid overspending

Case Study 2: Driving Genomics Processing for Cancer Research of an Australian Academic Medical Center
While the existing research infrastructure is for on-premise setup due to security and privacy needs, the team is facing serious challenges with growing data and the influx of new genomics samples to be processed at scale. A team of researchers is taking the lead in evaluating AWS Cloud to solve the issues related to scale and drive faster research in the cloud with in-build security and governance guardrails.

Addressing Genomic Research Cloud Needs with Research Gateway
RL addressed the genomics workload migration needs of the hospital with the Research Gateway SaaS model using the hub-and-spoke architecture that allows the customer to have exclusive access to their data and research infrastructure by bringing their one AWS account. Also, the deployment of the software is in the Sydney region, complying with in-country data norms as per governance standards. Users can easily configure AWS accounts for genomics workload projects. They also get access to genomic research-related products in 1-click along with seamless budget tracking and pausing.

The following primary solution patterns were delivered:

  • Migration of existing HPC system using Slurm Workload Manager and Singularity Containers
  • Using Cromwell for Large Scale Genomic Samples Processing
  • Using complex pipelines with a mix of custom and public WDL pipelines like RNA-Seq
  • Large Sample and Reference Datasets
  • AWS Batch HPC leveraged for cost-effective and scalable computing
  • Specific Data and Security needs met with country-level data safeguards & compliance
  • Large set of custom tools and packages

The workload currently operates in an HPC environment on-premise, using slurm as the orchestrator and singularity containers. This involves converting singularity containers to docker containers so that they can be used with AWS Batch. The pipelines are based on Cromwell, which is one of the leading workflow orchestrator software available from the Broad Institute. The following picture shows the existing on-premise system and contrasts that with the target cloud-based system.

Relevance Lab, in partnership with AWS cloud, is driving frictionless outcomes by enabling secure and scalable research leveraging Research Gateway for various use-cases. By simplifying the setting up and running research workloads in a seamless manner in just 30 minutes with self-service access and cost control, the solution enables the creation of internal virtual labs and acceleration of complex genomic workloads.

To know more about virtual Cloud Analytics training labs and launching Genomics Research in less than 30 minutes explore the solution at or feel free to write to

Accelerating Genomics and High Performance Computing on AWS with Relevance Lab Research Gateway Solution
Enabling Frictionless Scientific Research in the Cloud with a 30 Minutes Countdown Now!