Your address will show here +12 34 56 78
2022 Blogs, Blog, Featured

Software architecture provides a high-level overview of what a software system looks like. At the very minimum, it shows the various logical pieces of the overall solution and the interaction between those pieces. (See C4 Model for architecture diagramming). The software architecture is like a map of the terrain for anybody who must deal with the system. Contrary to what many might think, software architecture is important even for non-engineering functions like sales, as many customers like to review the architecture to see how well it fits within their enterprise and whether it could introduce future issues by its adoption.


Goals of the Architecture
It is important to determine the goals for the system when deciding on the architecture. This should include both short-term and long-term goals.

Some of our important goals for RLCatalyst Research Gateway are:
1. Ease of Use
The basic question in our mind is always “How would customers like to use this system?”. Our product is targeted to researchers and academics who want to use the scalability and elasticity of the AWS cloud for ad-hoc and high-performance computing needs. These users are not experts at using the AWS console. So, we made things extremely simple for the user. Researchers can order products with a single click, and the portal sets up their resources without the user needing to understand any of the underlying complexities. Users can also interact with the products through the portal, eliminating the need to set up anything outside the portal (though they always have that option).

We also kept in mind the administrators of the system for whom this might just be one amongst many others that they must manage. Thus, we made it easy for the administrator to add AWS accounts, create Organizational Units, and integrated Identity Providers. Our goals were: administrators to get the system up and running in less than 30 minutes.

2. Scalability, performance, and reliability
We followed the best practices recommended by AWS, and where possible, used standardized architecture models so that users would find it easy as well as familiar. For example, we deploy our system into a VPC with public and private subnets. The subnets are spread across multiple Availability Zones to guard against the possibility of one availability zone going down. The computing instances are deployed in the private subnet to prevent unauthorized access. We also use auto-scaling groups for the system to be able to pull in additional compute instances when the load is higher.

3. What is the time to market?
One of our main goals was to be able to bring the product to market quickly and put it in front of the customers to gain early and valuable feedback. Developing the product as a partner of AWS was a great help since we were able to use many AWS services for some of the common application needs without spending time in developing our own components for well known use-cases. For example, RLCatalyst Research Gateway does its user management via AWS Cognito, which provides the facility to create users, roles, and groups as well as the ability to interface with other Identity Provider systems.

Similarly, we use AWS DocumentDB (with MongoDB API compatibility) as our database. This allows developers to use a local MongoDB instance, while QA and production systems use AWS DocumentDB with high availability of multi-AZ clusters, automated backups via AWS Backup and Snapshots.

4. Cost efficiency
This is one of the key concerns for every administrator. RLCatalyst Research Gateway uses a scalable architecture that not only lets the system scale up when the load is high but also scales down when the load is less to optimize on the cost. We use EKS clusters to deploy our solution and AWS DocumentDB clusters. This allows us to choose the size and instance type according to the cost considerations.

We have also brought in features like the automatic shutdown of resources so that idle compute instances, which are not running any jobs, can shut down after a 15-minute idle time. Additionally, even resources like ALBs are de-provisioned when the last compute instance behind them is de-provisioned.

We provide a robust cost governance dashboard, allowing users insights into their usage and budget consumption.

5. Security
Our target customers are in the research and scientific computing area, where data security is a key concern. We are frequently asked, “Will the system be secure? Can it help me meet regulatory requirements and compliances?”. RLCatalyst Research Gateway architecture is developed with security in mind at each level. The use of SSL certificates, encryption of data at rest, and the ability to initiate action at a distance are some of the architecture considerations.

Map of AWS Services

AWS Service Purpose Benefits
Amazon EC2, Auto-scaling Elastic Compute Provides easily managed compute resources without need to manage hardware. Integrates well with Infrastructure as Code (IaC)
Amazon Virtual Private Cloud (VPC) Networking Provides isolation of resources, easy management of traffic, isolation of traffic.
Application Load Balancer, AWS Certificate Manager Load-balancer, Secure end-point Provides an easy way to provide a single end-point which can route traffic to multiple target groups. Integrates with AWS Certificate manager to provide SSL support.
AWS CostExplorer, AWS Budgets Cost and Governance Provides fine-grained cost and usage data. Notifications when budget thresholds are reached.
AWS Service Catalog Catalog of approved IT Services on AWS Provides control on what resources can be used in an AWS account.
AWS WAF (Web Application Firewall) Application Firewall Helps manage malicious traffic
Amazon Route53 DNS (Domain Name System) Services Provides hosted zones and API access to manage the same.
Amazon Cloudfront CDN (Content Delivery Network) Caches content closest to end-users to reduce latency and improve customer experience.
AWS Cognito User Management Authentication and authorization
AWS Identity and Access Management (IAM) Identity Management Provides support for granular control based on policies and roles.
AWS DocumentDb NoSQL database MongoDB compatible API


Validation of the Solution
It is always good to validate your solution with an external review from the experts. AWS offers such an opportunity to all its partners by way of the AWS Foundational Technical Review. The review is valid for two years and is free of cost to partners. Looking at our design through the FTR Lens enabled us to see where our design could get better in terms of using the best practices (especially in the areas of security and cost-efficiency). Once these changes were implemented, we earned the “Reviewed by AWS” badge.

Summary
Relevance Lab developed the RLCatalyst Research Gateway in close partnership with AWS. One of the excellent tools available from AWS for any software architecture team is the AWS Well-Architected Framework with its five pillars of Operational Excellence, Security, Reliability, Performance Efficiency, and Cost Efficiency. Working within this framework greatly facilitates the development of a robust architecture that serves not only current but also future goals.

To know more about RLCatalyst Research Gateway architecture, feel free to write to marketing@relevancelab.com.

References
How to speed up the GEOS-Chem Earth Science Research using AWS Cloud?
Driving Frictionless Scientific Research on AWS Cloud
Leveraging AWS HPC for Accelerating Scientific Research on Cloud
Health Informatics and Genomics on AWS with RLCatalyst Research Gateway
Enabling Researchers with Next-Generation Sequencing (NGS) Leveraging Nextflow and AWS
8-Steps to Set-Up RLCatalyst Research Gateway



0

2021 Blog, Blog, Featured

The CMS-1500 form is vital to the smooth functioning of the American health insurance system, and yet processing these manually filled-up, paper-based forms can be a nightmare. Relevance Lab has developed a new approach to automating claims processing that improves output quality and delivers cost savings.

CMS-1500 Processing Challenges
The CMS-1500 form, formerly known as the HCFA-1500 form, is the standard claim form used by a non-institutional provider or supplier to bill Medicare carriers under certain circumstances. Processing these important documents presents several challenges:


  • Large volume of information: A single form has 33 fields (plus sub-fields) to be filled up manually; multi-page claims are required if more than 6 services are provided.
  • Illegible handwriting: Since the forms are filled manually (and often in a hurry), it is quite common to find illegible or difficult to read entries.
  • Incomplete or inconsistent information: Fields are often missing or inconsistent (e.g. multiple spellings of the same name), complicating the task.
  • Poor scan quality: The scan quality can be poor due to misorientation of the form, folds, etc., making it difficult to recognize text.

Most users are faced with a Hobson’s choice between costly manual processing and low-accuracy conventional automation solutions, neither of which produce acceptable results. Manual processing can be slow, laborious, fatigue-prone and costs tend to grow linearly with claims volumes, regardless of whether the manpower is in-house or outsourced. Conventional automation solutions based on simple scanning and optical character recognition (OCR) techniques struggle to deal with such non-standardized data leading to high error rates.

Relevance Lab has developed a solution to address these issues and make CMS-1500 claims processing simpler and more cost-efficient without compromising on accuracy.

Relevance Lab’s Smart Automation Solution
Our solution enhances the effectiveness of automation by utilizing artificial intelligence and machine learning techniques. At the same time, the workflow design ensures that the final sign-off on form validation is provided by a human.
An illustrative solution architecture is given below:


Salient features of the solution are as follows:


Best-in-class OCR The solution utilizes the Tesseract open-source OCR engine (supported by Google), which delivers a high degree of character recognition accuracy, to convert the scanned document into a searchable pdf.
Processing & Validation against master data The document is analyzed using RL’s proprietary SPECTRA platform. Common errors (such as misaligned check-box entries) are corrected, and relevant fields are validated against the master data to catch anomalies (e.g. spelling errors).
Assisted human review The updated document is presented for human review. Fields that require attention are highlighted, together with algorithm-generated guesstimates suggesting possible corrections.
Automatic update of downstream data Once approved, downstream systems are automatically updated with validated data.
Self-learning The iterative self-learning algorithm improves with every validation cycle resulting in continuous refinement in accuracy. This improvement can be tracked over time through built-in trackers.
Workflow tracking The solution is equipped with dashboards that enable tracking the progress of a document through the cycle.
Role-based access It is possible to enable role-based access to different modules of the system to ensure data governance.


The following diagram presents a typical process flow incorporating our solution:


Demonstrated Benefits
Our CMS-1500 processing solution delivers significant time and cost savings, and quality improvements. It frees up teams from tedious tasks like data entry, without compromising human supervision and control over the process. The solution is scalable, keeping costs low even when processing volumes increase manifold.

The solution is tried, tested, and proven to deliver substantial value. For example, in a recent implementation at a renowned healthcare provider, the Relevance Lab SPECTRA solution was able to reduce the claims processing time by over 90%. Instead of a dedicated team working daily to process claim forms, manual intervention is now required only once a week for review and approvals. The resources freed up are now more productively utilized. This has also led to an increase in accuracy through the elimination of “human errors” such as typos.

Powered by the RL SPECTRA analytics platform, the solution has successfully delivered productivity gains to multiple clients by efficiently ingesting and processing structured and unstructured data. The plug-and-play platform is easy to integrate with most common system environments and applications.

Conclusion
CMS-1500 claims processing can be significantly optimized by using Relevance Lab’s intelligent solution based on its SPECTRA platform that combines the speed and scalability of automation with the judgment of a human reviewer to deliver substantial productivity gains and cost savings to organizations.

For more details, please feel free to reach out to marketing@relevancelab.com.



0

2021 Blog, Blog, BOTs Blog, Featured

While helping our customers with the right way to use the cloud using an Automation-First approach, the primary focus from Relevance Lab is to enable significant automation (achieved 70%+ for large customers) of day-to-day tasks with benefits on the speed of delivery, quality improvements, and cost reduction. Large customers have complex organizational structures with different groups focussing on infrastructure automation, application deployment automation, and service delivery automation. In many cases, there is a missing common architecture in planning, building, and running a proper end-to-end automation program. To help enterprises adopt an Automation-First approach for cloud adoption covering all three aspects of infrastructure, applications, and service delivery, we help create a blueprint for an Automation Factory.

In this blog, we are sharing our approach for large customers with a complex landscape of infrastructure and applications. The focus of this blog is more on application deployment automation with custom and COTS (commercial off-the-shelf) products in Cloud.

Some of the most typical asks by customers with all their workloads in AWS Cloud is captured below:


  • Separation of roles between common infrastructure teams and multiple business units managing their own application needs
  • Infrastructure teams provide base AMI with CloudFormation stacks to provide basic OS-level compute workloads to application groups, who manage their own deployments
  • Application groups deal with a set of custom Java + .NET applications and COTS products, including Oracle Fusion Middleware stacks
  • Application groups manage the complete lifecycle of deployment and support in production environments
  • Application deployments are about 20% containerized and 80% direct installations in hybrid scenarios with legacy codebases
  • Different set of tools are used along with homegrown custom scripts
  • Primary pain points are to automate application and product (COTS) build and deploy lifecycle across different environments and upgrades
  • The solution is expected to leverage DevOps maturity and automation-led standardization for speed and flexibility
  • Need guidance on the choice of Automation Factory model between mutable vs. immutable designs

Key requirements from application groups are shared below based on the snapshot of products for which there is a need for automated installation and scalability at run-time. The shift needs to happen from “handcrafting” product installations to automated and easy deployment, preferably with immutable infrastructure.


Standard Products COTS Products (High Priority) COTS Products (Good to have)
Weblogic Oracle E-Business Suite (Financial Portal) Cisco UCM
Tomcat 7, 8, & 9 OBIEE Kofax
Apache Oracle Discoverer IBM Business Rules Engine
IIS 10 Oracle Siebel CRM Aspect
Oracle 19 Microsoft SQL Server Reporting Service Avaya
SQL Server Oracle Fusion AS/400 Apps
MS SQL SAP Enterprise Adobe AEM


Relevance Lab Approach for Hyperautomation with RLCatalyst and BOTs
Our teams have implemented 50+ engagements across customers and created a mature automation framework to help re-use and speed up the need for an Automation Factory using RLCatalyst BOTs and RLCatalyst Cloud Portals.

The figure below explains the RLCatalyst solutions for hyperautomation leveraging the Automation Service Bus (ASB) framework that allows easy integration with existing customer tools and cloud environments.


The key building block of automation depends on the concept of BOTs. So what are BOTs?


  • BOTs are automation codes managed by Automation Service Bus orchestration
    • Infrastructure creation, updation, deletion
    • Application deployment lifecycle
    • Operational services, tasks, and workflows – Check, Act, Sensors
    • Interacting with Cloud and On-prem systems with integration adapters in a secure and auditable manner
    • Targeting any repetitive Operations tasks managed by humans – frequently, complex (time-consuming), security/compliance related
  • What are types of BOTs?
    • Templates – CloudFormation, Terraform, Azure Resource Models, Service Catalog
    • Lambda functions, Scripts (PowerShell/python/shell scripts)
    • Chef/Puppet/Ansible configuration tools – Playbooks, Cookbooks, etc.
    • API Functions (local and remote invocation capability)
    • Workflows and state management
    • UIBOTs (with UiPath, etc.) and un-assisted non-UI BOTs
    • Custom orchestration layer with integration to Self-Service Portals and API Invocation
    • Governance BOTs with guardrails – preventive and corrective
  • What do BOTs have?
    • Infra as a code stored in source code configuration (GitHub, etc.)
    • Separation of Logic and Data
    • Managed Lifecycle (BOTs Manager and BOTs Executors) for lifecycle support and error handling
    • Intelligent Orchestration – Task, workflow, decisioning, AI/ML

Proposed Solution to Customers
There are different approaches to achieving end-to-end automation, and the right solution depends on a proper assessment of the context of customer needs. Relevance Lab follows a consultative approach that helps do a proper assessment of customer needs, priorities, and business goals to create the right foundation and suggest a maturity model for an Automation Factory. Also, different engagement models are offered to customers covering the entire phase of the Plan-Build-Run lifecycle of automation initiatives, including organization design and change management.

The following table helps plan the right approach and maturity model to be adopted for BOTs targeting different levels of complexity for automation.


BOT Complexity Functionality Coverage Leveraging Relevance Lab Products and Solutions
Level-1 Standard Cloud Resources Provisioning in a secure, multi-account covering compute, storage and data EC2 Linux, EC2 Win, S3 Buckets, RDS, SageMaker, ALB, EMR, VPC, etc. with AWS Service Catalog AWS Console and ITSM Portals RLCatalyst Cloud Portal, BOTs Server CI/CD Pipelines with BOTs APIs
Level-2 Standard Applications deployment covering Middleware, Databases, Open Source Applications requiring single node setup. Single Node COTS setups can also be included though more complex Tomcat, Apache, MySQL, NGINX – common Middleware and Database Stacks Portal, CI/CD Pipeline, CLI Variants:
– Option-1 AMI Based (Preferred model for Immutable design)
– Option- 2 Docker Based
– Option- 3 Chef/Ansible Post Provision App Install & Configure (Mutable Design)
BUILD Phase – Covering Plan, Build, Test, Publish Lifecycle
CONSUME Phase – Production Deploy & Upgrade Lifecycle
Level-3 Multi-tier Applications – 2-Tier, 3-Tier, N-Tier with Web + App + DB, etc. combinations Required a combination of Infra, Apps, Post provision configurations, and orchestration. Complex Infra with ALB, PaaS Service Integrations Orchestration engine and service discovery/registry Docker and Kubernetes clusters
Level-4 Complex Business Apps – ERP, Oracle EBS, COTS, HPC Clusters – not supporting standard Catalog Items. Complex workflows with integration to multiple Third-Party systems UI or System Driven Custom Orchestration Flows and workflow modules Event-driven and state management Post provisioning complex integrations Pre-built BOTs Library

Leveraging a combination of Relevance Lab products and solutions, we provide a mature Automation Factory blueprint to our customers, as shown below.


The above solution is built leveraging best practices from AWS Well-Architected framework and bringing in a combination of AWS tools and other third-party solutions like HashiCorp, Ansible, Docker, Kubernetes, etc. The key building blocks of the Automation Factory cover the following tools and concepts:


  • AWS AMI Builder Factory and Golden AMI concept
  • HashiCorp Packer Scripts
  • OS and Hardening with Ansible
  • Vulnerability Assessment and Patch Management
  • AWS Inspector, AWS Parameter Store, AMI Catalog publishing, Multi-Account AWS Best Practices
  • AWS Service Catalog, Multi-Account Governance, Master and Consumption accounts
  • Self-Service Cloud Portals with guard-rails and automated fulfilment
  • CI/CD Pipelines for non-user assisted workflows using RLCatalyst BOTs, Terraform Templates, Jenkins, Docker, and Kubernetes
  • Monitoring and diagnostics with Observability tools like RLCatalyst Command Center
  • Ongoing Governance, Cost Management, Lifecycle Management, Blue-Green Deployments, and Container Management
  • Cloud Accounts, VPC Automation, AWS Control Tower, AWS Management, and Governance Lens Automation

Summary
The journey to adopting an Automation-First approach requires a strong foundation that our Automation Factory solution offers, saving at least 6 months of in-house efforts and about US$250K worth of savings for large customers annually. The BOTs deployed can scale up to provide productivity gains of 4-5 people full-time employees with other benefits of better fulfillment SLAs, quality, and compliance gains. In the case of COTS deployments, especially with Oracle stacks, our BOTs have reduced the time of deployments from a few weeks to a few hours.

To know more about how can we help your automation journey, feel free to contact marketing@relevancelab.com.

Reference Links
Considerations for AWS AMI Factory Design
AWS Well-Architected Best Practices
ASB – A New Approach for Intelligent and Touchless Automation



0

2021 Blog, Blog, Featured

We aim to enable the next-generation cloud-based platform for collaborative research on AWS with access to research tools, data sets, processing pipelines, and analytics workbench in a frictionless manner. It takes less than 30 minutes to launch a “MyResearchCloud” working environment for Principal Investigators and Researchers with security, scalability, and cost governance. Using the Software as a Service (SaaS) model is a preferable option for Scientific research in the cloud with tight control on data security, privacy, and regulatory compliances.

Typical top-5 use cases we have found for MyResearchCloud as a suitable solution for unlocking your Scientific Research needs:

  • Need an RStudio solution on AWS Cloud with an ability to connect securely (using SSL) without having to worry about managing custom certificates and their lifecycle
  • Genomic pipeline processing using Nextflow and Nextflow Tower (open source) solution integrated with AWS Batch for easy deployment of open source pipelines and associated cost tracking per researcher and per pipeline
  • Enable researchers with EC2 Linux and Windows servers to install their specific research tools and software. Ability to add AMI based researcher tools (both private and from AWS Marketplace) with 1-click on MyResearchCloud
  • Using SageMaker AI/ML Workbench drive Data research (like COVID-19 impact analysis) with available public data sets already on AWS cloud and create study-specific data sets
  • Enable a small group of Principal Investigator and researchers to manage Research Grant programs with tight budget control, self-service provisioning, and research data sharing

MyResearchCloud is a solution powered by RLCatalyst Research Gateway product and provides the basic environment with access to data, workspaces, analytics workbench, and cloud pipelines, as explained in the figure below.


Currently, it is not easy for research institutes, their IT staff, and a group of principal investigators & researchers to leverage the cloud easily for their scientific research. While there are constraints with on-premise data centers and these institutions have access to cloud accounts, converting a basic account to one with a secured network, secured access, ability to create & publish product/tools catalog, ingress & egress of data, sharing of analysis, tight budget control and other non-trivial tasks divert the attention away from ‘Science’ to ‘Servers’.

We aim to provide a standard catalog for researchers out-of-the-box solution with an ability to also bring your own catalog, as explained in the figure below.


Based on our discussions with research stakeholders, especially small & medium ones, it was clear that the users want something as easy to consume as other consumer-oriented activities like e-shopping, consumer banking, etc. This led to the simplified process of creating a “MyResearchCloud” with the following basic needs:


  • This “MyResearchCloud” is more suitable for smaller research institutions with a single or a few groups of Principal Investigators (PI) driving research with few fellow researchers.
  • The model to set up, configure, collaborate, and consume needs to be extremely simple and comes with pre-built templates, tools, and utilities.
  • PI’s should have full control of their cloud accounts and cost spends with dynamic visibility and smart alerts.
  • At any point, if the PI decides to stop using the solution, there should be no loss to productivity and preservation of existing compute & data.
  • It should be easy to invite other users to collaborate while still controlling their access and security.
  • Users should not be loaded with technical jargon while ordering simple products for day-to-day research using computation servers, data repositories, analysis IDE tools, and Data processing pipelines.

Based on the above ask, the following simple steps have been enabled:


Steps to Launch Activity Total time from Start
Step-1 As a Principal Investigator, create your own “MyResearchCloud” by using your Email ID or Google ID to login the first time on Research Gateway. 1 min
Step-2 If using a personal email ID, get an activation link and login for the first time with a secure password. 4 min
Step-3 Use your own AWS account and provide secure credentials for “MyResearchCloud” consumption. 10 min
Step-4 Create a new Research Project and set up your secure environment with default networking, secure connections, and a standard catalog. You can also leverage your existing setup and catalog. 13 min
Step-5 Invite new researchers or start using the new setup to order your products to get started with a catalog covering data, compute, analytic tools, and workflow pipeline. 15 min
Step-6 Order the necessary products – EC2, S3, Sagemaker/RStudio, Nextflow pipelines. Use the Research Gateway to interact with these tools without the need to access AWS Cloud console for PI and Researchers. 30 min


The picture below shows the easy way to get started with the new Launchpad and 30 minutes countdown.


Architecture Details
To balance the needs of Speed with Compliance, we have designed a unique model to allow Researchers to “Bring your own License” while leveraging the benefits of SaaS in a unique hybrid approach. Our solution provides a “Gateway” model of hub-and-spoke design where we provide and operate the “Hub” while enabling researchers to connect their own AWS Research accounts as a “Spoke”.

Security is a critical part of the SaaS architecture with a hub-and-spoke model where the Research Gateway is hosted in our AWS account using best practices of Cloud Management & Governance controlled by AWS Control Tower while each tenant is created using AWS security best practices of minimum privileges access and role-based access so that no customer-specific keys or data are maintained in the Research Gateway. The architecture and SaaS product are validated as per AWS ISV Path program for Well-Architected principles and data security best practices.

The following diagram explains in more detail the hub-and-spoke design for the Research Gateway.


This de-coupled design makes it easy to use a Shared Gateway while connecting your own AWS Account for consumption with full control and transparency in billing & tracking. For many small and mid-sized research teams, this is the best balance between using a third-party provider-hosted account and having their own end-to-end setup. This structure is also useful for deploying a hosted solution covering multiple group entities (or conglomerates), typically covering a collaborative network of universities working under a central entity (usually funded by government grants) in large-scale genomics grants programs. For customers who have more specific security and regulatory needs, we do allow both the hub-and-spoke deployment accounts to be self-hosted. The flexible architecture can be suitable for different deployment models.


AWS Services that MyResearchCloud uses for each customer:


Service Needed for Secure Research Solution Provided Run Time Costs for Customers
Need for DNS-based friendly URL to access MyResearchCloud SaaS RLCatalyst Research Gateway No additional costs
Secure SSL-based connection to my resources AWS ACM Certificates used and AWS ALB created for each Project Tenant AWS ALB implemented smartly to create and delete based on dependent resources to avoid fixed costs
Network Design Default VPC created for new accounts to save users trouble of network setups No additional costs
Security Role-based access provided to RLCatalyst Research Gateway with no keys stored locally No additional costs. Users can revoke access to RLCatalyst Research Gateway anytime.
IAM Roles AWS Cognito based model for Hub No additional costs for customers other than SaaS user-based license
AWS Resources Consumption Directly consumed based on user actions. Smart features are available by default with 15 min auto-stop for idle resources to optimize spends. Actual usage costs that is also suggested for optimization based on Spot instances for large workloads
Research Data Storage Default S3 created for Projects with the ability to have shared Project Data and also create private Study Data. Ability to auto-mount storage for compute instances with easy access, backup, and sync with base AWS costs
AWS Budgets and Cost Tracking Each project is configured to track budget vs. actual costs with auto-tagging for researchers. Notification and control to pause or stop consumption when budgets are reached. No additional costs.
Audit Trail All user actions are tracked in a secure audit trail and are visible to users. No additional costs
Create and use a Standard Catalog of Research Products Standard Catalog provided and uploaded to new projects. Users can also bring their own catalogs No additional costs.
Data Ingress and Egress for Large Data Sets Using standard cloud storage and data transfer features, users can sync data to Study buckets. Small set of files can also be uploaded from the UI. Standard cloud data transfer costs apply

In our experience, research institutions can enable new groups to use MyResearchCloud with small monthly budgets (starting with US $100 a month) and scale their cloud resources with cost control and optimized spendings.

Summary
With an intent to make Scientific Research in the cloud very easy to access and consume like typical Business to Consumer (B2C) customer experiences, the new “MyResearchCloud” model from Relevance Lab enables this ease of use with the above solution providing flexibility, cost management, and secure collaborations to truly unlock the potential of the cloud. This provides a fully functional workbench for researchers to get started in 30 minutes from a “No-Cloud” to a “Full-Cloud” launch.

If this seems exciting and you would like to know more or try this out, do write to us at marketing@relevancelab.com.

Reference Links
Driving Frictionless Research on AWS Cloud with Self-Service Portal
Leveraging AWS HPC for Accelerating Scientific Research on Cloud
RLCatalyst Research Gateway Built on AWS
Health Informatics and Genomics on AWS with RLCatalyst Research Gateway
How to speed up the GEOS-Chem Earth Science Research using AWS Cloud?
RLCatalyst Research Gateway Demo
AWS training pathway for researchers and research IT



0

2021 Blog, AppInsights Blog, Blog, Featured, ServiceOne

Relevance Lab announces the availability of a new product RLCatalyst AppInsights on ServiceNow Store. The certified standalone application will be available free of cost and offers a dynamic application-centric view of AWS resources.

Built on top of AWS Service Catalog AppRegistry and created in consultations with AWS Teams, the product offers a unique solution for ServiceNow and AWS customers. It offers dynamic insights related to cost, health, cloud asset usage, compliance, and security with the ability to take appropriate actions for operational excellence. This helps customers to manage their multi-account dynamic application CMDB (Configuration Management Database).

The product includes ServiceNow Dashboards with metrics and actionable insights. The design has pre-built connectors to AWS services and unique RL DataBridge that provides integration to third-party applications using serverless architecture for extended functionality.

Why do you need a Dynamic Application-Centric View for Cloud CMDB?
Cloud-based dynamic assets create great flexibility but add complexity for near real-time asset and CMDB tracking, especially for enterprises operating in a complex multi-account, multi-region, and multi-application environment. Such enterprises with complex cloud infrastructures and ITSM tools, struggle to change the paradigm from infrastructure-centric views to application-centric insights that are better aligned with business metrics, financial tracking and end user experiences.

While existing solutions using Discovery tools and Service Management connectors provided a partial solution to an infrastructure-centric view, a robust Application Centric Dynamic CMDB was a missing solution that is now addressed with this product. More details about the features of this product can be found on this blog.

Built on AWS Service Catalog AppRegistry
AWS Service Catalog AppRegistry helps to create a repository of your applications and associated resources. These capabilities enable enterprise stakeholders to obtain the information they require for informed strategic and tactical decisions about cloud resources.

Leveraging AWS Service Catalog AppRegistry as the foundation for the application-centric views, RLCatalyst AppInsights enhances the value proposition and provides integration with ServiceNow.

Value adds provided:

  • Single pane of control for Cloud Operational Management with ServiceNow
  • Cost planning, tracking, and optimization across multi-region and complex cloud setups
  • Near real-time view of the assets, health, security, and compliance
  • Detection of idle capacity and orphaned resources
  • Automated remediation

This enables the entire lifecycle of cloud adoption (Plan, Build and Run) to be managed with significant business benefits of speed, compliance, quality, and cost optimization.

Looking Ahead
With the new product now available on the ServiceNow store, it makes easier for enterprises to download and try this for enhanced functionality on existing AWS and ServiceNow platforms. We expect to work closely with AWS partnership teams to drive the adoption of AWS AppRegistry Service Catalog and solutions for TCAM (Total Cost of Application Management) in the market. This will help customers optimize their application assets tracking and cloud spends by better planning, monitoring, analyzing and corrective actions, through an intuitive UI-driven ServiceNow application at no additional costs.

To learn more about RLCatalyst AppInsight, feel free to write to marketing@relevancelab.com.



0

2020 Blog, Blog, Featured, ServiceOne, ServiceNow

AWS provides a Service Management Connector for ServiceNow and Jira Service Desk end users to provision, manage and operate AWS resources securely via ITSM Portal. However, a similar solution does not exist for FreshService. The same maturity of end to end automation for Freshservice customers can be provided by using Relevance Lab’s RLCatalyst BOTs solution. It will provide an Automation Service Bus between ITSM tools and AWS Cloud assets.

Freshservice is an Intelligent Service Management platform, which comprises of all the essential modules like Incident Management, Problem Management, Change Management, Release Management, Project Management, Knowledge Management and Asset Management including Hardware, Software and Contracts. It also provides consolidated reports including analytics.

Many customers are adopting Freshservice as an ITSM cloud based solution and orchestrating self-service requests for organizations. One of the common automation needs is for User and Workspace onboarding and offboarding that involves integration with HR systems, AWS Service Catalog and AWS Control Tower for proper management and governance. Similarly using Infrastructure As Code model, organizations are using Cloud Formation based template models for complex workloads provisioning with 1-Click models.

The Freshservice workflow automator with RLCatalyst BOTs integration helps in automation of simple repetitive tasks like assignment of tickets to the right groups, and setup of multi-level approvals. It is a simple drag and drop interface which can help to automate most of the simple use cases. In addition, the webhook option allows automation of complex workflows or use cases by integrating with the right automation tools. In addition to this, the business rules for forms feature will enable you to describe conditional logic and actions to create complex dynamic forms.

The below diagram illustrates the Integration Architecture between FreshService, AWS and RLCatalyst.


Using the integrated solution, organizations can automate use cases related to both End User Computing (EUC) and other standard Server side workloads provisioning. Two common examples are :

  • User and Workspace Provisioning : Onboard a new user and request for an AWS workspace where the original request is generated by Workday/Taleo.
  • Server Infrastructure Provisioning, Application Deployment and Configuration Updates : Request for provisioning of a complex multi-node workload using Service Catalog item fulfilled with an AWS Cloud Formation template and post provisioning setup.

The below diagram illustrates the following EUC automation.


The steps to Onboard a new user and Workspace in an automated are as follows.

  • RLCatalyst enables Freshservice to create an Service Request(SR) using the file generated from Workday or Taleo.
  • Once an SR is created, the workflow automator of Freshservice triggers the approval workflow for either auto approval, cost based approval or role-based approval.
  • Based on the approval workflow defined, and successful execution of the same, the next step is to request RLCatalyst to trigger the onboarding workflow within RLCatalyst.
  • RLCatalyst, then enables the BOT 1for creation of a user in simple AD.
  • BOT 2 sends out a request for provision of AWS workspace, while the BOT3 looks for the status of the workspace creations.
  • Once the status is received on the successful provision by the BOT3, the workflow instructs the AWS SNS to send out a notification email to the end user with the workspace details and login credentials.
  • Finally, RLCatalyst sends a request back to Freshservice for the successful closure of the SR.
  • In case of failure of workspace provision, RLCatalyst will instruct Freshservice to create an Incident to check for the Root Cause Analysis(RCA).

Similarly, a user can request for a multi-node application stack deployment in AWS using Freshservice service catalog. The below diagram illustrates the following :


  • Create the infrastructure with multiple AWS resources (EC2, S3, RDS etc).
  • Deploy one or more applications on the instances created (Web Tier, App Tier, DB Tier).
  • Configure the application with the run-time information. e.g. DNS endpoint creation, bind the listening IP address of an application to the IP address of the instance created. Then update YAML files with environment variable values etc.
  • Deploy the monitoring agents like Infra health, App health, Log monitoring and Service Registry.
  • Setup network configurations like hosted zones, routes etc and setup security configurations like SSL certificates.

The multi-stage orchestration requires a workflow for state and context management during the lifecycle and this is provided by using RLCatalyst Workflow capabilities.

Relevance Lab is a solution partner of Freshservice. We assist the enterprises to adopt AWS Cloud with intelligent automation using RLCatalyst BOTs. Relevance Lab also offers a pre-integrated solution of ServiceOne with Freshservice.

For a demo video and for more details,  please click here.

For more details, please feel free to reach out to marketing@relevancelab.com



0

2021 Blog, SPECTRA Blog, Blog, Featured

Oracle Fusion is an invaluable support to many businesses for managing their transaction data. However, business users would be familiar with limitations when it comes to generating even moderately complex analyses and reports involving a large volume of data. In a Big Data-driven world, this can become a major competitive disadvantage. Relevance Lab has designed SPECTRA, a Hadoop-based platform, that makes Oracle Fusion reporting reports simple, quick, and economical even when working with billions of transaction records.

Challenges with Oracle Fusion Reporting
Oracle Fusion users often struggle to extract reports from large transactional databases. Key issues include:


  • Inability to handle large volumes of data to generate accurate reports within reasonable timeframes.
  • Extracting integrated data from different modules of the ERP is not easy. It requires manual effort for synthesizing fragmented reports, which makes the process time-consuming, costly, and error-prone. Similar problems arise when trying to combine data from the ERP with that from other sources.
  • The reports are static, not permitting a drill down on the underlying drivers of reported information.
  • There are limited self-service operations, and business users have to rely heavily on the IT department for building new reports. It is not uncommon for weeks and months to pass between the first report request and the availability of the report.

Moreover, Oracle has stopped supporting its reporting tool Discoverer from 2017, creating additional challenges for users that continue to rely on it.

How RL SPECTRA can Help
Relevance Lab recognizes the value to its clients of generating near real-time dynamic insights from large, ever-growing data volumes at reasonable costs. With that in mind, we have developed an Enterprise Data Lake (EDL) platform, SPECTRA, that automates the process of ingesting and processing huge volumes of data from the Oracle Cloud.

This Hadoop-based solution has advantages over traditional data warehouses and ETL solutions due to its:


  • superior performance through parallel processing capability and robustness when dealing with large volumes of data,
  • rich set of components like Spark, AI/ML libraries to derive insights from big data,
  • a high degree of scalability,
  • cost-effectiveness, and
  • ability to handle semi-structured and unstructured data.

After the initial data ingestion into the EDL, incremental data ingestion uses delta refresh logic to minimize the time and computing resources spent on ingestion.

SPECTRA provides users access to raw data (based on authorization) empowering them to understand and analyze data as per their requirement. It enables users to filter, sort, search & download up to 6 million records at one go. The platform is also capable of visualizing data in charts, apart from being compatible with standard dashboard tools.

This offering combines our deep Oracle and Hadoop expertise with extensive experience across industries.
With this solution, we have helped companies generate critical business reports from massive volumes of underlying data delivering substantial improvement in extraction and processing time, quality, and cost-effectiveness.


Use Case: Productivity Enhancement through Optimised Reporting for a Publishing Major
A global publishing major that had recently deployed Oracle Fusion Cloud applications for inventory, supply chain, and financial management discovered that these were inadequate to meet its complex reporting and analytical requirements.


  • The application was unable to accurately process the company’s billion-plus transaction records on the Oracle Fusion Cloud to generate a report on the inventory position.
  • It was also challenging to use an external tool to do this as it would take several days to extract data from the Oracle cloud to an external source while facing multiple failures during the process.
  • This would make the cost and quality reconciliation of copies of books lying in different warehouses and distribution centres across the world very difficult and time-consuming, as business users did not have on-time and accurate visibility of the on-hand quantity.
  • In turn, this had adverse business consequences such as inaccurate planning, higher inventory costs, and inefficient fulfilment.

The company reached out to Relevance Lab for a solution. Our SPECTRA platform automated and optimized the process of data ingestion, harmonization, transformation, and processing, keeping in mind the specific circumstances of the client. The deployment yielded multiple benefits:


  • On-Hand quantity and costing reports are now generated in less than an hour
  • Users can access raw data as well as multiple reports with near real-time data, giving them full flexibility and making the business more responsive to market dynamics
  • Overall, user efforts stand reduced by 150 hours per person per quarter by using SPECTRA for their inventory report, leading to higher productivity
  • With all the raw data in SPECTRA, several reconciliation procedures are in place to identify missing data between the Oracle cloud and its legacy system

The Hadoop-based architecture can be scaled flexibly in response to the continuously growing size of the transaction database and is also compatible with the client’s future technology roadmap.


Conclusion
RL’s big-data platform, SPECTRA, offers an effective and efficient future-ready solution to the reporting challenges in Oracle Fusion when dealing with large data sets. SPECTRA enables clients to access near real-time insights from their big data stored on the Oracle Cloud while delivering substantial cost and time savings.

To know more about our solutions or to book a call with us, please write to marketing@relevancelab.com.



0

2021 Blog, Blog, Featured

Bioinformatics is a field of computational science that involves the analysis of sequences of biological molecules (DNA, RNA, or protein). It’s aimed at comparing genes and other sequences within an organism or between organisms, looking at evolutionary relationships between organisms, and using the patterns that exist across DNA and protein sequences to elucidate their function. Being an interdisciplinary branch of the life sciences, bioinformatics integrates computer science and mathematical methods to reveal the biological significance behind the continuously increasing biological data. It does this by developing methodology and analysis tools to explore the large volumes of biological data, helping to query, extract, store, organize, systematize, annotate, visualize, mine, and interpret complex data.

The advances in Cloud computing and availability of open source genomic pipeline tools have provided researchers powerful tools to speed up processing of next-generation sequencing. In this blog, we explain leveraging the RLCatalyst Research Gateway portal to help researchers focus on science and not servers while dealing with NGS and popular pipelines like RNA-Seq.

Steps and Challenges of RNA-Seq Analysis
Any Bioinformatics analysis involving next-generation Sequencing, RNA-Seq (named as an abbreviation of “RNA Sequencing”) constitutes of these following steps:


  • Mapping of millions of short sequencing reads to a reference genome, including the identification of splicing events
  • Quantifying expression levels of genes, transcripts, and exons
  • Differential analysis of gene expression among different biological conditions
  • Biological interpretation of differentially expressed genes

As seen from the figure below, the RNA-Seq analysis for identification of differentially expressed genes can be carried out in one of the three (A, B, C) protocols, involving different sets of bioinformatics tools. In study A, one might opt for TopHat, STAR, and HISAT for alignment of sequences and HTSeq for quantification, whereas the same set of steps can be performed by using Kalisto and Salmon tools (Study B) or in combination with CuffLinks (Study C) all of these yields the same results which are further used in the identification of differentially expressed genes or transcripts.


Each of these individual steps is executed using a specific bioinformatics tool or set of tools such as STAR, RSEM, HISAT2, or Salmon for gene isoform counting and extensive quality control of the sequenced data. The major bottlenecks in RNA-Seq data analysis include manual installations of software, deployment platforms, or computational capacity and cost.

Looking at the vast number of tools available for a single analysis and different versions and their compatibility makes the setup tricky. This can also be time-consuming as proper configuration and version compatibility assessment take several months to complete.

Nextflow: Solution to Bottleneck
The most efficient way to tackle these hurdles is by making use of Nextflow based pipelines that support cloud computing where virtual systems can be provisioned at a fraction of the cost, and the setup is seemingly smoother that can be done by a single individual, as well as support for container systems (Docker and Singularity).

Nextflow is a reactive workflow framework and a programming DSL (Domain Specific Language) that eases the writing of data-intensive computational pipelines.

As seen in the diagram below, the infrastructure to use Nextflow in the AWS cloud consists of a head node (EC2 instance with Nextflow and Nextflow Tower open source software installed) and wired to an AWS Batch backend to handle the tasks created by Nextflow. AWS Batch creates worker nodes at run-time, which can be either on-demand instances or spot instances (for cost-efficiency). Data is stored in an S3 bucket to which the worker nodes in AWS Batch connect and pull the input data. Interim data and results are also stored in S3 buckets, as is the output. The pipeline to be run (e.g. RNA-Seq, DualRNA-Seq, ViralRecon, etc.) is pulled by the worker nodes as a container image from a public repo like DockerHub or BioContainers.

RLCatalyst Research Gateway takes care of provisioning the infrastructure (EC2 node, AWS Batch compute environment, and Job Queues) in the AWS cloud with all the necessary controls for networking, access, data security, and cost and budget monitoring. Nextflow takes care of creating the job definitions and submitting the tasks to Batch at run-time.

The researcher initiates the creation of the workspace from within the RLCatalyst Research Gateway portal. There is a wide selection of parameters as input including, which pipeline to run, tuning parameters to control the sizing and cost-efficiency of the worker nodes, the location of input and output data, etc. Once the infrastructure is provisioned and ready, the researcher can connect to the head node via SSH and launch Nextflow jobs. The researcher can also connect to the Nextflow Tower UI interface to monitor the progress of jobs.


The pre-written Nextflow pipelines can be pulled from an nf-core GitHub repository and can be set up within minutes allowing the entire analysis to run using a single line command, and the results of each step are displayed on the command line/shell. Configuration of the resources on the cloud is seamless as well, since Nextflow based pipelines provide support for batch computing, enabling the analysis to scale as it progresses. Thus, the researchers can focus on running the pipeline and analysis of output data instead of investing time in setup and configurations.

As seen from the pipeline output (MultiQC) report of the Nextflow-based RNA-Seq pipeline below, we can identify the sequence quality by looking at FastQC scores, identify duplication scenarios based on the contour plots as well as pinpoint the genotypic biotypes along with fragment length distribution for each sample.


RLCatalyst Research Gateway enables the setting up and provisioning of AWS cloud resources with few simple clicks for such analysis, and the output of each run is saved in a S3 bucket enabling easy data sharing. These provisioned resources are pre-configured setups with a proper design template and security architecture and added to these features. RLCatalyst Research Gateway enables cost tracking for the currently running projects, which can be paused/ stopped or deleted as per convenience.

Steps for Running Nextflow-Based Pipelines in AWS Cloud for Genomic Research
Prerequisites for a researcher before starting data analysis.

  • A valid AWS account and access to the RLCatalyst Research Gateway portal
  • A publicly accessible S3 bucket with large Research Data sets accessible

Once done, below are the steps to execute this use case.

  • Login to the RLCatalyst Research Gateway Portal and select the project linked to your AWS account
  • Launch the Nextflow-Advanced product
  • Login to the head node using SSH (Nextflow software will already be installed on this node)
  • In the pipeline folder, modify the nextflow.config file to set the data location according to your needs (Github repo, S3 bucket, etc.). This can also be passed via the command line
  • Run the Nextflow job on the head node. This should automatically cause Nextflow to submit jobs to the AWS Batch backend
  • Output data will be copied to the Output bucket specified
  • Once done, terminate the EC2 instance and check for the cost spent on the use case
  • All costs related to the Nextflow project and researcher consumption are tracked automatically

Key Points

  • Bioinformatics involves developing methodology and analysis tools to analyze large volumes of biological data
  • Vast number of tools available for a single analysis and their compatibility make the analysis setup tricky
  • RLCatalyst Research Gateway enables the setting up and provisioning of Nextflow based pipelines and AWS cloud resources with few simple clicks

Summary
Researchers need powerful tools for collaboration and access to commonly used NGS pipelines with large data sets. Cloud computing makes it much easier with access to workflows, data, computation, and storage. However, there is a learning curve for researchers to use Cloud-specific knowhow and how to use resources optimally for large-scale computations like RNA-Seq analysis pipelines that can also be quite costly. Relevance Lab working closely with AWS partnership has provided RLCatalyst Research Gateway portal to use commonly used pre-built Nextflow pipeline templates and integration with open source repositories like nf-core and biocontainers. RLCatalyst Research Gateway enables execution of such Nextflow-based scalable pipelines on the cloud with few clicks and configurations with cost tracking and resource execution control features. By using AWS Batch the solution is very scalable and optimized for on-demand consumption.

For more information, please feel free to write to marketing@relevancelab.com.

References



0

2021 Blog, AWS Governance, Blog, Featured

As enterprises continue to rapidly adopt AWS cloud, the complexity and scale of operations on the AWS have increased exponentially. Enterprises now operate hundreds and even thousands of AWS accounts to meet their enterprise IT needs. With this in mind, AWS Management & Governance has emerged as a major focus area that enterprises need to address in a holistic manner to ensure efficient, automated, performant, available, secure, and compliant cloud operations.


Governance360 integrated with Dash ComplyOps

Governance360
Relevance Lab has recently launched its Governance360 professional services offering in the AWS Marketplace. This offering builds upon Relevance Lab’s theme of helping customers adopt AWS the right way.

Governance360 brings together the framework, tooling, and process for implementing a best-practices-based AWS Management & Governance at scale for multi-account AWS environments. It helps clients seamlessly manage their “Day after Cloud” operations on an ongoing basis. The tooling that would be leveraged for implementing Governance360 can include AWS’s native tools, services, RL’s tools, and third-party industry tools.

Typically a Governance360 type of professional service is engaged either during or after the phase of customers’ transition to AWS cloud (Infra & application migration or development on AWS Cloud).

Dash ComplyOps Platform
Dash ComplyOps platform enables and automates the lifecycle of a client’s journey for compliance of their AWS environments towards industry-specific compliance requirements such as HIPAA, HITRUST, SOC2, GDPR. Dash ComplyOps platform provides organizations with the ability to manage a robust cloud security program through the implementation of guardrails and controls, continuous compliance monitoring, and advanced reporting and remediation of security and compliance issues.


Relevance Lab and Dash Solutions have partnered together to bring an end-to-end solution and professional service offering that helps customers realize an automated AWS Management & Governance posture for their environments meeting regulatory compliance needs.
As a part of this partnership, the Dash ComplyOps platform is integrated within the overall Governance360 framework. The detailed mapping of features, tooling, and benefits (including Dash ComplyOps as a tool) across Governance360’s major topic areas is articulated in the table below.

Benefits


Governance360 Topic Benefits
Automation Lifecycle
  • Automate repetitive and time-consuming tasks
  • Automated setup of environments for common use cases such as regulatory, workloads, etc
  • Codify best practices learned over time
  • Control Services
  • Automated & Standardized Account Provisioning
  • Cost & Budget Management
  • Architecture for Industry Standard Compliance, Monitoring, and Remediation
  • Disaster Recovery
  • Automated & Continuous Compliance Monitoring, Detection, and Remediation
  • Proactive Monitoring
  • Dashboards for monitoring AWS Environment from infrastructure to application
  • Security Management
  • Ease of Deployment of Security Controls @ Scale using CI/CD Pipeline
  • Infra and Application Security Threat Monitoring, Prevention, Detection & Remediation
  • Service & Asset Management
  • Software and Asset Management practice with real-time CMDB for Applications & Infrastructure
  • Incident management and auto-remediation
  • Workload Migration & Management
  • Best practices-based workload migration and implementations on AWS cloud
  • Regulatory Compliance
  • Compliance with industry regulatory standards


  • Engagement Flow

    Engagement Flow / Phase Details Typical Duration*
    Discovery & Assessment
  • Understand current state, data, management & governance goals
  • 1-3 weeks
    Analysis & Recommendation
  • Requirement analysis, apply Governance360 framework, create recommendations & obtain client sign off
  • Recommendations include services, tools, and dashboards & expected outcomes, benefits
  • Use of native AWS services, RL’s monitoring & BOTs, Dash ComplyOps platform, and other 3rd party tools
  • 1-3 weeks
    Implementation
  • Implement, test, UAT & production cutover of recommended services, tools, and dashboards
  • 2-8 weeks
    Hypercare
  • Post-implementation support – monitor and resolve any issues faced
  • 1-2 weeks

    * Duration depends on the complexity and scope of the requirements.

    Summary
    Relevance Lab is a consulting partner of AWS and helps organizations achieve automation-led Cloud Management using Governance360, based on the best practices of AWS. For enterprises with regulatory compliance needs, integration with the Dash ComplyOps platform provides an advanced setup for operation in a multi-account environment. While enterprises can try to build some of these solutions, it is both time-consuming and error-prone and demands a specialist partner. Relevance Lab has helped multiple enterprises with this need and has a reusable automated solution and pre-built library to meet the security and compliance needs of any organization.

    For more details, please feel free to contact marketing@relevancelab.com.

    References
    Governance 360 – Are you using your AWS Cloud “The Right Way”
    AWS Security Governance for enterprises “The Right Way”
    Dash ComplyOps by Dash Solutions
    Governance360 available on AWS Marketplace



    0

    2021 Blog, Blog, Featured, SWB Blog

    Provide researchers access to secure RStudio instances in the AWS cloud by using Amazon issued certificates in AWS Certificate Manager (ACM) and an Application Load Balancer (ALB)

    Cloud computing offers the research community access to vast amounts of computational power, storage, specialized data tools, and public data sets, collectively referred to as Research IT, with the added benefit of paying only for what is used. However, researchers may not be experts in using the AWS Console to provision these services in the right way. This is where software solutions like Service Workbench on AWS (SWB) make it possible to deliver scientific research computing resources in a secure and easily accessible manner.

    RStudio is a popular software used by the Scientific Research Community and supported by Service Workbench. Researchers use RStudio very commonly in their day-to-day efforts. While RStudio is a popular product, the process of installing RStudio securely on AWS Cloud and using it in a cost-effective manner is a non-trivial task, especially for Researchers. With SWB, the goal is to make this process very simple, secure, and cost-effective for Researchers so that they can focus on “Science” and not “Servers” thereby increasing their productivity.

    Relevance Lab (RL), in partnership with AWS, set out to make the experience of using RStudio with Service Workbench on AWS simple and secure.

    Technical Solution Goals

    1. A researcher should be able to launch an RStudio instance in the AWS cloud from within the Service Workbench portal.
    2. The RStudio instance comes fully loaded with the latest version of RStudio and a variety of other software packages that help in scientific research computing.
    3. The user launches a URL to the RStudio from within the Service Workbench. This URL is a unique URL generated by SWB and is encoded with an authentication token that ensures that the researcher can access the RStudio instance without remembering any passwords. The URL is served over SSL so that all communications can be encrypted in transit.
    4. Maintaining the certificates used for SSL communication should be cost-effective and should not require excessive administrative efforts.
    5. The solution should provide isolation of researcher-specific instances using allowed IP lists controlled by the end-user.

    Comparison of Old and New Design Principles to make Researcher Experience Frictionless
    The following section summarizes the old design and the new architecture to make the entire researcher experience frictionless. Based on feedback from researchers, it was felt that the older design required a lot of setup complexity and lifecycle upgrades for security certificate management, slowing down researchers productivity. The new solution makes the lifecycle simple and frictionless along with smart and innovative features to keep ongoing costs optimized.


    No. RStudio Feature Original Design Approach New Design Approach
    1 User Generated Security Certificate for SSL Secure Connections to RStudio. Users have to create a certificate (like LetsEncrypt) and use it with RStudio EC2 Instance with NGINX server. This creates complexity in the Certificate lifecycle. Complex for end-users to create, maintain and renew. The RStudio AMI also needs to manage the Certificate lifecycle. Move from External certificates to AWS ACM.

    Bring in a shared AWS ALB (Application Load Balancer) and use AWS ACM certificates for each Hosting Account to simplify the Certificate Management Lifecycle.
    2 SSL Secure Connection. Create an SSL connection with Nginx Server on RStudio EC2. Related to custom certificate management. Replaced with ALB at an Account level and shared by all RStudio Instances in an account. User Portal to ALB connection secured by ACM. For ALB to RStudio EC2 secure connection, use unique self-signed Certificates to encrypt connection per RStudio.
    3 Client Role (IAM) changes in SWB. Client role is provided necessary permissions for setup purposes. Additional role privileges added to handle ALB.
    4 ALB Design. Not existing in the original design. Shared ALB design per Hosting Account to be shared between Projects. Each ALB is expected to cost about $20-50 monthly in shared mode with average use. API model used to create/delete ALB.
    5 Route 53 Changes on the Main account. A CNAME record gets created with the EC2 DNS name. A CNAME record gets created with the ALB DNS name.
    6 RStudio AMI. Embedded with Certificate details. Related to custom certificate management. Independent of user-provided Certificate details. Also, AMI has been enhanced to include the following: Self-signed SSL and additional packages (as commonly requested by researchers) are baked into the AMI.
    7 RStudio Cloud Formation Template (CFT). Original one to be removed from SWB. Added a new output to indicate the “Need ALB” flag. Also, create a new target group to which the ALB can route requests.
    8 SWB Hosting Account Configuration. Did not have to provision certificate AWS ACM. Manual process to set up a certificate in a new hosting account.
    9 Provisioned RStudio per Hosting Account Active Count Tracking. None. Needed to ensure ALB is created the first time when RStudio is provisioned and deleted after the last RStudio is deleted to optimize cost overheads of ALB.
    10 SWB DynamoDB Table Changes. DynamoDB used for all Tables by SWB. Modifications needed to support the new design. Added to the existing DeploymentStore table in SWB design.
    11 SWB Provision Environment Workflow. Standard design. Additional Step added to check if “Workspace Type” needs ALB and if it does when checking for ALB and either create or pass the reference to existing one.
    12 SWB Terminate Environment Workflow. Standard design. Additional Step added to check if last Active RStudio being deleted and if so, also delete ALB to reduce idle costs.
    13 Secure “Connection” Action from SWB Portal to RStudio instance. To ensure each RStudio has a secure connection for each user a unique connection URL is generated during the user session that is valid for a limited period. The same design of the original implementation is preserved. Internally the routing is managed through ALB but the concept remains the same. This ensures users do not have to remember user id/password for RStudio and a secure connection is always made available.
    14 Secure “Connection” from SWB Portal disallowing other users from accessing RStudio resources given shared ALB. NA. Using the design feature (Step-13) ensures that even post ALB the connection for a User (Researcher and PI) is still restricted to their provisioned RStudio only and they cannot access other Researchers Instances. The unique connection is system generated using User to RStudio mapping uniquely.
    15 ALB Routing Rules for RStudio secure connections given shared nature. NA. Every time an RStudio is created or deleted, changes are made to ALB rules to allow a secure connection between the User session and the linked RStudio. The same rules are cleaned up during RStudio delete lifecycle. These changes to ALB routing rules are managed from SWB code under Workflow customizations. (Step-11 and 12) using APIs.
    16 RStudio Configuration parameters related to CIDR. Original design allows only whitelisted IP addresses to connect to associated RStudio instances – this can be modified also from configurations. RStudio Cloud Formation Template (CFT) should take Classless Inter-Domain Routing (CIDR) as Input Parameter and pass it through as an Output Parameter for the SWB to take it and create the ALB Listener Rule.
    SWB code will take CIDR from RStudio CFT output, subsequently, update the ALB Listener Rule with the respective Target Group.
    17 Researcher costs tracking. The original design had RStudio costs tracked for Researchers. Custom certificate costs were not tracked if any. In the new design, RStudio costs are tagged and tracked per researcher. ALB costs are treated as shared costs for the Hosting account.
    18 RStudio Packaging and Delivery for a new customer – Repository Model. Bundled with standard SWB repo and installed. New model for RL to create a separate Repo and host RStudio with associated documentation and templates for customers to use.
    19 RStudio Packaging and Delivery for a new customer – AWS Marketplace model. None. RL to provide RStudio on AWS Marketplace for SWB customers to add to standard Service Catalog and import (Future Roadmap item).
    20 Upgrade and Support Models for RStudio. SWB teams ownership. To be managed by RL teams.
    21 UI Modification for Partner Provided Products. No partner provided products. Partner-provided products will reside in the self-hosted repo. SWB UI will provide a mechanism to show details of partner names and a link to additional information.


    The diagram below explains the interplay between different design components.


    Secure and Scalable Solution Architecture
    Keeping in mind the above design goals, a secure and scalable architecture is implemented that solves the problem of shared groups using products like RStudio requiring secure HTTPS access without the overheads of individual certificate management. The architecture also enables sharing the same concept for all future researcher products with similar needs without any additional implementation overheads resulting in increased productivity and lower costs.


    The Relevance Lab team designed a solution centered on an EC2 Linux instance with RStudio and relevant packages pre-installed and delivered as an AMI.

    1. When the instance is provisioned, it is brought up without a public IP address.
    2. All traffic to this instance is delivered via an Application Load Balancer (ALB). The ALB is shared across multiple RStudio instances within the same account to spread the cost over a larger number of users.
    3. The ALB serves over an SSL link secured with an Amazon-issued certificate which is maintained by AWS Certificate Manager.
    4. The ALB costs are further brought down by provisioning it on demand when the first RStudio instance is provisioned. Conversely, the ALB is de-provisioned when the last RStudio instance is de-provisioned.
    5. Traffic between the ALB and the RStudio instance is also secured with an SSL certificate which is self-signed but unique to each instance.
    6. The ALB listener rules enforce the IP allowed list configured by the user.

    Conclusion
    Both SWB and Relevance Lab RLCatalyst Research Gateway teams are committed to making scientific research frictionless for researchers. With a shared goal, this new initiative speeds up collaboration and will help provide new innovative open-source solutions leveraging Service Workbench on AWS and partner-provided solutions like this RStudio with ALB from Relevance Lab. The collaboration efforts will soon be adding more solutions covering Genomic Pipeline Orchestration with Nextflow, use of HPC Parallel Cluster, and secure research workspaces with AppStream 2.0, so stay tuned.

    To get started with RStudio on SWB provided by Relevance Lab use the following link:
    Relevance Lab Github Repository for SWB Templates

    For more information, feel free to contact marketing@relevancelab.com.

    References
    Service Workbench on AWS for driving Scientific Research
    Service Workbench on AWS Documentation
    Service Workbench on AWS Github Repository
    RStudio Secure Architecture Patterns
    Relevance Lab Research Gateway



    0

    PREVIOUS POSTSPage 5 of 12NEXT POSTS