In the last few months AWS has focused on releasing new capabilities for Redshift into general availability, some of which close gaps with existing Snowflake capabilities. Today I’m going to highlight one of the new features called data sharing and why you should be excited if you operate in a multi-tenant environment. This is also relevant if you are currently exploring data architectures to either monetize or directly share data with consumers.
This is a feature many have been waiting for since Snowflake first released their data sharing and announced themselves as the de facto data sharehouse in 2017. Redshift Data Sharing provides a practical solution to multi-tenancy and further supports DaaS (Data as a Service) use cases by separating storage from compute using new Redshift instance types.
Data Sharing enables a substantial business case that is attractive to product teams running analytical workloads. It allows the ability to isolate workloads and costs on a per consumer basis. This means no more guessing on developing usage formulas across multiple tenants or developing complex pricing strategies. It also solves the headache of multi-tenancy models on previous versions where tenant behaviors (workloads or mischievous behaviors) could have substantial performance impacts on other tenants because of the lack of isolation.
While the feature doesn’t directly cost extra, the price of underlying Redshift nodes and sizes still applies. However, the model provides the ability to size each node for each tenant on a case-by-case basis. This provides new opportunities for cost optimization.
Also, keep in mind, you can still take it one step further and build out the ability to elastically resize clusters for each tenant. This makes it easier to manage Redshift consumer cluster performance, especially in more complicated tenant-centric environments.
Internal Sharing & Data as a Service:
A common use case of multi-tenancy is as an external SaaS model that is feeding some product features. However, this capability also enables internal data adoption and exploration by sharing data at a more rapid rate between business units within an organization. Data sharing can provide a great avenue for enabling others to run analytical workloads on a common set of data (whether pre-wrangled or raw). Internal use cases are no exception.
From an internal organization perspective, data sharing has created the opportunity to quickly deliver compute infrastructure for emerging use cases business units want to explore. This can be particularly handy for example, when a data science team needs access to pre-wrangled data to start their exploration process on production data and without impacting production performance. Or, for that matter, any short-lived, ad hoc, compute-intensive data analysis where we can spin up then spin down a consumer node on a temporary basis and in an isolated manner.
Data sharing also facilitates a split out of resource usage in for example, an API driven environment. In an environment where there are REST APIs to serve complex aggregation requests, data sharing lets us start leveraging something which closer resembles a decoupled model. The DW compute can scale in isolation of other use cases and the API layer itself.
An existing challenge during the development of data solutions is the overhead and complexity in managing separate environments. Now development, UAT, and production version of data can be shared and leveraged in isolation. This reduces the overhead of data development cycles as it no longer requires the time sink of copying and replicating data between stages of development, traditionally a cumbersome process requiring a DBA or manual set of processes. Data sharing promotes the idea of making analytics development cycles smoother and more centered around a DataOps mentality.
One of the notable limitations on the release of this feature is the inability to leverage concurrency scaling for any shared objects. There exists a cap on how many users can access each shared object at the same time during query execution. While more applicable to extreme cases (long query times or significantly high requests), we really hope in the future AWS can overcome this limitation since multi- tenant use cases may benefit from higher concurrency scenarios.
Indellient has been a part of today’s fastest-growing areas in the IT industry for over 15 years, including data analytics, digital content delivery, cloud application orchestration, cloud migration, data science, DevOps services, and application development.
Leveraging AWS technologies such as RedShift, EC2, Glue, Sagemaker, and many more, we have been thoroughly impressed with the ability to build custom solutions to complex problems for our clients. AWS technologies provide the flexibility, scalability, and performance our customers need to meet their current and future objectives.