What is Redshift?
What is Snowflake?
Amazon Redshift is a fully-managed service that offers a petabyte-scale data warehouse. Redshift is ideal for analytic workloads and can integrate with business intelligence (BI) tools and standard SQL-based clients.
Snowflake is a data warehouse designed to run in cloud environments. You can run Snowflake on Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).
Redshift is designed to deliver fast query and input/output (I/O) performance for any dataset size. To achieve this, it uses parallelizing with columnar storage technology to distribute queries across multiple nodes.
Running Snowflake does not require installation, configuration, or management of software or hardware. You can easily move data into Snowflake by using standard extract, transform, load (ETL) solutions.
Redshift automates many data warehouse management tasks, such as provisioning, monitoring, configuration, backup, and security.
The Snowflake architecture enables you to independently scale storage and compute. This capability lets you pay for compute and storage resources separately. Additionally, Snowflake provides a sharing capability that lets you quickly share secure and governed data, in real-time.
This is part of our series of articles about Snowflake security.
In this article, you will learn:
Common Features of Snowflake and Amazon Redshift
Snowflake and Redshift share many common features. For example, both Snowflake and Redshift use a massive parallel processing architecture and are column-oriented. Additionally, both are designed to achieve fast queries on vast amounts of data, to help make data-driven decisions.
Snowflake vs. Redshift: Architecture
Redshift uses clusters as its core component. A cluster can contain one or more compute nodes that store data. The leader node divides the work into several work nodes, with each work node operating in parallel. Each task node has its own storage and computing resources.
Snowflake, on the other hand, keeps storage and calculations independent and stores data centrally, accessible to all computing units. Each compute cluster retrieves and caches data locally for processing. The advantage of keeping storage and compute separate is that you can add more compute during performance bottlenecks without affecting existing workloads.
One of the unique aspects of Snowflake is that it manages data using micro-partition files (16MB in size) on object storage (S3 on Amazon, Blob Storage on Azure). Metadata keeps track of every micro-partition file, and maps logical architecture to physical storage, allowing Snowflake to provide features like cloning Terabyte-size tables in seconds.
Snowflake vs. Redshift: Scalability and Performance
Here is how Snowflake and Redshift compare on scalability and performance:
- Scaling—Snowflake’s architecture separates computing and storage. This architecture provides you with greater scalability. Redshift can now also separate compute and storage resources, but only if you use the RA3 Nodes. Otherwise, computing and storage are not independent of each other.
- Performance—according to a benchmark report by Fivetran, there is very little difference between Snowflake and Redshift in terms of performance. The results of this report showed that Redshift’s average time to run 99 queries from the TPC-DS benchmark, on a 1TB dataset, was 8.24 seconds, while Snowflake’s time for the same task was 8.21. Redshift’s average cost for this operation was $0.0746 and Snowflake’s was $0.0729. See our own performance test of Snowflake using the TCP-H benchmark.
Snowflake vs. Redshift: Storage and Replication
Here are key data storage aspects to keep in mind when choosing between Snowflake and Redshift:
- Data loading — Snowflake offers the use of the COPY INTO command. Redshift offers the use of the COPY command and data must be located in S3.
- Storage options — Snowflake offers S3, Azure blob storage, and Google Cloud Storage, either as external storage or internal (within Snowflake) storage as part of the Snowflake data cloud on each of the three public clouds. Redshift only supports AWS storage services.
- Data type conversions — require expertise when using both Snowflake and Redshift. To ensure copies are managed properly, administrators need to know how to implement automatic data type enforcing rules.
- Data sharing — Snowflake supports data sharing across multiple accounts. Redshift requires manual copying.
- Virtual warehouses — Snowflake lets you create and run multiple virtual warehouses on the same data and use different warehouses for different applications. You can also leverage this capability to share data across accounts without performing copying operations. Redshift Spectrum is a querying engine offered as a service, which lets you query S3 files.
Snowflake vs. Redshift: Security
Here are key aspects of security to keep in mind when choosing between Snowflake and Redshift:
- Access — AWS controls access to Redshift via Amazon Identity and Access Management (IAM), letting you reuse existing roles and permissions from other Amazon services. Snowflake has its own access control framework, which does not integrate with IAM on Amazon or other cloud providers. Snowflake supports direct authentication via OAuth 2.0, or federated authentication via Okta, Microsoft ADFS, Google G Suite, Microsoft Azure AD, OneLogin, and Ping Identity PingOne.
- Credentials — In Redshift, user credentials are provided when you associate a cluster with cluster security groups. Snowflake lets you use multi-factor authentication and single sign-on (SSO) for parent accounts.
- Loading data — Redshift provides server-side and client-side encryption during data loads, as well as hardware-accelerated SSL to copy, backup, and restore data. Snowflake encrypts data automatically during loading and unloading.
Learn more in our detailed posts on:
Snowflake vs. Redshift: Ecosystem and Integrations
Snowflake provides several ways to extend the use of its data warehouse with third party solutions:
- Snowflake Data Marketplace — provides access to an ecosystem of technology solutions pre-integrated with Snowflake, which can be purchased on the marketplace and added to Snowflake in a self service model.
- Snowflake Partner Network — a selection of companies who have certified relationships with Snowflake, including public cloud providers, data providers, and service partners that can help implement Snowflake projects and solutions.
- Data integrations — an ETL ecosystem that lets you ingest and process data from external sources, using data pipeline and BI tools like Fivetran, Informatica, Matillion, Stitch, Tableau, and Talend.
- Snowflake Partner Connect — lets you create trial accounts with Snowflake partners and seamlessly integrate the accounts with Snowflake, letting you try third-party services on Snowflake data. Supported services include Informatica, Stitch, Alteryx, Materllion, and Sisense.
When loading data into tables using simple commands, Snowflake also supports data transformations, which greatly simplifies basic transformations in the ETL pipeline.
Amazon Redshift also provides numerous options to integrate with third party solutions and external data sources:
- Integration with other AWS services — you can move data easily from Redshift to other services like Amazon Athena, Amazon EMR, and Amazon SageMaker, or use Amazon tools like AWS Glue or Kinesis Data Firehose to manage and process data. You can also run Redshift queries on data while it is still in operational databases running on Amazon RDS. Redshift ML is a tool built into Redshift that lets you train and deploy Amazon SageMaker models.
- Partner solutions — Redshift integrates with common data sources like Salesforce, Google Analytics, Facebook Ads, Slack, JIRA, Splunk, and Marketo, as well as third-party technology solutions including ThoughtSpot, Matillion, Tableau, and Onica.
Snowflake vs. Redshift: Limitations
Both Amazon Redshift and Snowflake are powerful systems, but each has its limitations.
Amazon Redshift JSON support is limited. By default, all JSON data is split into strings, which can be difficult to query and analyze. Amazon lacks some of the data management automation provided by Snowflake, and requires more manual processing. This makes Redshift less appropriate for companies that have a small technical team or lack in-house data engineering expertise. Redshift also offers limited support for complex data types like arrays and objects.
Snowflake has advanced security features, but they are not available in all price packages. Users with smaller data volumes, but stringent security requirements, might find it difficult to find a suitable package. Snowflake also doesn’t have seamless integration with the AWS technology stack. It also provides limited backup capabilities compared to Redshift.
Snowflake and Redshift Security with Satori
Whichever data store you choose, you can use Satori for universal access control and security. All Satori features work exactly the same for both Redshift and Snowflake so you can manage entitlements to datasets in either data store, define data anonymization policies for certain users and groups, get an Inventory of all sensitive data with Satori’s automated data classification and get full and infinite audit of all data access.