Skip to content

Datasets

A dataset is a collection of data store objects such as tables or schemas from one or more data stores, that you wish to govern access to as a single unit.

For example, a set of tables in a Snowflake account which contain private customer information such as name, address and purchase history can be represented in Satori as a Customer Data dataset".

Data engineers create datasets as part of the data development lifecycle. Once a dataset is defined you can then assign a data stewards to manage the day to day operations of access to data.

When data consumers query data, Satori associates the query with the relevant datasets and applies the access rule permissions and policies that are defined on them.

Creating Datasets

To create a dataset, you require the Admin or the Editor role which is defined in the management console.

  1. Go to the Datasets view and click the Add button.

  2. Provide a name and description for the dataset, and optionally assign data stewards.

  3. Select datastore locations to include in the dataset and optionally, define the locations to exclude.

Checking Data Store Locations

Satori uses the longest match approach when checking if a data store location is included in the dataset. See the following dataset examples:

Included Locations

Finance Snowflake Account / Forecast database / Q2 schema

Excluded Locations

Finance Snowflake Account / Forecast database / Q2 schema / Orders

When querying any table other than the "Orders" table in the Q2 schema, Satori associates the query with this dataset and applies any permissions or policies that are defined on it.

Dataset User Access Rules

Permissions to access datasets are defined for individual users or groups and are limited to a predefined time range. In addition, Satori can automatically revoke permissions if they are unused. This helps organizations avoid excess and unused permissions.

Satori provides three main capabilities for controlling dataset access. These access controls can be used in parallel to streamline the process of managing access to data.

Dataset Permissions

Dataset permissions enable data engineers and data stewards to grant access to datasets without requiring users to ask for access. Satori recommends that you use this method for providing access if you know which users or groups require access to a dataset and your organization's policy does not require an approval process.

When users query data, Satori searches for the required permissions, if available Satori sends the query to the datastore.

Screenshot

Access History

Every change to permissions or access request is audited by Satori.

Screenshot

Access Requests

Enable access requests to allow users that do not have the required permissions to request access. When users query data they receive a service request URL link:

Screenshot Screenshot

Access History Requests are sent via email to the dataset's data stewards and appear in the management console.

Screenshot

Self-Service Access

Enable self-service access to allow users without the right to grant themselves predefined permissions. When users query data they receive a URL link enabling them to audit their access by specifying why they need to access the dataset. Once they submit the form they are granted with the relevant permissions that were defined on the dataset.

This method is the recommended alternative to the standard dataset permissions because it audits users access to datasets.

Screenshot

Managing Technical Metadata

Using the Data Inventory view of a dataset - data engineers or data stewards can review the results of the automatic data classification and override, remove or add any necessary tags. See the Data Inventory section for more details.

Implementing Custom Policies

Using the Custom Policy view of a dataset, enables data engineers or data stewards to implement custom data access policies using the Policy Engine.

Screenshot

See the Policy Engine Overview section for more details.