Elasticsearch Document-Level Security

Since its inception, Elasticsearch and the ecosystem of components that it facilitates, collectively referred to as the “Elastic Stack,” are used for many applications and have now included data security.

In this article, you will learn the following:

What is Elasticsearch?
The Basic Concepts of Elasticsearch
What is an Elasticsearch Document?
- Document Fields
Setting Up Document Level Security
Summary
Elasticsearch Security with Satori

This is part of our complete Elasticsearch Security guide.

What is Elasticsearch?

Elasticsearch is a distributed document system that is open source. Unlike traditional databases, which store information as rows of columnar data, Elasticsearch stores information as complex data structures serialized as JSON documents. The documents saved in an Elasticsearch cluster are distributed across the set and can be retrieved instantaneously from any node.

The Basic Concepts of Elasticsearch

It is important to check some fundamental aspects of how Elasticsearch organizes data to better understand how it operates.

Documents

It is possible to index documents in Elasticsearch using JSON, the worldwide internet data interchange format. A document, in this context, is like a row in a relational database, representing an entity or the actual DLS queries you require.

In Elasticsearch, a document is any structured data represented in JSON. Data types include numbers, strings, and dates. To describe the document’s data type, it has a unique ID.

Indices

A compilation of documents with comparable qualities gets referred to as an index. An index is the highest level entity against which you can query in Elasticsearch. In this context, you can compare the index to a database in a relational database schema.

In most cases, the documents in an index get related logically. Moreover, an index receives a reference name while indexing, searching, updating, and deleting operations on its documents.

For better comprehension, you can think of each document as something made up of fields, which are also the key-value pairings that make up your data, and an index is a streamlined collection of documents. Notably, Elasticsearch indexes all data in each field by default, and each indexed field has its optimized data structure.

Inverted indices get used to hold text fields, while BKD trees can store numeric and geo fields. Thus, Elasticsearch’s has the ability to employ per-field data structures, construct, and deliver search results quickly.

Elasticsearch can also be schema-less, meaning that documents can get indexed sans explicitly declaring how to treat each field. With dynamic mapping, Elasticsearch automatically finds and indexes new fields. Elasticsearch will recognize and map booleans, floating-point, and integer values, dates, and strings to the proper Elasticsearch data types as you begin indexing documents.

Additionally, various uses often require different indexing of the same field. You may also use multiple language analyzers to process user input in a string field.

Inverted Index

In Elasticsearch, an index is an inverted index, which is the method that all search engines use.

It is essentially a hashmap-like data structure that leads from a word to a document. Instead of storing strings directly, an inverted index divides each document into individual search phrases and then maps each search term to the documents where it appears.

Elasticsearch swiftly identifies the best results for full-text searches from even relatively large data sets by leveraging distributed inverted indices. In under a second, a document is stored, indexed, and is completely searchable.

What is an Elasticsearch Document?

Unlike a SQL database, which stores data as rows in tables, Elasticsearch saves data as documents within an index. However, Elasticsearch’s approach to records and indices is markedly different from a relational database.

Document Fields

Each document is fundamentally a JSON structure, a set of key-value pairs at its core. The document mapping determines how these pairs get indexed after that. Moreover, the field data type gets specified in the mapping as text, keyword, float, time, geo point, or other data types.

Elasticsearch documents are referred to as schema-less since they do not need users to establish the index field structure in advance, nor do they require that all documents in an index have the same structure. Nevertheless, once a field gets mapped to a particular data type, the mapping must be consistent across all pages in the index.

Notably, you can map each field in the index in multiple ways. To have complete control over how fields get saved and indexed, you can set rules to manage dynamic mapping and explicitly specify mappings. Making this rules-based is useful because you may need a keyword structure for aggregations while maintaining an analyzed data structure that allows you to conduct full-text searches for particular words in the field.

Finally, creating your mappings allows you to:

Separate full-text string fields from exact-value string fields
Analyze text in a language-specific manner
Optimize areas for partial matching
Make your date formats
Use data types that are not automatically detected, such as geo_point and geo_shape

While Elasticsearch does make the task of document security easier, implementing and maintaining data security does pull data teams away from their core responsibilities.

Setting Up Document Level Security

Document-level security is simple to set up. You can restrict access to data within a data stream or index by giving a role to the Elasticsearch field and document-level security permissions. Elastic search utilizes both RBAC and ABAC access control.

Once a user is identified and authenticated, then they can be granted RBAC and/or ABAC. Using RBAC the user is assigned access privileges to a secured resource, such as an Elasticsearch cluster. The user then has access to that sensitive data on the basis of the permissions defined in their various roles.

ABAC is also used in Elasticsearch where the users must have the required attributes in order to gain access to secured information.

A role can establish both Elasticsearch field and document-level security permissions per index. Without specifying field-level permissions, a high enough role allows access to all fields. Similarly, a role that does not specify permissions at the document level provides access to all documents in an index.

Each role can specify various permissions on the same data stream or index, and each user can have many roles.

In addition to document-level security, access can also be limited at the field level. Let’s take a look at the benefits and drawbacks of using both levels of security.

Field Level Security

Security permissions placed at the field level can limit access to certain fields inside a document. This approach ensures that data field level data is secured through access permissions.

A drawback to field level security is that it places the burden of access on the data engineering team, who must continuously update and revoke field access. Further, procedures to determine access to sensitive field level data may result in delays.

Document Level Security

Conversely, document-level security features restrict access to the entire specified document.

Document-level security considers the user’s roles and “ORs” each document-level security query for a specific data stream or index. This option means that for a document to be returned, just one of the role queries must match.

For instance, if one role gives access to an index without applying document-level security and another allows access with document-level security, document-level security is not implemented. The user with both roles has access to all index documents.

It becomes reasonable to set up document-level security to ensure users only access their documents. In this case, each record must be associated with a username or create a role name so that the role query may use this information for document-level security.

Document-level security does not apply to writing the remaining APIs. If you do not utilize unique IDs for user attributes who access the same data stream or index, they might overwrite other users’ documents, either intentionally or unintentionally. The ingest processor adds properties to the pages indexed for the current authorized user.

Summary

Any industry must protect its clients’ information, so maintaining high levels of security is critical. The clients’ trust and confidence in your organization are more vital than ever in today’s business environment.

Although document safety is no easy endeavor, Elasticsearch does make the task easier.

Elasticsearch Security with Satori

While Elasticsearch does make the task of document security easier, implementing and maintaining data security does pull data teams away from their core responsibilities.

Satori helps organizations streamline access to sensitive data stored on Elasticsearch. Learn more about how we help keep your Elasticsearch data access simple and secure or read about our key capabilities:

To learn more schedule a meeting with one of our experts.