What is Elasticsearch?


It was released in 2010, and it has quickly become the most popular search engine. Built on Apache Lucene library, Elasticsearch is a free, open-source, RESTful and distributed search engine. It provides multitenant-capable full-text-searching with schema-free JSON documents and HTTP web interface. Apart from full-text-searching it is commonly used for business analytics or log analytics. It is developed in Java, and it is supporting most popular languages, such as Python, C# and PHP.

What does it do?

Elasticsearch can be used to search all kinds of data. It has real-time search, provides a scalable solution and supports multi tenancy. Elasticsearch is often used for storing data that needs to be sliced and diced or grouped by various dimensions. Its distributed architecture makes it possible to search and analyze huge volumes of data in real time, and it makes it easy to run a full-featured search cluster, though running it at scale still requires a substantial level of expertise. Compared to most NoSQL databases, Elasticsearch is equipped with powerful HTTP RESTful API that enables you to perform fast searches in real time.

Basic Concepts

For a better understanding of Elasticsearch it would not hurt to know basic concepts of its backend components.

Document – simply, it is a basic unit of information that should be indexed. Within one index many documents can be stored.

Index – collection of documents with similar characteristics identified by unique name.

Node – part of cluster, identified by name, which is a single server. It participates in search capabilities and cluster’s indexing.

Cluster – collection of N nodes, together holding entire data.

Shard – small piece of index, each is fully-functional and independent index.

JVM – Elasticsearch is written in Java and thus uses JVM, which is a runtime engine that executes bytecode on many operating systems.

ELK Stack

ELK Stack is world’s most popular log analytics platform. It is acronym for three open source projects: Elasticsearch, Logstash and Kibana. Having Elasticsearch as search and analytics engine, Logstash as server-side data processing pipeline, and Kibana as data visualization tool with charts and graphs. The ELK Stack is popular because it fulfills a need in a log management and analytics space. It is providing users powerful platform that stores the data in scalable centralized data stores, collects and processes data from multiple data sources, and on the top of that, provides a set of tools to analyze the data. Logs have always existed, same as different tools for analyzing them. What has changed is the underlying architecture of the environments generating logs. Architectures have evolved into containers and micro services, which are deployed on clouds or hybrid environments. This is where centralized log management and analytics solution such as ELK Stack really stands out.

The key capabilities:

Storage – the ability to store data for extended periods of time.

Processing – the ability to transform logs into meaningful data for easier analysis.

Aggregation – the ability to collect and ship log messages from multiple data sources.

Analysis – the ability to create visualizations and dashboards on top of querying data.

In 2015, a Beat on ELK was dropped, introducing a family of lightweight, single-purpose data shippers into the ELK equation, called Beats.

Advantages and use cases
Some of Elasticsearch advantages are:

Easy to use API – provides simple RESTful APIs making indexing, searching and querying the data really easy.

Speed – using inverted indices to find the best matches for your full-text searches, makes it really fast even when searching from very large data sets.

Schema-Free – it accepts JSON documents.

Scalability – it is very easy to scale and its reliable as well.

Clarity – using aggregation, it allows to zoom out data and make sense of billions of log lines.

Compactness – combines different types of searches such as structured, unstructured, metrics, logging and Geo search.

Elasticsearch is used in various ways. The most common use cases are:

Data store – create a document store, searchable catalog and the logging system.

Data visualization

Application or Website search – very useful tool for effective and accurate searches.

Container monitoring

Logging and analytics – commonly used for analyzing data.

Infrastructure metrics