Big Data Analytics —Getting Started With Elasticsearch
The Elastic Stack has recently risen to fame in the realm of Big Data analytics and machine learning. The Elastic Stack is a suite of tools (i.e. Elasticsearch, Logstash, Kibana and Beats) for analyzing large quantities of data in real time. In the proceeding article, we’ll briefly cover Elasticsearch.
Elasticsearch is an open source, distributed, RESTful search engine. There’s a lot of information in that short sentence so let’s break it down.
- Open source — The source code for Elasticsearch is on GitHub and you can contribute if you’d like. It’s worth mentioning that the company Elastic has built a business around Elasticsearch and the rest of the Elastic Stack.
- Distributed — Elasticsearch is designed to horizontally scale using node clusters. In other words, it can run on top of multiple computer systems.
- RESTful — REST is a pattern followed when developing an API where actions are performed by making HTTP requests to different endpoints.
Elasticsearch allows you to store, search, and analyze big volumes of data in near real time. You might see Elasticsearch used for things like a web store that allows their customers to search for products, a business that wants to analyze and visualize consumer trends or a company that wants to aggregate, parse and perform queries on a set of logs.
Java
Elasticsearch runs on top of the JVM. Ergo, we need to have Java installed prior to installing Elasticsearch. You can verify if Java is installed by running java -version
. In the event it isn’t already installed, you can install it by running the following command.
sudo apt-get install default-jdk
Next, we need to make sure that the JAVA_HOME
environment variable is set.
echo $JAVA_HOME
If nothing comes back, then you’ll want to add the following line to your environment, using sudo vi /etc/environemnt
.
JAVA_HOME="/usr/lib/jvm/<java-version>"
To update the bash profile, run source /etc/environment
.
Elasticsearch
To start, download and install the public signing key.
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
You may need to install the apt-transport-https
package on Debian before proceeding.
sudo apt-get install apt-transport-https
The following line adds Elastic’s Debian package repository.
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
Finally, pull down and install the package.
sudo apt-get update && sudo apt-get install elasticsearch
Elasticsearch isn’t automatically started after installation. You can start it by running the following command.
sudo systemctl start elasticsearch.service
To configure Elasticsearch to start automatically when the system boots up, you can run the following commands.
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
By default, Elasticsearch runs on port 9200. We can verify that it’s working by running curl localhost:9200 | jq '.'
. If everything is working you should see a output similar to the following.
{
"name" : "zP41Q2p",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "z8FCPGNXTZymP8hmcas-YQ",
"version" : {
"number" : "6.7.0",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "8453f77",
"build_date" : "2019-03-21T15:32:29.844721Z",
"build_snapshot" : false,
"lucene_version" : "7.7.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
Next, let’s create a data.json
file with the following content.
{
"firstname": "John",
"lastname": "Doe"
}
To add data, we make a POST request to <host>/<index>/<type><id>
.
curl -d "@data.json" `-H "Content-Type: application/json"` -X `POST` localhost:9200/accounts/person/1 | jq '.'
We can verify that it was successful by making a GET request to the same endpoint.
curl localhost:9200/accounts/person/1 `| jq '.'`
We can a document by making a post request to _update
.
curl -d '{"doc":{"age": 42}}' `-H "Content-Type: application/json"` -X `POST` localhost:9200/accounts/person/1/_update | jq '.'
We can verify that it was successful by making a GET request to the same endpoint.
curl localhost:9200/accounts/person/1 `| jq '.'`
Let’s create a data2.json
file with the following content.
{
"firstname": "Jane",
"lastname": "Smith",
"age": 28
}
Run the following command to add it to the store.
curl -d "@data2.json" `-H "Content-Type: application/json"` -X `POST` localhost:9200/accounts/person/2 | jq '.'
We can also search for data using query strings.
curl localhost:9200/_search?q=john `| jq '.'`
The following command will return all the data with a field age
whose value is equal to 42
.
curl localhost:9200/_search?q=age:42 | jq '.'
We can delete a specific document by making a DELETE request.
curl -X DELETE localhost:9200/accounts/person/1 | jq '.'
Finally, we can delete the full index.
curl -X DELETE localhost:9200/accounts | jq '.'
Cory Maklin
_Sign in now to see your channels and recommendations!_www.youtube.com