Elasticsearch 1.0 を使ってみた。

Elasticsearch (Elasticsearch.org Open Source Distributed Real Time Search & Analytics | Elasticsearch) がVer1.0となったらしいので使ってみました。
オープンソースのSearch Systemといえば solr (Apache Lucene - Apache Solr)かこれの2択みたいですが、最近はElasticsearchの記事が目立つような気がする。
1.0になったといってもそれ以前のバージョンも使ったことなかったのではじめて使ってみる状態。まあやってみます。

インストール

CentOSにインストールします。ここ (Elasticsearch.org Download ELK | Elasticsearch)からRPM版をダウンロードしてインストールします。
serviceに登録しましょう、と新設に教えてくれるのでそのとおりに登録します。

$ curl -LO https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.0.0.noarch.rpm
$ sudo rpm -ihv elasticsearch-1.0.0.noarch.rpm 
$ sudo /sbin/chkconfig --add elasticsearch

JAVAが必要なのでそれもインストールします。JREをインストールしJAVA_HOMEの環境変数をセットします。
Java SE Runtime Environment 7 - Downloads | Oracle Technology Network | Oracle

$ sudo rpm -ihv jre-7u51-linux-x64.rpm 
$ export JAVA_HOME=/usr/java/jre1.7.0_51

動作確認

起動は簡単

$ sudo /sbin/service elasticsearch start
Starting elasticsearch:                                    [  OK  ]

特に設定を変更せずデフォルトでプロセスを確認するとこんな感じです

/usr/bin/java -Xms256m -Xmx1g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError 
-Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch 
-cp :/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* 
-Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch 
-Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch 
-Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch

ドキュメントを見ても用語が混乱するので整理。

cluster	一つの検索のシステム全体を示す
node	複数のサーバ(Elasticsearchのプロセス)で分散して検索システムを構成することができる。その際の各々を指すもの。定義されているcluster名が同一ネットワーク内にあればそのclusterに自動で組み込まれる

この辺の応答のJSONをみてみる。

$ curl -XGET 'http://localhost:9200/_cluster/state'
$ curl -XGET 'http://localhost:9200/_cluster/stats'
$ curl -XGET 'http://localhost:9200/_cluster/health'
$ curl -XGET 'http://localhost:9200/_nodes'
$ curl -XGET 'http://localhost:9200/_nodes/stats'

が、そんなことはせずともElasticsearchのモニタリングをする便利なプラグインがあるようです。

ElasticHQ - ElasticSearch monitoring and management application.

$ cd /usr/share/elasticsearch
$ sudo ./bin/plugin --install royrusso/elasticsearch-HQ

これだけで http://localhost:9200/_plugin/HQ にアクセスすると以下のようなカッコイイ&多機能を予感させる画面が出てきた!!
f:id:keisyu:20140218232912j:plain:w300

なんか入れて検索してみる

とりあえずここでも概念的に頭を整理しないと混乱する

index	一つの検索システムで複数のtypeを構成できる
type	検索対象の郡(集合)を表す
mapping	ドキュメントの各要素(検索Key)の属性等の定義
document	個々の検索データ

データの投入(index, typeは特に事前定義しなくとも勝手に作成されるのがデフォルトの挙動らしい), mapping も document のから自動的に識別されるらしい(頭がいい..が間違ってたり矯正したい場合は定義ファイルに書くみたい)

データの投入

PUTにて /[index_name]/[type_name]/[document_id] で JSON 形式でドキュメントを送信する.

$ curl -XPUT 'http://localhost:9200/myindex/mytype/doc1' -d '{ "name" : "Yamada Taro", "comment": "Hello! Elasticsearch World!"}'
{"_index":"myindex","_type":"mytype","_id":"doc1","_version":1,"created":true}
$ curl -XPUT 'http://localhost:9200/myindex/mytype/doc2' -d '{ "name" : "Tanaka Hanako", "comment": "Today is Febyrary 19. I like World Trip!"}'
{"_index":"myindex","_type":"mytype","_id":"doc2","_version":1,"created":true}
$ curl -XPUT 'http://localhost:9200/myindex/mytype/doc1' -d '{ "name" : "Yamada Taro", "comment": "Hello! Elasticsearch World!", "age":40}'
{"_index":"myindex","_type":"mytype","_id":"doc1","_version":4,"created":false}

データを更新したい場合は同じdocument_idで上書きする感じ (レスポンスの createdがfalseになって_versionが加算される)

データの取得(個別ドキュメント)

$ curl -XGET 'http://localhost:9200/myindex/mytype/doc1'
{
  "_index":"myindex",
  "_type":"mytype",
  "_id":"doc1",
  "_version":4,
  "found":true,
  "_source" : { 
     "name" : "Yamada Taro", 
     "comment": "Hello! Elasticsearch World!", 
     "age":40
  }
}

検索する

ドキュメント単位, タイプ単位で検索できるようです

$ curl -XGET 'http://localhost:9200/myindex/mytype/_search?q=Hello'   # 1件ヒット
{
  "took":3,
  "timed_out":false,
  "_shards":{
    "total":5,
    "successful":5,
    "failed":0
  },
  "hits":{
    "total":1,
    "max_score":0.13424811,
    "hits":[
      {
        "_index":"myindex","_type":"mytype","_id":"doc1","_score":0.13424811, 
        "_source" : { 
          "name" : "Yamada Taro", "comment": "Hello! Elasticsearch World!", "age": 40
        }
      }
    ]
  }
}

$ curl -XGET 'http://localhost:9200/myindex/mytype/_search?q=world'    # 2件ヒット
{
  "took":3,
  "timed_out":false,
  "_shards":{
    "total":5,
    "successful":5,
    "failed":0
  },
  "hits":{
    "total":2,
    "max_score":0.13424811,
    "hits":[
      {
        "_index":"myindex","_type":"mytype","_id":"doc1","_score":0.13424811,
        "_source" : { 
          "name" : "Yamada Taro", "comment": "Hello! Elasticsearch World!", "age": 40
        }
      },{
        "_index":"myindex","_type":"mytype","_id":"doc2","_score":0.095891505,
        "_source" : { 
          "name" : "Tanaka Hanako", "comment": "Today is Febyrary 19. I like World Trip!"
        }
      }
    ]
  }
}

この状態でHQ確認すると確かにmyindexというインデックスが定義されています
f:id:keisyu:20140219002609j:plain:w300
検索の確認もHQ内でできる。すごく使いやすい。早くも神ツールの予感。
f:id:keisyu:20140219002708j:plain:w300

その他の詳しいことはリファレンスを読んでみることにする。
Elasticsearch - Reference [1.x]

今日はここまで。

keisyuのブログ

最近めっきりエンジニアリングしなくなった厄年のおじさんがIT技術をリハビリするブログ。