Elasticsearch创建索引及数据(二)
第一篇介绍了Elasticsearch安装及基本用法,下面我们自己来创建一个索引,并写入一些数据
创建索引
基本语法
PUT /my_index { "settings": { ... any settings ... }, "mappings": { "type_one": { ... any mappings ... }, "type_two": { ... any mappings ... }, ... }
在创建索引的时候,我们需要设置索引被存放的分片数量、分析器、类型设置,分片数量、分析器通过settings来设置,类型映射通过mappings来设置,Elasticsearch默认创建索引是5个分片,1个副本,可通过settings来修改它,这里我们就遵循默认值不动,如下:
PUT flight { "settings" : { "index" : { "number_of_shards" : 5, "number_of_replicas" : 1 } } }
{ "acknowledged": true, "shards_acknowledged": true, "index": "flight" }
创建一个flight索引,大括号后面为默认配置,如果不想修改它,都可以去掉,然后在新建一个type为dynamic,并写入数据
PUT /flight/dynamic/1 { "flightNo":"CA1858", "flightDate":"2018-12-01", "depCode":"SHA", "arrCode":"PEK", "state": "到达", "subState": "", "depPlanTime":"2018-12-01 07:45", "arrPlanTime":"2018-12-01 10:10", "depReadyTime":"2018-12-01 07:45", "arrReadyTime":"2018-12-01 09:45", "depTime":"2018-12-01 07:54", "arrTime":"2018-12-01 07:46", "distance":1076, "tailNo":"B2487", "depTerm":"T2", "arrTerm":"T3", "gate":"48", "luggage":"32" }
创建一条数据后,ES会默认会对所有字段添加索引,当然也可以不指定ID,ES会默认生成一个ID,自动生成的ID有22个字符长,类似:wM0OSFhDQXGZAWDf0-drSA
{ "_index": "flight", "_type": "dynamic", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 }
下面我们来查询数据,ES默认情况下是禁用了source,即我们查询时,不会返回source里面的内容,只会返回数据ID,我们可以指定需要返回的_srouce字段
GET flight/dynamic/_search { "query": { "match_all": {}}, "_source": [ "flightNo", "flightDate"] }
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "flight", "_type": "dynamic", "_id": "1", "_score": 1, "_source": { "flightNo": "CA1858", "flightDate": "2018-12-01" } } ] } }
配置分析器
分析器介绍
分析器主要是将一块文本分成适合于倒排索引的独立 词条,通过这些词条我们可以搜索到指定的文档
分析器主要有以下几个功能:
1,字符过滤:通过整理字符串,如去掉HTML,将&转换为and
2,分词器:将字符串分成单个词条,比如通过空格或者标点符号来拆分
3,Token过滤:比如将Quick这种词条统一转换为小写,删除a,and,the这些无用词,增加近义词条(如:jump和leap这种同义词)
比如我们有一个航班号CA1858,我们需要通过输入CA18就可以返回对应的结果,那么我们就需要对其进行分词,如果直接使用下面的查询是获取不到数据的
GET flight/dynamic/_search { "query": { "match": { "flightNo": "CA18" } } }
我们可以查看ES对flightNo的分词情况
POST flight/_analyze { "field": "flightNo", "text": "CA1858" }
{ "tokens": [ { "token": "ca1858", "start_offset": 0, "end_offset": 6, "type": "<ALPHANUM>", "position": 0 } ] }
可以看到ES是对航班号转换为小写后直接进行了倒排索引,没有进行分词,直接查询CA18肯定搜索不到,下面我们自定义一个分析器
PUT flight { "settings": { "analysis": { "analyzer": { "flightNoAnalyzer": { "tokenizer": "flightNoTokenizer" } }, "tokenizer": { "flightNoTokenizer": { "type": "edge_ngram", "min_gram": 4, "max_gram": 8, "token_chars": ["letter","digit"] } } } }, "mappings": { "dynamic": { "properties": { "flightNo": { "type": "text", "analyzer" : "flightNoAnalyzer" } } } } }
edgengram为ES自带的分词器,ES自带了8种分析器,具体可以查看官方文档,我们自定义了一个analyzer,使用edgengram来进行分词,字符长度从4到8,然后将自定义的分析器映射到flightNo字段上,之后我们使用查询CA18就可以获取到结果了