Elasticsearch创建索引及数据(二)
第一篇介绍了Elasticsearch安装及基本用法,下面我们自己来创建一个索引,并写入一些数据
创建索引
基本语法
PUT /my_index
{
"settings": { ... any settings ... },
"mappings": {
"type_one": { ... any mappings ... },
"type_two": { ... any mappings ... },
...
}
在创建索引的时候,我们需要设置索引被存放的分片数量、分析器、类型设置,分片数量、分析器通过settings来设置,类型映射通过mappings来设置,Elasticsearch默认创建索引是5个分片,1个副本,可通过settings来修改它,这里我们就遵循默认值不动,如下:
PUT flight
{
"settings" : {
"index" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
}
}
}
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "flight"
}
创建一个flight索引,大括号后面为默认配置,如果不想修改它,都可以去掉,然后在新建一个type为dynamic,并写入数据
PUT /flight/dynamic/1
{
"flightNo":"CA1858",
"flightDate":"2018-12-01",
"depCode":"SHA",
"arrCode":"PEK",
"state": "到达",
"subState": "",
"depPlanTime":"2018-12-01 07:45",
"arrPlanTime":"2018-12-01 10:10",
"depReadyTime":"2018-12-01 07:45",
"arrReadyTime":"2018-12-01 09:45",
"depTime":"2018-12-01 07:54",
"arrTime":"2018-12-01 07:46",
"distance":1076,
"tailNo":"B2487",
"depTerm":"T2",
"arrTerm":"T3",
"gate":"48",
"luggage":"32"
}
创建一条数据后,ES会默认会对所有字段添加索引,当然也可以不指定ID,ES会默认生成一个ID,自动生成的ID有22个字符长,类似:wM0OSFhDQXGZAWDf0-drSA
{
"_index": "flight",
"_type": "dynamic",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
下面我们来查询数据,ES默认情况下是禁用了source,即我们查询时,不会返回source里面的内容,只会返回数据ID,我们可以指定需要返回的_srouce字段
GET flight/dynamic/_search
{
"query": { "match_all": {}},
"_source": [ "flightNo", "flightDate"]
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "flight",
"_type": "dynamic",
"_id": "1",
"_score": 1,
"_source": {
"flightNo": "CA1858",
"flightDate": "2018-12-01"
}
}
]
}
}
配置分析器
分析器介绍
分析器主要是将一块文本分成适合于倒排索引的独立 词条,通过这些词条我们可以搜索到指定的文档
分析器主要有以下几个功能:
1,字符过滤:通过整理字符串,如去掉HTML,将&转换为and
2,分词器:将字符串分成单个词条,比如通过空格或者标点符号来拆分
3,Token过滤:比如将Quick这种词条统一转换为小写,删除a,and,the这些无用词,增加近义词条(如:jump和leap这种同义词)
比如我们有一个航班号CA1858,我们需要通过输入CA18就可以返回对应的结果,那么我们就需要对其进行分词,如果直接使用下面的查询是获取不到数据的
GET flight/dynamic/_search
{
"query": {
"match": {
"flightNo": "CA18"
}
}
}
我们可以查看ES对flightNo的分词情况
POST flight/_analyze
{
"field": "flightNo",
"text": "CA1858"
}
{
"tokens": [
{
"token": "ca1858",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
}
]
}
可以看到ES是对航班号转换为小写后直接进行了倒排索引,没有进行分词,直接查询CA18肯定搜索不到,下面我们自定义一个分析器
PUT flight
{
"settings": {
"analysis": {
"analyzer": {
"flightNoAnalyzer": {
"tokenizer": "flightNoTokenizer"
}
},
"tokenizer": {
"flightNoTokenizer": {
"type": "edge_ngram",
"min_gram": 4,
"max_gram": 8,
"token_chars": ["letter","digit"]
}
}
}
},
"mappings": {
"dynamic": {
"properties": {
"flightNo": {
"type": "text",
"analyzer" : "flightNoAnalyzer"
}
}
}
}
}
edgengram为ES自带的分词器,ES自带了8种分析器,具体可以查看官方文档,我们自定义了一个analyzer,使用edgengram来进行分词,字符长度从4到8,然后将自定义的分析器映射到flightNo字段上,之后我们使用查询CA18就可以获取到结果了