ElasticSearch total distinct occurrences across whole data -

- March 15, 2015

i new elasticsearch (version 2.3.3) , following format data.

{       "title": "doc 1 title",    "year": "14",    "month": "06",    "sentences": [         {           "id": 1,           "text": "lorem ipsum dolor sit amet, consectetur adipiscing elit",           "class": "introduction",           "synth": "intr"         },         {           "id": 2,           "text": "donec molestie pulvinar odio, ultricies dictum mi porttitor sit amet.",           "class": "introduction",           "synth": "abstr"         },         {           "id": 3,           "text": "aliquam id tristique diam. suspendisse convallis convallis est ut condimentum.",           "class": "main_content",           "synth": "body"         },         {           "id": 4,           "text": "nunc ornare eros @ pretium faucibus. praesent congue cursus aliquet.",           "class": "main_content",           "synth": "body"         },         {           "id": 5,           "text": "integer pellentesque quam ut nulla dignissim hendrerit.",           "class": "future_work",           "synth": "ftr"         },         {           "id": 6,           "text": "pellentesque faucibus vehicula diam.",           "class": "bibliography",           "synth": "bio"         }     ] }

and, multiple documents such doc1, doc2, ..., doc700.

i trying generate such query total number of occurrences of every different "class" across whole document bulk sorted year.

so, outcome similar following.

{    "year" : "14",    "count" : [        { "introduction" : 1357 },        { "main_content" : 1021 },        { "future_work" : 490 },        { "bibliography" : 241 }    ],    "year" : "15",    "count" : [        { "introduction" : 972 } ,        { "main_content" : 712 },        { "future_work" : 335 },        { "bibliography" : 81 }    ] }

is possible achieve posting? or, easier every "class"?

thank much.

this done using nested aggregation. if existing mapping not have nested mapping can perhaps use following:

    {     "mappings": {         "book": {             "properties": {             "title": {                 "type": "string"             },             "month": {                 "type": "string"             },             "year": {                 "type": "string"             },             "sentences": {                 "type": "nested",                     "properties": {                         "synth": {                             "type": "string"                         },                         "id": {                             "type": "long"                         },                         "text": {                             "type": "string"                         },                         "class": {                             "type": "string"                         }                     }                 }             }         }     } }

then run following query:

    {     "size": 0,     "aggs": {         "years": {             "terms": {                 "field": "year"             },             "aggs" : {                 "sentences" : {                     "nested" : {                         "path" : "sentences"                     },                     "aggs" : {                         "classes" : { "terms" : { "field" : "sentences.class" } }                     }                 }             }         }     } }

and here sample data:

    "aggregations": {      "years": {         "doc_count_error_upper_bound": 0,         "sum_other_doc_count": 0,         "buckets": [         {             "key": "14",             "doc_count": 2,             "sentences": {                 "doc_count": 12,                 "classes": {                     "doc_count_error_upper_bound": 0,                     "sum_other_doc_count": 0,                     "buckets": [                     {                         "key": "introduction",                         "doc_count": 4                     },                     {                         "key": "main_content",                         "doc_count": 4                     },                     {                         "key": "bibliography",                         "doc_count": 2                     },                     {                         "key": "future_work",                         "doc_count": 2                     }                     ]                 }             }         },         {             "key": "15",             "doc_count": 1,             "sentences": {                 "doc_count": 5,                 "classes": {                     "doc_count_error_upper_bound": 0,                     "sum_other_doc_count": 0,                     "buckets": [                     {                         "key": "main_content",                         "doc_count": 2                     },                     {                         "key": "bibliography",                         "doc_count": 1                     },                     {                         "key": "future_work",                         "doc_count": 1                     },                     {                         "key": "introduction",                         "doc_count": 1                     }                     ]                 }             }         }         ]     }  }

do not confused doc_count here, they're true occurrences of "class" inside main doc. stored nested documents tied main document.

hope helps.

Search This Blog

To form

ElasticSearch total distinct occurrences across whole data -

Comments

Post a Comment

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -