ElasticSearch total distinct occurrences across whole data -
i new elasticsearch (version 2.3.3) , following format data.
{ "title": "doc 1 title", "year": "14", "month": "06", "sentences": [ { "id": 1, "text": "lorem ipsum dolor sit amet, consectetur adipiscing elit", "class": "introduction", "synth": "intr" }, { "id": 2, "text": "donec molestie pulvinar odio, ultricies dictum mi porttitor sit amet.", "class": "introduction", "synth": "abstr" }, { "id": 3, "text": "aliquam id tristique diam. suspendisse convallis convallis est ut condimentum.", "class": "main_content", "synth": "body" }, { "id": 4, "text": "nunc ornare eros @ pretium faucibus. praesent congue cursus aliquet.", "class": "main_content", "synth": "body" }, { "id": 5, "text": "integer pellentesque quam ut nulla dignissim hendrerit.", "class": "future_work", "synth": "ftr" }, { "id": 6, "text": "pellentesque faucibus vehicula diam.", "class": "bibliography", "synth": "bio" } ] }
and, multiple documents such doc1, doc2, ..., doc700.
i trying generate such query total number of occurrences of every different "class" across whole document bulk sorted year.
so, outcome similar following.
{ "year" : "14", "count" : [ { "introduction" : 1357 }, { "main_content" : 1021 }, { "future_work" : 490 }, { "bibliography" : 241 } ], "year" : "15", "count" : [ { "introduction" : 972 } , { "main_content" : 712 }, { "future_work" : 335 }, { "bibliography" : 81 } ] }
is possible achieve posting? or, easier every "class"?
thank much.
this done using nested aggregation. if existing mapping not have nested mapping can perhaps use following:
{ "mappings": { "book": { "properties": { "title": { "type": "string" }, "month": { "type": "string" }, "year": { "type": "string" }, "sentences": { "type": "nested", "properties": { "synth": { "type": "string" }, "id": { "type": "long" }, "text": { "type": "string" }, "class": { "type": "string" } } } } } } }
then run following query:
{ "size": 0, "aggs": { "years": { "terms": { "field": "year" }, "aggs" : { "sentences" : { "nested" : { "path" : "sentences" }, "aggs" : { "classes" : { "terms" : { "field" : "sentences.class" } } } } } } } }
and here sample data:
"aggregations": { "years": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "14", "doc_count": 2, "sentences": { "doc_count": 12, "classes": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "introduction", "doc_count": 4 }, { "key": "main_content", "doc_count": 4 }, { "key": "bibliography", "doc_count": 2 }, { "key": "future_work", "doc_count": 2 } ] } } }, { "key": "15", "doc_count": 1, "sentences": { "doc_count": 5, "classes": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "main_content", "doc_count": 2 }, { "key": "bibliography", "doc_count": 1 }, { "key": "future_work", "doc_count": 1 }, { "key": "introduction", "doc_count": 1 } ] } } } ] } }
do not confused doc_count here, they're true occurrences of "class" inside main doc. stored nested documents tied main document.
hope helps.
Comments
Post a Comment