c++ - Memory efficient map<pair<int,int>, set<int>> alternative -
i have huge amount (1500 million) of integer pairs each 1 associated document-id. goal search documents have same pair.
my first idea use hash-map (std::map
) using pair values keys , document-ids associated values, i.e. map<pair<int,int>, unordered_set<int>>
for example:
document1 - pair1: (3, 9) - pair2: (5,13) document2 - pair1: (4234, 13) - pair2: (5,13) map<pair<int,int>, unordered_set<int>> hashmap hashmap[{3, 9}].insert(1) hashmap[{5, 13}].insert(1) hashmap[{4234, 13}].insert(2) hashmap[{5, 13}].insert(2)
would result into
key(3,9) = documents(1) key(5,13) = documents(1,2) key(4234,13) = documents(2)
my problem takes huge amount of memory exceeds available 24 gb of ram. therefore need alternative performance inserts , lookups can fit memory. in theory i'm using 1500 million * 3 (pairval1, pairval2, document-id) * 4 (bytes per integer) = 18gb
when overhead costs not taking account. there alternatives problem?
this might job embedded database such sqlite or berkeleydb or tokyo cabinet.
if amount of data you're using exceeds ram need can work disk.
Comments
Post a Comment