c++ - Memory efficient map<pair<int,int>, set<int>> alternative -


i have huge amount (1500 million) of integer pairs each 1 associated document-id. goal search documents have same pair.

my first idea use hash-map (std::map) using pair values keys , document-ids associated values, i.e. map<pair<int,int>, unordered_set<int>>

for example:

document1   - pair1: (3, 9)  - pair2: (5,13)  document2   - pair1: (4234, 13)  - pair2: (5,13)  map<pair<int,int>, unordered_set<int>> hashmap hashmap[{3, 9}].insert(1) hashmap[{5, 13}].insert(1)  hashmap[{4234, 13}].insert(2) hashmap[{5, 13}].insert(2) 

would result into

key(3,9) = documents(1)  key(5,13) = documents(1,2)  key(4234,13) = documents(2) 

my problem takes huge amount of memory exceeds available 24 gb of ram. therefore need alternative performance inserts , lookups can fit memory. in theory i'm using 1500 million * 3 (pairval1, pairval2, document-id) * 4 (bytes per integer) = 18gb when overhead costs not taking account. there alternatives problem?

this might job embedded database such sqlite or berkeleydb or tokyo cabinet.

if amount of data you're using exceeds ram need can work disk.


Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -