how to read a huge csv file using mmap in python? -


i want read csv file , perform operation on file. i'm created program requirement i'm not getting output because file size large i.e. ~5gb.

i'm using simple system calls such open,readline ect. meanwhile explore memory mapped support in python didn't understand implementation of mmap.

can me implement reading of large csv file using mmap or other way can reduce speed of application?

i'm reading 1 csv file , want perform 1 task.

task-

i want read 1 csv file , read line_id csv file , find out unique line_id's , 1 unique line id want find out maximum time_gap single unique line_id. have find out same line_id , corresponding maximum time_gap. after getting unique line_id & corresponding maximum time_gap want 2 column information in output.csv file.

i created 1 program task , working small input file not working large files. i.e ~2gb.

my stuff-

import csv import sys, getopt  def csv_dict_reader(file_obj):      listoflineid = []     reader = csv.dictreader(file_obj, delimiter=',')      = 0;     line in reader:         listoflineid.insert(i, line['line_id']);         = + 1;      set1 = set(listoflineid)     new_dict = dict()     = 0;      se in set1:         f1 = open("latency.csv")         readerinput = csv.dictreader(f1, delimiter=',')         inpt in readerinput:             if (se == inpt['line_id']):                 if se in new_dict:                     if new_dict[se] < inpt['time_gap']:                         new_dict[se] = inpt['time_gap']                 else:                     new_dict[se] = inpt['time_gap']      print new_dict     write_dict(new_dict)  def write_dict(new_dict):      name_list = ['line_id', 'time_gap']     f = open('finaloutput.csv', 'wb')     writer = csv.dictwriter(f, delimiter=',', fieldnames=name_list)     writer.writeheader()     key, value in new_dict.iteritems():         writer.writerow({'line_id': key, 'time_gap': value})  f.close() print "check finaloutput.csv file..."   if __name__ == "__main__":      argv = sys.argv[1:]     inputfile = ''     outputfile = ''     try:         opts, args = getopt.getopt(argv, "hi:o:", ["ifile=", "ofile="])     except getopt.getopterror:     print 'test.py -i <inputfile> -o <outputfile>'     sys.exit(2)     opt, arg in opts:        if opt == '-h':           print 'test.py -i <inputfile> -o <outputfile>'           sys.exit()        elif opt in ("-i", "--ifile"):           inputfile = arg        elif opt in ("-o", "--ofile"):           outputfile = arg      open(inputfile) f_obj:        csv_dict_reader(f_obj) 

how can reduce speed of execution of application?


Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -