how to read a huge csv file using mmap in python? -

- January 15, 2013

i want read csv file , perform operation on file. i'm created program requirement i'm not getting output because file size large i.e. ~5gb.

i'm using simple system calls such open,readline ect. meanwhile explore memory mapped support in python didn't understand implementation of mmap.

can me implement reading of large csv file using mmap or other way can reduce speed of application?

i'm reading 1 csv file , want perform 1 task.

task-

i want read 1 csv file , read line_id csv file , find out unique line_id's , 1 unique line id want find out maximum time_gap single unique line_id. have find out same line_id , corresponding maximum time_gap. after getting unique line_id & corresponding maximum time_gap want 2 column information in output.csv file.

i created 1 program task , working small input file not working large files. i.e ~2gb.

my stuff-

import csv import sys, getopt  def csv_dict_reader(file_obj):      listoflineid = []     reader = csv.dictreader(file_obj, delimiter=',')      = 0;     line in reader:         listoflineid.insert(i, line['line_id']);         = + 1;      set1 = set(listoflineid)     new_dict = dict()     = 0;      se in set1:         f1 = open("latency.csv")         readerinput = csv.dictreader(f1, delimiter=',')         inpt in readerinput:             if (se == inpt['line_id']):                 if se in new_dict:                     if new_dict[se] < inpt['time_gap']:                         new_dict[se] = inpt['time_gap']                 else:                     new_dict[se] = inpt['time_gap']      print new_dict     write_dict(new_dict)  def write_dict(new_dict):      name_list = ['line_id', 'time_gap']     f = open('finaloutput.csv', 'wb')     writer = csv.dictwriter(f, delimiter=',', fieldnames=name_list)     writer.writeheader()     key, value in new_dict.iteritems():         writer.writerow({'line_id': key, 'time_gap': value})  f.close() print "check finaloutput.csv file..."   if __name__ == "__main__":      argv = sys.argv[1:]     inputfile = ''     outputfile = ''     try:         opts, args = getopt.getopt(argv, "hi:o:", ["ifile=", "ofile="])     except getopt.getopterror:     print 'test.py -i <inputfile> -o <outputfile>'     sys.exit(2)     opt, arg in opts:        if opt == '-h':           print 'test.py -i <inputfile> -o <outputfile>'           sys.exit()        elif opt in ("-i", "--ifile"):           inputfile = arg        elif opt in ("-o", "--ofile"):           outputfile = arg      open(inputfile) f_obj:        csv_dict_reader(f_obj)

how can reduce speed of execution of application?

Search This Blog

To form

how to read a huge csv file using mmap in python? -

Comments

Post a Comment

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

delphi - Take screenshot in webcam using VFrames in Console Application -

extjs - Set tooltip on click event on the grid cell -