oledb - get column names Jet OLE DB in vb.net -


i've written function reads csv files , parametrizes them accordingly, therefore have function gettypessql queries sql table @ first data types , therefore adjust columns later inserted in sql. problem when set hdr=yes in jet ole db column names f1, f2, f3. circumvent issue i've set hdr=no , written loops empty strings, problem? here code:

 private function getcsvfile(byval file string, byval min integer, byval max integer) datatable         dim constr string = "provider=microsoft.jet.oledb.4.0;data source=" & textbox1.text & ";extended properties=""text;hdr=no;imex=1;fmt=delimited;characterset=65001"""         dim conn new oledb.oledbconnection(constr)         dim dt new datatable         dim da oledb.oledbdataadapter = nothing         getdata = nothing          try             dim cmd string = "select * " & _table & ".csv"             da = new oledb.oledbdataadapter(cmd, conn)             da.fill(min, max, dt)             getdata = new datatable(_table)             dim firstrow datarow = dt.rows(0)                integer = 0 dt.columns.count - 1                 dim columnname string = firstrow(i).tostring()                 dim newcolumn new datacolumn(columnname, mlistoftypes(i))                 getdata.columns.add(newcolumn)             next              integer = 1 dt.rows.count - 1                 dim row datarow = dt.rows(i)                 dim newrow datarow = getdata.newrow()                  j integer = 0 getdata.columns.count - 1                     if row(j).gettype gettype(string)                         dim colvalue string = row(j).tostring()                         colvalue = changeencoding(colvalue)                         colvalue = parsestring(colvalue)                         colvalue = replacechars(colvalue)                         newrow(j) = colvalue                     else                         newrow(j) = row(j)                     end if                 next                  getdata.rows.add(newrow)                 application.doevents()             next         catch ex oledbexception             messagebox.show(ex.message)         catch ex exception             messagebox.show(ex.message)                     dt.dispose()             da.dispose()         end try          return getdata     end function 

and types sql, 1 doesn't convert properly, doubles

private sub gettypessql()         if (mlistoftypes nothing)             mlistoftypes = new list(of type)()         end if          mlistoftypes.clear()          dim dttabelshema datatable = db.getdatatable("select top 0 * " & _table)          using dttabelshema             each col datacolumn in dttabelshema.columns                 mlistoftypes.add(col.datatype)             next         end using     end sub 

i think have made more complicated needs be. instance, dbschema creating empty datatable , harvesting datatypes it. why not use first table rather creating new table types? table need not reconstructed on , on each batch of rows imported.

generally since oledb try infer types data, seems unnecessary , may in way in cases. also, redoing oledb , copying data different dt. given that, i'd skip overhead oledb imposes , work raw data.

this creates destination table using csv column name , type database. if csv not in same column order delivered in select * query, fail.


the following uses class map csv columns db table columns code not depending on csvs being in same order (since may generated externally). sample data csv not in same order:

public class csvmapitem      public property csvindex int32     public property colname string = ""    'optional     public property datatype type      public sub new(ndx int32, csvname string,                    dtcols datacolumncollection)          csvindex = ndx          each dc datacolumn in dtcols             if string.compare(dc.columnname, csvname, true) = 0                 colname = dc.columnname                 datatype = dc.datatype                 exit             end if         next          if string.isnullorempty(colname)             throw new argumentexception("cannot find column: " & csvname)         end if     end sub end class 

the code parse csv uses csvhelper in case textfieldparser used since code reads csv rows string array.

dim sql = string.format("select * {0} id<0", dbtblname) dim rowcount int32 = 0 dim totalrows int32 = 0 dim sw new stopwatch sw.start()  using dbcon new mysqlconnection(mysqlconnstr)     using cmd new mysqlcommand(sql, dbcon)          dtsample = new datatable         dbcon.open()          ' load empty dt, create insert command         dasample = new mysqldataadapter(cmd)         dim cb = new mysqlcommandbuilder(dasample)         dasample.insertcommand = cb.getinsertcommand         dtsample.load(cmd.executereader())          ' dtsample not empty, has columns         ' need          dim csvmap new list(of csvmapitem)          using sr new streamreader(csvfile, false),                         parser = new csvparser(sr)              ' col names csv             dim csvnames = parser.read()             ' create map of csv index dt columnname  see note             n int32 = 0 csvnames.length - 1                 csvmap.add(new csvmapitem(n, csvnames(n), dtsample.columns))             next              ' line data read string             dim data string()             data = parser.read()             dim dr datarow              until data nothing orelse data.length = 0                  dr = dtsample.newrow()                  each item in csvmap                     ' optional/as needed type conversion                     if item.datatype = gettype(boolean)                         ' "1" wont convert bool, (int)1                         dr(item.colname) = convert.toint32(data(item.csvindex).trim)                     else                         dr(item.colname) = data(item.csvindex).trim                     end if                 next                 dtsample.rows.add(dr)                 rowcount += 1                  data = parser.read()                  if rowcount = 50000 orelse (data nothing orelse data.length = 0)                     totalrows += dasample.update(dtsample)                     ' empty table if there more 100k rows                     dtsample.rows.clear()                     rowcount = 0                 end if             loop         end using      end using end using sw.stop() console.writeline("parsed , imported {0} rows in {1}", totalrows,                     sw.elapsed.totalminutes) 

the processing loop updates db every 50k rows in case there many many rows. in 1 pass rather reading n rows thru oledb @ time. csvparser read 1 row @ time, there should never more 50,001 rows worth of data on hand @ time.

there may special cases handle type conversions shown if item.datatype = gettype(boolean) then. boolean column read in "1" cant directly passed boolean column, converted integer can. there other conversions such funky dates.

time process 250,001 rows: 3.7 mins. app needs apply string transforms every single string column take longer. i'm pretty sure using csvreader in csvhelper have applied part of parsing type.


there potential disaster waiting happen since meant all-purpose importer/scrubber.

for integer = 0 dt.columns.count - 1     dim columnname string = firstrow(i).tostring()     dim newcolumn new datacolumn(columnname, mlistoftypes(i))     getdata.columns.add(newcolumn) next 

both question , self-answer build new table using column names csv , datatypes select * query on destination table. so, assumes csv columns in same order select * return them, , csvs use same names tables.

the answer above marginally better in finds , matches based on name.

a more robust solution write little utility app user maps db column name csv index. save results list(of csvmapitem) , serialize it. there whole collection of these saved disk. then, rather creating map based on dead reckoning, deserialize desired user csvmap in above code.


Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -