c# - Comparing two csv files by column and value and displaying line numbers of differing values -
i'm doing comparer 2 csv files has columns , corresponding values each column on every new line. columns specified on first line of file. each line after contains data each column.
i'm trying create program can handle files differing line numbers , number of columns , 1 display line number of values differed , create new text file displays line number, column name , value of file 1 , file 2.
the comparison should done based on identifier instead of line line. if column data missing specified in column row, display number of columns data missing.
so example:
worker1.csv:
name;age;height;gender;
bob;21;190;male
john;35;182;male
rose;
mary;20;175;female
worker2.csv
name;age;height;gender
bob;21;185;male
john;30;186;male
mary;
output.csv
differences found in mary:
file 2, line number 3, missing 3 values
differences found in bob:
file 1, line number 1, height: 190
file 2, line number 1, height: 185
differences found in john:
file 1, line number 2, age: 35, height: 182
file 2, line number 3, age: 30, height: 186
how should this? did linq's except on both files lines how can line numbers?
this little more complicated first appears. if approach step-by-step, it's doable.
i'm going assume have enough memory load 1 of file's records dictionary. if files large, things lot more complicated.
the first thing want load 1 of files dictionary, indexed id. in example, i'll assume id name. each record recorded in fileline
instance:
class fileline { public int linenumber; public string name; public int age; public int height; public string gender; }
and dictionary:
dictionary<string, fileline> file1lines = new dictionary<string, fileline>();
now, read file dictionary:
int linenumber = 0; foreach (var line in file.readlines("worker1.csv") { // split line , assign fields. // end name, age, height, , gender variables. ++linenumber; var theline = new fileline( linenumber = linenumber, name = name, age = age, height = height, gender = gender); file1lines.add(theline.name, theline); }
now, can read second file, item in dictionary, , report differences:
linenumber = 0; foreach (var line in file.readlines("worker2.csv")) { // split line , create fileline instance. // we'll call line2 // then, see if line in file1lines dictionary. fileline line1; if (!file1lines.trygetvalue(line2.name, out line1)) { // line didn't exist in first file } else { // compare individual fields if (line2.age != line1.age) { // report fields different } // same other fields } }
now, if want keep track of lines in first file not in second file, create hashset<string>
, , whenever find record in second file, add name hash set. when you're done second file, can compare hash set keys in dictionary. if hash set called foundrecords
, you'd have:
var recordsnotfound = file1lines.keys.except(foundrecords); foreach (var name in recordsnotfound) { // item in dictionary report not found }
Comments
Post a Comment