c# - Best way to remove duplicates from DataTable depending on column values -

- April 15, 2014

i have dataset contains 1 table, i'm working datatable here.

the code see below works, want have best , efficient way perform task because work data here.

basically, data table should later in database, primary key - of course - must unique.

the primary key of data work in column called computer name. each entry have date in column date.

i wrote function searches duplicates in computer name column, , compare dates of these duplicates delete newest.

the function wrote looks this:

private void mergeduplicate(dataset importeddata) {     dictionary<string, list<datarow>> systems = new dictionary<string, list<datarow>>();     dataset importeddatacopy = importeddata.copy();     importeddata.tables[0].clear();     foreach (datarow dr in importeddatacopy.tables[0].rows)     {         string systemname = dr["computer name"].tostring();         if (!systems.containskey(systemname))          {             systems.add(systemname, new list<datarow>());         }         systems[systemname].add(dr);     }       foreach (keyvaluepair<string,list<datarow>> entry in systems) {         if (entry.value.count > 1) {             int firstdatarowindex = 0;             int seconddatarowindex = 1;             while (entry.value.count > 1) {                 datetime time1 = validation.convertstringintodatetime(entry.value[firstdatarowindex]["date"].tostring());                 datetime time2 = validation.convertstringintodatetime(entry.value[seconddatarowindex]["date"].tostring());                  //delete older entry                 if (datetime.compare(time1,time2) >= 0) {                     entry.value.removeat(firstdatarowindex);                 } else {                     entry.value.removeat(seconddatarowindex);                 }             }         }         importeddata.tables[0].importrow(entry.value[0]);     } }

my question is, since code works - best , fastest/most efficient way perform task?

i appreciate answers!

i think can done more efficiently. copy dataset once dataset importeddatacopy = importeddata.copy(); , copy again dictionary , delete unnecessary data dictionary. rather remove unnecessary information in 1 pass. this:

private void mergeduplicate(dataset importeddata) {     dictionary<string, datarow> systems = new dictionary<string, datarow>();     int = 0;      while (i < importeddata.tables[0].rows.count)     {         datarow dr = importeddata.tables[0].rows[i];         string systemname = dr["computer name"].tostring();         if (!systems.containskey(systemname))          {             systems.add(systemname, dr);         }         else         {             // existing date date in dictionary.             datetime existing = validation.convertstringintodatetime(systems[systemname]["date"].tostring());              // candidate date date of current datarow.             datetime candidate = validation.convertstringintodatetime(dr["date"].tostring());              // if candidate date greater existing date replace existing datarow             // candidate datarow , delete existing datarow table.             if (datetime.compare(existing, candidate) < 0)              {                 importeddata.tables[0].rows.remove(systems[systemname]);                 systems[systemname] = dr;             }             else             {                 importeddata.tables[0].rows.remove(dr);             }         }         i++;     } }

Search This Blog

Th

c# - Best way to remove duplicates from DataTable depending on column values -

Comments

Post a Comment

Popular posts from this blog

xslt - Substring before throwing error -

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -