Creating a table of duplicates from SAS data set with over 50 variables -
i have large sas data set (54 variables , on 10 million observations) need load teradata. there duplicates must come along, , machine not configured multiload. want create table of 300,000 duplicates can append original load did not accept them. logic i've read in other posts seems tables few variables. there way create new table each observation having same combination of 54 variables listed. i'm trying avoid proc sort...by logic using 54 variables. query builder method seemed inefficient well. thanks.
using proc sort
way it, need create nicer way key off of it.
create test data.
data have; x = 1; y = 'a'; output; output; x = 2; output; run;
create new field equivalent appending of fields in row , running them though md5()
(hashing) algorithm. give nice short field uniquely identify combination of values on row.
data temp; length hash $16; set have; hash = md5(cats(of _all_)); run;
now use proc sort , our new hash field key. output duplicate records table named 'want':
proc sort data=temp nodupkey dupout=want; hash; run;
Comments
Post a Comment