Creating a table of duplicates from SAS data set with over 50 variables -

- January 15, 2010

i have large sas data set (54 variables , on 10 million observations) need load teradata. there duplicates must come along, , machine not configured multiload. want create table of 300,000 duplicates can append original load did not accept them. logic i've read in other posts seems tables few variables. there way create new table each observation having same combination of 54 variables listed. i'm trying avoid proc sort...by logic using 54 variables. query builder method seemed inefficient well. thanks.

using proc sort way it, need create nicer way key off of it.

create test data.

data have;   x = 1;   y = 'a';   output;    output;   x = 2;   output; run;

create new field equivalent appending of fields in row , running them though md5() (hashing) algorithm. give nice short field uniquely identify combination of values on row.

data temp;   length hash $16;   set have;   hash = md5(cats(of _all_)); run;

now use proc sort , our new hash field key. output duplicate records table named 'want':

proc sort data=temp nodupkey dupout=want;   hash; run;

Search This Blog

Th

Creating a table of duplicates from SAS data set with over 50 variables -

Comments

Post a Comment

Popular posts from this blog

xslt - Substring before throwing error -

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -