csv - How can i compare two columns in two different rows in python -
i want go through each line of csv file , compare see if first field of line 1 same first field of next line , on. if finds match ignore 2 lines contains same fields , keep lines there no match
here example dataset (no_dup.txt)
ac_gene_id m_gene_id ensgmog00000015632 ensorlg00000010573 ensgmog00000015632 ensorlg00000010585 ensgmog00000003747 ensorlg00000006947 ensgmog00000003748 ensorlg00000004636
basically want exclude line 1 , 2 since contains same fields (ensgmog00000015632) , keep lines 3 , 4
here code have tried couldn't finish it
prev = none open("no_dup.txt", 'r') fh_in: line in fh_in: line = line.strip() if line.startswith("e"): line1 = line.split() print "initial gene =", line1[0] if prev not none or prev!= line1[0]: prev = line1[0]
i think clean way of doing make map of each entry -> list of lines.
entries = {} open('no_dup.txt', 'r') fh_in: line in fg_in: entry = line.split()[0] if entry in entries: entries[entry].append(line) else: entries[entry] = [line] matches in entries.iteritems(): if len(matches) == 1: print matches[0]
you should note not preserve order of entries.
Comments
Post a Comment