python - Go through tar archive in memory to extract metadata? -
i have several tar archives need extract/read in memory. problem each tar contains many zip archives , each contain unique xml documents.
so structure of each tar follows: tar -> directories-> zips->xml.
obviously can manually extract single tar have 1000 tar archives 3 gb each , contains 6000 zip archives each. i'm looking way handle .tar archives in memory , extract xml data of each zip. there way this?
this should doable, since of relevant methods have non-disk-related options.
lots of loops here, let's dig in.
for each tar archive:
tarfile.open
open tar archive. (docs)- call
.getmembers
on resultingtarfile
instance list of zips (or other files) contained in archive. (docs)
for each zip within tar archive:
- once know member file (i.e., 1 of zips) want through, call
.extractfile
ontarfile
instance file object zip. (docs) - instantiate new
zipfile.zipfile
file object in order open zip can work it. (docs) - call
.infolist
onzipfile
instance list of files contains (including xml files). (docs)
for each xml file within zip:
Comments
Post a Comment