python - Go through tar archive in memory to extract metadata? -

- February 15, 2014

i have several tar archives need extract/read in memory. problem each tar contains many zip archives , each contain unique xml documents.

so structure of each tar follows: tar -> directories-> zips->xml.

obviously can manually extract single tar have 1000 tar archives 3 gb each , contains 6000 zip archives each. i'm looking way handle .tar archives in memory , extract xml data of each zip. there way this?

this should doable, since of relevant methods have non-disk-related options.

lots of loops here, let's dig in.

for each tar archive:

tarfile.open open tar archive. (docs)
call .getmembers on resulting tarfile instance list of zips (or other files) contained in archive. (docs)

for each zip within tar archive:

once know member file (i.e., 1 of zips) want through, call .extractfile on tarfile instance file object zip. (docs)
instantiate new zipfile.zipfile file object in order open zip can work it. (docs)
call .infolist on zipfile instance list of files contains (including xml files). (docs)

for each xml file within zip:

call .open on zipfile instance in order file object of 1 of xml files. (docs)
you have file object corresponding 1 of xml files. whatever want it: .read it, copy disk somewhere, stick in elementtree (docs), etc.

Search This Blog

Th

python - Go through tar archive in memory to extract metadata? -

Comments

Post a Comment

Popular posts from this blog

xslt - Substring before throwing error -

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -