python - Extracting specific src attributes from script tags -


i want js file names input content contains jquery substring re.

this code:

step 1: extract js file content.

>>> data = """    <script type="text/javascript" src="js/jquery-1.9.1.min.js"/> ...     <script type="text/javascript" src="js/jquery-migrate-1.2.1.min.js"/> ...     <script type="text/javascript" src="js/jquery-ui.min.js"/> ...     <script type="text/javascript" src="js/abc_bsub.js"/> ...     <script type="text/javascript" src="js/abc_core.js"/> ...     <script type="text/javascript" src="js/abc_explore.js"/> ...     <script type="text/javascript" src="js/abc_qaa.js"/>""" >>> import re >>> re.findall('src="js/([^"]+)"', data) ['jquery-1.9.1.min.js', 'jquery-migrate-1.2.1.min.js', 'jquery-ui.min.js', 'abc_bsub.js', 'abc_core.js', 'abc_explore.js', 'abc_qaa.js'] 

step 2: js file have sub string jquery

>>> [ii ii in re.findall('src="js/([^"]+)"', data) if "jquery" in ii] ['jquery-1.9.1.min.js', 'jquery-migrate-1.2.1.min.js', 'jquery-ui.min.js'] 

can above step 2 in step 1 means re pattern result?

sure can. 1 way use

re.findall('src="js/([^"]*jquery[^"]*)"', data) 

this match after "js/ until nearest " if contains jquery anywhere. if know more position of jquery (for example, if it's @ start) can adjust regex accordingly.

if want make sure jquery not directly surrounded other alphanumeric characters, use word boundary anchors:

re.findall(r'src="js/([^"]*\bjquery\b[^"]*)"', data) 

Comments

Popular posts from this blog

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -

javascript - oscilloscope of speaker input stops rendering after a few seconds -