python - How to read a asp.net page with BeautifulSoup? -


i trying scrape data webpage using beautiful soup.

i running problems when try convert html document beautifulsoup object.

when run code

soup = beautifulsoup(html_doc) 

the error message im getting :

syntaxerror: non-ascii character '\xa9' in file      c:/users/mlee/pycharmprojects/bstest/htmlparse.py on line 683, no encoding declared; see http://python.org/dev/peps/pep-0263/ details 

i believe because there asp.net viewstate objects in html base64 encoded.

is there suggested workaround or have use different tool?

also, interested in getting javascript generated portions of text. there better way of doing this?

thank you!

put header

#!/usr/bin/env python # -*- coding: utf-8 -*- 

on first line of htmlparse.py file, make sure pycharm saves file utf-8 encoded.

this has nothing asp/viewstate. have utf characters in file.

i interested in getting javascript generated portions of text. there better way of doing this?

you might want use selenium webdriver + python bindings doing task. option phantomjs


Comments

Popular posts from this blog

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -

javascript - oscilloscope of speaker input stops rendering after a few seconds -