python - How to read a asp.net page with BeautifulSoup? -
i trying scrape data webpage using beautiful soup.
i running problems when try convert html document beautifulsoup object.
when run code
soup = beautifulsoup(html_doc)
the error message im getting :
syntaxerror: non-ascii character '\xa9' in file c:/users/mlee/pycharmprojects/bstest/htmlparse.py on line 683, no encoding declared; see http://python.org/dev/peps/pep-0263/ details
i believe because there asp.net viewstate objects in html base64 encoded.
is there suggested workaround or have use different tool?
also, interested in getting javascript generated portions of text. there better way of doing this?
thank you!
put header
#!/usr/bin/env python # -*- coding: utf-8 -*-
on first line of htmlparse.py
file, make sure pycharm saves file utf-8 encoded.
this has nothing asp/viewstate. have utf characters in file.
i interested in getting javascript generated portions of text. there better way of doing this?
you might want use selenium webdriver + python bindings doing task. option phantomjs
Comments
Post a Comment