python - Figuring out how to web scrape with BeautifulSoup -


i trying scrape data in table "periods" , "percent per annum" (table 4) columns in url:

my code follows, think getting confused how refer row above first date , corresponding number , hence error attributeerror: 'nonetype' object has no attribute 'gettext' in line row_name = row.findnext('td.header_units').gettext().

from bs4 import beautifulsoup import urllib2   url = "http://sdw.ecb.europa.eu/browsetable.do?node=qview&series_key=165.yc.b.u2.eur.4f.g_n_a.sv_c_ym.sr_30y"  content = urllib2.urlopen(url).read() soup = beautifulsoup(content)  desired_table = soup.findall('table')[4]  # find columns want data headers1 = desired_table.findall('td.header_units') headers2 = desired_table.findall('td.header') desired_columns = [] th in headers1: #i'm working `headers1` see if have right idea     desired_columns.append([headers1.index(th), th.gettext()])  # iterate through each row grabbing data desired columns rows = desired_table.findall('tr')  row in rows[1:]:     cells = row.findall('td')     row_name = row.findnext('td.header_units').gettext()     column in desired_columns:         print(cells[column[0]].text.encode('ascii', 'ignore'), row_name.encode('ascii', 'ignore'), column[1].encode('ascii', 'ignore')) 

thank you

this put elements in tuples pairs:

from bs4 import beautifulsoup import requests  r = requests.get(     "http://sdw.ecb.europa.eu/browsetable.do?node=qview&series_key=165.yc.b.u2.eur.4f.g_n_a.sv_c_ym.sr_30y") soup = beautifulsoup(r.content)  data = iter(soup.find("table", {"class": "tablestats"}).find("td", {"class": "header"}).find_all_next("tr"))   headers = (next(data).text, next(data).text) table_items =  [(a.text, b.text) ele in data a, b in [ele.find_all("td")]]  a, b in table_items:     print(u"period={}, percent per annum={}".format(a, b if b.strip() else "null")) 

output:

period=2015-06-09, percent per annum=1.842026 period=2015-06-08, percent per annum=1.741636 period=2015-06-07, percent per annum=null period=2015-06-06, percent per annum=null period=2015-06-05, percent per annum=1.700042 period=2015-06-04, percent per annum=1.667431 

Comments

Popular posts from this blog

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -