html - Scrapy xpath construction for tables of data - yielding empty brackets -
i attempting build out xpath constructs data items extract several hundred pages of site formatted same. example site https://weedmaps.com/dispensaries/cannabicare
as can seen site has headings , within headings rows of item names , prices. trying extract sections, item names, , item prices whether per gram, 8th, ounce or edibles price per unit , keep them categorized. example scrapy item fields following:
sativa_item_name=scrapy.field() sative_item_price_gra,=scrapy.field() sativa_item_price_eigth=scrapy.field() sativa_item_price_quarter=scrapy.field() edible_item_name=scrapy.field() edible_item_price_each=scrapy.field()
and on , forth. able extract item names , price/gram xpaths such following:
response.xpath('.//div/span[@class="item_name"]/text()'].extract() response.xpath('//div[@data-price-name="price_gram"]/span/text()').extract()
i can't figure out how extract items within heading containers, price per gram items in hybrid category, price each item , item name in edible category.
they separated such id="menu_item_category_4" when like:
response.xpath('//div[@id="menu_item_category_4"]/span[@class="item_name"]/text()').extract()
it yields empty brackets , no results. guidance on beyond appreciated. thank taking time @ this!
the thing see in browser after javascript has formatted stuff, presumably angular.
if run html source in a html source beautifier, , search <span class="item_name">
you'll see pattern this, repeating blocks of
<div class="menu_item" data-category-id="1" data-category-name="indica" data-json="{}" id="menu_item_5390083" style="position: relative; overflow: visible;"> <div class="js-edit"><a class="btn" href="/new_admin/dispensaries/cannabicare/menu_items/banana-og-member-pricing/edit"><i class="icon-edit">edit</i></a></div> <div class="menu-item-form-container js-form" style="display: none;"></div> <div class="menu-item-content js-content"> <div class="row"> <div class="col-md-4 name"><span class="item_name">banana og - member pricing</span></div> <div class="col-md-8 js-prices prices menu-item-prices"> <div class="col-sm-2 col-md-2 price-container" data-price-name="price_gram"><span class="price">9 </span><span class="price-label">g</span></div> <div class="col-sm-2 col-md-2 price-container" data-price-name="price_eighth"><span class="price">30 </span><span class="price-label">1/8</span></div> <div class="col-sm-2 col-md-2 price-container" data-price-name="price_quarter"><span class="price">60 </span><span class="price-label">1/4</span></div> <div class="col-sm-2 col-md-2 price-container" data-price-name="price_half_ounce"><span class="price">90 </span><span class="price-label">1/2</span></div> <div class="col-sm-2 col-md-2 price-container" data-price-name="price_ounce"><span class="price">165 </span><span class="price-label">oz</span></div> </div> </div> <div class="row item-options" style="display: none;"> <div class="col-md-3 text"></div> <div class="col-md-2 category-id"> <div class="category-id-select" style="display: none;"></div> </div> <div class="current-category-id" id="current-category-menu-item-5390083" style="display: none;">1</div> </div> <div class="row"> <div class="col-md-12 dispensary_name"><a href="/dispensaries/cannabicare">cannabicare</a></div> </div> <div style="height:1px"></div> <div class="row item_details"> <div class="col-md-10">75% indica / 25% sativa</div> </div> </div> </div>
this html you'll need work on.
and extract data using like:
for category in response.css('div.menu_item'): print "--- category:", category.xpath('@data-category-name').extract() row in category.css('div.menu-item-content > div.row:first-child'): print row.xpath('string(.//span[@class="item_name"])').extract() price in row.css('div.prices > div.price-container'): print "price:", price.xpath('@data-price-name').extract(), price.css('span.price::text').extract()
which outputs:
--- category: [u'indica'] [u'banana og - member pricing'] price: [u'price_gram'] [u'9 '] price: [u'price_eighth'] [u'30 '] price: [u'price_quarter'] [u'60 '] price: [u'price_half_ounce'] [u'90 '] price: [u'price_ounce'] [u'165 '] --- category: [u'indica'] [u'purple kush - member pricing'] price: [u'price_gram'] [u'9 '] price: [u'price_eighth'] [u'30 '] price: [u'price_quarter'] [u'60 '] price: [u'price_half_ounce'] [u'90 '] price: [u'price_ounce'] [u'165 '] ...
Comments
Post a Comment