html - Scrapy xpath construction for tables of data - yielding empty brackets -


i attempting build out xpath constructs data items extract several hundred pages of site formatted same. example site https://weedmaps.com/dispensaries/cannabicare

as can seen site has headings , within headings rows of item names , prices. trying extract sections, item names, , item prices whether per gram, 8th, ounce or edibles price per unit , keep them categorized. example scrapy item fields following:

sativa_item_name=scrapy.field() sative_item_price_gra,=scrapy.field() sativa_item_price_eigth=scrapy.field() sativa_item_price_quarter=scrapy.field() edible_item_name=scrapy.field() edible_item_price_each=scrapy.field() 

and on , forth. able extract item names , price/gram xpaths such following:

response.xpath('.//div/span[@class="item_name"]/text()'].extract() response.xpath('//div[@data-price-name="price_gram"]/span/text()').extract() 

i can't figure out how extract items within heading containers, price per gram items in hybrid category, price each item , item name in edible category.

they separated such id="menu_item_category_4" when like:

response.xpath('//div[@id="menu_item_category_4"]/span[@class="item_name"]/text()').extract() 

it yields empty brackets , no results. guidance on beyond appreciated. thank taking time @ this!

the thing see in browser after javascript has formatted stuff, presumably angular.

if run html source in a html source beautifier, , search <span class="item_name"> you'll see pattern this, repeating blocks of

<div class="menu_item" data-category-id="1" data-category-name="indica" data-json="{}" id="menu_item_5390083" style="position: relative; overflow: visible;">     <div class="js-edit"><a class="btn" href="/new_admin/dispensaries/cannabicare/menu_items/banana-og-member-pricing/edit"><i class="icon-edit">edit</i></a></div>     <div class="menu-item-form-container js-form" style="display: none;"></div>     <div class="menu-item-content js-content">         <div class="row">             <div class="col-md-4 name"><span class="item_name">banana og - member pricing</span></div>             <div class="col-md-8 js-prices prices menu-item-prices">                 <div class="col-sm-2 col-md-2 price-container" data-price-name="price_gram"><span class="price">9 </span><span class="price-label">g</span></div>                 <div class="col-sm-2 col-md-2 price-container" data-price-name="price_eighth"><span class="price">30 </span><span class="price-label">1/8</span></div>                 <div class="col-sm-2 col-md-2 price-container" data-price-name="price_quarter"><span class="price">60 </span><span class="price-label">1/4</span></div>                 <div class="col-sm-2 col-md-2 price-container" data-price-name="price_half_ounce"><span class="price">90 </span><span class="price-label">1/2</span></div>                 <div class="col-sm-2 col-md-2 price-container" data-price-name="price_ounce"><span class="price">165 </span><span class="price-label">oz</span></div>             </div>         </div>         <div class="row item-options" style="display: none;">             <div class="col-md-3 text"></div>             <div class="col-md-2 category-id">                 <div class="category-id-select" style="display: none;"></div>             </div>             <div class="current-category-id" id="current-category-menu-item-5390083" style="display: none;">1</div>         </div>         <div class="row">             <div class="col-md-12 dispensary_name"><a href="/dispensaries/cannabicare">cannabicare</a></div>         </div>         <div style="height:1px"></div>         <div class="row item_details">             <div class="col-md-10">75% indica / 25% sativa</div>         </div>     </div> </div> 

this html you'll need work on.

and extract data using like:

for category in response.css('div.menu_item'):     print "--- category:", category.xpath('@data-category-name').extract()     row in category.css('div.menu-item-content > div.row:first-child'):         print row.xpath('string(.//span[@class="item_name"])').extract()         price in row.css('div.prices > div.price-container'):             print "price:", price.xpath('@data-price-name').extract(), price.css('span.price::text').extract() 

which outputs:

--- category: [u'indica'] [u'banana og - member pricing'] price: [u'price_gram'] [u'9 '] price: [u'price_eighth'] [u'30 '] price: [u'price_quarter'] [u'60 '] price: [u'price_half_ounce'] [u'90 '] price: [u'price_ounce'] [u'165 '] --- category: [u'indica'] [u'purple kush - member pricing'] price: [u'price_gram'] [u'9 '] price: [u'price_eighth'] [u'30 '] price: [u'price_quarter'] [u'60 '] price: [u'price_half_ounce'] [u'90 '] price: [u'price_ounce'] [u'165 '] ... 

Comments

Popular posts from this blog

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -