javascript - Scrapy + phantomjs + selenium raise NotImplementedError -

- May 15, 2012

i'm trying scrape page javascript elements, i've made this, has part pass 302 (that tested , works fine):

class bllanguage(scrapy.spider):     handle_httpstatus_list = [302]     name = "bllanguage"     allowed_domains = ["http://explore.com/"]     start_urls = [     'http://explore.com/this-other-url'     ]      def __init__(self):             driver = webdriver.phantomjs(executable_path='/usr/local/bin/phantomjs')       def start_requests(self):         u in self.start_urls:             r = scrapy.request(url = u, dont_filter=true, callback=self.parse)             r.meta['dont_redirect'] = true             yield r          def parse(self, response):                 driver.get(response.url)                 print response.url

it gives me error:

traceback (most recent call last): file "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runcallbacks current.result = callback(current.result, *args, **kw) file "/usr/lib/pymodules/python2.7/scrapy/spiders/init.py", line 76, in parse raise notimplementederror notimplementederror

i can't understand i'm making wrong, if gives me hint, great.

edit: @paul trmbrth said, had put "def parse" in same indentation of "def start_requests". have this:

line 30, in parse driver.get(response.url) nameerror: global name 'driver' not defined

but defined in def _ init _: why that?

Search This Blog

To form

javascript - Scrapy + phantomjs + selenium raise NotImplementedError -

Comments

Post a Comment

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

delphi - Take screenshot in webcam using VFrames in Console Application -

ubuntu - Executors lost when starting pyspark in YARN client mode -