Proxies in Scrapy

Hammad Rauf
2 min readJun 6, 2021
Scrapy framework

Scrapy is a framework for python to scrape websites; Scraping websites like Amazon, Walmart, Ebay and many more e-commerce sites is not easy as they block IP (Error 403) and you have to wait for certain time to gain access again.

Most of the time Websites have captcha (As shown in the image). Amazon has asked to solve captcha after detecting bot activity, From scrapy we can not solve captcha but we can avoid this by changing IP after each or few requests which is called Rotating proxies.

Captcha on amazon

This is one of the example for Captcha/IP block. We have many more types of captcha like google and cloudflare, each have there own captcha format.

Scraper api is the service which provides Proxies and on each request we can use unique IP, While using there service we do not need to worry about IP blockage(Error 403).

Scraper api is easy to use, After login you will get your secret APIKEY.

http://api.scraperapi.com?api_key=APIKEY&url=http://amazon.com/

This is the format to use scrpaer api where we have entered your secret key and target link.

yield Request(‘http://api.scraperapi.com?api_key=APIKEY&url=http://amazon.com/',callback=self.amazon_page)

This is how you can use scraper api in scrapy, Replace your secret key with APIKEY from this link and yield this in scrapy.

scraper api

On sign up you will get free requests and you can test your script/spider.

Disclaimer :

This blog contains affiliate link and each purchase will give author some percent of amount which will support for future blogs and content.

--

--