Issue with site

Omniferum · April 27, 2011

I'm having a problem scraping http://www.gonvisor.com, it keeps giving me a 403 Error.

Anybody got any ideas as to how to resolve this? Previous issues point to a header error of some sort. Tried analyzing the http upon loading the page but got nothing of any substance.

shawn · April 27, 2011

403 is "forbidden". Typically this means that your user-agent isn't allowed on their site. Trying with the following forged IE header works fine (for me):

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)

It's also possible that your IP has been banned due to abuse or exceeding a certain number of hits over a period (common rules for stuff like APF/BFD).

Omniferum · April 27, 2011

Whoops, thought that referrer field was only for the download url, not for everything.

Was dicking around with httpx://&header:accept stuff.

Thanks shawn

CybTekSol · May 28, 2011

I am finding it necessary to add a 'user agent' entry more often these days... your thoughts on this Shawn?

shawn · May 28, 2011

Sadly, it's very common for servers to ensure a valid connection now, especially if the data is being hosted within a cloud setup. I have the following custom variables setup in my Ketarin to help get around these issues:

ie32    Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C)
ie64    Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Media Center PC 5.0; SLCC1; Tablet PC 2.0; .NET4.0C)
firefox    Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10
opera    Opera/9.80 (Windows NT 6.1; U; en) Presto/2.6.30 Version/10.62
chrome    Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.63 Safari/534.3
wget    wget/1.9+cvs-stable+(red+hat+modified)
curl    pycurl/7.18.2

If you have UA header issues, start trying to fix it with curl and wget, and if they don't work, use ie32, ie64 and others. Usually it'll work by the time you get to ie32.

CybTekSol · May 31, 2011

WGet as 'user agent' has been very effective for me overall, so far.

shawn · May 31, 2011

Me, too -- as long as the site is actually intended to distribute files. If they're a "mom and pop" or a very small biz then it's likely it'll fail completely.

Sign In

Issue with site

Recommended Posts

Omniferum

Link to comment

Share on other sites

shawn

Link to comment

Share on other sites

Omniferum

Link to comment

Share on other sites

CybTekSol

Link to comment

Share on other sites

shawn

Link to comment

Share on other sites

CybTekSol

Link to comment

Share on other sites

shawn

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information