Ketarin not decoding data in HTML

Hi - I'm not sure if this is a bug in Ketarin, .NET, a fault of the web dev of the site, or a bit of none/all of the above.

On the download page of the encryption software VeraCrypt, the link to the download has an encoded "+" character in the HTML, showing up as "+". Firefox decodes this automagically and the link works, but Ketarin requests the URL as-is.

The offending line in the page is as follows:

<a href="https://launchpad.net/veracrypt/trunk/1.21/&#43;download/VeraCrypt%20Setup%201.21.exe">

Requesting this via Ketarin gets a 404, the same with cURL. I assume the web server is seeing the "#" and assuming it's part of a document anchor. Or something else weird. Either way I'm not sure where the fault lies but I'm erring towards the site being "less incorrect" :-)

For the moment I've just split the URL variable into two parts but figured I'd report this in case it is a bug in Ketarin.

it's bad form within the URI specification, but it's an edge case so browsers will usually allow it anyway. this is "HTML-encoded" (uses an ampersand escape) not "URL-encoded" (uses a percentage escape). URI's are supposed to be encoded with URL-encoding.

in situations like these I would recommend you pre-parse the URL by performing a replacement operation on it. 


Alternatively, you could pass it to multireplace to swap out a series of broken encodings like this (or any other string selections you wanted to replace).

