Omniferum Posted December 22, 2010 Report Share Posted December 22, 2010 I've been trying to further refine my regex [^\r]+((=|"|'|http).*\.exe) That's the furthest i've gotten, it works in the sense of 'preference' i.e. it iwll find " or ' or http first and then go all the way to the extension. The problem is though it doesn't leave out the " or ' and if I try adding a negated character class like [^"'] it stuffs the whole thing up. I want something that should work like this, but it doesn't annoyingly enough. [^\r]+(([^"=']|http).*\.exe) Any pointers? If I can get this down i'll have a regex that finds the download url for ALL my apps. Even tried working from the end, i'll admit MOST of my time finding this was not knowing that Ketarin was singleline regex and case insensitive so that sorta made a lot of time go bye bye. Then again I didn't even know that regex came in flavors so balls to me. Link to comment Share on other sites More sharing options...
floele Posted December 22, 2010 Report Share Posted December 22, 2010 I added a note regarding these options to the wiki. Link to comment Share on other sites More sharing options...
shawn Posted December 22, 2010 Report Share Posted December 22, 2010 I think what you're after is: [='"]([^'"\s]*\.exe)[\s'"] Link to comment Share on other sites More sharing options...
Omniferum Posted December 22, 2010 Author Report Share Posted December 22, 2010 Hm, I just realized that perhaps I am a silly bunny. My original regex of [^"']+ worked fine, however all the problem URL's I have which have the http problem are after the = after the a href=" I think I remember omitting it because "header information may be in downloads" but forgot. So [^"'=]+64\.exe works fine, finally found my bloody answer. Link to comment Share on other sites More sharing options...
Omniferum Posted December 22, 2010 Author Report Share Posted December 22, 2010 Hm, well that fixed a few. Some others don't actually start with = " ' and just http I keep trying to find a way to work back from the file extension. i.e. find \.exe then match the very first http as a preference, if not found then go through the " ' = group or the very last / without a space after/before it. Link to comment Share on other sites More sharing options...
shawn Posted December 23, 2010 Report Share Posted December 23, 2010 You're going to be very hard pressed to find one pattern that works in every situation. Generally, you can find one that'll work across most apps on a site, but even then you'll often need to include a filepart inside (like "x86" or "x64"), so there's no silver bullet. Link to comment Share on other sites More sharing options...
Omniferum Posted December 23, 2010 Author Report Share Posted December 23, 2010 Not trying to get it to fit file extensions or specific builds, I just want it to be able to find the extension and pick the first http or " or ' it finds working its way back from \.exe. [^"'=]+[^"']+\.exe is pretty much the BEST i've come across Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now