vertigo Posted April 13, 2021 Report Share Posted April 13, 2021 I'm trying to figure out how exactly the regex parsing works when creating a variable, specifically why some parts are included and some aren't and how to grab multiple parts into one variable, if that's even possible. Here's the example I'm working with: Quote blahblahblahtext<h1>AIMP v4.70, build 2248</h1>blahblahblahmoretext And here's one of many variations of regex I've tried: Quote (?<=AIMP for Windows.*?)v([\d\.]+[\d]*)[\D]+([\d]+)< I'm fairly familiar with regex, though not as good with it as I'd like to be, and I've read the mini-tutorial by appyface, but I'm still unclear on some things. First, how does it determine what parts of the found string to actually use (red highlighted, 4.70 in this example) and which parts to not use (blue highlighted, v and , build 2248< in this example)? I would think I would have to put those in a look-ahead and look-behind, but leaving them to be part of the regex match doesn't cause them to be included in the variable. Second, is there any way to have the 4.70 used, skip the ", build " and then concatenate the 2248 (preferably with a . between it and the first part)? Or is the only way to do this to set one to one variable, the other to another, then set the version variable to var1+var2 (assuming that's even possible in Ketarin)? Link to comment Share on other sites More sharing options...
Ambimind Posted April 14, 2021 Report Share Posted April 14, 2021 I understand your ultimate goal to be : Retrieve AIMP's current version number, including build number, in the following format : [versionNumber].[build] Below I present two possible solutions using Ketarin. Solution A retrieves HTTP headers for the file, then extracts the version and build info from the file name - all within a single variable. Solution B uses two dependent variables : First the version+build string is extracted from the webpage in versionstr, then the ", build " text is removed using a replace function in version(note that version's contents is set to "Textual content"). Link to comment Share on other sites More sharing options...
vertigo Posted April 14, 2021 Author Report Share Posted April 14, 2021 Interesting. A seems cleaner, B seems easier. I didn't even think of working it that way, pulling the full string then removing the part I don't want. Though if that's possible, I do wonder if it could be done the way I was thinking, by taking each part individually and combining them, albeit requiring more complexity and an additional variable. I'm not sure how to accomplish A, though, and would appreciate if you could give a brief explanation. And I'd also still like clarification on how it's decided what part of a match is actually used, e.g. in my examples, why the v isn't included even when after, not in, the look-behind. Link to comment Share on other sites More sharing options...
Ambimind Posted April 14, 2021 Report Share Posted April 14, 2021 On 4/14/2021 at 1:35 PM, vertigo said: I'm not sure how to accomplish A, though, and would appreciate if you could give a brief explanation. The URL shown in solution A is taken from AIMP's windows download page. While it includes no obvious reference to an executable, when followed, their web server will always redirect to the latest version of AIMP's exe. When a Ketarin variable is provided a URL which does not refer to a file containing searchable textual content, it returns the file's HTTP headers instead; this feature is used to determine the latest version+build number, in solution A. On 4/14/2021 at 1:35 PM, vertigo said: And I'd also still like clarification on how it's decided what part of a match is actually used When regular expressions employ "capturing groups"(eg. G1 & G2), only the first capturing group(G1) match, backed by red, is returned. This is true even when another, valid capturing group(G2) is matched(not marked by Ketarin). In this scenario blue backed text indicates the rest of the text matched by the regex - not returned. When capturing groups are absent from a regex, blue backed text indicates the match that will be returned as the variable value. Note : I've deliberately shortened the original regex to improve readability. Link to comment Share on other sites More sharing options...
vertigo Posted April 26, 2021 Author Report Share Posted April 26, 2021 Thanks a lot for the help! That's a good trick to know about the header. When you say capturing groups, I assume you mean the parts in parenthesis, so in the example (?<=AIMP for Windows.*?)v([\d\.]+[\d]*)[\D]+([\d]+)< the first capturing group is ([\d\.]+[\d]*) and the second is ([\d]+), but only the first, as you said, is actually captured? Too bad there's not a way to do multiples. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now