Jump to content
Ketarin forum

Regex help


vertigo
 Share

Recommended Posts

I'm trying to figure out how exactly the regex parsing works when creating a variable, specifically why some parts are included and some aren't and how to grab multiple parts into one variable, if that's even possible. Here's the example I'm working with:

Quote

blahblahblahtext<h1>AIMP v4.70, build 2248</h1>blahblahblahmoretext

 

And here's one of many variations of regex I've tried:

Quote

(?<=AIMP for Windows.*?)v([\d\.]+[\d]*)[\D]+([\d]+)<

 

I'm fairly familiar with regex, though not as good with it as I'd like to be, and I've read the mini-tutorial by appyface, but I'm still unclear on some things. First, how does it determine what parts of the found string to actually use (red highlighted, 4.70 in this example) and which parts to not use (blue highlighted, v and , build 2248< in this example)? I would think I would have to put those in a look-ahead and look-behind, but leaving them to be part of the regex match doesn't cause them to be included in the variable. Second, is there any way to have the 4.70 used, skip the ", build " and then concatenate the 2248 (preferably with a . between it and the first part)? Or is the only way to do this to set one to one variable, the other to another, then set the version variable to var1+var2 (assuming that's even possible in Ketarin)?

Link to comment
Share on other sites

I understand your ultimate goal to be : Retrieve AIMP's current version number, including build number, in the following format : [versionNumber].[build]

Below I present two possible solutions using Ketarin.

Solution A  retrieves HTTP headers for the file, then extracts the version and build info from the file name - all within a single variable.

Solution B uses two dependent variables : First the version+build string is extracted from the webpage in versionstr, then the ", build " text is removed using a replace function in version(note that version's contents is set to "Textual content").

KsTJKOo.png

kgchtnM.png

Link to comment
Share on other sites

Interesting. A seems cleaner, B seems easier. I didn't even think of working it that way, pulling the full string then removing the part I don't want. Though if that's possible, I do wonder if it could be done the way I was thinking, by taking each part individually and combining them, albeit requiring more complexity and an additional variable. I'm not sure how to accomplish A, though, and would appreciate if you could give a brief explanation.

And I'd also still like clarification on how it's decided what part of a match is actually used, e.g. in my examples, why the v isn't included even when after, not in, the look-behind.

Link to comment
Share on other sites

On 4/14/2021 at 1:35 PM, vertigo said:

I'm not sure how to accomplish A, though, and would appreciate if you could give a brief explanation.

The URL shown in solution A is taken from AIMP's windows download page. While it includes no obvious reference to an executable, when followed, their web server will always redirect to the latest version of AIMP's exe.
When a Ketarin variable is provided a URL which does not refer to a file containing searchable textual content, it returns the file's HTTP headers instead; this feature is used to determine the latest version+build number, in solution A.

On 4/14/2021 at 1:35 PM, vertigo said:

And I'd also still like clarification on how it's decided what part of a match is actually used

When regular expressions employ "capturing groups"(eg. G1 & G2), only the first capturing group(G1) match, backed by red, is returned.
This is true even when another, valid capturing group(G2) is matched(not marked by Ketarin).
In this scenario blue backed text indicates the rest of the text matched by the regex - not returned.

When capturing groups are absent from a regex, blue backed text indicates the match that will be returned as the variable value.

Note : I've deliberately shortened the original regex to improve readability.

kq98QEC.png

Link to comment
Share on other sites

  • 2 weeks later...

Thanks a lot for the help! That's a good trick to know about the header.

When you say capturing groups, I assume you mean the parts in parenthesis, so in the example

(?<=AIMP for Windows.*?)v([\d\.]+[\d]*)[\D]+([\d]+)<

the first capturing group is ([\d\.]+[\d]*) and the second is ([\d]+), but only the first, as you said, is actually captured? Too bad there's not a way to do multiples.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.