Yuri-Tech Posted March 19, 2018 Report Share Posted March 19, 2018 Can use some help please, as I don't have a lot of experience in .Net regex and javascript/HTML5. I'm tring to get SUMo version from the changlelog url (and not the download url): {ChangelogURL} http://www.kcsoftwares.com/bugs/changelog_page.php?project_id=11 {VersionFromChangeLogURL} regex I use with success: (?<=changelog_page\.php\?project_id\=11\"\>.*?SUMo.*?changelog_page\.php\?version_id\=\d+\"\>).*?([\d\.]+).*?(?=<\/) I wonder if there is a safer way and performance wise to scrape the version number. I tried using chromium "inspect url" F12 / Ctrl+Shift+I, I can see there is a reference to this site: http://www.kcsoftwares.com/bugs/changelog_page.php?version_id=954 which has less input for the regex to grab from but I can't get how to scrape the `id` div data which currently equals to `954`. Maybe there's also a way to grab the version number which post data or json data? Hope someone can help me figuring out this example so I'll be able to do it next time. Thank you Link to comment Share on other sites More sharing options...
MAPJe71 Posted March 20, 2018 Report Share Posted March 20, 2018 Try the follwing regex for {VersionFromChangeLogURL}: (?<=<h4(?:[^<]|<(?!h4))*(?-i:SUMo)(?:[^<]|<(?!h4))*)\d+(?:\.\d+){1,3} Link to comment Share on other sites More sharing options...
Yuri-Tech Posted March 20, 2018 Author Report Share Posted March 20, 2018 Thank you MAPJe71, Thats a one clever regular expression. Would you mind explaining what is your procedure how you approach the problem to solve it, (if you have any:) Actually, I thought there should be a better way to access the version number, If anyone has any idea using site inspection would be great and mind enriching. If there is no easy method should be also great to know. Link to comment Share on other sites More sharing options...
MAPJe71 Posted March 21, 2018 Report Share Posted March 21, 2018 Quote I'm tring to get SUMo version from the changlelog url (and not the download url) Why not get it from the download page and use the following regex: (?<=(?-i:SUMo)(?:[^<]|<(?!/(?:code|li)))*)\d+(?:\.\d+){1,3} Link to comment Share on other sites More sharing options...
Yuri-Tech Posted April 11, 2018 Author Report Share Posted April 11, 2018 MAPJe71 thanks for your answer and the regex. Actually I already used the download page. I asked if there's a better way scraping the version number from the changelog for 2 reasons: 1) to learn and use it for other sites with cleaner approach, so it won't have future glitches. 2) Whereveris possible, I use 2 variables for the version number and compare them with a script without coding so it'll be easier to identify website changes that may break something. I'm looking for a solution to make it cleaner, for example reggexing this url for version: https://justgetflux.com/update/v4/windows-download.json When I grabbed it the version wouldn't appear in the main download site (https://justgetflux.com/) but if you go to site inspection -> Sources -> Page -> update/v4 there's the json file with the version had to really dig it up and looking for explanation or a simple way for scraping it. Any suggestion following this way? Link to comment Share on other sites More sharing options...
shawn Posted April 13, 2018 Report Share Posted April 13, 2018 "version":"([\d\.+])" Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now