Help with site inspection for grabbing json data

Yuri-Tech · March 19, 2018

Can use some help please, as I don't have a lot of experience in .Net regex and javascript/HTML5.

I'm tring to get SUMo version from the changlelog url (and not the download url):

{ChangelogURL}

http://www.kcsoftwares.com/bugs/changelog_page.php?project_id=11

{VersionFromChangeLogURL} regex I use with success:

(?<=changelog_page\.php\?project_id\=11\"\>.*?SUMo.*?changelog_page\.php\?version_id\=\d+\"\>).*?([\d\.]+).*?(?=<\/)

I wonder if there is a safer way and performance wise to scrape the version number.

I tried using chromium "inspect url" F12 / Ctrl+Shift+I,

I can see there is a reference to this site:

http://www.kcsoftwares.com/bugs/changelog_page.php?version_id=954

which has less input for the regex to grab from but I can't get how to scrape the `id` div data which currently equals to `954`.

Maybe there's also a way to grab the version number which post data or json data?

Hope someone can help me figuring out this example so I'll be able to do it next time.

Thank you

MAPJe71 · March 20, 2018

Try the follwing regex for {VersionFromChangeLogURL}:

(?<=<h4(?:[^<]|<(?!h4))*(?-i:SUMo)(?:[^<]|<(?!h4))*)\d+(?:\.\d+){1,3}

Yuri-Tech · March 20, 2018

Thank you MAPJe71,

Thats a one clever regular expression.

Would you mind explaining what is your procedure how you approach the problem to solve it, (if you have any:)

Actually, I thought there should be a better way to access the version number,

If anyone has any idea using site inspection would be great and mind enriching.

If there is no easy method should be also great to know.

MAPJe71 · March 21, 2018

Quote

I'm tring to get SUMo version from the changlelog url (and not the download url)

Why not get it from the download page and use the following regex:

(?<=(?-i:SUMo)(?:[^<]|<(?!/(?:code|li)))*)\d+(?:\.\d+){1,3}

Yuri-Tech · April 11, 2018

MAPJe71 thanks for your answer and the regex.

Actually I already used the download page.

I asked if there's a better way scraping the version number from the changelog for 2 reasons:

1) to learn and use it for other sites with cleaner approach, so it won't have future glitches.

2) Whereveris possible, I use 2 variables for the version number and compare them with a script without coding so it'll be easier to identify website changes that may break something.

I'm looking for a solution to make it cleaner, for example reggexing this url for version:

https://justgetflux.com/update/v4/windows-download.json

When I grabbed it the version wouldn't appear in the main download site (https://justgetflux.com/)

but if you go to site inspection -> Sources -> Page -> update/v4

there's the json file with the version

had to really dig it up and looking for explanation or a simple way for scraping it.

Any suggestion following this way?

shawn · April 13, 2018

"version":"([\d\.+])"

Sign In

Help with site inspection for grabbing json data

Recommended Posts

Yuri-Tech

Link to comment

Share on other sites

MAPJe71

Link to comment

Share on other sites

Yuri-Tech

Link to comment

Share on other sites

MAPJe71

Link to comment

Share on other sites

Yuri-Tech

Link to comment

Share on other sites

shawn

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information