Jump to content
Ketarin forum
Yuri-Tech

Help with site inspection for grabbing json data

Recommended Posts

Can use some help please, as I don't have a lot of experience in .Net regex and javascript/HTML5.

 

I'm tring to get SUMo version from the changlelog url (and not the download url):

{ChangelogURL}

    http://www.kcsoftwares.com/bugs/changelog_page.php?project_id=11

{VersionFromChangeLogURL} regex I use with success:

(?<=changelog_page\.php\?project_id\=11\"\>.*?SUMo.*?changelog_page\.php\?version_id\=\d+\"\>).*?([\d\.]+).*?(?=<\/)

 

I wonder if there is a safer way and performance wise to scrape the version number.

 

I tried using chromium "inspect url" F12 / Ctrl+Shift+I,

I can see there is a reference to this site:

http://www.kcsoftwares.com/bugs/changelog_page.php?version_id=954

which has less input for the regex to grab from but I can't get how to scrape the `id` div data which currently equals to `954`.

Maybe there's also a way to grab the version number which post data or json data?

 

Hope someone can help me figuring out this example so I'll be able to do it next time.

 

Thank you

Share this post


Link to post
Share on other sites

Try the follwing regex for {VersionFromChangeLogURL}:

(?<=<h4(?:[^<]|<(?!h4))*(?-i:SUMo)(?:[^<]|<(?!h4))*)\d+(?:\.\d+){1,3}

 

Share this post


Link to post
Share on other sites

Thank you MAPJe71,

Thats a one clever regular expression.

Would you mind explaining what is your procedure how you approach the problem to solve it, (if you have any:) 

 

 

Actually, I thought there should be a better way to access the version number,

If anyone has any idea using site inspection would be great and mind enriching.

If there is no easy method should be also great to know.

 

Share this post


Link to post
Share on other sites
Quote

I'm tring to get SUMo version from the changlelog url (and not the download url)

Why not get it from the download page and use the following regex:

(?<=(?-i:SUMo)(?:[^<]|<(?!/(?:code|li)))*)\d+(?:\.\d+){1,3}

 

Share this post


Link to post
Share on other sites

MAPJe71 thanks for your answer and the regex.

Actually I already used the download page.

I asked if there's a better way scraping the version number from the changelog for 2 reasons:

1) to learn and use it for other sites with cleaner approach, so it won't have future glitches.

2) Whereveris possible, I use 2 variables for the version number and compare them with a script without coding so it'll be easier to identify website changes that may break something.

 

I'm looking for a solution to make it cleaner, for example reggexing this url for version:

https://justgetflux.com/update/v4/windows-download.json

When I grabbed it the version wouldn't appear in the main download site (https://justgetflux.com/)

but if you go to site inspection -> Sources -> Page -> update/v4

there's the json file with the version

had to really dig it up and looking for explanation or a simple way for scraping it.

 

Any suggestion following this way?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.