Kuppet : Javascript rendered pages for Ketarin

Ambimind · June 20, 2021

Download Link

17 April 2022 Update
- In summary, kuppet now has a proper command-line interface, can be used standalone and also handles the shadow DOM.
- All dependencies have been updated to their latest versions.
- Documentation is now improved, please make use of -h or --help options to access it.

18 April 2022 Update
- Fixed bug preventing application of user agent string
- Removed "-V" options
- Updated config file

21 April 2022 Update
- Implemented request specific options through post data
- Added post data option : redirection-delay
- Improved readability of standard logs, "-l"

Logitech_Options.xml

Edited April 21, 2022 by Ambimind
New version of kuppet uploaded

shawn · July 18, 2021

Thank you, @Ambimind! This looks like it could be very useful.

jusseppe · September 10, 2021

I keep getting this error:

image.png.1a896d219c71321b4a43faf011b39fd6.png

what am I doing wrong?

image.png.ebd764c7f186bc3e123e552959439ecd.png

image.png.cba9e70d1a9530a1170b3d00f169cd0e.png

Ambimind · September 14, 2021

Thanks for clearly reporting your problem, jusseppe.
Please try the new version I've uploaded(see original post); I think I've found the cause of the error - entirely my fault.
Also, see the working example I've uploaded.

An alternative resolution : If your firewall is not blocking chrome and\or kuppet, you can try changing kuppets default working port :

Tip : If you remove "-WindowStyle hidden" from the command line, kuppet will tell you a short story

jusseppe · September 14, 2021

thanks for your support!

I redownloaded the new package, replaced the old files, set to port 8008, opened it on my router (TCP), disabled the firewall on my PC and used your working configuration but I still get this error:

image.png.e9debbebc8b46b5417cf75cc464f343b.png

any ideas on the possible issue?

thanks again!

jusseppe · October 28, 2021

Hello Ambimind,

is there anything else we can try?

Let me know, thank you!

shawn · December 11, 2021

Hi, @jusseppe!

I've had a chance now to try this myself and the biggest issue was that the alternate port I attempted to use would not work until I closed and reopened Ketarin. This made all the difference for me. I've pasted the exact text you'll want to use in the "global variables" feature below to make it easier to ensure it is correct.

For the variable named "run_kuppet":

if(!(PS kuppet -ea 0)) {START -WindowsStyle hidden BIN\kuppet 8008}

Or, alternatively, use this to show the log as it happens:

if(!(PS kuppet -ea 0)) {START BIN\kuppet 8008}

For the variable named ">":

{run_kuppet:ps}http://localhost:8008/

shawn · December 11, 2021

@Ambimind - this is fantastic. Thank you!

Can I request a feature? It looks like (according to the log) that it is re-requesting the same URLs repeatedly with each instance of a variable that makes a request. Is it possible to make Kuppet cache the results temporarily until it has moved on to the next Application? Since I gather a lot of content from some pages this could save me a LOT of repeat requests.

Are there other switches than the port that it runs on?

shawn · December 11, 2021

One more feature request: can I get the complete webpage contents and not just the body element?

shawn · December 11, 2021

It is caching - yay! I didn't read the log very carefully.

It looks like the parameters are:

kuppet.exe [port] [timeout-ms]

It disconnects if idle for 30s or more no matter what.

It's using the following user-agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/93.0.4577.63 Safari/537.36

Can you make it possible to change the user-agent? Some sites block HeadlessChrome by default.

shawn · December 11, 2021

The cache appears to only survive about 10 seconds - so on a large download or on a slow site, it may not keep the cache long enough for the post-update scripts to run without re-downloading the variable webpages. I've changed the command used to extend the timeout to:

if(!(PS kuppet -ea 0)) {START -WindowsStyle hidden BIN\kuppet "8008 30000"}

This increases the timeout to 30 seconds and resolves my immediate problem, but it will not accept any value above 30000 (30s). I suspect there is a hard limit in place to ensure that Kuppet doesn't stay in memory forever. Unfortunately, this forces some applications to re-request data in order to perform my post-update scripts. I guarantee on some sites (just experienced it with SnagIt) this will become a problem. Re-requesting pages can get you throttled with a recaptcha or cloudflare stall. Grrrr.

Please increase the hard limit to 5m or more, and/or allow us to configure caching on an execution basis. Maybe allow Kuppet to run without a hard limit (timeout=0) - and use the timeout only in relation to the cache duration. I leave Ketarin open 24/7 anyway so there's no reason I can think of to close Kuppet any more frequently than I reboot (once per month or so).

Something like this:

kuppet.exe [port] [cache-timeout-ms] [hard-limit-ms] [custom-user-agent]

Thanks, again, @Ambimind - this has already resolved problems I've been having with about a dozen Applications.

shawn · December 12, 2021

Or maybe a configuration file that it will parse to populate these variables.

shawn · December 20, 2021

The ability to replace the user-agent string has become essential. The Cloudfront servers that are responsible for Logitech, SnagIt and others are now completely blocking the HeadlessChrome user-agent.

shawn · January 26, 2022

It looks like there's a problem with synchronicity. The global variable for {run_kuppet} must not be getting parsed as frequently as it needs to be by Ketarin since Kuppet isn't staying alive long enough - or even launching at all if Ketarin has been open for an extended period of time. Since I always have Ketarin open this means that Kuppet often doesn't behave well.

To resolve this problem I've rewritten several of my apps to avoid Cloudfront and rewrote the {run_kuppet} to simply:

START "BIN\kuppet.exe" "8008 30000"

This no longer checks for an existing instance of Kuppet, which means that subsequent calls will each attempt to load it, and will be ignored since an existing instance is already running. Since I don't have window:hidden flag, this results in several windows popping up briefly and disappearing almost instantaneously. This is a good thing since it indicates that the first instance is still running, and allows Kuppet to be reinstantiated on demand should it time out.

Now almost every app that I'm using Kuppet for is working well, with a random stall on a couple sites that use site throttling and large files (but the file still downloads fine). I think increasing the Kuppet timeout (or eliminating it entirely) will resolve both of these problems.

shawn · February 22, 2022

The many popping windows that steal focus have continued to be a problem, so here's a workaround.

Change {run_kuppet} to:

Add-Type -Assembly Microsoft.VisualBasic; [Microsoft.VisualBasic.Interaction]::Shell('k:\ketarin\BIN\kuppet.exe 8008 30000', 'MinimizedNoFocus') | Out-Null;

The path at Shell must be the complete path to Kuppet.exe. The next parameter is the port, the next parameter is the 30s timeout. The "MinimizedNoFocus" still runs the app but loads it in the background as a minimized window so it neither steals focus nor pops as a distraction.

An alternative if you don't mind the many windows popping up is to use "NormalNoFocus" which will still allow it in the foreground as a restored/normal window, but still won't steal focus.

"|Out-Null" is required or else it will insert the new PID in front of the URL when created with {>}

wdsarin9 · April 5, 2022

Thank you for the useful information

Ambimind · April 17, 2022

@shawn Thanks for your feedback, many of the changes in the new version were motivated by it. Your questions and further feedback are welcome.

@jusseppe Could you please try the latest version.

shawn · April 17, 2022

Yay! Thank you, @Ambimind. I'll report any issues here.

shawn · April 18, 2022

I've been testing it and have found a few issues:

The user agent option is never properly observed. No amount of fiddling with the userAgentArray in the kuppet.config file or -u or --user-agent parameters allows this to be changed. With the kuppet.config file removed and verbose logging enabled the user-agent assignment does appear only within the header summary, but what is sent to the server remains either:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4889.0 Safari/537.36

or:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/100.0.4889.0 Safari/537.36

In either case, the deciding factor here is actually whether the --chrome-visible option is enabled. If it's visible then it uses Chrome, otherwise it uses HeadlessChrome as the user-agent.

The visibility option works perfectly - in that it either shows or hides the Chrome sandbox - usually in the background of monitor 1, but not always -- for some reason this is inconsistent.

I don't see that the standard logging option does anything at all.

Port, timeouts, wait, website-url, verbose logging, and version, too, each work well.

The only real issue is that the user-agent is never being assigned from either the kuppet.config or the parameters and the current need to enable visibility in order to avoid the HeadlessChrome user-agent is a major downer. At least it's loading in the background most of the time, though, so it's not that big of a deal. The importance of avoiding HeadlessChrome can not be understated. Almost every site now blocks it by default. The new visibility option allows this to continue to operate as expected though.

Thanks, again, @Ambimind

shawn · April 18, 2022

To enable visibility in kuppet.config you can use chromeVisible:

"chromeVisible"   : 1

Ambimind · April 18, 2022

5 hours ago, shawn said:

The user agent option is never properly observed.

Nice catch, thank you; I think it should be fixed now.

5 hours ago, shawn said:

I don't see that the standard logging option does anything at all.

If '-L' is used, it implies '-l', in which case, while '-L' is set, setting\unsetting '-l' has no effect. I've added a note about this in the help.
To prove it to yourself, remove '-L', and set '-l', this should remove all log lines starting with "===:" and retain all those starting with "---:".
Both can be excluded, in which case no logging occurs.

5 hours ago, shawn said:

The visibility option works perfectly - in that it either shows or hides the Chrome sandbox - usually in the background of monitor 1, but not always -- for some reason this is inconsistent.

I wasn't able to reproduce this. In order to avoid any possible confusion the '-V'(capitalized v) option is removed from the cmd-interface.
If a request is served from cache, chrome is not launched. I found this may give the impression of inconsistency, when it is by design.
I've updated the config file with the relevant key 'chromeVisible'; I suggest using 'true'/'false'(without quotes) in place of '0'/'1'.

Usage Note
When setting command line options within Ketarin global variables, the options must be quoted and separated by spaces, as shown below:
if(!(PS kuppet -ea 0)) {START -win hidden BIN\kuppet '-v --port=9000'}

shawn · April 19, 2022

Thank you, @Ambimind! That fixes the user-agent functionality I was most concerned with.

Thank you for explaining the logging behavior. That makes a lot of sense and works exactly as you describe.

Please do not have the kuppet.config file from the distribution named kuppet.config. It should be named "kuppet.config.sample" or similar, that way it won't overwrite any custom kuppet.config file in the destination directory.

It's working perfectly with CloudFlare now.

I am having problems with a site using a specific WordPress plugin, though. This is only causing problems with one site I'm downloading from, but the software there is AWESOME and the last thing I want is to lose access to it. The plugin is CleanTalk Spam Protect, and it has a built-in Anti-Crawler FireWall that's responsible for imposing a delay of between 3 and 60 seconds with a cookie and cache-buster refresh/redirect that is resulting in Kuppet not making it to the "real" content of the page.

I'm sure there are going to be many similar situations but I think it can be addressed in one of two ways each time. In both cases you'll want to have an option to turn it on or off to either prevent unwanted redirects or unnecessary delays.

1) Ensure that an option exists whether to impose the redirect behavior or not (I suggest followRedirects or similar). On some sites we will not want to follow redirects. On others (like this one) we will.

When this option is enabled you simply allow the JavaScript or redirect to proceed as long as it's within the page-load-timeout period. Scanning the produced content for "window.location" should be sufficient to determine whether to wait "up to page-load-timeout" seconds for the redirect to occur. I'm sure some sites will obfuscate this, but for those that don't this should be an easy check and probably the easiest solution.

2) Provide a minWait option. Like the followRedirects option above, this would provide a solution for this scenario by allowing the page to finish loading and following any redirects or timed output before considering the page fully loaded. For example, setting a minWait of 70 seconds and the page-load-timeout at 90 seconds would allow any page with a 60 second imposed delay to complete. This is probably the more universal solution since it will allow certain other scripted content sufficient time to load, too.

I don't see any other way to watch for anonymous setTimeout calls other than writing a handler to capture every call to setTimeout (which is probably overkill), otherwise I would just do that so we could do something like check the execution window of each timeout and anything under page-load-timeout would be allowed to proceed until all timeouts under page-load-timeout were triggered before capturing the page contents. This would address pages where the content has an advertisement delay or other interstitial as well, though I avoid those sites like the plague.

Ambimind · April 19, 2022

1 hour ago, shawn said:

The plugin is CleanTalk Spam Protect, and it has a built-in Anti-Crawler FireWall that's responsible for imposing a delay of between 3 and 60 seconds with a cookie and cache-buster refresh/redirect that is resulting in Kuppet not making it to the "real" content of the page.

I'm unable to reproduce the behaviors you've described. I've attached the job I created to test it : It works fine with or without kuppet.

cleantalk-spam-protect.xml

shawn · April 19, 2022

Hi! It's not downloading CleanTalk plugin that's the problem, but from sites that are using it. Here's an example. However, the plugin must use a connection counter or something, too, since often the first request (after several hours) works fine but any subsequent request (say, for another app from the same site) will fail. If you setup a new app and just try to capture the version number you'll see how it misbehaves by the second or third request. Here's an app for it though -- Kuppet is used in {versionstub}.

setdefaultbrowser.xml

Ambimind · April 20, 2022

4 hours ago, shawn said:

Hi! It's not downloading CleanTalk plugin that's the problem, but from sites that are using it. Here's an example. However, the plugin must use a connection counter or something, too, since often the first request (after several hours) works fine but any subsequent request (say, for another app from the same site) will fail. If you setup a new app and just try to capture the version number you'll see how it misbehaves by the second or third request. Here's an app for it though -- Kuppet is used in {versionstub}.

setdefaultbrowser.xml 4.27 kB · 1 download

Got it - I can reproduce the behavior at will, now.
Pardon my confusion; it was clearly stated in your original post.

Sign In

Kuppet : Javascript rendered pages for Ketarin

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Important Information