Jump to content
Ketarin forum

Kuppet : Javascript rendered pages for Ketarin


Ambimind
 Share

Recommended Posts

PswP11w.png

Download Link

17 April 2022 Update

- In summary, kuppet now has a proper command-line interface,  can be used standalone and also handles the shadow DOM.
- All dependencies have been updated to their latest versions.
- Documentation is now improved, please make use of -h or --help options to access it.

18 April 2022 Update
- Fixed bug preventing application of user agent string
- Removed "-V" options
- Updated config file

21 April 2022 Update
- Implemented request specific options through post data
- Added post data option redirection-delay
- Improved readability of standard logs, "-l"

Logitech_Options.xml

Edited by Ambimind
New version of kuppet uploaded
Link to comment
Share on other sites

  • 4 weeks later...
  • 1 month later...

Thanks for clearly reporting your problem, jusseppe.
Please try the new version I've uploaded(see original post); I think I've found the cause of the error - entirely my fault.
Also, see the working example I've uploaded.

An alternative resolution : If your firewall is not blocking chrome and\or kuppet, you can try changing kuppets default working port :
cs4i4MS.jpg

Tip : If you remove "-WindowStyle hidden" from the command line, kuppet will tell you a short story ;)

Link to comment
Share on other sites

  • 1 month later...
  • 1 month later...

Hi, @jusseppe!

I've had a chance now to try this myself and the biggest issue was that the alternate port I attempted to use would not work until I closed and reopened Ketarin. This made all the difference for me. I've pasted the exact text you'll want to use in the "global variables" feature below to make it easier to ensure it is correct.

For the variable named "run_kuppet":

if(!(PS kuppet -ea 0)) {START -WindowsStyle hidden BIN\kuppet 8008}

Or, alternatively, use this to show the log as it happens:

if(!(PS kuppet -ea 0)) {START BIN\kuppet 8008}

For the variable named ">":

{run_kuppet:ps}http://localhost:8008/

 

Link to comment
Share on other sites

@Ambimind - this is fantastic. Thank you!

Can I request a feature? It looks like (according to the log) that it is re-requesting the same URLs repeatedly with each instance of a variable that makes a request. Is it possible to make Kuppet cache the results temporarily until it has moved on to the next Application? Since I gather a lot of content from some pages this could save me a LOT of repeat requests.

Are there other switches than the port that it runs on?

Link to comment
Share on other sites

It is caching - yay! I didn't read the log very carefully. 

It looks like the parameters are:

kuppet.exe [port] [timeout-ms]

It disconnects if idle for 30s or more no matter what.

It's using the following user-agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/93.0.4577.63 Safari/537.36

Can you make it possible to change the user-agent? Some sites block HeadlessChrome by default.

Link to comment
Share on other sites

The cache appears to only survive about 10 seconds - so on a large download or on a slow site, it may not keep the cache long enough for the post-update scripts to run without re-downloading the variable webpages. I've changed the command used to extend the timeout to:

if(!(PS kuppet -ea 0)) {START -WindowsStyle hidden BIN\kuppet "8008 30000"}

This increases the timeout to 30 seconds and resolves my immediate problem, but it will not accept any value above 30000 (30s). I suspect there is a hard limit in place to ensure that Kuppet doesn't stay in memory forever. Unfortunately, this forces some applications to re-request data in order to perform my post-update scripts. I guarantee on some sites (just experienced it with SnagIt) this will become a problem. Re-requesting pages can get you throttled with a recaptcha or cloudflare stall. Grrrr.

Please increase the hard limit to 5m or more, and/or allow us to configure caching on an execution basis. Maybe allow Kuppet to run without a hard limit (timeout=0)  - and use the timeout only in relation to the cache duration. I leave Ketarin open 24/7 anyway so there's no reason I can think of to close Kuppet any more frequently than I reboot (once per month or so).

Something like this:

kuppet.exe [port] [cache-timeout-ms] [hard-limit-ms] [custom-user-agent]

Thanks, again, @Ambimind - this has already resolved problems I've been having with about a dozen Applications.

Link to comment
Share on other sites

  • 1 month later...

It looks like there's a problem with synchronicity. The global variable for {run_kuppet} must not be getting parsed as frequently as it needs to be by Ketarin since Kuppet isn't staying alive long enough - or even launching at all if Ketarin has been open for an extended period of time. Since I always have Ketarin open this means that Kuppet often doesn't behave well.

To resolve this problem I've rewritten several of my apps to avoid Cloudfront and rewrote the {run_kuppet} to simply:

START "BIN\kuppet.exe" "8008 30000"

This no longer checks for an existing instance of Kuppet, which means that subsequent calls will each attempt to load it, and will be ignored since an existing instance is already running. Since I don't have window:hidden flag, this results in several windows popping up briefly and disappearing almost instantaneously. This is a good thing since it indicates that the first instance is still running, and allows Kuppet to be reinstantiated on demand should it time out.

Now almost every app that I'm using Kuppet for is working well, with a random stall on a couple sites that use site throttling and large files (but the file still downloads fine). I think increasing the Kuppet timeout (or eliminating it entirely) will resolve both of these problems.

Link to comment
Share on other sites

  • 4 weeks later...

The many popping windows that steal focus have continued to be a problem, so here's a workaround.

Change {run_kuppet} to:

Add-Type -Assembly Microsoft.VisualBasic; [Microsoft.VisualBasic.Interaction]::Shell('k:\ketarin\BIN\kuppet.exe 8008 30000', 'MinimizedNoFocus') | Out-Null;

The path at Shell must be the complete path to Kuppet.exe. The next parameter is the port, the next parameter is the 30s timeout. The "MinimizedNoFocus" still runs the app but loads it in the background as a minimized window so it neither steals focus nor pops as a distraction.

An alternative if you don't mind the many windows popping up is to use "NormalNoFocus" which will still allow it in the foreground as a restored/normal window, but still won't steal focus.

"|Out-Null" is required or else it will insert the new PID in front of the URL when created with {>}

Link to comment
Share on other sites

  • 1 month later...

I've been testing it and have found a few issues:

The user agent option is never properly observed. No amount of fiddling with the userAgentArray in the kuppet.config file or -u or --user-agent parameters allows this to be changed. With the kuppet.config file removed and verbose logging enabled the user-agent assignment does appear only within the header summary, but what is sent to the server remains either:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4889.0 Safari/537.36

or:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/100.0.4889.0 Safari/537.36

In either case, the deciding factor here is actually whether the --chrome-visible option is enabled. If it's visible then it uses Chrome, otherwise it uses HeadlessChrome as the user-agent. 

The visibility option works perfectly - in that it either shows or hides the Chrome sandbox - usually in the background of monitor 1, but not always -- for some reason this is inconsistent.

I don't see that the standard logging option does anything at all.

Port, timeouts, wait, website-url, verbose logging, and version, too, each work well.

The only real issue is that the user-agent is never being assigned from either the kuppet.config or the parameters and the current need to enable visibility in order to avoid the HeadlessChrome user-agent is a major downer. At least it's loading in the background most of the time, though, so it's not that big of a deal. The importance of avoiding HeadlessChrome can not be understated. Almost every site now blocks it by default. The new visibility option allows this to continue to operate as expected though.

Thanks, again, @Ambimind

Link to comment
Share on other sites

5 hours ago, shawn said:

The user agent option is never properly observed. 

Nice catch, thank you; I think it should be fixed now.

5 hours ago, shawn said:

I don't see that the standard logging option does anything at all.

If '-L' is used, it implies '-l', in which case, while '-L' is set, setting\unsetting '-l' has no effect. I've added a note about this in the help.
To prove it to yourself, remove '-L', and set '-l', this should remove all log lines starting with "===:" and retain all those starting with "---:".
Both can be excluded, in which case no logging occurs.

5 hours ago, shawn said:

The visibility option works perfectly - in that it either shows or hides the Chrome sandbox - usually in the background of monitor 1, but not always -- for some reason this is inconsistent.

I wasn't able to reproduce this. In order to avoid any possible confusion the '-V'(capitalized v) option is removed from the cmd-interface.
If a request is served from cache, chrome is not launched. I found this may give the impression of inconsistency, when it is by design.
I've updated the config file with the relevant key 'chromeVisible'; I suggest using 'true'/'false'(without quotes) in place of '0'/'1'.

Usage Note
When setting command line options within Ketarin global variables, the options must be quoted and separated by spaces, as shown below: 
if(!(PS kuppet -ea 0)) {START -win hidden BIN\kuppet '-v --port=9000'}

Link to comment
Share on other sites

Thank you, @Ambimind! That fixes the user-agent functionality I was most concerned with.

Thank you for explaining the logging behavior. That makes a lot of sense and works exactly as you describe.

 

Please do not have the kuppet.config file from the distribution named kuppet.config. It should be named "kuppet.config.sample" or similar, that way it won't overwrite any custom kuppet.config file in the destination directory.

 

It's working perfectly with CloudFlare now.

I am having problems with a site using a specific WordPress plugin, though. This is only causing problems with one site I'm downloading from, but the software there is AWESOME and the last thing I want is to lose access to it. The plugin is CleanTalk Spam Protect, and it has a built-in Anti-Crawler FireWall that's responsible for imposing a delay of between 3 and 60 seconds with a cookie and cache-buster refresh/redirect that is resulting in Kuppet not making it to the "real" content of the page.

I'm sure there are going to be many similar situations but I think it can be addressed in one of two ways each time. In both cases you'll want to have an option to turn it on or off to either prevent unwanted redirects or unnecessary delays.

1) Ensure that an option exists whether to impose the redirect behavior or not (I suggest followRedirects or similar). On some sites we will not want to follow redirects. On others (like this one) we will.

When this option is enabled you simply allow the JavaScript or redirect to proceed as long as it's within the page-load-timeout period. Scanning the produced content for "window.location" should be sufficient to determine whether to wait "up to page-load-timeout" seconds for the redirect to occur. I'm sure some sites will obfuscate this, but for those that don't this should be an easy check and probably the easiest solution.

2) Provide a minWait option. Like the followRedirects option above, this would provide a solution for this scenario by allowing the page to finish loading and following any redirects or timed output before considering the page fully loaded. For example, setting a minWait of 70 seconds and the page-load-timeout at 90 seconds would allow any page with a 60 second imposed delay to complete. This is probably the more universal solution since it will allow certain other scripted content sufficient time to load, too.

I don't see any other way to watch for anonymous setTimeout calls other than writing a handler to capture every call to setTimeout (which is probably overkill), otherwise I would just do that so we could do something like check the execution window of each timeout and anything under page-load-timeout would be allowed to proceed until all timeouts under page-load-timeout were triggered before capturing the page contents. This would address pages where the content has an advertisement delay or other interstitial as well, though I avoid those sites like the plague.

Link to comment
Share on other sites

1 hour ago, shawn said:

The plugin is CleanTalk Spam Protect, and it has a built-in Anti-Crawler FireWall that's responsible for imposing a delay of between 3 and 60 seconds with a cookie and cache-buster refresh/redirect that is resulting in Kuppet not making it to the "real" content of the page.

I'm unable to reproduce the behaviors you've described. I've attached the job I created to test it : It works fine with or without kuppet.

cleantalk-spam-protect.xml

Link to comment
Share on other sites

Hi! It's not downloading CleanTalk plugin that's the problem, but from sites that are using it. Here's an example. However, the plugin must use a connection counter or something, too, since often the first request (after several hours) works fine but any subsequent request (say, for another app from the same site) will fail. If you setup a new app and just try to capture the version number you'll see how it misbehaves by the second or third request. Here's an app for it though -- Kuppet is used in {versionstub}.

setdefaultbrowser.xml

Link to comment
Share on other sites

4 hours ago, shawn said:

Hi! It's not downloading CleanTalk plugin that's the problem, but from sites that are using it. Here's an example. However, the plugin must use a connection counter or something, too, since often the first request (after several hours) works fine but any subsequent request (say, for another app from the same site) will fail. If you setup a new app and just try to capture the version number you'll see how it misbehaves by the second or third request. Here's an app for it though -- Kuppet is used in {versionstub}.

setdefaultbrowser.xml 4.27 kB · 1 download

Got it - I can reproduce the behavior at will, now.
Pardon my confusion; it was clearly stated in your original post.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.