appyface Posted September 18, 2010 Report Share Posted September 18, 2010 (edited) Thought I'd give this URL a try: hxxp://www.java.com/en/download/manual.jsp Ketarin is not able to load this page in the variable pane so I can scrape it. I use WatchThatPage for my page-watching service, and it loads this page fine. Is there a way to get the server to render the page contents for Ketarin? TIA, --appyface P.S. I know there are other web sources for downloading Java JRE and JDK. I'm only interested in methods to get the original sites' downloads via Ketarin as these are the problematic ones. E.g. java.com, oracle.com/technetwork/java/javase/downloads/index.html, etc. Please don't post me 3rd party sites, thanks :-) Edited September 18, 2010 by appyface Link to comment Share on other sites More sharing options...
floele Posted September 18, 2010 Report Share Posted September 18, 2010 Ketarin is not able to load this page in the variable pane so I can scrape it. Why not? Works fine for me. Link to comment Share on other sites More sharing options...
appyface Posted September 18, 2010 Author Report Share Posted September 18, 2010 @flo I'm asking YOU why not :-) I copy/paste the URL and click the 'load' button and nothing happens except a brief screen flash I can't read. Win7 Pro 64-bit new Ketarin 1.5.x. Link to comment Share on other sites More sharing options...
floele Posted September 18, 2010 Report Share Posted September 18, 2010 Sure that you did not forget to scroll down? Link to comment Share on other sites More sharing options...
appyface Posted September 18, 2010 Author Report Share Posted September 18, 2010 (edited) OK the box that briefly flashes is, "The content is being loaded"... Good gravy yes I did not scroll down far enough... I paged down a few times and *assumed* nothing happened there. Thank you. I'm glad this was an easy fix! Whew! LOL Edited September 18, 2010 by appyface Link to comment Share on other sites More sharing options...
appyface Posted September 18, 2010 Author Report Share Posted September 18, 2010 (edited) Still not quite right... the page that loads into Ketarin does not contain the same links that I get when looking at the page with IE8. For example, in IE8 look at the second Windows download entry (it's for 64-bit). Hover over the orange download arrow to the left of it and note the bundle-id is 41293. However, I can't find that bundle-id anywhere in the text that Ketarin loads? Edited September 18, 2010 by appyface Link to comment Share on other sites More sharing options...
floele Posted September 18, 2010 Report Share Posted September 18, 2010 I guess that it is a user agent issue. Link to comment Share on other sites More sharing options...
floele Posted September 18, 2010 Report Share Posted September 18, 2010 Maybe it's time for application specific user agents? Link to comment Share on other sites More sharing options...
appyface Posted September 18, 2010 Author Report Share Posted September 18, 2010 Agreed! :-) Link to comment Share on other sites More sharing options...
shawn Posted September 19, 2010 Report Share Posted September 19, 2010 Appy, consider using this instead: <?xml version='1.0' encoding='utf-8'?> <Jobs> <ApplicationJob xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Guid="56063164-e18f-4c70-806d-04ed851f2222"> <WebsiteUrl>http://www.java.com/en/</WebsiteUrl> <UserNotes /> <LastFileSize>16299808</LastFileSize> <LastFileDate>2010-09-12T02:29:12.4995617</LastFileDate> <IgnoreFileInformation>false</IgnoreFileInformation> <DownloadBeta>Default</DownloadBeta> <DownloadDate>2009-07-09T13:28:52</DownloadDate> <CheckForUpdatesOnly>false</CheckForUpdatesOnly> <VariableChangeIndicator /> <CanBeShared>false</CanBeShared> <ShareApplication>false</ShareApplication> <ExclusiveDownload>false</ExclusiveDownload> <HttpReferer /> <Variables> <item> <key> <string>aversion</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>RegularExpression</VariableType> <Regex>Recommended Version ([^<>]+?)\s*</strong</Regex> <Url>http://java.com/en/download/manual.jsp</Url> <Name>aversion</Name> </UrlVariable> </value> </item> <item> <key> <string>version</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>Textual</VariableType> <Regex /> <Url>http://java.com/en/download/manual.jsp</Url> <StartText>Recommended Version </StartText> <EndText> </strong</EndText> <TextualContent>{aversion:regexreplace:[^\d]+:u}</TextualContent> <Name>version</Name> </UrlVariable> </value> </item> <item> <key> <string>URL</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>StartEnd</VariableType> <Regex /> <Url>http://java.com/en/download/manual.jsp#win</Url> <StartText>Offline" href="http://javadl.sun.com/webapps/download/AutoDL?BundleId=</StartText> <EndText>" onclick</EndText> <Name>URL</Name> </UrlVariable> </value> </item> <item> <key> <string>swebsite</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>Textual</VariableType> <Regex /> <TextualContent>http://www.java.com/winoffline_installer/</TextualContent> <Name>swebsite</Name> </UrlVariable> </value> </item> <item> <key> <string>schangelog</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>Textual</VariableType> <Regex /> <TextualContent>http://java.sun.com/javase/{version:split:u:0}/webnotes/ReleaseNotes.html</TextualContent> <Name>schangelog</Name> </UrlVariable> </value> </item> <item> <key> <string>snotes</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>Textual</VariableType> <Regex /> <TextualContent /> <Name>snotes</Name> </UrlVariable> </value> </item> </Variables> <ExecuteCommand /> <ExecutePreCommand /> <Category>Plugins</Category> <SourceType>FixedUrl</SourceType> <DeletePreviousFile>true</DeletePreviousFile> <Enabled>true</Enabled> <FileHippoId /> <TargetPath>.\{category}\{appname:regexreplace:([\s\t\r\n\-\\&\/]+):_}-{version}.{url:ext}</TargetPath> <FixedDownloadUrl>http://javadl.sun.com/webapps/download/AutoDL?BundleId={URL}</FixedDownloadUrl> <Name>Java x86</Name> </ApplicationJob> </Jobs> Or FileHippo. Link to comment Share on other sites More sharing options...
shawn Posted May 30, 2011 Report Share Posted May 30, 2011 It's best to start a new thread when you're asking about a different application. Link to comment Share on other sites More sharing options...
CybTekSol Posted May 31, 2011 Report Share Posted May 31, 2011 UksosoFF's inquiry regarding Oracle/Sun JDK moved to new thread here: Link to comment Share on other sites More sharing options...
appyface Posted July 4, 2011 Author Report Share Posted July 4, 2011 (edited) For whatever reason (mistake on Oracle's part to the web page? My dumb luck?) I've been able to scrape the JRE offline installers from http://www.java.com/en/download/manual.jsp for awhile now -- until this last update 6u26. Now I'm back to the problem of not seeing the same content loaded in Ketarin, as IE8 and Firefox can see on the webpage (user agent issue?). So... back to my thread... any ideas on how to scrape the offline installers from this page again? Kind regards, --appyface Edited July 4, 2011 by appyface Link to comment Share on other sites More sharing options...
shawn Posted July 4, 2011 Report Share Posted July 4, 2011 It uses either a cookie (that does high-byte math) or a UA header to ensure that it's offering 64-bit to 64-bit computers. Use this user-agent header and it'll work fine: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Media Center PC 5.0; SLCC1; Tablet PC 2.0; .NET4.0C) Here's a revised 64-bit app profile: <?xml version='1.0' encoding='utf-8'?> <Jobs> <ApplicationJob xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Guid="019d2cdb-5370-45b8-9d8d-14c012033dd0"> <WebsiteUrl>http://www.java.com/en/</WebsiteUrl> <UserAgent>Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Media Center PC 5.0; SLCC1; Tablet PC 2.0; .NET4.0C)</UserAgent> <UserNotes /> <LastFileSize>16852768</LastFileSize> <LastFileDate>2011-06-07T14:21:40.8008915</LastFileDate> <IgnoreFileInformation>false</IgnoreFileInformation> <DownloadBeta>Default</DownloadBeta> <DownloadDate>2009-07-09T13:28:52</DownloadDate> <CheckForUpdatesOnly>false</CheckForUpdatesOnly> <VariableChangeIndicator /> <CanBeShared>true</CanBeShared> <ShareApplication>false</ShareApplication> <ExclusiveDownload>false</ExclusiveDownload> <HttpReferer /> <SetupInstructions /> <Variables> <item> <key> <string>aversion</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>RegularExpression</VariableType> <Regex>Recommended Version ([^<>]+?)\s*</strong</Regex> <Url>http://java.com/en/download/manual.jsp</Url> <Name>aversion</Name> </UrlVariable> </value> </item> <item> <key> <string>version</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>Textual</VariableType> <Regex /> <Url>http://java.com/en/download/manual.jsp</Url> <StartText>Recommended Version </StartText> <EndText> </strong</EndText> <TextualContent>{aversion:regexreplace:[^\d]+:u}</TextualContent> <Name>version</Name> </UrlVariable> </value> </item> <item> <key> <string>URL</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>RegularExpression</VariableType> <Regex>Windows\s*\(64\-bit\)" href="http\://[^'"]+BundleId=(\d+)"</Regex> <Url>http://java.com/en/download/manual.jsp#win</Url> <StartText>Offline" href="http://javadl.sun.com/webapps/download/AutoDL?BundleId=</StartText> <EndText>" onclick</EndText> <Name>URL</Name> </UrlVariable> </value> </item> <item> <key> <string>swebsite</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>Textual</VariableType> <Regex /> <TextualContent>http://www.java.com/winoffline_installer/</TextualContent> <Name>swebsite</Name> </UrlVariable> </value> </item> <item> <key> <string>schangelog</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>Textual</VariableType> <Regex /> <TextualContent>http://java.sun.com/javase/{version:split:u:0}/webnotes/ReleaseNotes.html</TextualContent> <Name>schangelog</Name> </UrlVariable> </value> </item> <item> <key> <string>snotes</string> </key> <value> <UrlVariable> <RegexRightToLeft>false</RegexRightToLeft> <VariableType>Textual</VariableType> <Regex /> <TextualContent /> <Name>snotes</Name> </UrlVariable> </value> </item> </Variables> <ExecuteCommand /> <ExecutePreCommand /> <ExecuteCommandType>Batch</ExecuteCommandType> <ExecutePreCommandType>Batch</ExecutePreCommandType> <Category>Plugins</Category> <SourceType>FixedUrl</SourceType> <DeletePreviousFile>true</DeletePreviousFile> <Enabled>true</Enabled> <FileHippoId /> <LastUpdated>2011-06-07T14:21:40.8008915</LastUpdated> <TargetPath>.\{category}\{appname:regexreplace:([\s\t\r\n\-\\&\/]+):_}-{version}.{url:ext}</TargetPath> <FixedDownloadUrl>http://javadl.sun.com/webapps/download/AutoDL?BundleId={URL}</FixedDownloadUrl> <Name>Java x64</Name> </ApplicationJob> </Jobs> Of course, you could always just use filehippo to get 'em, too, as this avoid the issue of overwhelming maintenance completely. Link to comment Share on other sites More sharing options...
appyface Posted July 5, 2011 Author Report Share Posted July 5, 2011 Thank you for your reply, Shawn. Where do I put the user-agent in Ketarin? Link to comment Share on other sites More sharing options...
shawn Posted July 5, 2011 Report Share Posted July 5, 2011 Open the app profile, advanced settings, user agent. Link to comment Share on other sites More sharing options...
appyface Posted July 5, 2011 Author Report Share Posted July 5, 2011 Oh geez I bet I looked at that panel a dozen times this morning and I never noticed that addition. I'll check it out. Thanks again, Shawn! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now