Jump to content
Ketarin forum

Little problem with scraping NOD32 version from FileHippo


Stalker
 Share

Recommended Posts

Now that's a classic =). From what I see I imagine that

 

- Windows Live Messenger 2009 (14.0.8064)

- Ad-Aware 2009 8.0.0.0

- Foobar2000 0.9.6.3

- 3DMark Vantage 1.0.1

 

etc., are also in this list. I can solve 97% of versions, but for the remaining 3% I will use "Date added". Is this of any interest ?

Edited by FranciscoR
Link to comment
Share on other sites

(?<=\<td\>.*?)(\s\(?\d+?\.\d+?.*?|[a-z]+?\s\d{1,2},\s\d{4})(?=\</td\>)

 

 

It's actually more difficult than I first thought. ;)

 

- Windows Live Messenger 2009 (14.0.8064) = (14.0.8064)

- Ad-Aware 2009 8.0.0.0 = 8.0.0.0

- Foobar2000 0.9.6.3 = 0.9.6.3

- 3DMark Vantage 1.0.1 = 1.0.1

- NOD32 AntiVirus 4.0.314 = 4.0.314

- Windows Media Player 11 = October 30, 2006

 

If I find a better solution I'll post it here. I'm using the technical tab to get version.

Edited by FranciscoR
Link to comment
Share on other sites

1. For DLs such as Nvidia, Ketarin will match 182.08 WHQL XP (first match); for DLs such as Windows Media Player 11, it will match date, October 30, 2006.

2. Yeah, that aditional space it's the reason why I say "not so easy". If you have better suggestion... =)

Edited by FranciscoR
Link to comment
Share on other sites

Flo,

 

Sometimes I get a perfectly clear match in Expresso (but not in Ketarin = no red highlight for instance to http://www.filehippo.com/download_windows_media_player/tech/ ) with an expression like

 

(?<=\<td\>[a-z].*?\s)(\(?\d+?\.\d+?.*?)(?=\</td\>)|([a-z]+?\s\d{1,2},\s\d{4})

 

that, btw, solves the above issue; can you verify that this is my mistake ?

Edited by FranciscoR
Link to comment
Share on other sites

@Stalker,

I know it sounds crazy... but I use a template even for FileHippo apps to customize the {version} scrape and set other personal preferences (path... etc.). Try this from regex on the tech tab:

((?<=\>Title:\<.*?\s)(\(?\d+?\.\d+?.*?)(?=\</[a-z]{2}\>)|(?<=\>Date\sadded:\<.*?\<[a-z]{2}\>)([a-z]+?\s\d{1,2}\,\s\d{4})(?=\</[a-z]{2}\>))

I use this in my template and it has proved to be reliable, but as I have noted before, there is no such thing as perfect when it comes to 'universal' regex for a dynamic site! ;)

 

Addendum: After seeing FranciscoR's post regarding Expresso vs. Ketarin, this regex works in Ketarin and fails in Expresso for Date added: on WMP11.

Edited by CybTekSol
Link to comment
Share on other sites

In case anyone is wondering, this is a modified version of my FileHippo template... I do not ask for the 'Application name' in the template because it will 'auto-fill' after the template is imported simply by clicking the text field for 'FileHippo ID:', then clicking the text field for 'Application name:'. Flo... if you do not want this template posted, please delete this post. ;) I have not posted this in the 'Template Forum' before because of the 'built-in' support for FileHippo but it does demonstrate that a template can be helpful in many other ways!

<?xml version="1.0" encoding="utf-16"?>
<Jobs>
 <ApplicationJob xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <DownloadBeta>Default</DownloadBeta>
   <DownloadDate xsi:nil="true" />
   <VariableChangeIndicator />
   <CanBeShared>true</CanBeShared>
   <ShareApplication>false</ShareApplication>
   <HttpReferer />
   <Variables>
     <item>
       <key>
         <string>version</string>
       </key>
       <value>
         <UrlVariable>
           <VariableType>RegularExpression</VariableType>
           <Regex>((?<=\>Title:\<.*?\s)(\(?\d+?\.\d+?.*?)(?=\</[a-z]{2}\>)|(?<=\>Date\sadded:\<.*?\<[a-z]{2}\>)([a-z]+?\s\d{1,2}\,\s\d{4})(?=\</[a-z]{2}\>))</Regex>
           <Url><placeholder name="App Page URL from FileHippo?" value="http://www.filehippo.com/download_firefox/" />tech/</Url>
           <Name>version</Name>
         </UrlVariable>
       </value>
     </item>
   </Variables>
   <ExecuteCommand />
   <Category><placeholder name="Category" value="FileHippo" /></Category>
   <SourceType>FileHippo</SourceType>
   <DeletePreviousFile>true</DeletePreviousFile>
   <Enabled>true</Enabled>
   <FileHippoId><placeholder name="App Page URL from FileHippo?" value="http://www.filehippo.com/download_firefox/" /></FileHippoId>
   <LastUpdated xsi:nil="true" />
   <TargetPath><placeholder name="TargetPath" value="{target}\{category}\{appname:replace: :_}_v{version:replace: :_}.{url:ext}" /></TargetPath>
   <FixedDownloadUrl />
   <Name />
 </ApplicationJob>
</Jobs>

@Stalker,

Try it and see if you like it... ;)

 

Addendum: After seeing FranciscoR's post regarding Expresso vs. Ketarin, this regex works in Ketarin and fails in Expresso for Date added: on WMP11.

Edited by CybTekSol
Link to comment
Share on other sites

This is a bit strange, but if I enclose my latest regex on a big capture group w/2 non-capturing subgroups, I will get a match both in Expresso and Ketarin

 

((?<=\<td\>[a-z].*?\s)(?:\(?\d+?\.\d+?.*?)(?=\</td\>)|(?:[a-z]+?\s\d{1,2},\s\d{4}))

 

The variation by CybTekSol also does the trick

 

((?<=\>Title:\<.*?\s)(\(?\d+?\.\d+?.*?)(?=\</[a-z]{2}\>)|(?<=\>Date\sadded:\<.*?\<[a-z]{2}\>)([a-z]+?\s\d{1,2}\,\s\d{4})(?=\</[a-z]{2}\>))

 

 

IMO, this is fixed. Flo, will any of these do ?

Link to comment
Share on other sites

Using ((?<=\<td\>[a-z].*?\s)(?:\(?\d+?\.\d+?.*?)(?=\</td\>)|(?:[a-z]+?\s\d{1,2},\s\d{4})) expect to see the following versions:

 

 

 

Firefox 3.0.7 = 3.0.7

Yahoo! Messenger 9.0.0.2136 = 9.0.0.2136

Firefox 3.0.7 = 3.0.7

Flash Player 10.0.22.87 (IE) = 10.0.22.87 (IE)

Google Chrome 1.0.154.48 = 1.0.154.48

Google Desktop 5.8.809.23506 = 5.8.809.23506

Internet Explorer 8.0 RC1 = 8.0 RC1

Maxthon 2.5.1.4751 = 2.5.1.4751

Opera 9.64 = 9.64

eMule 0.49c = 0.49c

FrostWire 4.17.2 = 4.17.2

LimeWire Basic 5.1.1 = 5.1.1

Shareaza 2.4.0.0 = 2.4.0.0

uTorrent 1.8.3 Beta 14755 = 1.8.3 Beta 14755

Vuze 4.1.0.4 = 4.1.0.4

AIM 6.8.14.6 = 6.8.14.6

Google Talk 1.0.0.104 Beta = 1.0.0.104 Beta

Pidgin 2.5.5 = 2.5.5

Skype 4.0.0.206 = 4.0.0.206

Thunderbird 3.0 Beta 2 = 3.0 Beta 2

Trillian 3.1.12.0 = 3.1.12.0

Windows Live Messenger 2009 (14.0.8064) = (14.0.8064)

Yahoo! Messenger 9.0.0.2136 = 9.0.0.2136

CuteFTP 8.3.2 Home = 8.3.2 Home

FileZilla 3.2.2.1 = 3.2.2.1

FlashGet 1.9.6.1073 = 1.9.6.1073

GMail Drive 1.0.13 = 1.0.13

Adobe Reader 9.0 = 9.0

Foxit Reader 3.0.1301 = 3.0.1301

OpenOffice.org 3.0.1 Final = 3.0.1 Final

Notepad++ 5.2 = 5.2

VMware Player 2.5.1 = 2.5.1

Ad-Aware 2009 8.0.0.0 = 8.0.0.0

CWShredder 2.19 = 2.19

HijackThis 2.0.2 = 2.0.2

Rootkit Revealer 1.71 = 1.71

Spybot Search & Destroy 1.6.2 = 1.6.2

SpywareBlaster 4.1 = 4.1

Windows Defender 1.1.1593 = 1.1.1593

Comodo Firewall 3.0.25.378 = 3.0.25.378

PeerGuardian 2.0 Beta 6c = 2.0 Beta 6c

Sunbelt Personal Firewall 4.6.1861 = 4.6.1861

Sygate Personal Firewall 5.6.2808 = 5.6.2808

ZoneAlarm Free 8.0.065.0 = 8.0.065.0

AntiVir Personal 8.2.00.337 = 8.2.00.337

Avast! Home Edition 4.8.1335 = 4.8.1335

AVG Free Edition 8.5.278 = 8.5.278

CCleaner 2.17.853 = 2.17.853

Recuva 1.24.399 = 1.24.399

Tweak UI 2.1 = 2.1

7-Zip 4.65 = 4.65

WinRAR 3.80 = 3.80

WinZip 12.0.8252 = 12.0.8252

3DMark Vantage 1.0.1 = 1.0.1

CPU-Z 1.50 = 1.50

Sandra Lite XII (15.72) = (15.72)

Hamachi 1.0.3.0 = 1.0.3.0

RealVNC 4.1.3 = 4.1.3

Foobar2000 0.9.6.3 = 0.9.6.3

iTunes 8.0.2.20 = 8.0.2.20

K-Lite Codec Pack 4.70 (Full) = 4.70 (Full)

MediaMonkey 3.1.0.1222 Beta = 3.1.0.1222 Beta

QuickTime Alternative 2.8.0 = 2.8.0

QuickTime Player 7.60.92.0 = 7.60.92.0

Real Alternative 1.90 = 1.90

RealPlayer 11.0.0.581 = 11.0.0.581

Songbird 1.0.0 = 1.0.0

VLC Media Player 0.9.8a = 0.9.8a

Winamp 5.55 Full = 5.55 Full

Windows Media Player 11 = October 30, 2006

DAEMON Tools Lite 4.30.3 = 4.30.3

DeepBurner 1.9.0.228 = 1.9.0.228

DVD Shrink 3.2.0.15 = 3.2.0.15

ImgBurn 2.4.2.0 = 2.4.2.0

Nero Burning Rom 9.2.6.0 = 9.2.6.0

ObjectDock 1.9 = 1.9

RocketDock 1.3.5 = 1.3.5

Samurize 1.64.3 = 1.64.3

WindowBlinds 6.4 = 6.4

Yahoo! Widget Engine 4.5.1 = 4.5.1

FastStone Image Viewer 3.7 = 3.7

IrfanView 4.23 = 4.23

Paint.NET 3.36 = 3.36

Picasa 3.1 Build 70.73 = 3.1 Build 70.73

.NET Framework Version 3.5 SP1 = 3.5 SP1

ATI Catalyst Drivers 9.2 XP = 9.2 XP

DirectX 9.0c (Nov 08) = 9.0c (Nov 08)

IntelliPoint 6.3 = 6.3

IntelliType Pro 6.3 = 6.3

Java Runtime Environment 1.6.0.12 = 1.6.0.12

NVIDIA Forceware 182.08 WHQL XP = 182.08 WHQL XP

Link to comment
Share on other sites

Yours is OK that's why I am now using your prefix. ;) I didn't test everything all over again after Stalker comment, 7-zip, 3dmark, .NET where capturing date. Before I realized I could put the space before version to work, I started testing with a-z and later on I forgot to remove it. =)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.