Spybot 2.0: the scan method

I've read it mentioned as a request for 2.0, and it's been a controversial thing for a long time, so I thought this earns its own 2.0 blog entry.

The standard AV (antivirus) approach at scanning is filesystem based, iterating through all or selected file partitions or folders. Extend that to AS (antispyware), and you'll add a full registry iteration as well. Each file/registry entry will be compared to a set of detection rules.

Our "current" (1.x) approach is mostly pattern-based, iterating though a list of definitions and trying to find malware that way. It's actually quite a bit more complex than that, but I'll leave it there for the introduction.

Going back in time, the reason we chose the pattern based access is clear. A few dozen of threats existed somewhere around the year 2000, simple software with clear schemes to detect. A pattern based detection meant a very fast scanning time. A downside of this is that what is often called "inactive" malware does not got detected this way - if you copy malware files to a different location for example. Different locations are not a problem with active malware, since if the malware changes its behaviour where to install itself, detection patterns need an update anyway (changed files mean different hashes). Well, it soon got more a bit more sophisticated than that.

About 4 years ago, we already had a 2.0 hybrid filesystem/registry based scanner combied with a few optimization features from our pattern based scanner ready, but there we met a big disadvantage of those. If you take a look at the OpenSBI wiki, you'll notice that we are able to link various detection patterns together, e.g. using the name of a detected file to flag an associated registry entry, and vice versa. If you scan both registry and filesystem at the same time, one would all the time have to wait for "final" results from the other, creating a deadlock situation. Regular filesystem/registry iteration scanners are not capable of using result parts in other patterns because the order in which events do appear is undefined.

Another downside of filesystem scanners is relative as to when problems are fixed. We often encounter malware that uses various small stubs to re-install themselves. On a pure on-demand scan and with partial fixing appearing the moment a problem is encountered, a pattern iterating scanner is able to scan and fix problems that belong together in as shorter timespan, reducing the chance that the re-installation cycle of the malware has time to complete before it has been completetely removed.

Third, the time a scan takes is essential. Users already now complaint about a scan that takes "an hour". A filesystem/registry iterating scanner would, similar to AVs, take multiple hours, and most users probably would not accept such a timespan. The best method to counter this obviously would be realtime on-access protection, which would make regular full on-demand scans less necessary (still very much needed after an infection/when cleaning from a bootable PE CD for example). The big problem with on-access scanning are AV companies of course, which already now, while we still have a near-access scanner designed especially to not conflict take the on-access scanning as a reason to force users to uninstall our product. Switching to a system that depends on a filesystem/registry based scan iterator therefore in the current situation would mean product suicid for us really. A "nice" (for them) illegal way of improper competition by the mentioned AVs to keep competition like us down; something which we're not going to tolerate for much longer, so don't take this is any indicator of what might or might not appear in a final 2.0.

Well, this blog entry already has reached some length just discussing the theories of the two approaches, so I'll save going into how we're working at hybrid modes for another post.
 
Take a look here at some useful conversations:

http://forums.spybot.info/showthread.php?t=41003

To cut the long story short, let me say the use the technique the Avira Free edition uses. :)

Moreover, you can also create a track-wise (hdd tracks) scan instead of explorer based scan (files in the same folder gets scanned alphabetically, which in fact increase the I/O operation since they are physically located in different tracks) at least for a full system scan.
 
Avira is an antivirus application, Spybot-S&D is an anti-spyware application! Does your Avira finish in 7 minutes like Spybot does, or in some others like an Antivirus application does? Have you read a single word of the arguments above?

And no, trackwise/sectorwise scanning would not work do unless you wloud boot from another system, then fully defragment, then scan. Great, now you've got 7 hours instead of 4 hours instead of 7 minutes!
 
First of all I don't use Avira. :)

Well, I know that Avira is an AV but it has a 14 MBPS scan speed. Since Avira is not an open source software I donno how it works! But there must be something "wow wow special" in the Avira's engine, although the detection bases are highly bloated (even game tweaking utilities are marked as Trojans!) it has a cool scanning speed.

Spybot owes the reputation of world's first ever Anti-Spy Software, so I think since version 2 is round about the corner, you guys/gals should also integrate some "wow wow special" stuff in the program that makes the engine real fast.

Dear PepiMk, I have thoroughly read the arguments you posted, my intention was to post a suggestion. :sad:
 
Last edited:
You can't just optimize speed of scanning quickly.

Every engine has something good and bad. By comparing Ad-aware with Spybot I've saw next:

Spybot engine takes longer time to scan, but it detects 2x more problems and I can update new detections quickly (less in 40 min). Spybot was working and spent most of the time on engine.

Ad-aware's engine takes less time, but it mostly detects tracking cookies and at last updating takes an hour.

Avira is an antivirus software and diffrence between antivirus engine and antispyware engine is big.

Note: No offense to Ad-aware, Spybot and other users if done.
 
On first scan build integrity database and in subsequent scans scan only changed items. Of course, this assumes that the first scan is done on a clean system.
 
Last edited:
an PepiMK:

Ein Scanner, der wie z.B. bei AntiVir alle Dateien auf allen Festplatten durchsucht, ist unter anderem durch folgende zwei Faktoren begrenzt:
- CPU-Geschwindigkeit (& Arbeitsspeicher)
- Anzahl der Dateien auf der Festplatte

Ein Scanner, der wie bei Spybot, die Malware an bestimmten Orten auf der Festplatte sucht, ist nur von der CPU-Geschwindigkeit und nicht von den restlichen Dateien auf der Festplatte abhängig. Daher finde ich euren Scanner besser.

Vielleicht klingt das jetzt ein bisschen hart, aber wenn ich einen älteren Rechner habe, der für einen Durchlauf des Spybot-Scanners eine Stunde oder mehr benötigt, dann ist es vielleicht an der Zeit, den Rechner aufzurüsten oder sich einen neuen zuzulegen.
Sicherlich könnt ihr bei Spybot die Erkennungsregeln dahingehend ändern, dass sich die Scandauer etwas verkürzt, aber auf mittlere Sicht benötigt man einfach - um es einfach zu sagen - mehr CPU-Power... ;) ... sofern man es sich leisten kann.

Ich hatte früher einen 600 MHz-Rechner, der hat in der Version 1.4 mit nicht mal 100.000 Signaturen weit über 30 Minuten benötigt.
Jetzt habe ich 2 x 1,86 GHz und benötige in etwa 13 min.

Ich könnte mir gut vorstellen, dass man mit 4 x 3,00 GHz vielleicht nur 5 -7 min für einen Scan benötigt.

Es gibt einfach einen Punkt, an dem die Anzahl der Signaturen so groß ist, dass die Scandauer eines Computers über die "akzeptable" Zeit hinsausgeht... ;)

Spybot wird auch in der Version 2.x nicht die gesamten Festplatten durchsuchen. Daher könnte jemand Malware in einen neuen Ordner anlegen, ohne dass der Scanner diese bei einem Suchlauf finden würde. Wie werdet ihr das kompensieren?

Ist Spybot grundsätzlich dafür ausgelegt, auch nach versteckten Prozesen und Dateien (Rootkits) zu suchen?
 
@khagaroth: and that assumption: first scan is a clean system - is sadly not really something we should take for granted. But you were thinking very mich into the right direction: avoiding to scan "good" files is indeed an approach that gets more and more important these days (just think about the regular F/Ps some AVs have on important system files - so its a matter of safety even more than of speed) and that we have implemented - though through another avenue htan depending on previous scans ;)

@Matt: da sprichst Du allerdings einen ganz wichtigen Punkt an... egal wieviel man optimiert, die Anzahl der Bösewichter wächst mit. Den Vorteil beim Release der 1.6 haben wir ja quasi schon wieder aufgefressen, und die 2.0 kürzt nochmal ab, aber auch da sollte man sich natürlich fragen, wie lange das halten wird.

Trotzdem... die Hardware-Industrie subventioniert uns leider nicht, daher bin ich bei Aufüstempfehlungen immer vorsichtig. Klar, ein Windows 9x sollte heute keiner mehr benutzen, aber wenn Sicherheit einen High-End-Gamer-Rechner bräuchte, wär das ja auch schon ziemlich traurig.

Rootkits erkennt Spybot eigentlich sehr gut. Da haben wir ja vor einiger Zeit mal drei Plugins übers Update für nachgeschoben (in 1.6 müssten die auch so schon dringewesen sein). Unser RootAlyzer und die TotalCommander-Plugins dienen da auch als Demo :)

Was das Malware in neuem Ordner angeht: die mal ganz simplifizierte Antwort ist: entweder ist dieses Verhalten in der Malware schon festgelegt, dann müssen wir das beim Regeln schreiben erfassen und diese Ordner berücksichtigen. Oder aber es ist eine neue Version der Malware, die dann so oder so neue Regeln braucht, um erfasst zu werden.
Das berücksichtigt jetzt allerdings keine Heuristik - oft genug versuchen wir ja, auch zukünftige Versionen von Malware-Dateien mit abzudecken...

Allerdings ist es nicht so, daß Spybot 2 nicht alle Dateien scannen würde, eher daß es nicht in allen Dateien nach allem schaut - es geht uns da eher um eine Hybridlösung, die die Vorteile aus beiden Modellen verbindet.
 
@xpsunny:

Sorry man. I think that my english isn't good enough to write such a long text in English without bigger mistakes. ;)
 
Back
Top