View Full Version : A suggestion for the scan speed increase
Hello,
This is my suggestion for increasing the scan speed mechanism.
Well, there are many AV/AS out there which have millions of detection pattern. For example Avira detects more than a million malwares, kaspersky have about 14 lakh malware signature detection, etc etc etc. But they still have a cool scanning speed (Courtesy: AV-Comparatives)
As I have suggested this before too, I would be great if spybot scans for the malwares track-wise (HDD tracks), a same file (example shell32.dll) gets scanned over and over for different malware signatures.
The scan is performed with signatures as the reference point. How about scanning a file and comparing it with the list of signatures?
FYI, it's the scanning method that most of the AV/AS uses out there.
Here is document that will help you understand what I mean. :)
Please excuse me for the hand-made document. :laugh:
A picture says a thousand words.
http://img412.imageshack.us/img412/632/scan0001jb6.png (http://img412.imageshack.us/my.php?image=scan0001jb6.png)
http://img412.imageshack.us/img412/scan0001jb6.png/1/w1160.png (http://g.imageshack.us/img412/scan0001jb6.png/1/)
This one gives a better overview of the technology.
http://img242.imageshack.us/img242/2659/scan0001ic0.png
Please delete posts #2, #4, #5.
Well, it's not that simple I'm afraid ;)
You forgot the arrows going back from file 1 and three (row 2) into signature 2 (row 1) for example, see last part of Wiki: Algo-Prefix (http://wiki.spybot.info/index.php/AlgoPrefix) for example. Some patterns are partially defined by re-using previous scan results. Add to that more complexity by thinking about cross-dependencies with the registry scan.
Plus a few more things ;) Tomorrows 2.0 blog (http://forums.spybot.info/blog.php?u=1&blogcategoryid=6) post will deal with the problem of various scanner concepts.
Well, it's not that simple I'm afraid ;)
You forgot the arrows going back from file 1 and three (row 2) into signature 2 (row 1) for example, see last part of Wiki: Algo-Prefix (http://wiki.spybot.info/index.php/AlgoPrefix) for example. Some patterns are partially defined by re-using previous scan results. Add to that more complexity by thinking about cross-dependencies with the registry scan.
Plus a few more things ;) Tomorrows 2.0 blog (http://forums.spybot.info/blog.php?u=1&blogcategoryid=6) post will deal with the problem of various scanner concepts.
Are you referring to screenshot 2? Please neglect the first screenshot.
You said, "You forgot the arrows going back from file 1 and three (row 2) into signature 2 (row 1)", well that's what I actually wanted to convey it to you dear! The current mechanism scans only the most probable zone of infection.
Let me explain it to you, for example: Signature 1 contains the detection code of few infected (by trojan.fujack) system32 files, lets assume it's shell.dll, shell32.dll,etc. Now assume that the names of the files to be 1,3,etc. Hence the signature 1 will only scan the files are specifically targeted by a specific malware.
Now Signature 2 scans for infected shell32.dll file only.
Conclusion: shell32.dll file gets scanned twice for two different signatures.
If you see the row 3 and row 2, the file scanning works on a simple conditional sequence "If.....else".
Well I wrote a simple "if..else" ladder for detection of malware based on C language via MD5 detection
For information puorpose only, since I am a 1st Sem student, I am a newbie in the field of programming, so sorry in advance for errors in programming (if they exist) :rolleyes:
/*Program to detect malware based on MD5 code*/
#include<stdio.h>
#include<conio.h>
define ID NTFS stream /*subprocedure call*/
define MD5 SignatureBase /*subprocedure call*/
define IDMD5 MD5 /*subprocedure call*/
void main()
{
char MD5[i], ID[j], IDMD5[k];
int i,j,k;
for (i=0; i<=999999;i++)
{
for (j=0; i<=999999;j++)
{
for (j=0; i<=999999;j++)
{
if (MD5[i]==IDMD5[k])
{
Procedure Call "del"; /*predeclared procedure*/
}
else
{
Prodecure Call "skip" ; /*predeclared procedure*/
}
}
}
}
}
Nope, it's far more complicated than that ;)
Standard MD5 hashes can be used only on static files. An MD5 could match only a single instance of a possibly very random file. If we would detect all files by MD5, the detection database would be hundreds of MB in size, which is unacceptable and would still not cover many variants that are so morphing that you just could never collect all possible variants! If you take a look here (http://wiki.spybot.info/index.php/Category:Advanced_file_parameters), you'll notice dozens of other parameters that can be used instead of MD5 (and those are just the public ones), where sometimes a single parameter would replace thousands of MD5s.
And no file will get scanned twice, "even" in a (good) pattern based scanner. Next to the obvious method, caching of results, we use a hybrid approach that is far from the linearity that we display in the form of malware names during the scan, which simply is a simplification since the actuol progress could not be easily shown in 2D.
You also misunderstood the "arrows back" I'm afraid. Take a look at this simple example (not real syntax but a bit simplified and constructed OpenSBI syntax):
File:"test","<$WINDIR>\<regexpr>([^\.]*\.exe)","filesize=12345,md=ABCD..."
RegyKey:"test",HKLM,"\Test\","<regexpr>([a-z]*)","testvalue=<$REGMATCH1>"
File:"test","<$WINDIR>\<$REGMATCH1>","..."There'e a registry key that depends on a files name, and another file that depends on the registry keys name.
In pure filesystem/registry iteration, that would imply that the registry scanner would have to wait until the file scanner has completed scanning everything, but at the same time the file scanner has to wait until the registry scanner has completely finished - a situation that could not be solved - the scanner would simply hang forever. A good equivalent in coding would be deadlocks in threads (see also semaphores etc.). The simple iteration approach would thus have to give up any dependencies between detections, which would slow down everything even more and might make detection of a few of the worst malwares impossible.
What you describe is the old AV concept, which does not take the registry and the modern complexity of malware into account ;)
Okay.....Now I understand what you were trying to say. :oops:
Well, in which programming language is Spybot written?
spybotsandra
2008-12-08, 16:40
Hello,
It's written in delphi. :)
Best regards
Sandra
Team Spybot
The language does not really matter though ;)
Any high language that compiles native code is more or less "the same", the frameworks in use are - imho - a much bigger difference.
(edit: except for this security concept, I have to raise ObjectPascal for its string handling. Quite a lot of common security holes are based on buffer overflows, which are typical for those silly C zero-terminated strings ;) )