Microsoft Research has embarked on a new project to automatically seek out search engine spam before it can be used to defraud advertisers on MSN, Yahoo and Google. Called Strider Search Defender, the tool combines two other projects from MSR: Strider Honey Monkey and URL Tracer.
The effort is being headed up by researcher Yi-Min Wang and focuses on a major problem now plaguing the Web: blog spam. The basic premise of Strider Search Defender is that spammers utilize what Yi-Min calls "doorway pages" -- sites at reputable hosts and blog services. The doorway pages pull ads from a "target page" operated by the spammer.
Instead of reading the actual content of a page to see if it could be classified as spam, Microsoft is taking a context-based approach that analyzes URL redirection. Because many Web sites will use redirection to serve up different pages to search engines and humans, this methodology could prove more effective.
In addition, Yi-Min notes that large-scale spammers create hundreds or thousands of doorway pages the either redirect to or retrieve ads from a single domain. By finding these target pages that are connected to a large number of doorways, an entire spam operation can be stopped in a single pass.