Monday, September 7, 2009

Hackers Outsmart Google in Search Ranking

—— Is Google unwillingly aiding hackers to virus downloading sites? ——

First, Let's ponder the following questions:

  • Do you trust the search results from the three most popular search sites - Google, Yahoo, or Microsoft Bing to be free of malicious websites?

  • Do you know what your webservers are doing while you are not watching/checking?


The first event happened more than a month ago when Amazon was acquiring Zappos - the online shoe seller. While I was searching for the terms "Nick Swinmurn Zappos" on Google and unaware of the TWO virus downloading sites displayed on the FIRST page of the search results, I actually clicked through the links and got the fake virus infection dialog. I quickly killed the browser before the virus program was downloaded. After the shocking encounter, I submitted my complaint on the Google feedback site. To Google's credit, those two virus links were removed shortly after my complaints. Thinking that it might be just a pure coincidence, I was not conducting any further investigation.


Fake_VirusInfection_Dialog
Fake_VirusInfection_Dialog



A month later, I was searching for "Kenmore Dishwasher Racks" on the Google to look for some parts. Unfortunately, the virus downloading sites were displayed ranking very high on the 4th page of Google search results as shown below. Without much attention, I again clicked through a malicious link and realized that the page was transitioning to yet another virus downloading site. Then I had to kill the browser to halt the download.

GoogleSearch_Kenmore_P4
GoogleSearch_Kenmore_P4


After the browser was closed, I was very agitated and alarmed by this incident. I decided to investigate more on: how these malicious sites were actually working and what search terms might return more high ranking virus downloading sites on the search results. I then used the “Wireshark” to capture the browser traffic and found out that all those malicious websites were re-directing with the HTTP server “302 Found” return status code. There were several re-directions to the final virus downloading sites with catchy site names such as: “remove-adware”, “virus-scanner”, and etc . Note: the hostname shown below was not the real malicious virus downloading site.

Webserver_HTTP302
Webserver_HTTP302


After finishing packet capture and understanding the search link redirection, the next task was to determine what Search Terms caused Google search results to display more virus downloading sites. The Google search results for "Kenmore Dishwasher Parts" was a really big surprise because on the 6th page FIVE different malicious sites were displayed. It seemed that most sites were "PHP"-based pages with random directory names and very few "html" pages. Apparently these redirecting sites were hacked by these virus downloading hackers without the knowledge of the legitimate site owners. These sites consisted of personal websites, professional websites, private companies, and academic institutions (e.g. .edu). One site owner was informed about the hacked pages, and the owner was very glad of my notification to clean up the site.

GoogleSearch_Kenmore_P6
GoogleSearch_Kenmore_P6


GoogleSearch_Kenmore_P7
GoogleSearch_Kenmore_P7


More search terms were tested such as "Processor", "Pokemon", and other phrases popular with teens, but no malicious site was displayed within the first ten pages. It seemed that there were plenty of websites with real content ranked high enough with these popular "keywords" to allow malicious sites to squeeze into top 100 results pages. Then some less popular but useful terms like "Toro Sprinkler Parts" were searched on Google. Wow, here was the jackpot. There was one malicious site on the first page, and three sites on the 4th page. It seemed that "the hackers really understood the inner secret of Google Search Ranking with very thoughtful keywords content". When this research was performed over a period of several days, the Google Search results came up with different virus hijacked sites but their contents were mostly similar.

GoogleSearch_ToroRank_P1
GoogleSearch_ToroRank_P1


GoogleSearch_ToroRank_P4
GoogleSearch_ToroRank_P4


These search results confirmed the fact that "the virus hackers are systematically targeting the Google search for high ranking placement of their hacked site with crafted fake contents". The cached page on Google should give some insights of the targeted "Dishwasher" content page. It seemed that the page content only had several hundreds of separate phrases all embedded with "dishwasher" keyword. At the bottom of the pages, there were some links pointing to other pages of the same hacked site. After reviewing the page content, loud questions must be raised for Google: "What kind of Google Algorithms are actually based on to give these hacker pages such a high Search Ranking? Is it purely based on the amount of the search keywords embedded in the page or the page keyword name itself?" By entering "site:hijacked-virusRedirect.com", Google would show all the hacked sites topics and pages. It seemed that the hijacked hackers sites had 150 - 300 php pages of certain keywords related pages.
Dishwasher_php_Content
Dishwasher_php_Content

While I was still working on this research blog, another amazing search struck me again. I tried to find a biology book on Google with these search keywords: "biology miller levine fifth". There was yet another surprising virus downloading site on the FIRST page of the search results: "biology-5th-edition-miller-and-levine.virus-sytz.com" (virus-sytz.com was not the real virus site) with all the keywords embedded in the subdomain name. Further studying on Google about the site revealed that there were just incredibly huge amounts of subdomains of keyword combos - 19,800 sites under this domain. It seemed that these hackers had a warehouse full of pages for some popular search keyword combos in their arsenal. These Hackers were getting bolder and more sophisticated with a dedicated webserver and DNS server to server tons of subdomains of possible virus sites. All the contents of these hackers sites were NOT CACHED on the Google - apparently "meta tag: no-cache" embedded in these pages. The content of these virus downloading sites were neither captured nor fully analyzed before the main domain name was blocked on DNS resolution somewhere. Thank the lord, at least there are some forces working behind the scene shielding us from these unsafe Google search results.

GoogleSearch_Biology_RankedP1
GoogleSearch_Biology_RankedP1


GoogleSearch_VirusSite_20K
GoogleSearch_VirusSite_20K


GoogleSearch_VirusSite_DNS_Err
GoogleSearch_VirusSite_DNS_Err


Since I was searching on Google more often than on Yahoo or Microsoft Bing, I wanted to find out if the other two search engines had the same problems that the high ranking virus downloading sites on the search results. The search terms "Kenmore Dishwasher Parts" on Yahoo or Bing did not find any similar sites as Google on the first 10 pages (top 100 sites). However, the search terms "Toro Sprinkler parts" on Yahoo did display couple of virus downloading sites on one search day, but not on the following day. Microsoft Bing seemed to have none of the virus downloading sites displayed on the top 100 sites for the few search terms conducted.

YahooSearch_ToroRank_P2
YahooSearch_ToroRank_P2


In conclusion, here is the "Open Challenge for Google" to investigate and work on: "Can Google, as the world's No. 1 Search Engine with so many technology talents in its employ, improve the search algorithms and indexing procedures to make the search results free of malicious virus sites or at least among the top 100 sites for most frequent search results?" This is definitely not an exclusive and unique challenge only for Google, however. It is smart and prudent business practice for the Yahoo and Microsoft combo to take this challenge seriously too. Certainly, there are a lot of works to be done. We can not underestimate the threats of these hackers. Let's summarize some analyses/ideas about how the battle can be won against the hackers search ranking mouse-and-cat games here:

  1. As shown above in the fake "Dishwasher_php" content page, it is very obvious that the search algorithms must be drastically improved to be smarter and more heuristic than the current ones, which seem to be based mostly on the number of keywords in the content.

  2. As shown above in the hackers’ sites, these hackers are apparently well-prepared and well-organized with 19,800 popular keyword combinations that are ready to be highly ranked in a Google search. These keyword combos are a very good starting point for major search engines technologists and security experts to tackle their search algorithms’ weakness.

  3. Fast crawling/indexing the websites may position the Google search engine as the search technology leader. But without safe search results, any fast search results will greatly diminish the trust of the search engine capability. Although Google can show off some of their fast results as indexed “10 mins ago or 3 hours ago”, the true “Hacker Free Sites” results may be the real technology leader for the unsuspecting internet search users.
    To achieve this goal, there may be a need to have an automatic screening program to browse/retrieve in an emulated approach these newly-indexed sites again to analyze the real contents from a browser view (e.g. re-directing) before the indexing results are applied/released to the general public audiences. However, hackers' webservers can certainly deploy GEO IP technology to serve fake content based on their IP blacklists.

    I had this exact experience when I researched the site re-direction. The search links gave me two different contents from two different IPs computers: one link was re-directed to

    the hacker site while the other link returned with the garbage contents.

    To achieve this goal, there may be a need to have an automatic screening program to browse/retrieve in an emulated approach these newly-indexed sites again to analyze the real contents from a browser view (e.g. re-directing) before the indexing results are applied/released to the general public audiences. However, hackers' webservers can certainly deploy GEO IP technology to serve fake content based on their IP blacklists. I had the exact experience when I researched the site re-direction. The search links gave me two different contents from two different IPs computers: one link was re-directed to the hacker site while the other link returned with the garbage contents. I also clicked once the Google "cached" link and the site was re-directed to the hacker site. It seems that Google should start to analyze their own cached contents to check if the embedded tags with any re-direction are in the pages.

  4. The verification procedure for adding a site on Google webmaster tools does not seem to be very secure. If the hackers can create a Google account and hack into an unsecured webserver, it's likely that the hacker can add the hackers' pages for Google to index. A more secure way should have domain registrar get involved for the initial setup of webmaster account on Google. Then the subsequent pages submission for Google to index can be secured through a separate channel of communication.


Finally, here are some tips and thoughts for the average internet search users to prevent the traps of virus downloading websites.

  1. Be careful of the setting of Internet Explorer. When clicking through those malicious links, the virus may be automatically downloaded and installed on the computer. During this entire research, I used Google "Chrome" which is good for me to control/terminate the virus downloading from the malicious web links. Other browser like "Firefox" should be also safer to use with some plug-in to disable "javascript".

  2. Be alert and suspicious with the search results. If the search results contain random domain and/or path names and the exact keyword page name, it may be some hacker pages. If you are trying to find some products and the high ranking sites listed from "edu" or unknown websites, that should trigger some flags too. Be careful of the newly indexed search results with tags like "n hours/mins ago". Sometimes, clicking through the "Cached" link may be safer to preview the content.

  3. If you can set up the Windows XP accounts on your home computers with as few "Administrator Privilege" accounts as possible, you will save a great deal of time cleaning up the mess from “limited accounts” instead of "administrator accounts" in case users have downloaded the virus and installed it on the computers.

This post was originally published at SecuredTao at wordpress.com. Oddly enough, that post was not getting "Google" indexed for couple of weeks after my last update. Now, let me see how fast "Google" can index its own blogger site.

Before we can get 100% safe search results, we need to be careful of our search on the Internet - it's a dangerous world out there.