AVG Destroys Web Analytics

I had a call from a client yesterday who was concerned that their web stats had taken a sharp increase over the weekend. An increase shouldn’t normally cause concern but the client is (quite rightly) very skeptical about such increases. I said I’d investigate the source of the extra traffic and hopefully put his mind at rest.

A quick look at the Awstats reports is normally enough to highlight an issue if one exists, this is usually a new spider or some kind of strange spidering activity, in this case it wasn’t. I couldn’t see anything that looked out of the ordinary, this was an increase in visits with a reasonably sensible increase in page views etc. What next, I thought. “Is the traffic from Google?” The site in question normally receives about 75% of it’s traffic from Google, a quick tot up of the figures showed this was looking more in the region of 30% for this month. OK, so it’s an increase in direct traffic, a massive increase in fact, time to delve in to the log files by hand!

I copied across the previous days log file to my laptop and dropped it in to Textpad (I’m always amazed how well it copes with large text files, nice one textpad!) I started to look through the file and it looked reasonably normal, then I spotted a block of requests, only half a dozen or so, for the same file one after another. The thing that made this particularly odd, was that the file being requested was a tracking page used within the site to record data back to the SQL server. I continued to sift through the file and noticed the same block several more times, each time however, from a different IP address, completely different, not even the same range. Could this be a DDoS, I thought, possibly, although we’ve never seen one before. I tried to look for some commonality between the blocks and noticed they all had no referrer information and all seemed to use the same (slightly strange looking) user agent (UA). The user agent in question was

Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;1813)

I googled this and spotted a Webmaster World entry entitled AVG Toolbar Glitch May Be Causing Visitor Loss, sounds interesting I thought. To be honest, it was the only link that wasn’t to somebody’s stats page! At least I’m not alone on this one, I thought.

The forum discussion on Webmaster World described exactly what I was seeing, with many webmasters seeing it. Unfortunately, this isn’t down to a rogue spider, hack attempt, DDoS, no, it’s the latest version of AVG anti-virus.

Grisoft (the people behind AVG) purchased LinkScanner back in December 2007, one of it’s features being

LinkScanner automatically analyzes results returned by Google and other search engines and places a check mark next to sites believed to be safe.

In fact, LinkScanner analyses results from search engines (not just Google) and is browser independent. This may sound like a good idea from a security point of view, however, from a webmaster/website owner point of view, this is not good at all.

If your site appears well in the search engines, as everyone strives to do, your website is or is going to be hugely affected by this. Essentially this means, that everytime your site appears in a users results, regardless of whether they click on it, your website logfiles and thefore your statistics will show that person as a real visitor coming to your site. Now, because the IP address is the users IP address, we can’t filter on that, at first look it would appear we can filter on this useragent, unfortunately I spotted another one

Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)

This one however, is even worst. This time it’s a legitimate user agent which means you can’t filter it out or rewrite it to another page on your site without the risk of blocking or harming real visitors. The first user agent is different, due to lack of a space (or plus) between the last semi-colon and the 1813, it doesn’t follow the standard pattern used by Microsoft.

So, we get to crux of the problem, AVG has destroyed web analytics for people who use a logfile analysis tool. Not only have they done this, they are also wasting our bandwidth and our disk space on servers!

Can we filter it out of our logs? Perhaps. They do seem to follow a pattern.

  • A request for the result in the SERP (often missing the trailing slash)
  • One or more requests for associated JavaScript files
  • A subsequent request for the root of the site
  • One or more further requests for associated JavaScript files

This is the pattern, it also serves as a prefetching routine which may speed up your eventual click on a result, if you do, that is.

I’m no Perl expert (.net is my bag), but I’m pretty sure a Perl guru could knock up a quick log processing script that parses your logs (IIS and Apache versions would differ, I guess) and removes this spam. It is spam at the end of the day, we didn’t ask for it and it’s wasting our resources dealing with it.

Any takers?

I’ve now disbaled the linkscanner component from my machine at home and am encouraging that friends do the same. To be honest I’m considering ditching it completely and using something else. I used to recommend AVG to everyone, I can’t do that anymore.

UPDATE: I have a possible LogParser solution, let me know if it helps.

Note: If you’re not seeing the block of requests for a single file in your logs but think you’re seeing this problem, I’ll explain why we were/are seeing that. Essentially we include a link to an ASP page as the source of a JavaScript include, it sounds a bit dodgy but it does the job. I think linkscanner is expecting a header or similar from this request which it doesn’t receive as it’s not really returning the file it thinks it is. I suspect that it’s therefore requesting the page again and again until it gives up. I intend to get rid of this tracker ASAP and implement it in a more elegant way!

add to del.icio.us :: Bookmark Post in Technorati :: Add to Blinkslist :: add to furl :: Digg it :: add to ma.gnolia :: Stumble It! :: add to simpy :: seed the vine :: :: :: TailRank :: post to facebook :: Bookmark on Google :: Add to Netscape :: Share on Yahoo :: Add this to Live

18 Responses to “AVG Destroys Web Analytics”

  1. Simon Zerafa Says:

    Hi,

    The following .CMD file will install AVG 8.0 Free without the offending Linkscanner / SafeSearch function:

    @echo off
    setlocal enableextensions enabledelayedexpansion
    echo.
    echo Installing AVG Free v8.0 …
    echo.
    echo Please wait …

    pushd %~dp0

    start /wait avg_free_stf_en_8_100a1295.exe /HIDE /NO_WELCOME /NOAVGTOOLBAR /DONT_START_APPS /REMOVE_FEATURE fea_AVG_SafeSurf /REMOVE_FEATURE fea_AVG_SafeSearch /ADD_FEATURE fea_AVG_EmailPlugins /ADD_FEATURE fea_AVG_Exchange_plugin /ADD_FEATURE fea_AVG_EMC /ADD_FEATURE fea_AVG_Office_2000_plugin /QUIT_IF_INSTALLED /LOG “C:\AVG8INST.LOG”

    popd
    echo.
    echo Installation Completed.
    echo.
    pause

    This is a silent install of AVG, so apart from the .CMD window nothing will appear and there are no prompts for the user.

    Kind Regards

    Simon Zerafa
    Simon’s PC Services

  2. osblues Says:

    Thanks Simon, that’ll come in useful 🙂

    AVG will have to sort out this feature otherwise everyone will simply not install it or disable it. I’ll mention this to AVG when they get back to me.

  3. Web analytics, AVG couldn't care less | DoesWhat.com Says:

    […] OSBlues were one of the first to notice the issue when one of their clients noticed a sharp increase in traffic, an analysis of the log files showed that the increase in traffic was mostly from Google. Eventually it was worked out that the user agent “Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;1813)” was causing the trouble. After a search on Google there was a connection with AVG, which was then worked out to be ‘LinkScanner’. […]

  4. AVG Antivirus - The Web Bandwidth Killer - Cornell Finch Says:

    […] AVG scans a site it (the site) will record that hit in the web statistics. OSBlues recounts an interesting story about an investigation for one of his users where there was a spike […]

  5. AVG scanner blasts internet with fake traffic - Computer Forums Says:

    […] security of users, it’s a real pain for website owners and webmasters," Beale tells us, having blogged about this growing problem. "It’s causing people to think their traffic is increasing, costing […]

  6. Matt Solar’s Search Blog » Blog Archive » AVG Scanner v. Web Analytics Says:

    […] And… If your site appears well in the search engines, as everyone strives to do, your website is or is going to be hugely affected by this.  Essentially this means, that everytime your site appears in a users results, regardless of whether they click on it, your website logfiles and thefore your statistics will show that person as a real visitor coming to your site.  Now, because the IP address is the users IP address, we can’t filter on that, at first look it would appear we can filter on this useragent, unfortunately I spotted another one – OSBlues.com […]

  7. IDSEO Says:

    HI

    Can you please confirm whether Google Analytics data will be affected by this problem, or is it only Server Log stats?

    Thanks

  8. osblues Says:

    IDSEO – To be honest, there are probably only two groups of people who know for sure on that, Google and AVG. When Roger Thompson’s team at AVG get back to me, I will ask them.

    I have also set up some custom filters in a test Google Analytics profile to see if I can spot anything from the UserAgent AVG are using.

    I’ll post here with updates.

  9. IDSEO Says:

    Thanks – I would appreciate that.

  10. Judah Phillips at Web Analytics Demystified » Blog Archive » AVG Link Scanner Bot Executes JavaScript!!!!!! Says:

    […] here.  Read the Register’s first article here.  And check out the dude’s blog who broke the news first and responses from AVG here and […]

  11. AVG Update: Yet More Fake Traffic With New Disguises | Infosecurity.US Says:

    […] The Register [2] OSBlues First Post and OSBlues Second Post Sphere: Related […]

  12. Unexplained Spikes in Web Traffic « Network Observations Says:

    […] of users, it’s a real pain for website owners and webmasters,” Beale tells us, having blogged about this growing problem. “It’s causing people to think their traffic is increasing, […]

  13. Zero Day mobile edition Says:

    […] of users, it’s a real pain for website owners and webmasters,” Beale tells us, having blogged about this growing problem. “It’s causing people to think their traffic is increasing, […]

  14. Vorwürfe: AVG verursacht unnötigen Internet-Traffic - News | ZDNet.de Security - Sicherheit Says:

    […] von einem Blog-Posting des britischen IT-Beraters Adam Beale sieht sich der IT-Security-Anbieter AVG derzeit mit Vorwürfen […]

  15. osblues Says:

    IDSEO, apologies for the long delay on this one, things have been pretty hectic at work.

    I can confirm that LinkScanner DOES NOT execute any javascript on pages and therefore Google Analytics is unaffected.

  16. IDSEO Says:

    Sorry – I have been on holiday and just getting back to this now. I appreciate your confirmation on this. Thanks.

  17. F900 Cameraman Says:

    I had this happen over a three day period increasing my web traffic around 1900%. They all came from unique URLs, as you mentioned during certain times of the day.

    It was bizarre and confusing.

    Thanks you for looking into this and for your blog. Not even my hosting company knew the answer.


Leave a comment