Hitwise Shock: Drudge Rank Misleading
Drudge Report recently displayed a headline indicating that it ranked above many prominent news sites in “market share”. It cited a weekly report from a company called Hitwise. From what I reckon this rank might be misleading.
I wish to make an informal refutation of the implied analysis that the recent Hitwise rank of new sites indicates that the Drudge site is more popular and/or important than The New York Times, Fox News, People Magazine, Yahoo! Weather, and others. Hitwise is a web statistic tracking company owned by Experian, one of the big-three credit reporting agencies. Hitwise obtains the raw information to compile web traffic statistics directly from ISPs.
The basis for my refutation of the implied analysis is quite simple. Anyone familiar with the drudge-report might as easily point this out. Drudge’s page instructs your browser to refresh once every three minutes, thus inflating the hit counts. Drudge presumably does this to keep you “up to date” but as I will argue below, this is a very outdated method. What might motivate drudge to continue to use it? Read on.

The Drudge Report, a hand-edited collection of news headlines, is unique among websites today. It has a very anachronistic look-and-feel: it consists of a single page, the home page, with few internal links of its own. Technically speaking the site is constructed completely of plain HTML except for the advertisements (a prominent ad is always placed on the top). This format was extremely popular in the mid 1990s when the web was brand new. The original NerdWorld [screenshot], and the original Yahoo Directory [screenshot], quickly come to mind. The site could very well be maintained by someone (perhaps drudge himself) hand editing a notepad file.
That’s fine in and of itself, that’s how the web has always been intended from the start. Scientists, laypeople, etc can make basic HTML to distribute information easily to potentially anyone. Yet also from the beginning of the web there were more complex features, extensibility, for more “advanced” things. It is these “advanced” features that rapidly changed over 10 years, over and over, wandering to and fro. This includes popularity of forcing users to use java implementations, all sorts of ebmedded objects such as various competing media players, and the short-lived dream of microsoft’s OCX objects. Also, the popularity of using many proprietary features has risen and waned along with the popularity of the sponsoring browser (a battle within the browser wars). One of these were the use of META tags to do all sorts of things that are standard now done with Javascript. And also there were many competing methods to do the same thing in Javascript. I have casually observed that since around 2004 there have not been any significant “feature battles” within the (still ongoing) browser wars as the standards have been settled upon. To say it briefly, the consensus is: flash for games/video/animation/fancyshit (as opposed to slow-bloated Java or insecure OCX); and AJAX for most everything else (includes Javascript, CSS, and XML, but more broad explanation is beyond the scope here).
Drudge uses an outdated method to update information to a user’s browser screen. The proper way would involve AJAX dynamically changing the visible text. This updated text could be obtained by a now-standard AJAX process, safely in the background of a users computer. This does not require refresh and information is updated as often as the website owner likes. Even in real time: like Google’s implementation of live streaming quotes in its finance section, or its email service Gmail. But Drudge uses a brute forced refresh of the page. This exactly mimics a user pressing “refresh” button, except its done without the user’s permission. That means when you visit the drudge-report site, as you are reading through the headlines your reading experience will be disrupted multiple times by an automatic refresh. This is done once every three minutes. If you keep the screen open for a moment more than three minutes, it will register with your ISP that you have visited the page 2 times. That’s two hits, where for normal websites (including all the ones that rank below and above drudge), it would only register as one hit. The potential cumulative effect this would have on hit statistics is not hard to see. Website tracking services, internal or external, could not tell the difference between a user initiated website visit and an automatic brute-forced browser refresh merely by compiling hit counts of URLs.
It has probably served drudge well over the years to keep this old fashioned refresh method on his site. He could afford AJAX reprogramming of his site. Its standard stuff now: most very smalltime business sites implement some AJAX.
There is no indication in the Hitwise report (including their background statement or any other place on the web where I’ve done my due diligence) that suggests they take brute-force refresh on drudge’s site into account with their site popularity ratings! (Hitwise uses the term “market share” instead of popularity). If it seems ridiculous that drudge outranks The New York Times and Yahoo, that’s because it is. Yahoo! web pages are the 2nd most popular set on the internet, for god’s sake. Would advertisers relying on such ranking data be well advised to take that into consideration? Hitwise’s opacity on the matter of how/if it takes drudge’s outdated browser-refresh method into account are bound to make it less relevant among serious web stat analysis makers.
However odd the drudge report is, it has somehow managed to remain very important. If you want to keep up on what headlines drudge is currently cutting & pasting there are alternatives to using his site directly. Blipnews.com describes itself as the Drudge Report news aggregator. One might forgive blipnews for spamming the comments sections of news sites linked from drudge only because it displays the headlines in a mildly more thoughtful format than Drudge itself. Another aggregator of the aggregator is found on DrudgeReportArchives.com. Besides having daily snapshots of past Drudge Reports, it has a feature which lists current and past compilations of drudge headlines in a format that is as nice to look at and work with as an Excel spreadsheet. Perhaps out of habit I use the official Drudge site, but only with the NoScript firefox-addon to keep drudge from refreshing my browser. (not because I am attempting to make hit stats fair, but because it’s damn annoying)
External Information:
US Ranking of News Sites from Hitwise for week ending October 25, 2008 (see source below)
1. Yahoo News
2. CNN.com
3. MSNBC
4. The Weather Channel
5. Google News
6. Drudge Report
7. The New York Times
8. Fox News
9. People Magazine
10. Yahoo! Weather
External Links:
Drudge Report (C) 2009
url:http://www.drudgereport.com
Drudge Report Archives (C) 2008
url:http://www.drudgereportarchives.com/dsp/links_recap.htm
Blipnews.com
url:http://blipnews.com
Hitwise News and Media Category Weekly Report (for the week ending October 25, 2008)
http://www.drudgereport.com/hit3.pdf
(available on ReTran USA for now because the drudge site uses a strange embedding method for the PDF doc that raises security flags on many browsers)
url:http://www.retran.com/wtf/hit3.pdf
Drudge Report snapshot containing the “Hitwise Shock” headline via DrudgeReportArchives
url:http://www.drudgereportarchives.com/data/2008/10/31/20081031_153241.htm
Hitwise a subsidiary of Experian(tm)
url:http://www.hitwise.com
to see how similar Drudge is to ancient web formats:
Nerd World (1998 version via archive.org)
screenshot:http://retran.com/wtf/nerdworld97.jpg
url:http://web.archive.org/web/19971210162538/http://www.nerdworld.com/
Yahoo! (1996 version via archive.org)
screenshot:http://retran.com/wtf/yahoo96.jpg
url:http://web.archive.org/web/19961017235908/http://www2.yahoo.com/
[ps: why is drudge copyrighted for year that has not happened yet?]