Favicon poster project

See the interactive viewer at nmap.org.

Icon finder (enter URL):

Details of the favicon posters

These graphics show roughly one million favicon files, collected in a giant survey near the beginning of 2010, combined into informative posters. They were originally presented at my talk at FOSDEM 2010.

These are older versions of the posters. For the latest version and an interactive viewer, see https://nmap.org/favicon/. The text here has a little more information than is shown there.

Icons of the Web, scaled by number of sites

What you see is the result of a large-scale scan of web site favorites icons (“favicons”) using the Nmap security scanner and the Nmap Scripting Engine (NSE). The scan was done in December 2009.

The sites scanned were the entire contents of the Open Directory Project (dmoz.org), and the external links of the English, German, French, and Spanish Wikipedias, limited to domain names only. For each domain name, www.domain was also scanned if domain did not already begin with “www.”. Omitting duplicates, there were 8,172,390 sites.

For each of these, an NSE script downloaded the favicon, calculated its MD5 hash, and finally counted the number of domains under each has. 995,152 unique icons were retrieved; of those, 799,924 were loadable by PerlMagick and the remaining 195,228 were considered non–image files. The smallest icons are those that appeared only once; they are scaled to 16 × 16 pixels at 600 pixels per inch, or about 0.68 mm on a side. Icons that appeared twice have approximately twice the area, and so on.

Details of the scan: The script first retrieved the root document and searched for an element of the form <link rel="icon" href="...">. If such an element was present, the favicon was retrieved from the given URL. If the element did not exist, or if the URL could not be retrieved, the favicon was looked for at /favicon.ico. Up to five redirects were followed for every document retrieved. When redirects led to a different site, only the final redirected-to site was included in the count. After redirects, only responses with an HTTP status code of 200 were counted. Files larger than 10 megabytes were discarded. When multiple icons were present in a file, the image with the greatest size and color depth was used.

Programming and design were done by David Fifield. Scanning was done by Brandon Enright. More information on the scans, with data files and programs, is available at http://nmap.org/favicon/. For more on Nmap see http://nmap.org/.

Icons of the Web, scaled by Alexa reach

What you see is the result of a large-scale scan of web sites favorites icons (“favicons”) using the Nmap security scanner and the Nmap Scripting Engine (NSE).

The sites scanned were the one million domain names with the greatest “reach” according to Alexa on January 19, 2010, plus the one million names created by prepending “www.” to the former.

For each of these, an NSE script downloaded the favicon, calculated its MD5 hash, then summed the reach of the sites under each hash. 328,427 unique icons were retrieved; of those, 288,945 were loadable by PerlMagick and the remaining 39,482 were considered non–image files.

The reach was not know exactly for every site. On January 27, 2010, the reach was looked up for a sample of 178 sites, and the reach of the remaining sites was calculated by the formula reach = 66.1682 × rank0.9337. The formula comes from a linear regression of log(reach) versus log(rank) of the sampled sites. This chart shows the closeness of the fit between the estimate and sample: (see chart).

The area of each icon is proportional to the sum of the reach of all sites using that icon. When both a bare domain name and its “www.” counterpart used the same icon, only one of them was counted. The smallest icons, those corresponding to sites with approximately 0.0001% reach, are scaled to 8 × 8 pixels at 600 pixels per inch, or about 0.34 mm on a side. The larger icons are scaled proportionally, their size however being constrained to be a multiple of 8 pixels. The largest icon is 5,968 × 5,968 pixels, and the whole diagram is 18,720 × 18,720.

Details of the scan: The script first retrieved the root document and searched for an element of the form <link rel="icon" href="...">. If such an element was present, the favicon was retrieved from the given URL. If the element did not exist, or if the URL could not be retrieved, the favicon was looked for at /favicon.ico. Up to five redirects were followed for every document retireved. An icon was considered to belong to a domain name even when redirects led away from that domain name, or the icon was on a different domain. After redirects, only response with an HTTP status code of 200 were counted. When multiple icons were present in a file, the image with the greatest size and color depth is shown.

Viewer beware: The chart should not be taken as authoritative of the popularity of the sites presented, because of the inaccuracy of large, over-Internet scans. Sites were only counted if their favicon could be downloaded. Because of the unpredictable network effects, some sites, such as Bing, Baidu, and Amazon, are shown smaller than they should be.

Programming and design were done by David Fifield. Scanning was done by Brandon Enright. More information on the scans, with data files and programs, is available at http://nmap.org/favicon/. For more on Nmap see http://nmap.org/.

The posters, when printed, are bigger than you would think. On the left they are shown to scale with a man of average height. The image on the right is an extreme close-up of the printing.

Downloads

Note: The “combined” graphic is a different version than what is shown at https://nmap.org/favicon/. It scales the icons by number of sites instead of by reach (as if all sites had equal reach). It also uses a different data set (eight million domains and 800,000 graphical icons). (See preview.) The “alexa” graphic is the same as what's on the Nmap page, except that the PNG has half the resolution of the full zoom of the interactive viewer, and the PDF has different information at the bottom. alexa-1.1-good.png is half the resolution of the interactive version at nmap.org (smallest icons are 8 × 8); I did this with a small hack to favicon-place.pl from favicon-progs. See the changelog for differences between the versions.

Posters.

Graphic files. Check the file sizes before downloading. These are purposely not hyperlinked to discourage you from trying to open them in a web browser. The “good” files are image files and the “bad” file are non–image files. “empty” is a special case of “bad.”

Miscellaneous prototypes I made while brainstorming and writing the programs.

Programs

The programs that created this image are under Bazaar revision control. Do

bzr get https://www.bamsoftware.com/bzr/favicon-progs
to get a copy. There is a README file with some hints on usage. If you just want the NSE script that did the survey, it is here: favicon-survey.nse.

Data files

The complete survey data, with lists of sites for each hash and all icon files, are so big that I don't want to host them here permanently. They are two squashfs images of 1.3 and 2.9 GB. Email me if you want a copy.

The processed frequency files (site counts for dmoz/Wikipedia and reach for Alexa) and the placement files that say where and how big each icon should be, are here: favicon-data-1.0.tar.bz2.

Alexa's top one million sites of January 19, 2010.

Changelog

Version 1.0, February 2010. Original release.

Version 1.1, March 2010. Reduced color depth of PNG files from 48 bits to 24 bits, as I intended in the first place. Fixed y axis tick marks in the Alexa graph. Fixed the explanatory text for Alexa to say that it summed the reach of sites using an icon, didn't just count the sites.

External links

OWASP favicon database project headed by Vlatko Košturjak, original author of http-favicon.nse. See also his page about the Web survey he did.

A similar survey, including more domains, done in 2008.


Up