<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Noel Schutt &#187; Mandelbrot</title>
	<atom:link href="http://schutt.org/blog/tag/mandelbrot/feed/" rel="self" type="application/rss+xml" />
	<link>http://schutt.org/blog</link>
	<description></description>
	<lastBuildDate>Sun, 05 Feb 2012 12:14:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Zipf&#8217;s Law</title>
		<link>http://schutt.org/blog/2009/10/zipf-law/</link>
		<comments>http://schutt.org/blog/2009/10/zipf-law/#comments</comments>
		<pubDate>Sun, 04 Oct 2009 16:31:54 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mandelbrot]]></category>
		<category><![CDATA[Zipf]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=388</guid>
		<description><![CDATA[I&#8217;ve run across the interesting Zipfian distribution several times recently. Zipf&#8217;s law states that for many things, particularly words, the frequency is inversely proportional to the rank of the frequency. So, for example, the most common word is used twice as often as the second most common word, which is used twice as often as [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve run across the interesting <a href="http://mathworld.wolfram.com/ZipfDistribution.html">Zipfian distribution</a> several times recently. <a href="http://mathworld.wolfram.com/ZipfsLaw.html">Zipf&#8217;s law</a> states that for many things, particularly words, the frequency is inversely proportional to the rank of the frequency. So, for example, the most common word is used twice as often as the second most common word, which is used twice as often as the third most common word, and so on. This means that if you plot rank vs. count on a log-log plot, you will see a straight line.</p>
<p>I decided to try it on several of the most common religious texts. The key requirement was that I could download them from Project Gutenberg; one of my favorite websites. A second similarity is that they are rarely studied or understood by many of their professing followers. I also tried with several other texts, but left them off the chart for simplicity.</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2009/10/zipf21.png"><img src="http://schutt.org/blog/wp-content/uploads/2009/10/zipf21.png" alt="Zipf's Law for common religious texts." title="Zipf's Law for common religious texts." width="346" height="349" class="aligncenter size-full wp-image-491" /></a></p>
<p>This rule also works well for problems outside of linguistics, for example: city populations, website visits, et cetera. I plotted the frequency of hits to different pages on my website and for city sizes in the United States, and found fairly good fits.</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2009/10/cities2.png"><img src="http://schutt.org/blog/wp-content/uploads/2009/10/cities2.png" alt="Zipf's law for US cities" title="Zipf's law for US cities" width="345" height="349" class="aligncenter size-full wp-image-492" /></a></p>
<p>Zipf&#8217;s Law, and later <a href="http://en.wikipedia.org/wiki/Zipf–Mandelbrot_law">extensions</a>, are common and useful enough that there is an [R] package for dealing with them. The package developers, Stefan Evert and Marco Baroni, posted a good <a href="http://zipfr.r-forge.r-project.org//#counting_words">introduction</a> to Zipf&#8217;s law and its uses on the zipfR project page. Their presentation materials are also a good example of slide design.</p>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2009/10/zipf-law/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

