<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jörn&#039;s Blog</title>
	<atom:link href="http://joernhees.de/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://joernhees.de/blog</link>
	<description>Science, code and links.</description>
	<lastBuildDate>Thu, 29 Dec 2011 14:25:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Duolingo: Learn a language and translate the web</title>
		<link>http://joernhees.de/blog/2011/12/28/duolingo-learn-a-language-and-translate-the-web/</link>
		<comments>http://joernhees.de/blog/2011/12/28/duolingo-learn-a-language-and-translate-the-web/#comments</comments>
		<pubDate>Wed, 28 Dec 2011 14:39:33 +0000</pubDate>
		<dc:creator>joern</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[human computation]]></category>
		<category><![CDATA[ideas]]></category>
		<category><![CDATA[ted]]></category>

		<guid isPermaLink="false">http://joernhees.de/blog/?p=414</guid>
		<description><![CDATA[Another one of Luis von Ahn&#8216;s ingenious projects: http://duolingo.com learn a language for free and translate the web in the background. There is a pretty recent TED talk by him, and below you can find their introductory video on youtube:]]></description>
			<content:encoded><![CDATA[<p>Another one of <a href="http://www.cs.cmu.edu/~biglou/">Luis von Ahn</a>&#8216;s ingenious projects: <a href="http://duolingo.com">http://duolingo.com</a> learn a language for free and translate the web in the background.<br />
There is a pretty recent <a href="http://www.ted.com/talks/luis_von_ahn_massive_scale_online_collaboration.html">TED talk by him</a>, and below you can find their <a href="http://www.youtube.com/embed/WyzJ2Qq9Abs">introductory video on youtube</a>:<br />
<iframe width="640" height="360" src="http://www.youtube.com/embed/WyzJ2Qq9Abs" frameborder="0" allowfullscreen></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://joernhees.de/blog/2011/12/28/duolingo-learn-a-language-and-translate-the-web/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Interesting talk about &#8220;Filter Bubbles&#8221;</title>
		<link>http://joernhees.de/blog/2011/09/22/interesting-talk-about-filter-bubbles/</link>
		<comments>http://joernhees.de/blog/2011/09/22/interesting-talk-about-filter-bubbles/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 16:21:29 +0000</pubDate>
		<dc:creator>joern</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[filter bubbles]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[information filtering]]></category>
		<category><![CDATA[personalization]]></category>
		<category><![CDATA[presentation]]></category>
		<category><![CDATA[talk]]></category>
		<category><![CDATA[ted talk]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://joernhees.de/blog/?p=408</guid>
		<description><![CDATA[A few days ago I stumbled over an interesting TED talk by Eli Pariser about the ever increasing personalization of the web, its search results, your facebook news feed, &#8230; Do you think that you still see the whole picture &#8230; <a href="http://joernhees.de/blog/2011/09/22/interesting-talk-about-filter-bubbles/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A few days ago I stumbled over an interesting <a href="http://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles.html">TED talk by Eli Pariser</a> about the ever increasing personalization of the web, its search results, your facebook news feed, &#8230; Do you think that you still see the whole picture or are you already caught in your own filtered information bubble? (thx to <a href="https://plus.google.com/112399767740508618350/posts/DnK9wkEnE2J">Kingsley Idehen</a>)<br />
<object width="526" height="374"><param name="movie" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf"></param><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always"/><param name="wmode" value="transparent"></param><param name="bgColor" value="#ffffff"></param><param name="flashvars" value="vu=http://video.ted.com/talk/stream/2011/Blank/EliPariser_2011-320k.mp4&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/EliPariser-2011.embed_thumbnail.jpg&#038;vw=512&#038;vh=288&#038;ap=0&#038;ti=1091&#038;lang=&#038;introDuration=15330&#038;adDuration=4000&#038;postAdDuration=830&#038;adKeys=talk=eli_pariser_beware_online_filter_bubbles;year=2011;theme=what_s_next_in_tech;theme=new_on_ted_com;theme=a_taste_of_ted2011;theme=bold_predictions_stern_warnings;event=TED2011;tag=Culture;tag=Global+Issues;tag=Technology;tag=journalism;tag=politics;&#038;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" /><embed src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" pluginspace="http://www.macromedia.com/go/getflashplayer" type="application/x-shockwave-flash" wmode="transparent" bgColor="#ffffff" width="526" height="374" allowFullScreen="true" allowScriptAccess="always" flashvars="vu=http://video.ted.com/talk/stream/2011/Blank/EliPariser_2011-320k.mp4&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/EliPariser-2011.embed_thumbnail.jpg&#038;vw=512&#038;vh=288&#038;ap=0&#038;ti=1091&#038;lang=&#038;introDuration=15330&#038;adDuration=4000&#038;postAdDuration=830&#038;adKeys=talk=eli_pariser_beware_online_filter_bubbles;year=2011;theme=what_s_next_in_tech;theme=new_on_ted_com;theme=a_taste_of_ted2011;theme=bold_predictions_stern_warnings;event=TED2011;tag=Culture;tag=Global+Issues;tag=Technology;tag=journalism;tag=politics;&#038;preAdTag=tconf.ted/embed;tile=1;sz=512x288;"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://joernhees.de/blog/2011/09/22/interesting-talk-about-filter-bubbles/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mac OS X Harddisk high Load Cycle Counts</title>
		<link>http://joernhees.de/blog/2011/09/16/mac-os-x-harddisk-high-load-cycle-counts/</link>
		<comments>http://joernhees.de/blog/2011/09/16/mac-os-x-harddisk-high-load-cycle-counts/#comments</comments>
		<pubDate>Fri, 16 Sep 2011 14:37:03 +0000</pubDate>
		<dc:creator>joern</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[193]]></category>
		<category><![CDATA[apm]]></category>
		<category><![CDATA[clicking]]></category>
		<category><![CDATA[hard disk]]></category>
		<category><![CDATA[hard drive]]></category>
		<category><![CDATA[hd]]></category>
		<category><![CDATA[hdd]]></category>
		<category><![CDATA[hdparm]]></category>
		<category><![CDATA[hitachi]]></category>
		<category><![CDATA[load cycle count]]></category>
		<category><![CDATA[mac]]></category>
		<category><![CDATA[mac book pro]]></category>
		<category><![CDATA[mac os x]]></category>
		<category><![CDATA[mbp]]></category>
		<category><![CDATA[setup]]></category>
		<category><![CDATA[smart]]></category>
		<category><![CDATA[wear down]]></category>

		<guid isPermaLink="false">http://joernhees.de/blog/?p=360</guid>
		<description><![CDATA[Mac OS X's default power management settings might wear your hard drive down unnecessarily. This post provides a lot of background information and how to change these settings. <a href="http://joernhees.de/blog/2011/09/16/mac-os-x-harddisk-high-load-cycle-counts/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Short summary: Mac OS X&#8217;s default power management settings might wear your hard drive down unnecessarily. This post provides a lot of background information and how to change these settings.<span id="more-360"></span></p>
<p>I recently got a new MacBook Pro and one interesting thing i noticed was light &#8220;click&#8221; (a clicking noise) from it whenever it was idle for a few seconds. I pay attention to such things since I heard about <a title="e.g. this bugreport" href="https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695" target="_blank">problems with power management settings under Ubuntu</a>, which could quickly wear down a hard drive. I experienced this myself, where one of my old hard drives started to sound like a frog :-/. So I installed smartmontools (either use <a href="http://www.macports.org/" target="_blank">MacPorts</a> or <a href="http://www.finkproject.org/" target="_blank">fink</a>) and checked:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:640px;height:400px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">smartctl <span style="color: #660033;">-a</span> <span style="color: #000000; font-weight: bold;">/</span>dev<span style="color: #000000; font-weight: bold;">/</span>disk0<br />
smartctl <span style="color: #000000;">5.40</span> <span style="color: #000000;">2010</span>-<span style="color: #000000;">10</span>-<span style="color: #000000;">16</span> r3189 <span style="color: #7a0874; font-weight: bold;">&#91;</span>x86_64-apple-darwin10.7.3<span style="color: #7a0874; font-weight: bold;">&#93;</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">local</span> build<span style="color: #7a0874; font-weight: bold;">&#41;</span><br />
...<br />
=== START OF INFORMATION SECTION ===<br />
Device Model:     Hitachi HTS725050A9A362<br />
...<br />
User Capacity:    <span style="color: #000000;">500</span>,<span style="color: #000000;">107</span>,<span style="color: #000000;">862</span>,016 bytes<br />
...<br />
<span style="color: #666666;">ID# </span>ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE<br />
<span style="color: #000000;">1</span> Raw_Read_Error_Rate     0x000b   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   062    Pre-fail  Always       -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">2</span> Throughput_Performance  0x0005   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   040    Pre-fail  Offline      -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">3</span> Spin_Up_Time            0x0007   <span style="color: #000000;">164</span>   <span style="color: #000000;">164</span>   033    Pre-fail  Always       -       <span style="color: #000000;">2</span><br />
<span style="color: #000000;">4</span> Start_Stop_Count        0x0012   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Always       -       <span style="color: #000000;">421</span><br />
<span style="color: #000000;">5</span> Reallocated_Sector_Ct   0x0033   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   005    Pre-fail  Always       -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">7</span> Seek_Error_Rate         0x000b   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   067    Pre-fail  Always       -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">8</span> Seek_Time_Performance   0x0005   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   040    Pre-fail  Offline      -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">9</span> Power_On_Hours          0x0012   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Always       -       <span style="color: #000000;">351</span><br />
<span style="color: #000000;">10</span> Spin_Retry_Count        0x0013   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   060    Pre-fail  Always       -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">12</span> Power_Cycle_Count       0x0032   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Always       -       <span style="color: #000000;">220</span><br />
<span style="color: #000000;">160</span> Unknown_Attribute       0x0032   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Always       -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">191</span> G-Sense_Error_Rate      0x000a   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Always       -       <span style="color: #000000;">21474836480</span><br />
<span style="color: #000000;">192</span> Power-Off_Retract_Count 0x0032   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Always       -       <span style="color: #000000;">30064771073</span><br />
<span style="color: #000000;">193</span> Load_Cycle_Count        0x0012   097   097   000    Old_age   Always       -       <span style="color: #000000;">36492</span><br />
<span style="color: #000000;">194</span> Temperature_Celsius     0x0002   <span style="color: #000000;">148</span>   <span style="color: #000000;">148</span>   000    Old_age   Always       -       <span style="color: #000000;">37</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span>Min<span style="color: #000000; font-weight: bold;">/</span>Max <span style="color: #000000;">18</span><span style="color: #000000; font-weight: bold;">/</span><span style="color: #000000;">42</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><br />
<span style="color: #000000;">195</span> Hardware_ECC_Recovered  0x000a   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Always       -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">196</span> Reallocated_Event_Count 0x0032   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Always       -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">197</span> Current_Pending_Sector  0x0022   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Always       -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">198</span> Offline_Uncorrectable   0x0008   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Offline      -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">199</span> UDMA_CRC_Error_Count    0x000a   <span style="color: #000000;">200</span>   <span style="color: #000000;">200</span>   000    Old_age   Always       -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">223</span> Load_Retry_Count        0x000a   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Always       -       <span style="color: #000000;">0</span><br />
<span style="color: #000000;">254</span> Free_Fall_Sensor        0x0032   <span style="color: #000000;">100</span>   <span style="color: #000000;">100</span>   000    Old_age   Always       -       <span style="color: #000000;">66</span><br />
...</div></div>
<p>As you can see I have a Hitachi 500GB 7200rpm drive. The puzzling fact here is the Load_Cycle_Count. You can see a value of 36,492 load cycle counts in 351 hours the HD was powered on, so approx. 100 per hour.<br />
Put easily the load cycle count is how often your HD decided to park its heads. Depending on the manufacturer and HD model this can mean several things. In my case it means the number of times the HD&#8217;s heads are moved to a ramp next to the platters. The advantage of this is that being in this &#8220;parked&#8221; position the drive can shut down some energy consuming parts and it is much harder to damage the drive when the heads are parked (nothing there for a <a href="http://en.wikipedia.org/wiki/Head_crash">Head crash</a>).<br />
<div id="attachment_396" class="wp-caption aligncenter" style="width: 483px"><a href="http://www.hitachigst.com/tech/techlib.nsf/techdocs/9076679E3EE4003E86256FAB005825FB/$file/LoadUnload_white_paper_FINAL.pdf"><img src="http://joernhees.de/blog/wp-content/uploads/2011/09/RampLoadUnloadDynamics.png" alt="" title="RampLoadUnloadDynamics" width="473" height="339" class="size-full wp-image-396" /></a><p class="wp-caption-text">Ramp Load/Unload Dynamics (c) Hitachi</p></div><br />
The downside of parking the heads is that HDs are usually not designed to do this every few seconds. Typical limits range from 300,000 to 600,000 (<a href="http://en.wikipedia.org/wiki/S.M.A.R.T." target="_blank">link</a>) load cycle counts. (This doesn&#8217;t mean your HD will break if it does it more often, just that it&#8217;s more likely to fail if worn down like that.)</p>
<p>To observe the development of your Load_Cycle_Count you can use the terminal with this small one-liner:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:640px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #000000; font-weight: bold;">while</span> True ; <span style="color: #000000; font-weight: bold;">do</span> <span style="color: #007800;">s</span>=<span style="color: #ff0000;">&quot;<span style="color: #007800;">$(date)</span> <span style="color: #007800;">$(smartctl -a /dev/disk0 | grep 'Load_Cycle_Count')</span>&quot;</span> ; <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #007800;">$s</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">tee</span> <span style="color: #660033;">-a</span> hddLoadCounts.log ; <span style="color: #c20cb9; font-weight: bold;">sleep</span> <span style="color: #000000;">60</span> ; <span style="color: #000000; font-weight: bold;">done</span><br />
Di <span style="color: #000000;">17</span> Mai <span style="color: #000000;">2011</span> <span style="color: #000000;">14</span>:<span style="color: #000000;">43</span>:<span style="color: #000000;">20</span> CEST <span style="color: #000000;">193</span> Load_Cycle_Count 0x0012 097 097 000 Old_age Always - <span style="color: #000000;">36492</span><br />
...</div></div>
<p>The script will log the load cycle count to your terminal and a file called <code class="codecolorer bash default"><span class="bash">hddLoadCounts.log</span></code> in the current directory every minute.</p>
<p>You might notice that when doing nothing but browsing this count increases by 2-8 every minute. Playing music with iTunes seems to stop this, as the HD keeps busy reading your music. Doing the maths you&#8217;ll find that it&#8217;s not unlikely that your drive will have over 300,000 load cycle counts withing the first half year (lucky music listeners, yours will last much longer <img src='http://joernhees.de/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ).</p>
<p>As I had a bad feeling about this, i went on to have a look into <a href="http://www.hitachigst.com/tech/techlib.nsf/products/Travelstar_7k500" target="_blank">Hitachi&#8217;s technical specs for my HD</a>. <a href="http://www.hitachigst.com/tech/techlib.nsf/techdocs/3FDCAB792901CF4B862575D8005AB39B/$file/TS7K500_DS.pdf" target="_blank">Here</a> you can find that my HD is designed for up to 600,000 load cycles (page 2), meaning approx. 6000 hours at the 100 cycles per hour rate. In the <a href="http://www.hitachigst.com/tech/techlib.nsf/techdocs/C9CA0F26B56D6DC5862576320081F434/$file/TS7K500_OEM_Specification_R14.pdf" target="_blank">specs</a> on page 135 you can find that if Advanced Power Management is enabled, the deepest reachable power saving is depending on the Power Management level. In general the Advanced Power Management Level is between 1 (power saving) and 254 (performance). If the Level is 0 or 255 no power saving is done, if the level is 1-127 it&#8217;s &#8220;Standby&#8221;, if the level is 128-191 it&#8217;s &#8220;Low Power Idle&#8221; and if it&#8217;s 192-254 it&#8217;s &#8220;Active Idle&#8221;.</p>
<p>As we&#8217;ll find out in a second, the default value (which Mac OS X sometimes seems to reset) seems to be 128, so &#8220;Low Power Idle&#8221; mode. The three power saving levels are explained in Section 12.6 &#8220;Advanced Power Management (Adaptive Battery Life Extender 3) Feature&#8221; of the specs. In short: &#8220;Active Idle&#8221; mode cuts down power consumption by 45-55%, the heads are parked near the mid-diameter of the disk, recovering takes about 20ms. In &#8220;Low Power Idle&#8221; mode power is cut down by 60-65%, the heads are unloaded to the ramp (this is the &#8220;parked&#8221; counted by Load_Cycle_Count), recovering takes 300ms. Transition into these modes is magically done internally by the HD (it observes what&#8217;s going on and decides what to do next), taking into account the Advanced Power Management Level. (&#8220;Standby&#8221; mode isn&#8217;t mentioned here, but it sure unloads the heads to the ramp, as it spins down the HD&#8230; recovery will take long, but unimportant, as we&#8217;re having a problem with &#8220;Low Power Idle&#8221; mode.)</p>
<p>So how do we find out which Advanced Power Management (APM) Level our HDD uses?<br />
This doesn&#8217;t seem to be very easy in Mac OS X as there&#8217;s nothing like the <code class="codecolorer bash default"><span class="bash">hdparm</span></code> on Linux.<br />
There is the hdapm tool, but it can&#8217;t read the value, you can just set it. We&#8217;ll learn why this tool is necessary in a moment, but first let&#8217;s find out what the current value is.<br />
The easiest way to accomplish this was to throw in a Linux Boot CD (Knoppix, Ubuntu, whatever you like), reboot, boot from CD (hold down the &#8220;c&#8221;-key), then fire up some terminal, become root (<code class="codecolorer bash default"><span class="bash"><span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #660033;">-i</span></span></code> and check the current APM value:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:640px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">hdparm <span style="color: #660033;">-B</span> <span style="color: #000000; font-weight: bold;">/</span>dev<span style="color: #000000; font-weight: bold;">/</span>sda</div></div>
<p>For me it was 128.<br />
You can check the immediate effect from within the Live CD: you can use <code class="codecolorer bash default"><span class="bash">smartctl <span style="color: #660033;">-a</span> <span style="color: #000000; font-weight: bold;">/</span>dev<span style="color: #000000; font-weight: bold;">/</span>sda</span></code>. As before this kept increasing.</p>
<p>As I always handle my laptop with care and can live with 10 % more power consumption of my HD, I decided to change the default. WARNING: This might not be suitable for you, it&#8217;s your decision.</p>
<p>To stop this rapid growth of the load cycles, I first tried to set the value to 191, but i could still observe a rapid increase.<br />
After setting the value to 192, it immediately stopped:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:640px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">hdparm <span style="color: #660033;">-B192</span> <span style="color: #000000; font-weight: bold;">/</span>dev<span style="color: #000000; font-weight: bold;">/</span>sda</div></div>
<p>Afterwards i rebooted, the Load_Cycle_Count increased by 1 over the reboot and no more after a couple of hours runtime (without iTunes keeping my HD busy, draining my battery). To my surprise the next day my logs showed that the load cycle count was increasing rapidly again, i rebooted back into linux and found the value was reset to 128. Weird. I reset it, rebooted, the count didn&#8217;t increase anymore, but at some point i again found it increasing rapidly. Based on this I assume Mac OS X or something else (like Windows run via bootcamp) sometimes resets that value to 128. My first guess was that maybe it is reset after resuming from sleep, but I couldn&#8217;t reproduce it by this. If someone finds out let us know in the comments.</p>
<p>To overcome this problem it seems sufficient to have a tool which explicitly resets the APM level once during system startup to something meaningful. That&#8217;s where we remember the <a href="http://mckinlay.net.nz/hdapm/">hdapm</a> tool: You can download it from the given page and install it as described in the <a href="http://mckinlay.net.nz/hdapm/usage.html">user guide</a>.<br />
Afterwards edit <code class="codecolorer bash default"><span class="bash"><span style="color: #000000; font-weight: bold;">/</span>Library<span style="color: #000000; font-weight: bold;">/</span>LaunchDaemons<span style="color: #000000; font-weight: bold;">/</span>hdapm.plist</span></code> to set the correct APM value. As a reference my file looks like this:</p>
<div class="codecolorer-container xml default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:640px;height:400px;"><div class="xml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;?xml</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;1.0&quot;</span> <span style="color: #000066;">encoding</span>=<span style="color: #ff0000;">&quot;UTF-8&quot;</span><span style="color: #000000; font-weight: bold;">?&gt;</span></span><br />
<span style="color: #00bbdd;">&lt;!DOCTYPE plist PUBLIC &quot;-//Apple//DTD PLIST 1.0//EN&quot; &quot;http://www.apple.com/DTDs/PropertyList-1.0.dtd&quot;&gt;</span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;plist</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;1.0&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;dict<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>Label<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;string<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>hdapm<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/string<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>Disabled<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;false</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>ProgramArguments<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;array<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;string<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>/usr/local/bin/hdapm<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/string<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;string<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>disk0<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/string<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;string<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>192<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/string<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/array<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>ServiceDescription<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;string<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>Set ATA Advanced Power Management level<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/string<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>RunAtLoad<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;true</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>LaunchOnlyOnce<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/key<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;true</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/dict<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/plist<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></div></div>
<p>Notice that for other drives, especially other manufacturers the 192 might not be the right value. If you found the correct values for other drives, be welcome to share them (preferably with links to tech specs) in the comments.</p>
<p>Now, after four months since discovering the problem went by, my load cycle count only increased by about 500 (which is about the number of times i sent the mac to standby). Isn&#8217;t that a figure compared to the 36500 within the first month? <img src='http://joernhees.de/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Edit (Sep. 22, 2011): Revised my guess about sleep causing the reset. Thx to Sam.</p>
]]></content:encoded>
			<wfw:commentRss>http://joernhees.de/blog/2011/09/16/mac-os-x-harddisk-high-load-cycle-counts/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Live mapping of tweets, facebook msgs, emails, sms&#8230;</title>
		<link>http://joernhees.de/blog/2011/09/16/live-mapping-of-tweets-facebook-emails-sms/</link>
		<comments>http://joernhees.de/blog/2011/09/16/live-mapping-of-tweets-facebook-emails-sms/#comments</comments>
		<pubDate>Fri, 16 Sep 2011 12:25:56 +0000</pubDate>
		<dc:creator>joern</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analyzing]]></category>
		<category><![CDATA[disaster]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[live]]></category>
		<category><![CDATA[management]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[maps]]></category>
		<category><![CDATA[mashup]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[trends]]></category>
		<category><![CDATA[tweets]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://joernhees.de/blog/?p=386</guid>
		<description><![CDATA[Reading the Wikimedia blog I stumbled over this interesting post. They mention a framework called Ushahidi (Swahili word for &#8220;testimony&#8217;) with its subproject SwitfRiver which can be used to track and verify the reliability of news concerning current trending topics, &#8230; <a href="http://joernhees.de/blog/2011/09/16/live-mapping-of-tweets-facebook-emails-sms/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Reading the Wikimedia blog I stumbled over <a href="http://blog.wikimedia.org/2011/09/13/ushahidi-to-track-breaking-news-trends-on-wikipedia/">this</a> interesting post. They mention a framework called <a href="http://ushahidi.com">Ushahidi</a> (Swahili word for &#8220;testimony&#8217;) with its subproject SwitfRiver which can be used to track and verify the reliability of news concerning current trending topics, possibly helping editors of Wikipedia to enhance the quality.</p>
<p>Digging into I found out the framework is used for live mapping (collection, aggregation and visualization) of disaster and event related messages sent via all different kinds of transports (e.g., twitter, facebook, email, sms&#8230;). One example is the <a href="http://haiti.ushahidi.com">2010 Haiti earthquake</a>. Where it helped to coordinate all the s&#038;r teams.</p>
<p>As I find it quite fascinating how much people who sit at home in their living rooms might be able to help others in a disaster region, I&#8217;d like to suggest this talk:<br />
<iframe width="640" height="360" src="http://www.youtube.com/embed/Hh_PiVqf8BA" frameborder="0" allowfullscreen></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://joernhees.de/blog/2011/09/16/live-mapping-of-tweets-facebook-emails-sms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LaTeX Thesis Skeleton</title>
		<link>http://joernhees.de/blog/2011/03/08/latexthesis-skeleton/</link>
		<comments>http://joernhees.de/blog/2011/03/08/latexthesis-skeleton/#comments</comments>
		<pubDate>Tue, 08 Mar 2011 15:11:01 +0000</pubDate>
		<dc:creator>joern</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bachelor]]></category>
		<category><![CDATA[latex]]></category>
		<category><![CDATA[master]]></category>
		<category><![CDATA[skeleton]]></category>
		<category><![CDATA[template]]></category>
		<category><![CDATA[thesis]]></category>
		<category><![CDATA[uni-kl]]></category>

		<guid isPermaLink="false">http://joernhees.de/blog/?p=346</guid>
		<description><![CDATA[As it might be useful for other students (especially for computer science students at the University of Kaiserslautern), I decided to invest some time and create a skeleton for a thesis. The project can be found on github: http://github.com/joernhees/thesis-skeleton. I&#8217;ll &#8230; <a href="http://joernhees.de/blog/2011/03/08/latexthesis-skeleton/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As it might be useful for other students (especially for computer science students at the University of Kaiserslautern), I decided to invest some time and create a skeleton for a thesis.</p>
<p>The project can be found on github: <a href="http://github.com/joernhees/thesis-skeleton">http://github.com/joernhees/thesis-skeleton</a>.<br />
I&#8217;ll happily include / pull changes.</p>
<p>Quick instructions to get started with your thesis:</p>
<ol>
<li>Make sure you have git, otherwise install it (e.g., on ubuntu: <code class="codecolorer bash default"><span class="bash"><span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #c20cb9; font-weight: bold;">aptitude</span> <span style="color: #c20cb9; font-weight: bold;">install</span> git-core</span></code>)</li>
<li>Run this:
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:640px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #c20cb9; font-weight: bold;">git clone</span> git:<span style="color: #000000; font-weight: bold;">//</span>github.com<span style="color: #000000; font-weight: bold;">/</span>joernhees<span style="color: #000000; font-weight: bold;">/</span>thesis-skeleton.git myMasterThesis</div></div>
<p>It will create a directory called <code class="codecolorer bash default"><span class="bash">myMasterThesis</span></code> in the current directory which actually is a git repository and includes a thesis directory.</li>
<li>Enter it and have a look at thesis.pdf</li>
<li>Insert your name, title, supervisors, etc. in thesis.tex.</li>
<li><a href="http://book.git-scm.com">Get familiar with git</a>, <a href="http://book.git-scm.com/3_normal_workflow.html">this</a> is a good start.</li>
</ol>
<p>That&#8217;s it.</p>
]]></content:encoded>
			<wfw:commentRss>http://joernhees.de/blog/2011/03/08/latexthesis-skeleton/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Interesting analysis of a post&#8217;s life cycle</title>
		<link>http://joernhees.de/blog/2011/01/24/interesting-analysis-of-a-posts-life-cycle/</link>
		<comments>http://joernhees.de/blog/2011/01/24/interesting-analysis-of-a-posts-life-cycle/#comments</comments>
		<pubDate>Mon, 24 Jan 2011 16:05:34 +0000</pubDate>
		<dc:creator>joern</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[lifecycle]]></category>
		<category><![CDATA[links]]></category>
		<category><![CDATA[post]]></category>

		<guid isPermaLink="false">http://joernhees.de/blog/?p=342</guid>
		<description><![CDATA[Corte.si did it again. This time a very interesting analysis of what happens when he posts on his blog and twitters about it. Most interesting: the number of bots that access his page just seconds after he published it, where &#8230; <a href="http://joernhees.de/blog/2011/01/24/interesting-analysis-of-a-posts-life-cycle/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Corte.si did it again.</p>
<p>This time <a href="http://corte.si/posts/socialmedia/post-lifecycle/index.html" target="_blank">a very interesting analysis of what happens when he posts on his blog and twitters about it</a>.</p>
<p>Most interesting: the number of bots that access his page just seconds after he published it, where did the payload of human readers come from and what did it change in numbers of subscribers.</p>
]]></content:encoded>
			<wfw:commentRss>http://joernhees.de/blog/2011/01/24/interesting-analysis-of-a-posts-life-cycle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BetterRelations (beta): some updates</title>
		<link>http://joernhees.de/blog/2011/01/13/betterrelations-some-updates/</link>
		<comments>http://joernhees.de/blog/2011/01/13/betterrelations-some-updates/#comments</comments>
		<pubDate>Thu, 13 Jan 2011 21:16:17 +0000</pubDate>
		<dc:creator>joern</dc:creator>
				<category><![CDATA[LODgames]]></category>
		<category><![CDATA[BetterRelations]]></category>
		<category><![CDATA[linked data]]></category>
		<category><![CDATA[linked open data]]></category>

		<guid isPermaLink="false">http://joernhees.de/blog/?p=337</guid>
		<description><![CDATA[Well, in a hopefully last coding &#8220;flash&#8221; this night I included some frequently requested features, most important: a &#8220;can&#8217;t decide&#8221; button: Enjoy (also see the first post)]]></description>
			<content:encoded><![CDATA[<p>Well, in a hopefully last coding &#8220;flash&#8221; this night I included some frequently requested features, most important: a &#8220;can&#8217;t decide&#8221; button:</p>
<div id="attachment_338" class="wp-caption aligncenter" style="width: 652px"><a href="http://lodgames.kl.dfki.de/betterRelations/"><img class="size-full wp-image-338 " title="The BetterRelations Game in Action (click to play)" src="http://joernhees.de/blog/wp-content/uploads/2011/01/screenshot_betterRelations_inRoundBarack1.png" alt="" width="642" height="351" /></a><p class="wp-caption-text">The BetterRelations Game in action (click to play)</p></div>
<p>Enjoy <img src='http://joernhees.de/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>(also see <a href="http://joernhees.de/blog/2011/01/12/introducing-betterrelations/">the first post</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://joernhees.de/blog/2011/01/13/betterrelations-some-updates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introducing: BetterRelations &#8211; a Game with a Purpose</title>
		<link>http://joernhees.de/blog/2011/01/12/introducing-betterrelations/</link>
		<comments>http://joernhees.de/blog/2011/01/12/introducing-betterrelations/#comments</comments>
		<pubDate>Wed, 12 Jan 2011 11:43:38 +0000</pubDate>
		<dc:creator>joern</dc:creator>
				<category><![CDATA[LODgames]]></category>
		<category><![CDATA[beta]]></category>
		<category><![CDATA[BetterRelations]]></category>
		<category><![CDATA[gwap]]></category>
		<category><![CDATA[linked data]]></category>

		<guid isPermaLink="false">http://joernhees.de/blog/?p=324</guid>
		<description><![CDATA[As many of you know I&#8217;m developing a game called BetterRelations for my MasterThesis. It is now available: BetterRelations (alpha) The game collects pairwise user preferences, which are then used to rate Linked Data triples by &#8220;Importance&#8221;. Would be cool &#8230; <a href="http://joernhees.de/blog/2011/01/12/introducing-betterrelations/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><!-- p, li { white-space: pre-wrap; } -->As many of you know I&#8217;m developing a game called BetterRelations for my MasterThesis. It is now available:</p>
<h1><strong><a href="http://lodgames.kl.dfki.de/betterRelations/" target="_blank">BetterRelations</a></strong> (alpha)</h1>
<div id="attachment_325" class="wp-caption aligncenter" style="width: 653px"><a href="http://lodgames.kl.dfki.de/betterRelations/" target="_blank"><img class="size-full wp-image-325  " title="The BetterRelations Game in action (click to play)" src="http://joernhees.de/blog/wp-content/uploads/2011/01/screenshot_betterRelations_inRoundBarack.png" alt="" width="643" height="347" /></a><p class="wp-caption-text">The BetterRelations Game in action (click to play)</p></div>
<p><!-- p, li { white-space: pre-wrap; } -->The game collects pairwise user preferences, which are then used to rate Linked Data triples by &#8220;Importance&#8221;. Would be cool if you find time to play the game maybe in the lunch break and help me collecting the data for my thesis.</p>
<p>Feedback and bug reports are heartily welcome. If you know other interested players feel free to forward the link or this post, the more people, the better <img src='http://joernhees.de/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>More to come, keep posted.</p>
]]></content:encoded>
			<wfw:commentRss>http://joernhees.de/blog/2011/01/12/introducing-betterrelations/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Python unicode doctest howto in a doctest</title>
		<link>http://joernhees.de/blog/2010/12/15/python-unicode-doctest-howto-in-a-doctest/</link>
		<comments>http://joernhees.de/blog/2010/12/15/python-unicode-doctest-howto-in-a-doctest/#comments</comments>
		<pubDate>Wed, 15 Dec 2010 07:12:34 +0000</pubDate>
		<dc:creator>joern</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[doctest]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[raw]]></category>
		<category><![CDATA[russian strings]]></category>
		<category><![CDATA[string]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://joernhees.de/blog/?p=314</guid>
		<description><![CDATA[Another thing which has been on my stack for quite a while has been a unicode doctest howto, as I remember I was quite lost when I first tried to test encoding stuff in a doctest. So I thought the &#8230; <a href="http://joernhees.de/blog/2010/12/15/python-unicode-doctest-howto-in-a-doctest/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Another thing which has been on my stack for quite a while has been a unicode doctest howto, as I remember I was quite lost when I first tried to test encoding stuff in a doctest.<br />
So I thought the ultimate way to show how to do this would be in a doctest <img src='http://joernhees.de/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:640px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #808080; font-style: italic;"># -*- coding: utf-8 -*-</span><br />
<br />
<span style="color: #ff7700;font-weight:bold;">def</span> testDocTestUnicode<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; ur<span style="color: #483d8b;">&quot;&quot;&quot;Non ascii letters in doctests actually are tricky. The reason why<br />
&nbsp; &nbsp; &nbsp; &nbsp; things work here that usually don't (each marked with a #BAD!) is<br />
&nbsp; &nbsp; &nbsp; &nbsp; explained quite in the end of this doctest, but the essence is: we<br />
&nbsp; &nbsp; &nbsp; &nbsp; didn't only fix the encoding of this file, but also the<br />
&nbsp; &nbsp; &nbsp; &nbsp; sys.defaultencoding, which you should never do.<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; This file has a utf8 input encoding, which python is informed about by<br />
&nbsp; &nbsp; &nbsp; &nbsp; the first line: # -*- coding: utf-8 -*-. This means that for example an<br />
&nbsp; &nbsp; &nbsp; &nbsp; ä is 2 bytes: 11000011 10100100 (hexval &quot;c3a4&quot;).<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; There are two types of strings in Python 2.x: &quot;&quot; aka byte strings and<br />
&nbsp; &nbsp; &nbsp; &nbsp; u&quot;&quot; aka unicode string. For these two types two different things happen<br />
&nbsp; &nbsp; &nbsp; &nbsp; when parsing a file:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; If python encounters a non ascii char in a byte string (e.g., &quot;ä&quot;) it<br />
&nbsp; &nbsp; &nbsp; &nbsp; will check if there's an input encoding given (yes, utf8) and then check<br />
&nbsp; &nbsp; &nbsp; &nbsp; if the 2 bytes ä is a valid utf-8 encoded char (yes it is). It will then<br />
&nbsp; &nbsp; &nbsp; &nbsp; simply keep the ä as its 2 byte utf-8 encoding in this byte-string<br />
&nbsp; &nbsp; &nbsp; &nbsp; internal representation. If you print it and you're lucky to have a utf8<br />
&nbsp; &nbsp; &nbsp; &nbsp; console you'll see an ä again. If you're not lucky and for example have<br />
&nbsp; &nbsp; &nbsp; &nbsp; a iso-8859-15 encoding on your console you'll see 2 strange chars<br />
&nbsp; &nbsp; &nbsp; &nbsp; (probably Ã€) instead. So python will simply write the byte-string to<br />
&nbsp; &nbsp; &nbsp; &nbsp; output.<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print &quot;ä&quot; #BAD!<br />
&nbsp; &nbsp; &nbsp; &nbsp; ä<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; If there was no encoding given, we'd get a SyntaxError: Non-ASCII<br />
&nbsp; &nbsp; &nbsp; &nbsp; character '<span style="color: #000099; font-weight: bold;">\x</span>c3' in file ..., which is the first byte of our 2 byte ä.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Where did the '<span style="color: #000099; font-weight: bold;">\x</span>c3' come from? Well, this is python's way of writing a<br />
&nbsp; &nbsp; &nbsp; &nbsp; non ascii byte to ascii output (which is always safe, so perfect for<br />
&nbsp; &nbsp; &nbsp; &nbsp; this error message): it will write a <span style="color: #000099; font-weight: bold;">\x</span> and then two hex chars for each<br />
&nbsp; &nbsp; &nbsp; &nbsp; byte. Python does the same if we call:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print repr(&quot;ä&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; '<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4'<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; Or just<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; &quot;ä&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; '<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4'<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; It also works the other way around, so you can give an arbitrary byte by<br />
&nbsp; &nbsp; &nbsp; &nbsp; using the same <span style="color: #000099; font-weight: bold;">\x</span>XX escape sequences:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print &quot;<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4&quot; #BAD!<br />
&nbsp; &nbsp; &nbsp; &nbsp; ä<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; Oh look, we hit the utf8 representation of an ä, what a luck. You'll ask<br />
&nbsp; &nbsp; &nbsp; &nbsp; how do I then print &quot;<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4&quot; to my console? You can either double all<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;<span style="color: #000099; font-weight: bold;">\&quot;</span> or tell python it's a raw string:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print &quot;<span style="color: #000099; font-weight: bold;">\\</span>xc3<span style="color: #000099; font-weight: bold;">\\</span>xa4&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print r&quot;<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; If python encounters a unicode string in our document (e.g., u&quot;ä&quot;) it<br />
&nbsp; &nbsp; &nbsp; &nbsp; will use the specified file encoding to convert our 2 byte utf8 ä into a<br />
&nbsp; &nbsp; &nbsp; &nbsp; unicode string. This is the same as calling &quot;ä&quot;.decode(myFileEncoding):<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print u&quot;ä&quot; # BAD for another reason!<br />
&nbsp; &nbsp; &nbsp; &nbsp; ä<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; u&quot;ä&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; u'<span style="color: #000099; font-weight: bold;">\x</span>e4'<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; &quot;ä&quot;.decode(&quot;utf-8&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; u'<span style="color: #000099; font-weight: bold;">\x</span>e4'<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; Python's internal unicode representation of this string is never exposed<br />
&nbsp; &nbsp; &nbsp; &nbsp; to the user (it could be UTF-16 or 32 or anything else, anyone?).<br />
&nbsp; &nbsp; &nbsp; &nbsp; The hex e4 corresponds to 11100100, the unicode ord value of the char ä,<br />
&nbsp; &nbsp; &nbsp; &nbsp; which is decimal 228.<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; ord(u'ä')<br />
&nbsp; &nbsp; &nbsp; &nbsp; 228<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; And the same again backwards, we can use the <span style="color: #000099; font-weight: bold;">\x</span>XX escaping to denote a<br />
&nbsp; &nbsp; &nbsp; &nbsp; hex unicode point or raw not to interpret such escaping:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print u&quot;<span style="color: #000099; font-weight: bold;">\x</span>e4&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; ä<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print ur&quot;<span style="color: #000099; font-weight: bold;">\x</span>e4&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000099; font-weight: bold;">\x</span>e4<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; Oh, noticed the difference? This time print did some magic. I told<br />
&nbsp; &nbsp; &nbsp; &nbsp; you, you'll never see python's internal representation of a unicode<br />
&nbsp; &nbsp; &nbsp; &nbsp; string. So whenever print receives a unicode string it will try to<br />
&nbsp; &nbsp; &nbsp; &nbsp; convert it to your output encoding (sys.out.encoding), which works in a<br />
&nbsp; &nbsp; &nbsp; &nbsp; terminal, but won't work if you're for example redirecting output to a<br />
&nbsp; &nbsp; &nbsp; &nbsp; file. In such cases you have to convert the string into the desired<br />
&nbsp; &nbsp; &nbsp; &nbsp; encoding explicitly:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; u&quot;ä&quot;.encode(&quot;utf8&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; '<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4'<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print u&quot;ä&quot;.encode(&quot;utf8&quot;) #BAD!<br />
&nbsp; &nbsp; &nbsp; &nbsp; ä<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; If that last line confused you a bit: We converted the unicode string<br />
&nbsp; &nbsp; &nbsp; &nbsp; to a byte-string, which was then simply copied byte-wise by print and<br />
&nbsp; &nbsp; &nbsp; &nbsp; voila, we got an ä.<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; This all is done before the string even reaches doctest.<br />
&nbsp; &nbsp; &nbsp; &nbsp; So you might have written something like all the above in doctests,<br />
&nbsp; &nbsp; &nbsp; &nbsp; and probably saw them failing. In most cases you probably just <br />
&nbsp; &nbsp; &nbsp; &nbsp; forgot the ur'''prefix''', but sometimes you had it and were confused.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Well this is good, as all of the above #BAD! examples don't make much sense.<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; Bummer, right.<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; The reason is: we made assumptions on the default encoding all over the<br />
&nbsp; &nbsp; &nbsp; &nbsp; place, which is not a thing you would ever want to do in production<br />
&nbsp; &nbsp; &nbsp; &nbsp; code. We did this by setting sys.setdefaultencoding(&quot;UTF-8&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; below. Without this you'll usually get unicode warnings like this one:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;UnicodeWarning: Unicode equal comparison failed to convert both<br />
&nbsp; &nbsp; &nbsp; &nbsp; arguments to Unicode - interpreting them as being unequal&quot;.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Just fire up a python interpreter (not pydev, as I noticed it seems to<br />
&nbsp; &nbsp; &nbsp; &nbsp; fiddle with the default setting).<br />
&nbsp; &nbsp; &nbsp; &nbsp; Try: u&quot;ä&quot; == &quot;ä&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; You should get:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; arguments to Unicode - interpreting them as being unequal<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; False<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; This actually is very good, as it warns you that you're comparing some<br />
&nbsp; &nbsp; &nbsp; &nbsp; byte-string from whatever location (could be a file) to a unicode string.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Shall python guess the encoding? Silently? Probably a bad idea.<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; Now if you do the following in your python interpreter:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; import sys<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; reload(sys)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sys.setdefaultencoding(&quot;utf8&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; u&quot;ä&quot; == &quot;ä&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; You should get:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; No wonder, you explicitly told python to interpret the &quot;ä&quot; as utf8<br />
&nbsp; &nbsp; &nbsp; &nbsp; encoded when nothing else specified.<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; So what's the problem in our docstrings again? We had these bad<br />
&nbsp; &nbsp; &nbsp; &nbsp; examples:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print &quot;ä&quot; #BAD!<br />
&nbsp; &nbsp; &nbsp; &nbsp; ä<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print &quot;<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4&quot; #BAD!<br />
&nbsp; &nbsp; &nbsp; &nbsp; ä<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print u&quot;ä&quot;.encode(&quot;utf8&quot;) #BAD!<br />
&nbsp; &nbsp; &nbsp; &nbsp; ä<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; Well, we're in a ur'''docstring''' here, so what doctest does is: it<br />
&nbsp; &nbsp; &nbsp; &nbsp; takes the part after &gt;&gt;&gt; and exec(utes) it. There's one special feature<br />
&nbsp; &nbsp; &nbsp; &nbsp; of exec i wasn't aware of: if you pass a unicode string to it, it will<br />
&nbsp; &nbsp; &nbsp; &nbsp; revert the char back to utf-8:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; exec u'print repr(&quot;ä&quot;)'<br />
&nbsp; &nbsp; &nbsp; &nbsp; '<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4'<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; exec u'print repr(&quot;<span style="color: #000099; font-weight: bold;">\x</span>e4&quot;)'<br />
&nbsp; &nbsp; &nbsp; &nbsp; '<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4'<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; This means that even though one might think that print &quot;ä&quot; in this<br />
&nbsp; &nbsp; &nbsp; &nbsp; unicode docstring will get print &quot;<span style="color: #000099; font-weight: bold;">\x</span>e4&quot;, it will print as if you wrote<br />
&nbsp; &nbsp; &nbsp; &nbsp; print &quot;ä&quot; outside of a unicode string, so as if you wrote print<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4&quot;. Let this twist your mind for a second. The doctest will<br />
&nbsp; &nbsp; &nbsp; &nbsp; execute as if there had been no conversion to a unicode string, which is<br />
&nbsp; &nbsp; &nbsp; &nbsp; what you want. But now comes the comparison. It will see what comes out<br />
&nbsp; &nbsp; &nbsp; &nbsp; of that and compare to the next line from this docstring, which now is a<br />
&nbsp; &nbsp; &nbsp; &nbsp; unicode &quot;ä&quot;, so <span style="color: #000099; font-weight: bold;">\x</span>e4. Hence we're now comparing u'<span style="color: #000099; font-weight: bold;">\x</span>e4' == '<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4'.<br />
&nbsp; &nbsp; &nbsp; &nbsp; If you didn't notice, this is the same we did in the python interpreter<br />
&nbsp; &nbsp; &nbsp; &nbsp; above: we were comparing u&quot;ä&quot; == &quot;ä&quot;. And again python tells us &quot;Hmm,<br />
&nbsp; &nbsp; &nbsp; &nbsp; don't know shall I guess how to convert &quot;ä&quot; to u&quot;ä&quot;? Probably not, so<br />
&nbsp; &nbsp; &nbsp; &nbsp; evaluate to False.<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; Summary:<br />
&nbsp; &nbsp; &nbsp; &nbsp; Always specify the source encoding: # -*- coding: utf-8 -*-<br />
&nbsp; &nbsp; &nbsp; &nbsp; and _ALWAYS_, no excuse, use utf-8. Repeat it: I will never use<br />
&nbsp; &nbsp; &nbsp; &nbsp; iso-8859-x, latin-1 or anything else, I'll use UTF-8 so I can write<br />
&nbsp; &nbsp; &nbsp; &nbsp; Jörn and he can actually read his name once.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Use ur'''...''' surrounded docstrings (so a raw unicode docstring).<br />
&nbsp; &nbsp; &nbsp; &nbsp; You can also use ru'''...''', but I always think Russian strings?<br />
&nbsp; &nbsp; &nbsp; &nbsp; Never compare a unicode string with a byte string. This means: don't<br />
&nbsp; &nbsp; &nbsp; &nbsp; use u&quot;ä&quot; and &quot;ä&quot; mixed, they're not the same. Also the result line can<br />
&nbsp; &nbsp; &nbsp; &nbsp; only match unicode strings plain ascii, no other encoding.<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; The following are bad comparisons, as they will compare byte- and<br />
&nbsp; &nbsp; &nbsp; &nbsp; unicode strings. They'll cause warnings and eval to false:<br />
&nbsp; &nbsp; &nbsp; &nbsp; #&gt;&gt;&gt; u&quot;ä&quot; == &quot;ä&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; #False<br />
&nbsp; &nbsp; &nbsp; &nbsp; #&gt;&gt;&gt; &quot;ä&quot;.decode(&quot;utf8&quot;) == &quot;ä&quot; <br />
&nbsp; &nbsp; &nbsp; &nbsp; #False<br />
&nbsp; &nbsp; &nbsp; &nbsp; #&gt;&gt;&gt; print &quot;ä&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; #ä<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; So finally a few working examples: &nbsp;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; &quot;ä&quot; # if file encoding is utf8<br />
&nbsp; &nbsp; &nbsp; &nbsp; '<span style="color: #000099; font-weight: bold;">\x</span>c3<span style="color: #000099; font-weight: bold;">\x</span>a4'<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; u&quot;ä&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; u'<span style="color: #000099; font-weight: bold;">\x</span>e4'<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; Here both are unicode, so no problem, but nevertheless a bad idea to<br />
&nbsp; &nbsp; &nbsp; &nbsp; match output of print due to the print magic mentioned above and think<br />
&nbsp; &nbsp; &nbsp; &nbsp; about i18n: time formats, commas, dots, float precision, etc. <br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; print u&quot;ä&quot; # unicode even after exec, no prob.<br />
&nbsp; &nbsp; &nbsp; &nbsp; ä<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; Better:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; &quot;ä&quot; == &quot;ä&quot; # compares byte-strings<br />
&nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; u&quot;ä&quot;.encode(&quot;utf8&quot;) == &quot;ä&quot; # compares byte-strings<br />
&nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; u&quot;ä&quot; == u&quot;ä&quot; # compares unicode-strings<br />
&nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; &quot;ä&quot;.decode(&quot;utf8&quot;) == u&quot;ä&quot; # compares unicode-strings<br />
&nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &quot;&quot;&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">pass</span><br />
<br />
<br />
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ <span style="color: #66cc66;">==</span> <span style="color: #483d8b;">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span><br />
&nbsp; &nbsp; <span style="color: #008000;">reload</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #dc143c;">sys</span>.<span style="color: black;">setdefaultencoding</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;UTF-8&quot;</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;"># DON'T DO THIS. READ THE ABOVE @UndefinedVariable</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">doctest</span><br />
&nbsp; &nbsp; <span style="color: #dc143c;">doctest</span>.<span style="color: black;">testmod</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></div></div>
]]></content:encoded>
			<wfw:commentRss>http://joernhees.de/blog/2010/12/15/python-unicode-doctest-howto-in-a-doctest/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to restrict the length of a unicode string</title>
		<link>http://joernhees.de/blog/2010/12/14/how-to-restrict-the-length-of-a-unicode-string/</link>
		<comments>http://joernhees.de/blog/2010/12/14/how-to-restrict-the-length-of-a-unicode-string/#comments</comments>
		<pubDate>Tue, 14 Dec 2010 15:21:06 +0000</pubDate>
		<dc:creator>joern</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[unicode]]></category>
		<category><![CDATA[utf8]]></category>
		<category><![CDATA[utils]]></category>

		<guid isPermaLink="false">http://joernhees.de/blog/?p=297</guid>
		<description><![CDATA[Ha, not with me! It&#8217;s a pretty common tripwire: Imagine you have a unicode string and for whatever reason (which should be a good reason, so make sure you really need this) you need to make sure that its UTF-8 &#8230; <a href="http://joernhees.de/blog/2010/12/14/how-to-restrict-the-length-of-a-unicode-string/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Ha, not with me!<br />
It&#8217;s a pretty common tripwire: Imagine you have a unicode string and for whatever reason (which should be a good reason, so make sure you really need this) you need to make sure that its UTF-8 representation has at most maxsize bytes.<br />
The first and in this case worst attempt is probably <code class="codecolorer python default"><span class="python">unicodeStr<span style="color: black;">&#91;</span>:maxsize<span style="color: black;">&#93;</span></span></code>, as its <a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8</a> representation could be up to 6 times as long.<br />
So the next worse attempt could be this <code class="codecolorer python default"><span class="python"><span style="color: #008000;">unicode</span><span style="color: black;">&#40;</span>unicodeStr.<span style="color: black;">encode</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;utf-8&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span>:maxsize<span style="color: black;">&#93;</span><span style="color: #66cc66;">,</span> <span style="color: #483d8b;">&quot;utf-8&quot;</span><span style="color: black;">&#41;</span></span></code>: This could cut a multi-byte UTF-8 representation of a codepoint in half (example: <code class="codecolorer python default"><span class="python"><span style="color: #008000;">unicode</span><span style="color: black;">&#40;</span>u<span style="color: #483d8b;">&quot;jörn&quot;</span>.<span style="color: black;">encode</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;utf-8&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span>:<span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span><span style="color: #66cc66;">,</span> <span style="color: #483d8b;">&quot;utf-8&quot;</span><span style="color: black;">&#41;</span></span></code>). Luckily python will tell you by throwing a UnicodeDecodeError.</p>
<p>The last attempt actually wasn&#8217;t that wrong, as it only lacked the <code class="codecolorer python default"><span class="python">errors<span style="color: #66cc66;">=</span><span style="color: #483d8b;">&quot;ignore&quot;</span></span></code> flag:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:640px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #008000;">unicode</span><span style="color: black;">&#40;</span>myUnicodeStr.<span style="color: black;">encode</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;utf-8&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span>:maxsize<span style="color: black;">&#93;</span><span style="color: #66cc66;">,</span> <span style="color: #483d8b;">&quot;utf-8&quot;</span><span style="color: #66cc66;">,</span> errors<span style="color: #66cc66;">=</span><span style="color: #483d8b;">&quot;ignore&quot;</span><span style="color: black;">&#41;</span></div></div>
<p>One might think we&#8217;re done now, but this depends on your <a href="http://en.wikipedia.org/wiki/Unicode_normalization">Unicode Normalization Form</a>: Unicode allows <a href="http://en.wikipedia.org/wiki/Combining_character">Combined Characters</a>, for example the precomposed <code class="codecolorer python default"><span class="python">u<span style="color: #483d8b;">&quot;ü&quot;</span></span></code> could be represented by the decomposed sequence <code class="codecolorer python default"><span class="python">u<span style="color: #483d8b;">&quot;u&quot;</span></span></code> and <code class="codecolorer python default"><span class="python">u<span style="color: #483d8b;">&quot;¨&quot;</span></span></code> (see <a href="http://en.wikipedia.org/wiki/Unicode_normalization">Unicode Normalization</a>).<br />
In my case I know that my unicode strings are in Unicode Normalization Form C (NFC) (at least the <a href="http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal">RDF Literal Specs</a> say so. This means that if there is a precomposed char it will be used. Nevertheless Unicode potentially allows for Combined characters which do not have a precomposed canonical equivalent. In this case not even normalizing would help, multiple unicode chars would remain, leading to multiple multi-byte UTF-8 chars. In this case I&#8217;m unsure what&#8217;s the universal solution&#8230; for such a u&#8221;ü&#8221; is it better to have a u&#8221;u&#8221; or nothing in case of a split? You have to decide.<br />
I decided for having an &#8220;u&#8221; in the hopefully very rare case this occurs.<br />
So use the following with care:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:640px;height:400px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">def</span> truncateUTF8length<span style="color: black;">&#40;</span>unicodeStr<span style="color: #66cc66;">,</span> maxsize<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; ur<span style="color: #483d8b;">&quot;&quot;&quot; This method can be used to truncate the length of a given unicode<br />
&nbsp; &nbsp; &nbsp; &nbsp; string such that the corresponding utf-8 string won't exceed<br />
&nbsp; &nbsp; &nbsp; &nbsp; maxsize bytes. It will take care of multi-byte utf-8 chars intersecting<br />
&nbsp; &nbsp; &nbsp; &nbsp; with the maxsize limit: either the whole char fits or it will be<br />
&nbsp; &nbsp; &nbsp; &nbsp; truncated completely. Make sure that unicodeStr is in Unicode<br />
&nbsp; &nbsp; &nbsp; &nbsp; Normalization Form C (NFC), else strange things can happen as<br />
&nbsp; &nbsp; &nbsp; &nbsp; mentioned in the examples below.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Returns a unicode string, so if you need it encoded as utf-8, call<br />
&nbsp; &nbsp; &nbsp; &nbsp; .decode(&quot;utf-8&quot;) after calling this method.<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; truncateUTF8lengthIfNecessary(u&quot;ö&quot;, 2) == (u&quot;ö&quot;, False)<br />
&nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; truncateUTF8length(u&quot;ö&quot;, 1) == u&quot;&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; u'<span style="color: #000099; font-weight: bold;">\u</span>1ebf'.encode('utf-8') == '<span style="color: #000099; font-weight: bold;">\x</span>e1<span style="color: #000099; font-weight: bold;">\x</span>ba<span style="color: #000099; font-weight: bold;">\x</span>bf'<br />
&nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; truncateUTF8length(u'hi<span style="color: #000099; font-weight: bold;">\u</span>1ebf', 2) == u&quot;hi&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; truncateUTF8lengthIfNecessary(u'hi<span style="color: #000099; font-weight: bold;">\u</span>1ebf', 3) == (u&quot;hi&quot;, True)<br />
&nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; truncateUTF8length(u'hi<span style="color: #000099; font-weight: bold;">\u</span>1ebf', 4) == u&quot;hi&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; truncateUTF8length(u'hi<span style="color: #000099; font-weight: bold;">\u</span>1ebf', 5) == u&quot;hi<span style="color: #000099; font-weight: bold;">\u</span>1ebf&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; True<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; Make sure the unicodeStr is in NFC (see unicodedata.normalize(&quot;NFC&quot;, ...) ).<br />
&nbsp; &nbsp; &nbsp; &nbsp; The following would not be true, as e and u'<span style="color: #000099; font-weight: bold;">\u</span>0301' would be seperate<br />
&nbsp; &nbsp; &nbsp; &nbsp; unicode chars. This could be handled with unicodedata.combining<br />
&nbsp; &nbsp; &nbsp; &nbsp; and a loop deleting chars from the end until after the first non<br />
&nbsp; &nbsp; &nbsp; &nbsp; combining char, but this is _not_ done here!<br />
&nbsp; &nbsp; &nbsp; &nbsp; #&gt;&gt;&gt; u'e<span style="color: #000099; font-weight: bold;">\u</span>0301'.encode('utf-8') == 'e<span style="color: #000099; font-weight: bold;">\x</span>cc<span style="color: #000099; font-weight: bold;">\x</span>81'<br />
&nbsp; &nbsp; &nbsp; &nbsp; #True<br />
&nbsp; &nbsp; &nbsp; &nbsp; #&gt;&gt;&gt; truncateUTF8length(u'e<span style="color: #000099; font-weight: bold;">\u</span>0301', 0) == u&quot;&quot; # not in NFC (u'<span style="color: #000099; font-weight: bold;">\x</span>e9'), but in NFD<br />
&nbsp; &nbsp; &nbsp; &nbsp; #True<br />
&nbsp; &nbsp; &nbsp; &nbsp; #&gt;&gt;&gt; truncateUTF8length(u'e<span style="color: #000099; font-weight: bold;">\u</span>0301', 1) == u&quot;&quot; #decodes to utf-8: <br />
&nbsp; &nbsp; &nbsp; &nbsp; #True<br />
&nbsp; &nbsp; &nbsp; &nbsp; #&gt;&gt;&gt; truncateUTF8length(u'e<span style="color: #000099; font-weight: bold;">\u</span>0301', 2) == u&quot;&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; #True<br />
&nbsp; &nbsp; &nbsp; &nbsp; #&gt;&gt;&gt; truncateUTF8length(u'e<span style="color: #000099; font-weight: bold;">\u</span>0301', 3) == u&quot;e<span style="color: #000099; font-weight: bold;">\u</span>0301&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; #True<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;&quot;&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">unicode</span><span style="color: black;">&#40;</span>unicodeStr.<span style="color: black;">encode</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;utf-8&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span>:maxsize<span style="color: black;">&#93;</span><span style="color: #66cc66;">,</span> <span style="color: #483d8b;">&quot;utf-8&quot;</span><span style="color: #66cc66;">,</span> errors<span style="color: #66cc66;">=</span><span style="color: #483d8b;">&quot;ignore&quot;</span><span style="color: black;">&#41;</span></div></div>
<p>Unicode and UTF-8 is nice, but if you don&#8217;t pay attention it will cause your code to contain a lot of sleeping bugs. And yes, probably I&#8217;d care less if there was no &#8220;ö&#8221; in my name <img src='http://joernhees.de/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>PS: Günther, this is SFW. :p</p>
]]></content:encoded>
			<wfw:commentRss>http://joernhees.de/blog/2010/12/14/how-to-restrict-the-length-of-a-unicode-string/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

