<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="https://www.w3.org/2005/Atom">

  <title><![CDATA[...and then it crashed]]></title>
  <link href="https://blog.etianen.com/atom.xml" rel="self"/>
  <link href="https://blog.etianen.com/"/>
  <updated>2014-01-19T17:19:48+00:00</updated>
  <id>https://blog.etianen.com/</id>
  <author>
    <name><![CDATA[Dave Hall]]></name>
    
  </author>
  <generator uri="https://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Don't use Gunicorn to host your Django sites on Heroku]]></title>
    <link href="https://blog.etianen.com/blog/2014/01/19/gunicorn-heroku-django/"/>
    <updated>2014-01-19T17:30:00+00:00</updated>
    <id>https://blog.etianen.com/blog/2014/01/19/gunicorn-heroku-django</id>
    <content type="html"><![CDATA[<p><a href="https://gunicorn.org/">Gunicorn</a> is a pure-Python HTTP server that&rsquo;s widely used for deploying Django (and other Python) sites in production. <a href="https://www.heroku.com/">Heroku</a> is an excellent Platform As A Service (PAAS) provider that will host any Python HTTP application, and recommends using Gunicorn to power your apps.</p>

<p><em>Unfortunately, the process model of Gunicorn makes it unsuitable for running production Python sites on Heroku.</em></p>

<h2>Gunicorn is designed to be used behind a buffering reverse proxy</h2>

<p>Gunicorn uses a pre-forking process model by default. This means that network requests are handed off to a pool of worker processes, and that these worker processes take care of reading and writing the entire HTTP request to the client. If the client has a fast network connection, the entire request/response cycle takes a fraction of a second. However, if the client is slow (or <a href="https://ha.ckers.org/slowloris/">deliberately misbehaving</a>), the request can take much longer to complete.</p>

<p>Because Gunicorn has a relatively small (2x CPU cores) pool of workers, if can only handle a small number of concurrent requests. If all the worker processes become tied up waiting for network traffic, the entire server will become unresponsive. To the outside world, your web application will cease to exist.</p>

<p>For this reason, Guncorn <a href="https://docs.gunicorn.org/en/latest/deploy.html">strongly recommends</a> that it is used behind a buffering reverse proxy, like <a href="https://wiki.nginx.org/Main">Nginx</a>. This means that the entire request and response will be buffered, protecting Gunicorn from delays caused by a slow network.</p>

<p>However, while Heroku does provide <a href="https://devcenter.heroku.com/articles/http-routing#request-buffering">limited request/response buffering</a>, large file uploads/downloads can still bypass the buffer. This means that your site is still trivially vulnerable to accidental (or deliberate) Denial of Service (DoS) attacks.</p>

<h2>The Waitress HTTP server protects your from slow network clients</h2>

<p><a href="https://pylons.readthedocs.org/projects/waitress/en/latest/">Waitress</a> is a pure-Python HTTP server that supports <a href="https://pylons.readthedocs.org/projects/waitress/en/latest/design.html">request and response buffering</a>, using in-memory and temporary file buffers to completely shield your Python application from slow network clients.</p>

<p>Waitress can be installed in your Heroku app using <a href="https://www.pip-installer.org/en/latest/">pip</a>:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>pip install waitress
</span><span class='line'><span class="nv">$ </span>pip freeze &gt; requirements.txt
</span></code></pre></td></tr></table></div></figure>


<p>And then added to your Procfile like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>web: waitress-serve --port<span class="o">=</span><span class="nv">$PORT</span> <span class="o">{</span>project_name<span class="o">}</span>.wsgi:application
</span></code></pre></td></tr></table></div></figure>


<h2>Why not use Gunicorn async workers?</h2>

<p>The Guncicorn docs suggest using an alternative <a href="https://docs.gunicorn.org/en/latest/design.html?highlight=async#choosing-a-worker-type">async worker class</a> when serving requests directly to the internet. This avoids the problem of slow network clients by allowing thousands of asyncronous HTTP requests to be processes in parallel.</p>

<p>Unfortunately, this approach introduces a different problem. The Django ORM will open a separate database connection for each request, quickly leading to thousands of simulataneous database connections being created. On the cheaper <a href="https://www.heroku.com/postgres">Heroku Postgres</a> plans, this can easily cause requests to fail due to refused database connections.</p>

<p>By using a fixed pool of worker processes, Waitress makes it much easier to control the number of database connections being opened by Django, while still protecting you against slow network traffic.</p>

<h2>Check out django-herokuapp on GitHub</h2>

<p>For an easy quickstart, and a more in-depth guide to running Django apps on Heroku, please check out the <a href="https://github.com/etianen/django-herokuapp">django-herokuapp project</a> on GitHub.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Working with unicode streams in Python]]></title>
    <link href="https://blog.etianen.com/blog/2013/10/05/python-unicode-streams/"/>
    <updated>2013-10-05T15:00:00+01:00</updated>
    <id>https://blog.etianen.com/blog/2013/10/05/python-unicode-streams</id>
    <content type="html"><![CDATA[<p>When working with unicode in Python, the standard approach is to use the <code>str.decode()</code> and <code>unicode.encode()</code> methods to convert whole strings between the builtin <code>unicode</code> and <code>str</code> types.</p>

<p>As an example, here&rsquo;s a simple way to load the contents of a <em>utf-16</em> file, remove all <em>vertical tab</em> codepoints, and write it out as <em>utf-8</em>. (This can be important when working with broken XML.)</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
</pre></td><td class='code'><pre><code class='py'><span class='line'><span class="c"># Load the file contents.</span>
</span><span class='line'><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">&quot;input.txt&quot;</span><span class="p">,</span> <span class="s">&quot;rb&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="nb">input</span><span class="p">:</span>
</span><span class='line'>    <span class="n">data</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Decode binary data as utf-16.</span>
</span><span class='line'><span class="n">data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s">&quot;utf-16&quot;</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Remove vertical tabs.</span>
</span><span class='line'><span class="n">data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">u&quot;</span><span class="se">\u000B</span><span class="s">&quot;</span><span class="p">,</span> <span class="s">u&quot;&quot;</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Encode unicode data as utf-8.</span>
</span><span class='line'><span class="n">data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s">&quot;utf-8&quot;</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Write the data as utf-8.</span>
</span><span class='line'><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">&quot;output.txt&quot;</span><span class="p">,</span> <span class="s">&quot;wb&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">output</span><span class="p">:</span>
</span><span class='line'>    <span class="n">output</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>This approach work just fine unless you have to deal with really big files. At that point, loading all the data into RAM becomes a problem.</p>

<h2>Using a streaming encoder/decoder</h2>

<p>The Python standard library includes the <code>codecs</code> module that allow you to incrementally move through a file, loading only a small chunk of unicode data into memory at a time.</p>

<p>The simplest way is to modify the above example to use the <code>codecs.open()</code> helper.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
</pre></td><td class='code'><pre><code class='py'><span class='line'><span class="kn">import</span> <span class="nn">codecs</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Open both input and output streams.</span>
</span><span class='line'><span class="nb">input</span> <span class="o">=</span> <span class="n">codecs</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s">&quot;input.txt&quot;</span><span class="p">,</span> <span class="s">&quot;rb&quot;</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">&quot;utf-16&quot;</span><span class="p">)</span>
</span><span class='line'><span class="n">output</span> <span class="o">=</span> <span class="n">codecs</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s">&quot;output.txt&quot;</span><span class="p">,</span> <span class="s">&quot;wb&quot;</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">&quot;utf-8&quot;</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Stream chunks of unicode data.</span>
</span><span class='line'><span class="k">with</span> <span class="nb">input</span><span class="p">,</span> <span class="n">output</span><span class="p">:</span>
</span><span class='line'>    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
</span><span class='line'>        <span class="c"># Read a chunk of data.</span>
</span><span class='line'>        <span class="n">chunk</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">4096</span><span class="p">)</span>
</span><span class='line'>        <span class="k">if</span> <span class="ow">not</span> <span class="n">chunk</span><span class="p">:</span>
</span><span class='line'>            <span class="k">break</span>
</span><span class='line'>        <span class="c"># Remove vertical tabs.</span>
</span><span class='line'>        <span class="n">chunk</span> <span class="o">=</span> <span class="n">chunk</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">u&quot;</span><span class="se">\u000B</span><span class="s">&quot;</span><span class="p">,</span> <span class="s">u&quot;&quot;</span><span class="p">)</span>
</span><span class='line'>        <span class="c"># Write the chunk of data.</span>
</span><span class='line'>        <span class="n">output</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<h2>Files are horrible&hellip; let&rsquo;s use iterators!</h2>

<p>Dealing with files can get tedious. For complex processing tasks, it can be nice to just deal with iterators of unicode data.</p>

<p>Here&rsquo;s an efficient way to read an iterator of unicode chunks from a file using <code>iterdecode()</code>.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='py'><span class='line'><span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">partial</span>
</span><span class='line'><span class="kn">from</span> <span class="nn">codecs</span> <span class="kn">import</span> <span class="n">iterdecode</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Returns an iterator of unicode chunks from the given path.</span>
</span><span class='line'><span class="k">def</span> <span class="nf">iter_unicode_chunks</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">encoding</span><span class="p">):</span>
</span><span class='line'>    <span class="c"># Open the input file.</span>
</span><span class='line'>    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s">&quot;rb&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="nb">input</span><span class="p">:</span>
</span><span class='line'>        <span class="c"># Convert the binary file into binary chunks.</span>
</span><span class='line'>        <span class="n">binary_chunks</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">partial</span><span class="p">(</span><span class="nb">input</span><span class="o">.</span><span class="n">read</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="s">&quot;&quot;</span><span class="p">)</span>
</span><span class='line'>        <span class="c"># Convert the binary chunks into unicode chunks.</span>
</span><span class='line'>        <span class="k">for</span> <span class="n">unicode_chunk</span> <span class="ow">in</span> <span class="n">iterdecode</span><span class="p">(</span><span class="n">binary_chunks</span><span class="p">,</span> <span class="n">encoding</span><span class="p">):</span>
</span><span class='line'>            <span class="k">yield</span> <span class="n">unicode_chunk</span>
</span></code></pre></td></tr></table></div></figure>


<p>Here&rsquo;s how to write an iterator of unicode chunks to a file using <code>iterencode()</code>.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='py'><span class='line'><span class="kn">from</span> <span class="nn">codecs</span> <span class="kn">import</span> <span class="n">iterencode</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Writes an iterator of unicode chunks to the given path.</span>
</span><span class='line'><span class="k">def</span> <span class="nf">write_unicode_chunks</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">unicode_chunks</span><span class="p">,</span> <span class="n">encoding</span><span class="p">):</span>
</span><span class='line'>    <span class="c"># Open the output file.</span>
</span><span class='line'>    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s">&quot;wb&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">output</span><span class="p">:</span>
</span><span class='line'>        <span class="c"># Convert the unicode chunks to binary.</span>
</span><span class='line'>        <span class="k">for</span> <span class="n">binary_chunk</span> <span class="ow">in</span> <span class="n">iterencode</span><span class="p">(</span><span class="n">unicode_chunks</span><span class="p">,</span> <span class="n">encoding</span><span class="p">):</span>
</span><span class='line'>            <span class="n">output</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">binary_chunk</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>Using these two functions, removing all <em>vertical tab</em> codepoints from a stream of unicode data just becomes a case of plumbing everything together.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='py'><span class='line'><span class="c"># Load the unicode chunks from the file.</span>
</span><span class='line'><span class="n">unicode_chunks</span> <span class="o">=</span> <span class="n">iter_unicode_chunks</span><span class="p">(</span><span class="s">&quot;input.txt&quot;</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">&quot;utf-16&quot;</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Modify the unicode chunks.</span>
</span><span class='line'><span class="n">unicode_chunks</span> <span class="o">=</span> <span class="p">(</span>
</span><span class='line'>    <span class="n">chunk</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">u&quot;</span><span class="se">\u000B</span><span class="s">&quot;</span><span class="p">,</span> <span class="s">u&quot;&quot;</span><span class="p">)</span>
</span><span class='line'>    <span class="k">for</span> <span class="n">chunk</span>
</span><span class='line'>    <span class="ow">in</span> <span class="n">unicode_chunks</span>
</span><span class='line'><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Write the chunks to a file.</span>
</span><span class='line'><span class="n">write_unicode_chunks</span><span class="p">(</span><span class="s">&quot;output.txt&quot;</span><span class="p">,</span> <span class="n">unicode_chunks</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">&quot;utf-8&quot;</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<h2>Why even bother with the <code>codecs</code> module?</h2>

<p>It might seem simpler to just read binary chunks from a regular <code>file</code> object, encoding and decoding that chunk using the standard <code>str.decode()</code> and <code>unicode.encode()</code> methods like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
</pre></td><td class='code'><pre><code class='py'><span class='line'><span class="c"># BAD IDEA! DON&#39;T DO IT THIS WAY!</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Open both input and output streams.</span>
</span><span class='line'><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">&quot;input.txt&quot;</span><span class="p">,</span> <span class="s">&quot;rb&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="nb">input</span><span class="p">,</span> <span class="nb">open</span><span class="p">(</span><span class="s">&quot;output.txt&quot;</span><span class="p">,</span> <span class="s">&quot;wb&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">output</span><span class="p">:</span>
</span><span class='line'>    <span class="c"># Iterate over chunks of binary data.</span>
</span><span class='line'>    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
</span><span class='line'>        <span class="c"># Read a chunk of data.</span>
</span><span class='line'>        <span class="n">chunk</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">4096</span><span class="p">)</span>
</span><span class='line'>        <span class="k">if</span> <span class="ow">not</span> <span class="n">chunk</span><span class="p">:</span>
</span><span class='line'>            <span class="k">break</span>
</span><span class='line'>        <span class="c"># UNSAFE: Decode binary data as utf-16.</span>
</span><span class='line'>        <span class="n">chunk</span> <span class="o">=</span> <span class="n">chunk</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s">&quot;utf-16&quot;</span><span class="p">)</span>
</span><span class='line'>        <span class="c"># Remove vertical tabs.</span>
</span><span class='line'>        <span class="n">chunk</span> <span class="o">=</span> <span class="n">chunk</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">u&quot;</span><span class="se">\u000B</span><span class="s">&quot;</span><span class="p">,</span> <span class="s">u&quot;&quot;</span><span class="p">)</span>
</span><span class='line'>        <span class="c"># Encode unicode data as utf-8.</span>
</span><span class='line'>        <span class="n">chunk</span> <span class="o">=</span> <span class="n">chunk</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s">&quot;utf-8&quot;</span><span class="p">)</span>
</span><span class='line'>        <span class="c"># Write the chunk of data.</span>
</span><span class='line'>        <span class="n">output</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>Unfortunately, some unicode codepoints are encoded as more than one byte of binary data. Simply reading a chunk of bytes from a file and passing it to <code>decode()</code> can result in an unexpected <code>UnicodeDecodeError</code> if your chunk happens to split up a multi-byte codepoint.</p>

<p>Using the tools in <code>codecs</code> will help keep you safe from unpredictable crashes in production!</p>

<h2>What about Python 3?</h2>

<p>Python 3 makes working with unicode files a lot easier. The builtin method <code>open()</code> contains all the functionality you need to easily modify unicode data and switch between encodings.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
</pre></td><td class='code'><pre><code class='py3'><span class='line'><span class="c"># Open both input and output streams.</span>
</span><span class='line'><span class="nb">input</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">&quot;input.txt&quot;</span><span class="p">,</span> <span class="s">&quot;rt&quot;</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">&quot;utf-16&quot;</span><span class="p">)</span>
</span><span class='line'><span class="n">output</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">&quot;output.txt&quot;</span><span class="p">,</span> <span class="s">&quot;wt&quot;</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">&quot;utf-8&quot;</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="c"># Stream chunks of unicode data.</span>
</span><span class='line'><span class="k">with</span> <span class="nb">input</span><span class="p">,</span> <span class="n">output</span><span class="p">:</span>
</span><span class='line'>    <span class="k">while</span> <span class="k">True</span><span class="p">:</span>
</span><span class='line'>        <span class="c"># Read a chunk of data.</span>
</span><span class='line'>        <span class="n">chunk</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">4096</span><span class="p">)</span>
</span><span class='line'>        <span class="k">if</span> <span class="ow">not</span> <span class="n">chunk</span><span class="p">:</span>
</span><span class='line'>            <span class="k">break</span>
</span><span class='line'>        <span class="c"># Remove vertical tabs.</span>
</span><span class='line'>        <span class="n">chunk</span> <span class="o">=</span> <span class="n">chunk</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\u000B</span><span class="s">&quot;</span><span class="p">,</span> <span class="s">&quot;&quot;</span><span class="p">)</span>
</span><span class='line'>        <span class="c"># Write the chunk of data.</span>
</span><span class='line'>        <span class="n">output</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>Python 3 rules! Happy coding!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Using Django querysets effectively]]></title>
    <link href="https://blog.etianen.com/blog/2013/06/08/django-querysets/"/>
    <updated>2013-06-08T17:00:00+01:00</updated>
    <id>https://blog.etianen.com/blog/2013/06/08/django-querysets</id>
    <content type="html"><![CDATA[<p>Object Relational Mapping (ORM) systems make interacting with an SQL database much easier, but have a reputation of being inefficient and slower than raw SQL.</p>

<p>Using ORM effectively means understanding a little about how it queries the database. In this post, I&rsquo;ll highlight ways of efficiently using the <a href="https://docs.djangoproject.com/en/dev/topics/db/models/">Django ORM</a> system for medium and huge datasets.</p>

<h2>Django querysets are lazy</h2>

<p>A queryset in Django represents a number of rows in the database, optionally filtered by a query. For example, the following code represents all people in the database whose first name is &lsquo;Dave&rsquo;:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">person_set</span> <span class="o">=</span> <span class="n">Person</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">first_name</span><span class="o">=</span><span class="s">&quot;Dave&quot;</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>The above code doesn&rsquo;t run any database queries. You can can take the <code>person_set</code> and apply additional filters, or pass it to a function, and nothing will be sent to the database. This is good, because querying the database is one of the things that significantly slows down web applications.</p>

<p>To fetch the data from the database, you need to iterate over the queryset:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="k">for</span> <span class="n">person</span> <span class="ow">in</span> <span class="n">person_set</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span><span class="p">(</span><span class="n">person</span><span class="o">.</span><span class="n">last_name</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<h2>Django querysets have a cache</h2>

<p>The moment you start iterating over a queryset, all the rows matched by the queryset are fetched from the database and converted into Django models. This is called <em>evaluation</em>. These models are then stored by the queryset&rsquo;s built-in cache, so that if you iterate over the queryset again, you don&rsquo;t end up running the same query twice.</p>

<p>For example, the following code will only execute one database query:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">pet_set</span> <span class="o">=</span> <span class="n">Pet</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">species</span><span class="o">=</span><span class="s">&quot;Dog&quot;</span><span class="p">)</span>
</span><span class='line'><span class="c"># The query is executed and cached.</span>
</span><span class='line'><span class="k">for</span> <span class="n">pet</span> <span class="ow">in</span> <span class="n">pet_set</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span><span class="p">(</span><span class="n">pet</span><span class="o">.</span><span class="n">first_name</span><span class="p">)</span>
</span><span class='line'><span class="c"># The cache is used for subsequent iteration.</span>
</span><span class='line'><span class="k">for</span> <span class="n">pet</span> <span class="ow">in</span> <span class="n">pet_set</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span><span class="p">(</span><span class="n">pet</span><span class="o">.</span><span class="n">last_name</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<h2><code>if</code> statements trigger queryset evaluation</h2>

<p>The most useful thing about the queryset cache is that it allows you to efficiently test if your queryset contains rows, and then only iterate over them if at least one row was found:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">restaurant_set</span> <span class="o">=</span> <span class="n">Restaurant</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">cuisine</span><span class="o">=</span><span class="s">&quot;Indian&quot;</span><span class="p">)</span>
</span><span class='line'><span class="c"># The `if` statement evaluates the queryset.</span>
</span><span class='line'><span class="k">if</span> <span class="n">restaurant_set</span><span class="p">:</span>
</span><span class='line'>    <span class="c"># The cache is used for subsequent iteration.</span>
</span><span class='line'>    <span class="k">for</span> <span class="n">restaurant</span> <span class="ow">in</span> <span class="n">restaurant_set</span><span class="p">:</span>
</span><span class='line'>        <span class="k">print</span><span class="p">(</span><span class="n">restaurant</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<h2>The queryset cache is a problem if you don&rsquo;t need all the results</h2>

<p>Sometimes, rather than iterating over results, you just want to see if at least one result exists. In that case, simply using an <code>if</code> statement on the queryset will still fully evaluate the queryset and populate it&rsquo;s cache, even if you never plan on using those results!</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">city_set</span> <span class="o">=</span> <span class="n">City</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">&quot;Cambridge&quot;</span><span class="p">)</span>
</span><span class='line'><span class="c"># The `if` statement evaluates the queryset.</span>
</span><span class='line'><span class="k">if</span> <span class="n">city_set</span><span class="p">:</span>
</span><span class='line'>    <span class="c"># We don&#39;t need the results of the queryset here, but the</span>
</span><span class='line'>    <span class="c"># ORM still fetched all the rows!</span>
</span><span class='line'>    <span class="k">print</span><span class="p">(</span><span class="s">&quot;At least one city called Cambridge still stands!&quot;</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p></p>

<p>To avoid this, use the <code>exists()</code> method to check whether at least one matching row was found:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">tree_set</span> <span class="o">=</span> <span class="n">Tree</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s">&quot;deciduous&quot;</span><span class="p">)</span>
</span><span class='line'><span class="c"># The `exists()` check avoids populating the queryset cache.</span>
</span><span class='line'><span class="k">if</span> <span class="n">tree_set</span><span class="o">.</span><span class="n">exists</span><span class="p">():</span>
</span><span class='line'>    <span class="c"># No rows were fetched from the database, so we save on</span>
</span><span class='line'>    <span class="c"># bandwidth and memory.</span>
</span><span class='line'>    <span class="k">print</span><span class="p">(</span><span class="s">&quot;There are still hardwood trees in the world!&quot;</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<h2>The queryset cache is a problem if your queryset is huge</h2>

<p>If you&rsquo;re dealing with thousands of rows of data, fetching them all into memory at once can be very wasteful. Even worse, huge querysets can lock up server processes, causing your entire web application to grind to a halt.</p>

<p>To avoid populating the queryset cache, but to still iterate over all your results, use the <code>iterator()</code> method to fetch the data in chunks, and throw away old rows when they&rsquo;ve been processed.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">star_set</span> <span class="o">=</span> <span class="n">Star</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
</span><span class='line'><span class="c"># The `iterator()` method ensures only a few rows are fetched from</span>
</span><span class='line'><span class="c"># the database at a time, saving memory.</span>
</span><span class='line'><span class="k">for</span> <span class="n">star</span> <span class="ow">in</span> <span class="n">star_set</span><span class="o">.</span><span class="n">iterator</span><span class="p">():</span>
</span><span class='line'>    <span class="k">print</span><span class="p">(</span><span class="n">star</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>Of course, using the <code>iterator()</code> method to avoid populating the queryset cache means that iterating over the same queryset again will execute another query. So use <code>iterator()</code> with caution, and make sure that your code is organised to avoid repeated evaluation of the same huge queryset.</p>

<h2><code>if</code> statements are a problem if your queryset is huge</h2>

<p>As shown previously, the queryset cache is great for combining an <code>if</code> statement with a <code>for</code> statement, allowing conditional iteration over a queryset. For huge querysets, however, populating the queryset cache is not an option.</p>

<p>The simplest solution is to combine <code>exists()</code> with <code>iterator()</code>, avoiding populating the queryset cache at the expense of running two database queries.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">molecule_set</span> <span class="o">=</span> <span class="n">Molecule</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
</span><span class='line'><span class="c"># One database query to test if any rows exist.</span>
</span><span class='line'><span class="k">if</span> <span class="n">molecule_set</span><span class="o">.</span><span class="n">exists</span><span class="p">():</span>
</span><span class='line'>    <span class="c"># Another database query to start fetching the rows in batches.</span>
</span><span class='line'>    <span class="k">for</span> <span class="n">molecule</span> <span class="ow">in</span> <span class="n">molecule_set</span><span class="o">.</span><span class="n">iterator</span><span class="p">():</span>
</span><span class='line'>        <span class="k">print</span><span class="p">(</span><span class="n">molecule</span><span class="o">.</span><span class="n">velocity</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>A more complex solution is to make use of Python&rsquo;s <a href="https://docs.python.org/2/library/itertools.html">advanced iteration methods</a> to take a peek at the first item in the <code>iterator()</code> before deciding whether to continue iteration.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">atom_set</span> <span class="o">=</span> <span class="n">Atom</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
</span><span class='line'><span class="c"># One database query to start fetching the rows in batches.</span>
</span><span class='line'><span class="n">atom_iterator</span> <span class="o">=</span> <span class="n">atom_set</span><span class="o">.</span><span class="n">iterator</span><span class="p">()</span>
</span><span class='line'><span class="c"># Peek at the first item in the iterator.</span>
</span><span class='line'><span class="k">try</span><span class="p">:</span>
</span><span class='line'>    <span class="n">first_atom</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">atom_iterator</span><span class="p">)</span>
</span><span class='line'><span class="k">except</span> <span class="ne">StopIteration</span><span class="p">:</span>
</span><span class='line'>    <span class="c"># No rows were found, so do nothing.</span>
</span><span class='line'>    <span class="k">pass</span>
</span><span class='line'><span class="k">else</span><span class="p">:</span>
</span><span class='line'>    <span class="c"># At least one row was found, so iterate over</span>
</span><span class='line'>    <span class="c"># all the rows, including the first one.</span>
</span><span class='line'>    <span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">chain</span>
</span><span class='line'>    <span class="k">for</span> <span class="n">atom</span> <span class="ow">in</span> <span class="n">chain</span><span class="p">([</span><span class="n">first_atom</span><span class="p">],</span> <span class="n">atom_iterator</span><span class="p">):</span>
</span><span class='line'>        <span class="k">print</span><span class="p">(</span><span class="n">atom</span><span class="o">.</span><span class="n">mass</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<h2>Beware of naive optimisation</h2>

<p>The queryset cache exists in order to reduce the number of database queries made by your application, and under normal usage will ensure that your database is only queried when necessary.</p>

<p>Using the <code>exists()</code> and <code>iterator()</code> methods allow you to optimize the memory usage of your application. However, because they don&rsquo;t populate the queryset cache, they can lead to extra database queries.</p>

<p>So code carefully, and if things start to slow down, take a look at the bottlenecks in your code, and see if a little queryset optimisation might help things along.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Javascript object inheritance isn't complicated]]></title>
    <link href="https://blog.etianen.com/blog/2013/05/26/javascript-inheritance/"/>
    <updated>2013-05-26T20:49:00+01:00</updated>
    <id>https://blog.etianen.com/blog/2013/05/26/javascript-inheritance</id>
    <content type="html"><![CDATA[<p>Javascript&rsquo;s method of object inheritance causes a lot of confusion, even amongst experienced programmers. This is largely because it doesn&rsquo;t follow the <a href="https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming)">classical inheritance</a> pattern found in many other popular programming languages, suce as Java, PHP, Python and Ruby.</p>

<p>Instead, Javascript uses a <a href="https://en.wikipedia.org/wiki/Prototype-based_programming">prototype inheritance</a> pattern, which is a little different. To confuse matters, many frameworks attempt to &ldquo;fix&rdquo; Javascript inheritance by making it work more like classical inheritance. The end result is a mess.</p>

<p>Thankfully, Javascript inheritance is actually pretty easy!</p>

<h2>Defining a new class</h2>

<p>Let&rsquo;s define a new class, <code>Animal</code>. Animals have a name, an age, and can make a noise.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='js'><span class='line'><span class="c1">// Define the Animal constructor.</span>
</span><span class='line'><span class="kd">function</span> <span class="nx">Animal</span><span class="p">(</span><span class="nx">name</span><span class="p">,</span> <span class="nx">age</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>    <span class="k">this</span><span class="p">.</span><span class="nx">name</span> <span class="o">=</span> <span class="nx">name</span><span class="p">;</span>
</span><span class='line'>    <span class="k">this</span><span class="p">.</span><span class="nx">age</span> <span class="o">=</span> <span class="nx">age</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="c1">// Define the makeNoise method.</span>
</span><span class='line'><span class="nx">Animal</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">makeNoise</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
</span><span class='line'>    <span class="k">return</span> <span class="s2">&quot;Snuffle&quot;</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>You can instantiate an <code>Animal</code> like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='js'><span class='line'><span class="kd">var</span> <span class="nx">animal</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Animal</span><span class="p">(</span><span class="s2">&quot;Fluffy&quot;</span><span class="p">,</span> <span class="mi">5</span><span class="p">);</span>
</span><span class='line'><span class="nx">animal</span><span class="p">.</span><span class="nx">makeNoise</span><span class="p">();</span>  <span class="c1">// =&gt; &quot;Snuffle&quot;</span>
</span></code></pre></td></tr></table></div></figure>


<h2>Inheriting from this class</h2>

<p>Let&rsquo;s now make a <code>Dog</code>, which is like an animal, but also has a breed, and makes a different noise.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
</pre></td><td class='code'><pre><code class='js'><span class='line'><span class="kd">function</span> <span class="nx">Dog</span><span class="p">(</span><span class="nx">name</span><span class="p">,</span> <span class="nx">age</span><span class="p">,</span> <span class="nx">breed</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>    <span class="c1">// Call the parent constructor.</span>
</span><span class='line'>    <span class="nx">Animal</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="nx">name</span><span class="p">,</span> <span class="nx">age</span><span class="p">);</span>
</span><span class='line'>    <span class="c1">// Add dog-specific constructor logic.</span>
</span><span class='line'>    <span class="k">this</span><span class="p">.</span><span class="nx">breed</span> <span class="o">=</span> <span class="nx">breed</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="c1">// Extend the Animal class.</span>
</span><span class='line'><span class="nx">Dog</span><span class="p">.</span><span class="nx">prototype</span> <span class="o">=</span> <span class="nb">Object</span><span class="p">.</span><span class="nx">create</span><span class="p">(</span><span class="nx">Animal</span><span class="p">.</span><span class="nx">prototype</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'><span class="c1">// Extend the makeNoise method.</span>
</span><span class='line'><span class="nx">Dog</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">makeNoise</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
</span><span class='line'>    <span class="kd">var</span> <span class="nx">parentNoise</span> <span class="o">=</span> <span class="nx">Animal</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">makeNoise</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="k">this</span><span class="p">);</span>
</span><span class='line'>    <span class="k">return</span> <span class="nx">parentNoise</span> <span class="o">+</span> <span class="s2">&quot;... Woof!&quot;</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>You can then instantiate a <code>Dog</code> like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='js'><span class='line'><span class="kd">var</span> <span class="nx">dog</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Dog</span><span class="p">(</span><span class="s2">&quot;Spot&quot;</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="s2">&quot;Golden Retriever&quot;</span><span class="p">);</span>
</span><span class='line'><span class="nx">dog</span><span class="p">.</span><span class="nx">makeNoise</span><span class="p">();</span>  <span class="c1">// =&gt; &quot;Snuffle... Woof!&quot;</span>
</span></code></pre></td></tr></table></div></figure>


<h2>What about private methods and properties?</h2>

<p>While it&rsquo;s possible to implement something similar to private methods and properties, it probably isn&rsquo;t worth the time and effort (and performance penalty) of using them. Simply prefixing non-public methods and properties with an underscore is a good way of indicating they they&rsquo;re not part of the public API.</p>

<h2>What about interfaces?</h2>

<p>Interfaces can be useful, but Javascript doesn&rsquo;t support them. In any case, they would add to the download size of your code.</p>

<h2>What about <code>prototype.constructor</code>?</h2>

<p>If your code depends on <code>prototype.constructor</code> being set, then you can use the following helper method instead of calling <code>Object.create()</code> directly.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='js'><span class='line'><span class="kd">function</span> <span class="nx">inherits</span><span class="p">(</span><span class="nx">child</span><span class="p">,</span> <span class="nx">parent</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>    <span class="nx">child</span><span class="p">.</span><span class="nx">prototype</span> <span class="o">=</span> <span class="nb">Object</span><span class="p">.</span><span class="nx">create</span><span class="p">(</span><span class="nx">parent</span><span class="p">.</span><span class="nx">prototype</span><span class="p">,</span> <span class="p">{</span>
</span><span class='line'>        <span class="nx">constructor</span><span class="o">:</span> <span class="p">{</span>
</span><span class='line'>            <span class="nx">value</span><span class="o">:</span> <span class="nx">parent</span><span class="p">,</span>
</span><span class='line'>            <span class="nx">enumerable</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span>
</span><span class='line'>            <span class="nx">writable</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span>
</span><span class='line'>            <span class="nx">configurable</span><span class="o">:</span> <span class="kc">true</span>
</span><span class='line'>        <span class="p">}</span>
</span><span class='line'>    <span class="p">});</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>In reality, <code>prototype.constructor</code> isn&rsquo;t very useful, so it&rsquo;s probably best to just call <code>Object.create()</code> directly.</p>

<h2>Using inheritance in your code</h2>

<p>Javascript code has an unfortunate habit of turning into a mess of nested callbacks and copy-and-paste logic. Defining a hierarchy of helper classes is just one of the many techniques that allow you to write modular, maintainable code.</p>

<p>For example, consider building:</p>

<ul>
<li><strong>A hierarchy of Javascript form validation classes.</strong> These could all inherit from a base <code>Validator</code> class that implements basic checks for required fields. Subclasses could provide integer validation, date validation, password length validation, etc.</li>
<li><strong>A set of models that map to server-side database objects.</strong> These could all inherit from a common <code>Model</code> class that contains the HTTP syncronization logic.</li>
<li><strong>A related set of <a href="https://docs.angularjs.org/guide/dev_guide.services.creating_services">AngularJS services</a></strong>. Data-driven services could inherit from a base <code>Service</code> class that provides logic for updating data within bound scopes.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Embedding HTML5 video is still pretty hard]]></title>
    <link href="https://blog.etianen.com/blog/2013/05/19/html5-video-is-hard/"/>
    <updated>2013-05-19T09:28:00+01:00</updated>
    <id>https://blog.etianen.com/blog/2013/05/19/html5-video-is-hard</id>
    <content type="html"><![CDATA[<p>In the early days of the HTML5 movement, I wrote the first major cross-browser compatibility shim for HTML5 <code>&lt;video&gt;</code> and <code>&lt;audio&gt;</code> tags. It was called <a href="https://html5media.info">html5media.js</a>.</p>

<p>At the time, I assumed that the shim would be obsolete within a few years, just as soon as major browsers adopted a common standard and video codec. Unfortunately, the shim is still used by hundreds of thousands of people each day, and embedding video is just as confusing as ever.</p>

<h2>So how do I embed video in my site?</h2>

<p>Please, just save yourself a headache, and host your video on <a href="https://youtube.com">YouTube</a>, <a href="https://vimeo.com">Vimeo</a>, or some other third party service. They employ some very clever people who&rsquo;ve solved all the problems with embedding video.</p>

<h2>Haha&hellip; no, really. How do I embed video in my site?</h2>

<p>Take a deep breath. In order to embed video in your site, there are four major groups of people you need to keep happy:</p>

<ol>
<li>Modern browsers using commercial codecs (Chrome, Safari, IE9+)</li>
<li>Modern browsers using open-source codecs (Firefox, Opera)</li>
<li>Legacy browsers (IE8)</li>
<li>Under-powered mobile devices (iPhone 3GS, cheap Android)</li>
</ol>


<p>For the rest of this post, I&rsquo;ll take you through the steps required to allow an increasing number of people to watch your video.</p>

<h2>Embedding video for modern browsers with commercial codecs</h2>

<p>The simplest video embed code you can possibly use is as follows:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='html'><span class='line'><span class="cp">&lt;!DOCTYPE html&gt;</span>
</span><span class='line'><span class="nt">&lt;html&gt;</span>
</span><span class='line'>    <span class="nt">&lt;body&gt;</span>
</span><span class='line'>        <span class="nt">&lt;video</span> <span class="na">src=</span><span class="s">&quot;video.mp4&quot;</span> <span class="na">width=</span><span class="s">640</span> <span class="na">height=</span><span class="s">360</span> <span class="na">controls</span><span class="nt">&gt;</span>
</span><span class='line'>    <span class="nt">&lt;/body&gt;</span>
</span><span class='line'><span class="nt">&lt;/html&gt;</span>
</span></code></pre></td></tr></table></div></figure>


<p>Congratulations! Your video will now play in:</p>

<ul>
<li>Chrome</li>
<li>Safari (inc. Mobile Safari on iPhone 4+)</li>
<li>IE9+</li>
</ul>


<h2>Adding support for legacy browsers</h2>

<p>In order to make your video work in legacy browsers, you need to add a script tag to the <code>&lt;head&gt;</code> of your document. This script, the venerable <a href="https://html5media.info">html5media.js</a>, will provide a Flash video player fallback for legacy browsers.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='html'><span class='line'><span class="cp">&lt;!DOCTYPE html&gt;</span>
</span><span class='line'><span class="nt">&lt;html&gt;</span>
</span><span class='line'>    <span class="nt">&lt;head&gt;</span>
</span><span class='line'>        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">&quot;https://api.html5media.info/1.1.5/html5media.min.js&quot;</span><span class="nt">&gt;&lt;/script&gt;</span>
</span><span class='line'>    <span class="nt">&lt;/head&gt;</span>
</span><span class='line'>    <span class="nt">&lt;body&gt;</span>
</span><span class='line'>        <span class="nt">&lt;video</span> <span class="na">src=</span><span class="s">&quot;video.mp4&quot;</span> <span class="na">width=</span><span class="s">640</span> <span class="na">height=</span><span class="s">360</span> <span class="na">controls</span><span class="nt">&gt;&lt;/video&gt;</span>
</span><span class='line'>    <span class="nt">&lt;/body&gt;</span>
</span><span class='line'><span class="nt">&lt;/html&gt;</span>
</span></code></pre></td></tr></table></div></figure>


<p><strong>Note</strong>: The syntax of the <code>&lt;video&gt;</code> tag has changed to include an explicit closing tag, to avoid confusing older browsers.</p>

<p>Fantastic! Your video will now play in:</p>

<ul>
<li>Chrome</li>
<li>Safari (inc. Mobile Safari on iPhone 4+)</li>
<li>IE9+</li>
<li>IE8 (via Flash)</li>
<li>Firefox (via Flash)</li>
<li>Opera (via Flash)</li>
</ul>


<p>At this point, the vast majority of internet users will be able to play your video. The only people who&rsquo;ll be left out will be:</p>

<ul>
<li>Firefox or Opera users without Flash</li>
<li>Owners of under-powered mobile devices.</li>
</ul>


<h2>Adding Flash-free support for modern browers with open-source codecs</h2>

<p>To allow Firefox and Opera users to view your video using their native players, you need to transcode your video into an open-source format, and embed both files in your page. I&rsquo;d recommend using the free <a href="https://www.mirovideoconverter.com/">Miro Video Encoder</a> to transcode your video to <em>WebM</em> format. You can then embed it using the following code:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='html'><span class='line'><span class="cp">&lt;!DOCTYPE html&gt;</span>
</span><span class='line'><span class="nt">&lt;html&gt;</span>
</span><span class='line'>    <span class="nt">&lt;head&gt;</span>
</span><span class='line'>        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">&quot;https://api.html5media.info/1.1.5/html5media.min.js&quot;</span><span class="nt">&gt;&lt;/script&gt;</span>
</span><span class='line'>    <span class="nt">&lt;/head&gt;</span>
</span><span class='line'>    <span class="nt">&lt;body&gt;</span>
</span><span class='line'>        <span class="nt">&lt;video</span> <span class="na">src=</span><span class="s">&quot;video.mp4&quot;</span> <span class="na">width=</span><span class="s">640</span> <span class="na">height=</span><span class="s">360</span> <span class="na">controls</span><span class="nt">&gt;</span>
</span><span class='line'>            <span class="nt">&lt;source</span> <span class="na">src=</span><span class="s">&quot;video.mp4&quot;</span><span class="nt">&gt;&lt;/source&gt;</span>
</span><span class='line'>            <span class="nt">&lt;source</span> <span class="na">src=</span><span class="s">&quot;video.webm&quot;</span><span class="nt">&gt;&lt;/source&gt;</span>
</span><span class='line'>        <span class="nt">&lt;/video&gt;</span>
</span><span class='line'>    <span class="nt">&lt;/body&gt;</span>
</span><span class='line'><span class="nt">&lt;/html&gt;</span>
</span></code></pre></td></tr></table></div></figure>


<p><strong>Note</strong>: We&rsquo;re adding explicit closing tags to <code>&lt;source&gt;</code> elements to avoid confusing legacy browsers.</p>

<p>Unbelievable! Now your video will play in:</p>

<ul>
<li>Chrome</li>
<li>Safari (inc. Mobile Safari on iPhone 4+)</li>
<li>IE9+</li>
<li>IE8 (via Flash)</li>
<li>Firefox <del>(via Flash)</del></li>
<li>Opera <del>(via Flash)</del></li>
</ul>


<p>It&rsquo;s just the owners of under-powered mobile devices who&rsquo;ll struggle to play your video now.</p>

<h2>Adding support for under-powered mobile devices</h2>

<p>The latest mobile devices support high-resolution video, but cheap Android phones and iPhone 3GS will refuse to play anything higher-resolution than about 320 x 180 pixels. To keep these devices happy, you need to transcode your video to this lower resolution. <a href="https://www.mirovideoconverter.com/">Miro Video Encoder</a> has a built-in iPhone 3GS setting, so just use that.</p>

<p>Now you can embed your video using the following code:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
</pre></td><td class='code'><pre><code class='html'><span class='line'><span class="cp">&lt;!DOCTYPE html&gt;</span>
</span><span class='line'><span class="nt">&lt;html&gt;</span>
</span><span class='line'>    <span class="nt">&lt;head&gt;</span>
</span><span class='line'>        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">&quot;https://api.html5media.info/1.1.5/html5media.min.js&quot;</span><span class="nt">&gt;&lt;/script&gt;</span>
</span><span class='line'>    <span class="nt">&lt;/head&gt;</span>
</span><span class='line'>    <span class="nt">&lt;body&gt;</span>
</span><span class='line'>        <span class="nt">&lt;video</span> <span class="na">src=</span><span class="s">&quot;video.mp4&quot;</span> <span class="na">width=</span><span class="s">640</span> <span class="na">height=</span><span class="s">360</span> <span class="na">controls</span><span class="nt">&gt;</span>
</span><span class='line'>            <span class="nt">&lt;source</span> <span class="na">src=</span><span class="s">&quot;video.mp4&quot;</span> <span class="na">media=</span><span class="s">&quot;only screen and (min-device-width: 568px)&quot;</span><span class="nt">&gt;&lt;/source&gt;</span>
</span><span class='line'>            <span class="nt">&lt;source</span> <span class="na">src=</span><span class="s">&quot;video-low.mp4&quot;</span> <span class="na">media=</span><span class="s">&quot;only screen and (max-device-width: 568px)&quot;</span><span class="nt">&gt;&lt;/source&gt;</span>
</span><span class='line'>            <span class="nt">&lt;source</span> <span class="na">src=</span><span class="s">&quot;video.webm&quot;</span><span class="nt">&gt;&lt;/source&gt;</span>
</span><span class='line'>        <span class="nt">&lt;/video&gt;</span>
</span><span class='line'>    <span class="nt">&lt;/body&gt;</span>
</span><span class='line'><span class="nt">&lt;/html&gt;</span>
</span></code></pre></td></tr></table></div></figure>


<p>OMG! What a monster! But now everyone will be able to play your video!</p>

<ul>
<li>Chrome</li>
<li>Safari (inc. Mobile Safari on iPhone 4+)</li>
<li>IE9+</li>
<li>IE8 (via Flash)</li>
<li>Firefox <del>(via Flash)</del></li>
<li>Opera <del>(via Flash)</del></li>
<li>Mobile Safari (iPhone 3GS)</li>
<li>Android Browser (inc. cheap Android phones)</li>
</ul>


<h2>Help! My video still isn&rsquo;t playing!</h2>

<p>The most common causes of problems are:</p>

<ul>
<li>Video encoding errors.</li>
<li>Incorrect server configuration.</li>
</ul>


<p>There&rsquo;s a page full of troubleshooting information on the <a href="https://github.com/etianen/html5media/wiki/embedding-video">html5media video hosting wiki</a>. Your problem is almost certainly covered there.</p>

<h2>I want to customize the player UI, and make it look consistent across all browsers!</h2>

<p>Ahahahahahahahaha!</p>

<p>Ahahahaha!</p>

<p><strong><em>No.</em></strong></p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Git makes writing code easier!]]></title>
    <link href="https://blog.etianen.com/blog/2013/05/11/git-makes-writing-code-easier/"/>
    <updated>2013-05-11T14:35:00+01:00</updated>
    <id>https://blog.etianen.com/blog/2013/05/11/git-makes-writing-code-easier</id>
    <content type="html"><![CDATA[<p>Git is a distributed version control system that&rsquo;s a quite tricky to get into, and it can be hard to justify spending time getting to know it.</p>

<p>By the end of this post, I hope you&rsquo;ll be in a better position to use Git on <em>all</em> your software projects, and understand the benefits of doing so.</p>

<h2>Git is easy to start using.</h2>

<p>Installing Git on your system is simply a case of selecting the correct <a href="https://git-scm.com/downloads">Git installer</a>, and downloading the software onto your computer. Once you&rsquo;ve got Git installed, setting up a Git repositity for your software project is as simple as typing the following commands into a terminal:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span><span class="nb">cd</span> /path/to/your/project
</span><span class='line'><span class="nv">$ </span>git init
</span></code></pre></td></tr></table></div></figure>


<h2>Commit your code to Git as often as possible</h2>

<p>The more frequently you commit, the easier it is to go back at a later date and understand the work you&rsquo;ve been doing (and maybe even undo some of that work). Committing is easy!</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>git add .
</span><span class='line'><span class="nv">$ </span>git commit -a -m <span class="s2">&quot;Description of commit&quot;</span>
</span></code></pre></td></tr></table></div></figure>


<h2>If you&rsquo;ve made a terrible mistake, simply roll back to the last commit</h2>

<p>This is why it&rsquo;s a good idea to commit your work frequently. If you make a stupid coding mistake, and want to revert your code back to how it was before you broke everything, then just run the following command:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>git revert --hard
</span></code></pre></td></tr></table></div></figure>


<h2>Go back time and see how your code used to look</h2>

<p>Just type the following command to display a history of all your past commits:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>git log
</span></code></pre></td></tr></table></div></figure>


<p>To revert your codebase to some time in the past, simply copy the corresponding <em>commit hash</em> to your clipboard, close the Git log by pressing <code>q</code>, then type the following command:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>git checkout YOUR_COMMIT_HASH -- .
</span></code></pre></td></tr></table></div></figure>


<p>(A commit hash looks a bit like this: <code>814c219a338006492bf6f751d958461dd3e8b775</code>)</p>

<p>Once you&rsquo;ve finished with the older version of your code, you can go back to the lastest version by running the following command:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>git reset --hard
</span></code></pre></td></tr></table></div></figure>


<p>Alternatively, if you want to keep this older version of your code (and discard any changes you&rsquo;ve made since then), simply commit it using the following commands, and keep working:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>git add .
</span><span class='line'><span class="nv">$ </span>git commit -a -m <span class="s2">&quot;Rolled back to previous version of code&quot;</span>
</span></code></pre></td></tr></table></div></figure>


<h2>Back up your code on a server</h2>

<p>To protect against hard drive failure, it&rsquo;s a good idea to back up your code. You can either set up your own code hosting service, or save yourself some effort and get a free <a href="https://bitbucket.org/">BitBucket</a> account.</p>

<p>Once you&rsquo;ve created a remote repository, you can connect your local codebase to it using the following commands:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>git remote add origin ssh://bitbucket.org/your-repo-name.git
</span><span class='line'><span class="nv">$ </span>git push origin master -u
</span></code></pre></td></tr></table></div></figure>


<p>Then, whenever you&rsquo;ve made a few commits that you want to push to the server, just run the following command:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>git push
</span></code></pre></td></tr></table></div></figure>


<h2>Work with other people</h2>

<p>Once you&rsquo;ve put your code online, you can invite other people to work on it too. In order to get a copy of your code, they just need to run the following command:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>git clone ssh://bitbucket.org/your-repo-name.git
</span><span class='line'><span class="nv">$ </span><span class="nb">cd </span>your-repo-name
</span></code></pre></td></tr></table></div></figure>


<p>They can then make changes, <code>commit</code> them, and <code>push</code> them to the server. In order for you to see the changes that they have make, just run the following command on your machine:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>git pull
</span></code></pre></td></tr></table></div></figure>


<h2>Sometimes things go wrong when you <code>push</code></h2>

<p>If you try to <code>push</code> your code, and you get an error message saying this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="sb">`</span>! <span class="o">[</span>rejected<span class="o">]</span> master -&gt; master <span class="o">(</span>non-fast-forward<span class="o">)</span><span class="sb">`</span>
</span></code></pre></td></tr></table></div></figure>


<p>Don&rsquo;t worry, just run <code>git pull</code>, and try pushing again.</p>

<h2>Sometimes things go wrong when you <code>pull</code></h2>

<p>If you try to <code>pull</code> some code, and get an error message saying this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>CONFLICT <span class="o">(</span>content<span class="o">)</span>: Merge conflict in some_file
</span></code></pre></td></tr></table></div></figure>


<p>Don&rsquo;t worry, this just means that two people have tried to edit the same file. Just open the conflicting file in your editor, fix the contents, and run the following commands:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>git add .
</span><span class='line'><span class="nv">$ </span>git commit -a -m <span class="s2">&quot;Merging in changes&quot;</span>
</span></code></pre></td></tr></table></div></figure>


<h2>Can&rsquo;t I just use Dropbox?</h2>

<p>Dropbox is pretty good. However, for coding projects, Git has some key advantages:</p>

<ol>
<li>With Git, you choose when to save a version, and those versions get meaningful descriptions.</li>
<li>Git is extremely good at merging changes made by multiple people to the same file.</li>
<li>Git can save versions, and rollback to previous versions, <em>even when offline</em>.</li>
<li>When you understand Git, and know some of it&rsquo;s more complex features, <em>it can do magic</em>.</li>
</ol>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Processing XML with Python - you're probably doing it wrong]]></title>
    <link href="https://blog.etianen.com/blog/2013/04/14/python-xml/"/>
    <updated>2013-04-14T11:16:00+01:00</updated>
    <id>https://blog.etianen.com/blog/2013/04/14/python-xml</id>
    <content type="html"><![CDATA[<p>If you think that processing XML in Python sucks, and your code is eating up hundreds on megabytes of RAM just to process a simple document, then don&rsquo;t worry. You&rsquo;re probably using <code>xml.dom.minidom</code>, and there is a much better way&hellip;</p>

<h2>The test &ndash; books.xml</h2>

<p>For this example, we&rsquo;ll be attempting to process a 43MB document containing 4000 books. The test data can be downloaded <a href="https://blog.etianen.com/downloads/code/python-xml/books.xml">here</a>. The two methods shall be tested by providing implementations of <code>iter_authors_and_descriptions</code>.</p>

<figure class='code'><figcaption><span> (xml_test.py)</span> <a href='https://blog.etianen.com/downloads/code/python-xml/xml_test.py'>download</a></figcaption>
 <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
</pre></td><td class='code'><pre><code class='py3'><span class='line'><span class="kn">import</span> <span class="nn">resource</span><span class="o">,</span> <span class="nn">time</span>
</span><span class='line'><span class="kn">from</span> <span class="nn">xml.dom</span> <span class="k">import</span> <span class="n">minidom</span>
</span><span class='line'>
</span><span class='line'><span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
</span><span class='line'>    <span class="n">authors</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
</span><span class='line'>    <span class="n">max_description</span> <span class="o">=</span> <span class="mi">0</span>
</span><span class='line'>    <span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span><span class='line'>    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">&quot;books.xml&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">handle</span><span class="p">:</span>
</span><span class='line'>        <span class="k">for</span> <span class="n">author</span><span class="p">,</span> <span class="n">description</span> <span class="ow">in</span> <span class="n">iter_authors_and_descriptions</span><span class="p">(</span><span class="n">handle</span><span class="p">):</span>
</span><span class='line'>            <span class="n">authors</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">author</span><span class="p">)</span>
</span><span class='line'>            <span class="n">max_description</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">max_description</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">description</span><span class="p">))</span>
</span><span class='line'>    <span class="c"># Print out the report.</span>
</span><span class='line'>    <span class="n">end_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span><span class='line'>    <span class="n">report</span> <span class="o">=</span> <span class="n">resource</span><span class="o">.</span><span class="n">getrusage</span><span class="p">(</span><span class="n">resource</span><span class="o">.</span><span class="n">RUSAGE_SELF</span><span class="p">)</span>
</span><span class='line'>    <span class="nb">print</span><span class="p">(</span><span class="s">&quot;Unique authors: {}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">authors</span><span class="p">)))</span>
</span><span class='line'>    <span class="nb">print</span><span class="p">(</span><span class="s">&quot;Longest description: {}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">max_description</span><span class="p">))</span>
</span><span class='line'>    <span class="nb">print</span><span class="p">(</span><span class="s">&quot;Time taken: {} ms&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">int</span><span class="p">((</span><span class="n">end_time</span> <span class="o">-</span> <span class="n">start_time</span><span class="p">)</span> <span class="o">*</span> <span class="mi">1000</span><span class="p">)))</span>
</span><span class='line'>    <span class="nb">print</span><span class="p">(</span><span class="s">&quot;Max memory: {} MB&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">report</span><span class="o">.</span><span class="n">ru_maxrss</span> <span class="o">/</span> <span class="mi">1024</span> <span class="o">/</span> <span class="mi">1024</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">&quot;__main__&quot;</span><span class="p">:</span>
</span><span class='line'>    <span class="n">main</span><span class="p">()</span>
</span></code></pre></td></tr></table></div></figure>


<h2>The wrong way &ndash; minidom</h2>

<p>The minidom method is both awkward and inefficient.</p>

<figure class='code'><figcaption><span> (minidom.py)</span> <a href='https://blog.etianen.com/downloads/code/python-xml/minidom.py'>download</a></figcaption>
 <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
</pre></td><td class='code'><pre><code class='py3'><span class='line'><span class="kn">from</span> <span class="nn">xml.dom</span> <span class="k">import</span> <span class="n">minidom</span>
</span><span class='line'>
</span><span class='line'><span class="k">def</span> <span class="nf">get_child_text</span><span class="p">(</span><span class="n">parent</span><span class="p">,</span> <span class="n">child_name</span><span class="p">):</span>
</span><span class='line'>    <span class="n">child</span> <span class="o">=</span> <span class="n">parent</span><span class="o">.</span><span class="n">getElementsByTagName</span><span class="p">(</span><span class="n">child_name</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
</span><span class='line'>    <span class="k">return</span> <span class="s">&quot;&quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span>
</span><span class='line'>        <span class="n">grandchild</span><span class="o">.</span><span class="n">data</span>
</span><span class='line'>        <span class="k">for</span> <span class="n">grandchild</span>
</span><span class='line'>        <span class="ow">in</span> <span class="n">child</span><span class="o">.</span><span class="n">childNodes</span>
</span><span class='line'>        <span class="k">if</span> <span class="n">grandchild</span><span class="o">.</span><span class="n">nodeType</span> <span class="ow">in</span> <span class="p">(</span><span class="n">grandchild</span><span class="o">.</span><span class="n">TEXT_NODE</span><span class="p">,</span> <span class="n">grandchild</span><span class="o">.</span><span class="n">CDATA_SECTION_NODE</span><span class="p">)</span>
</span><span class='line'>    <span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="k">def</span> <span class="nf">iter_authors_and_descriptions</span><span class="p">(</span><span class="n">handle</span><span class="p">):</span>
</span><span class='line'>    <span class="n">document</span> <span class="o">=</span> <span class="n">minidom</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">handle</span><span class="p">)</span>
</span><span class='line'>    <span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">document</span><span class="o">.</span><span class="n">getElementsByTagName</span><span class="p">(</span><span class="s">&quot;book&quot;</span><span class="p">):</span>
</span><span class='line'>        <span class="k">yield</span> <span class="p">(</span>
</span><span class='line'>            <span class="n">get_child_text</span><span class="p">(</span><span class="n">book</span><span class="p">,</span> <span class="s">&quot;author&quot;</span><span class="p">),</span>
</span><span class='line'>            <span class="n">get_child_text</span><span class="p">(</span><span class="n">book</span><span class="p">,</span> <span class="s">&quot;description&quot;</span><span class="p">),</span>
</span><span class='line'>        <span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>Due to loading the entire document in one chunk, minidom takes a long time to run, and uses a lot of memory.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>Unique authors: 999
</span><span class='line'>Longest description: 63577
</span><span class='line'>Time taken: 368 ms
</span><span class='line'>Max memory: 107 MB</span></code></pre></td></tr></table></div></figure>


<h2>The right way &ndash; cElementTree</h2>

<p>The cElementTree method is also awkward (this is XML, after all). However, by using the <code>iterparse()</code> method to avoid loading the whole document into memory, a great deal more efficiency can be acheived.</p>

<figure class='code'><figcaption><span> (cElementTree.py)</span> <a href='https://blog.etianen.com/downloads/code/python-xml/cElementTree.py'>download</a></figcaption>
 <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
</pre></td><td class='code'><pre><code class='py3'><span class='line'><span class="kn">from</span> <span class="nn">xml.etree</span> <span class="k">import</span> <span class="n">cElementTree</span>
</span><span class='line'>
</span><span class='line'><span class="k">def</span> <span class="nf">iter_elements_by_name</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
</span><span class='line'>    <span class="n">events</span> <span class="o">=</span> <span class="n">cElementTree</span><span class="o">.</span><span class="n">iterparse</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="n">events</span><span class="o">=</span><span class="p">(</span><span class="s">&quot;start&quot;</span><span class="p">,</span> <span class="s">&quot;end&quot;</span><span class="p">,))</span>
</span><span class='line'>    <span class="n">_</span><span class="p">,</span> <span class="n">root</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">events</span><span class="p">)</span>  <span class="c"># Grab the root element.</span>
</span><span class='line'>    <span class="k">for</span> <span class="n">event</span><span class="p">,</span> <span class="n">elem</span> <span class="ow">in</span> <span class="n">events</span><span class="p">:</span>
</span><span class='line'>        <span class="k">if</span> <span class="n">event</span> <span class="o">==</span> <span class="s">&quot;end&quot;</span> <span class="ow">and</span> <span class="n">elem</span><span class="o">.</span><span class="n">tag</span> <span class="o">==</span> <span class="n">name</span><span class="p">:</span>
</span><span class='line'>            <span class="k">yield</span> <span class="n">elem</span>
</span><span class='line'>            <span class="n">root</span><span class="o">.</span><span class="n">clear</span><span class="p">()</span>  <span class="c"># Free up memory by clearing the root element.</span>
</span><span class='line'>
</span><span class='line'><span class="k">def</span> <span class="nf">iter_authors_and_descriptions</span><span class="p">(</span><span class="n">handle</span><span class="p">):</span>
</span><span class='line'>    <span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">iter_elements_by_name</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="s">&quot;book&quot;</span><span class="p">):</span>
</span><span class='line'>        <span class="k">yield</span> <span class="p">(</span>
</span><span class='line'>            <span class="n">book</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&quot;author&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="p">,</span>
</span><span class='line'>            <span class="n">book</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&quot;description&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="p">,</span>
</span><span class='line'>        <span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>The results speak for themselves. By using cElementTree, you can process XML in half the time and only use 10% of the memory.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>Unique authors: 999
</span><span class='line'>Longest description: 63577
</span><span class='line'>Time taken: 192 ms
</span><span class='line'>Max memory: 8 MB</span></code></pre></td></tr></table></div></figure>


<h2>Conclusion</h2>

<p>These tests are hardly scientific, so feel free to download the code and see how it runs in your own environment. In any case, the next time your servers get melted by an XML document, consider giving cElementTree a spin.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[How to turn nginx into a caching, authenticated Twitter API proxy]]></title>
    <link href="https://blog.etianen.com/blog/2013/04/12/nginx-twitter-api-proxy/"/>
    <updated>2013-04-12T10:22:00+01:00</updated>
    <id>https://blog.etianen.com/blog/2013/04/12/nginx-twitter-api-proxy</id>
    <content type="html"><![CDATA[<p>Very soon, the old <a href="https://dev.twitter.com/docs/api/1">Twitter 1.0 API</a> will be turned off, making a switch to the <a href="https://dev.twitter.com/docs/api/1.1">1.1 API</a> essential. Unfortunately, the new API has a couple of restrictions that can make the transition very difficult.</p>

<ul>
<li><strong>Mandatory authentication HTTP headers</strong> &ndash; Using JSONP is now impossible.</li>
<li><strong>Restrictive crossdomain.xml</strong> &ndash; Using CORS is now impossible.</li>
</ul>


<p><em>The result of these changes is that it is now impossible to access the Twitter API directly from the browser.</em></p>

<h2>So, just use a proxy, right?</h2>

<p>A simple solution is to write your own proxy server, which can then run on your own domain. The minimum features for a useful Twitter API proxy are:</p>

<ul>
<li>Adds the required authentication HTTP headers to your request.</li>
<li>Caches the results to avoid exceeding API rate limits.</li>
</ul>


<p>Writing a Python/Ruby/PHP script to handle this is easy, but it&rsquo;s a waste of valuable server resources. Far better to let <a href="https://wiki.nginx.org/Main">nginx</a>, the best caching reverse proxy server in the world, do the hard work instead.</p>

<h2>Step 1 &ndash; Create a Twitter application</h2>

<p>Creating a Twitter application allows you to authenticate with the API. Just visit <a href="https://dev.twitter.com/apps/new">https://dev.twitter.com/apps</a> and register your application with Twitter.</p>

<p>Once your new app is created, head over to it&rsquo;s detail page and make a note of the <em>consumer key</em> and <em>consumer secret</em>. You&rsquo;ll need these for the next step.</p>

<h2>Step 2 &ndash; Obtain a bearer token for the application</h2>

<p>The easiest way to authenticate with the Twitter API is to obtain a <em>bearer token</em> for your proxy server, which is a simple code that can be sent as a HTTP header with every request.</p>

<p>To obtain your bearer token, run the following shell commands, substituting your own <em>consumer key</em> and <em>consumer secret</em>.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span><span class="nb">export </span><span class="nv">CONSUMER_KEY</span><span class="o">=</span>XXXXXXXXXXXXXXXXXXXXX
</span><span class='line'><span class="nv">$ </span><span class="nb">export </span><span class="nv">CONSUMER_SECRET</span><span class="o">=</span>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
</span><span class='line'><span class="nv">$ </span>curl -H <span class="s2">&quot;Authorization: Basic `echo -ne &quot;</span><span class="nv">$CONSUMER_KEY</span>:<span class="nv">$CONSUMER_SECRET</span><span class="s2">&quot; | base64`&quot;</span> -d <span class="s2">&quot;grant_type=client_credentials&quot;</span> https://api.twitter.com/oauth2/token
</span></code></pre></td></tr></table></div></figure>


<p>After a few seconds, your terminal will print out a JSON string containing your <em>bearer token</em>. It will look something like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='json'><span class='line'><span class="p">{</span><span class="nt">&quot;token_type&quot;</span><span class="p">:</span><span class="s2">&quot;bearer&quot;</span><span class="p">,</span><span class="nt">&quot;access_token&quot;</span><span class="p">:</span><span class="s2">&quot;AAAAAAAAAAAAAAAAAAAAAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&quot;</span><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>Make a note of the <code>access_token</code> field. You&rsquo;ll need this for the next step.</p>

<h2>Step 3 &ndash; Update your nginx configuration file</h2>

<p>Simply place the following settings in your nginx configuration, adjusting paths as necessary. In particular, make sure that <code>proxy_cache_path</code>, <code>server_name</code> and <code>root</code> are all correct. Most important of all, replace the <code>INSERT_YOUR_BEARER_TOKEN</code> placeholder with the <em>bearer token</em> you obtained in step 2.</p>

<figure class='code'><figcaption><span> (nginx.conf)</span> <a href='https://blog.etianen.com/downloads/code/nginx-twitter-api/nginx.conf'>download</a></figcaption>
 <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
<span class='line-number'>52</span>
<span class='line-number'>53</span>
</pre></td><td class='code'><pre><code class='nginx'><span class='line'><span class="c1"># This defines a 10 megabyte cache for the proxy service, and needs to live</span>
</span><span class='line'><span class="c1"># outside of the virtual host configuration. Adjust the path according to</span>
</span><span class='line'><span class="c1"># your environment.</span>
</span><span class='line'><span class="k">proxy_cache_path</span>  <span class="s">/var/cache/nginx/twitter_api_proxy</span> <span class="s">levels=1:2</span> <span class="s">keys_zone=twitter_api_proxy:10m</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'><span class="c1"># The virtual host configuration.</span>
</span><span class='line'><span class="k">server</span> <span class="p">{</span>
</span><span class='line'>
</span><span class='line'>  <span class="c1"># If your want to secure your proxy with SSL, replace with the appropriate SSL configuration.</span>
</span><span class='line'>  <span class="kn">listen</span> <span class="mi">80</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'>  <span class="c1"># Replace this with the name of the domain you wish to run your proxy on.</span>
</span><span class='line'>  <span class="kn">server_name</span> <span class="s">api.twitter.yourdomain.com</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'>  <span class="c1"># Replace this with your own document root.</span>
</span><span class='line'>  <span class="kn">root</span> <span class="s">/var/www</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'>  <span class="c1"># This setting attempts to use files in the document root before</span>
</span><span class='line'>  <span class="c1"># hitting the Twitter proxy. This allows you to put a permissive</span>
</span><span class='line'>  <span class="c1"># crossdomain.xml file in your document root, and have it show up</span>
</span><span class='line'>  <span class="c1"># in the browser.</span>
</span><span class='line'>  <span class="kn">location</span> <span class="s">/</span> <span class="p">{</span>
</span><span class='line'>    <span class="kn">try_files</span> <span class="nv">$uri</span> <span class="nv">$uri/index.html</span> <span class="s">@twitter</span><span class="p">;</span>
</span><span class='line'>  <span class="p">}</span>
</span><span class='line'>
</span><span class='line'>  <span class="c1"># The Twitter proxy code!</span>
</span><span class='line'>  <span class="kn">location</span> <span class="s">@twitter</span> <span class="p">{</span>
</span><span class='line'>
</span><span class='line'>    <span class="c1"># Caching settings, to avoid rate limits on the API service.</span>
</span><span class='line'>    <span class="kn">proxy_cache</span> <span class="s">twitter_api_proxy</span><span class="p">;</span>
</span><span class='line'>    <span class="kn">proxy_cache_use_stale</span> <span class="s">error</span> <span class="s">updating</span> <span class="s">timeout</span><span class="p">;</span>
</span><span class='line'>    <span class="kn">proxy_cache_valid</span> <span class="mi">200</span> <span class="mi">302</span> <span class="mi">404</span> <span class="mi">5m</span><span class="p">;</span>  <span class="c1"># The server cache expires after 5 minutes - adjust as required.</span>
</span><span class='line'>    <span class="kn">proxy_ignore_headers</span> <span class="s">X-Accel-Expires</span> <span class="s">Expires</span> <span class="s">Cache-Control</span> <span class="s">Set-Cookie</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'>    <span class="c1"># Hide Twitter&#39;s own caching headers - we&#39;re applying our own.</span>
</span><span class='line'>    <span class="kn">proxy_hide_header</span> <span class="s">X-Accel-Expires</span><span class="p">;</span>
</span><span class='line'>    <span class="kn">proxy_hide_header</span> <span class="s">Expires</span><span class="p">;</span>
</span><span class='line'>    <span class="kn">proxy_hide_header</span> <span class="s">Cache-Control</span><span class="p">;</span>
</span><span class='line'>    <span class="kn">proxy_hide_header</span> <span class="s">pragma</span><span class="p">;</span>
</span><span class='line'>    <span class="kn">proxy_hide_header</span> <span class="s">set-cookie</span><span class="p">;</span>
</span><span class='line'>    <span class="kn">expires</span> <span class="mi">5m</span><span class="p">;</span>  <span class="c1"># The browser cache expires after 5 minutes - adjust as required.</span>
</span><span class='line'>
</span><span class='line'>    <span class="c1"># Set the correct host name to connect to the Twitter API.</span>
</span><span class='line'>    <span class="kn">proxy_set_header</span> <span class="s">Host</span> <span class="s">api.twitter.com</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'>    <span class="c1"># Add authentication headers - edit and add in your own bearer token.</span>
</span><span class='line'>    <span class="kn">proxy_set_header</span> <span class="s">Authorization</span> <span class="s">&quot;Bearer</span> <span class="s">INSERT_YOUR_BEARER_TOKEN&quot;</span>
</span><span class='line'>
</span><span class='line'>    <span class="c1"># Actually proxy the request to Twitter API!</span>
</span><span class='line'>    <span class="s">proxy_pass</span> <span class="s">https://api.twitter.com</span><span class="p">;</span>
</span><span class='line'>  <span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>Phew! That&rsquo;s it, simply restart nginx and hit the following URL in your browser to make sure that everything is working!</p>

<p><a href="https://api.twitter.yourdomain.com/1.1/search/tweets.json?q=cats">https://api.twitter.yourdomain.com/1.1/search/tweets.json?q=cats</a></p>
]]></content>
  </entry>
  
</feed>
