Advertising In Planned Obsolescence
A presentation I had to put together for my Science Communications unit at Uni.
Simple Continuous Integration
For my Photokarma project, which I’ve blogged about previously, I wanted a very easy way to have the live testing server up to date with the git repository. To do this I whipped up a simple little python script with the help of Flask and a bash script.
GitHub Service Hooks

This process is all made possible thanks to GitHub’s service hooks feature. To set it up on the GitHub side I simply set a POST Receive URL to one that my application is listening to
simple-ci.py
This part of the application is very, very simple. It waits for a request to hit the set URL, and when that happens it executes the shell script.
run.sh
Here is where most of the magic happens for this little project. The code is rather self explanatory as to what it does, it first pulls the latest updates from GitHub, checks to see if there are any updates to the pip requirements, collects the static files and finally either starts or restarts the Gunicorn process.
One thing that my script does not do is do database migrations. I figure that they are a somewhat risky operation and it’s best to manually do that and shit down the application for a minute or so to ensure that nothing goes awry.
I intend on using this script, with the project_location and PID_FILE variables changed accordingly, on all other Python/Django projects that I work on as they make the deploying aspect of my workflow extremely streamlined with minimal involvement from me for the majority of cases.
Paginating Markdown
For a project that I’m working on at the moment, I have to paginate a big long markdown document into smaller pages. To do that, I decided that I would split the big document for each top level heading # top level heading.
I’m using Django as the framework for this project, so I start off with this fairly simple model, with two methods that do all the magic, and also the template that will be used to present the page:
So the first 3 lines of the session class are pretty straightforward, just declaring my database fields, the interesting part is the two methods session.pages() and session.get_page(), which I’ll now go over.
pages()
The pages method generates a list of all the different pages, splitting a new page each time it finds a single #, which is used to indicate a top level heading, at the start of a line. An issue that I have, and I’m not sure how to solve, is that the regular expression to find where to split the pages will eat those characters, so the first character after the hash will disappear. This means that currently you need to have a space directly after the # for the heading.
The add_h1 function inside of the pages method puts the # and the space back in after the split has cut it off.
I think that I need some form of caching for this, possibly using memoization. I would need to do some profiling to find out if that would be worthwhile.
get_page()
Next up is the get_page method, this is what’s called by my controller and the result of which is fed into my template.
The first 4 lines of it is some simple error checking, making sure that the requested page is within the range of the pages available. page[0] is going to be garbage, as I’ll have a html comment at the start of my session.content, so I ensure that the first page requested is page[1]. I then make sure that the upper page is within the range of the list.
Next, I declare my Page namedtuple, which is what I’m going to return at the end of the method, and pass into the template. namedtuple’s act very similarly to class instances, which means that in the template they are basically indistinguishable from real django models, which is nice.
In these two if/else blocks I determine if there is a next or previous page to the current page, and if there is the url of which is passed on. Otherwise it is set as None which the template can use.
Finally, I create my Page instance and return it, up into the controller which then passes it off to the template.
template.html
The template is pretty straightforward, it takes the markdown source and converts it into html using a django filter. I then detects if the Page.next and Page.previous are set, if they are it provides those links allowing the user to navigate back and forth, and if there isn’t a next page, it provides the option to “complete” the session, an application specific action.
Conclusion
I think that this approach is far more robust than the alternative, which is to have a separate model for pages and to store the content for each page in a separate instance. That approach makes trivial things like inserting a page in the middle of a session rather difficult.
Another piece of functionality that I need to introduce is having a shortcode-like system to link in quiz’s, which are a important part of this application, into the middle of sessions. My current line of thought is to have lines that look like # <!--quiz:45345--> with the number being the id for a given quiz.
Another Virgin Email Sent
So after being told that I would receive a call back on this past Friday that has not occurred, and I’m starting to get pretty damn annoyed with Virgin, here’s another message I’ve sent to them, after not hearing any response from the last one that I sent.
Hi, I have an ongoing issue with mobile data charges that still hasn’t been resolved. I was meant to receive a call back on Friday the 20th after speaking to a representative in Sydney on the 13th, but he did not call back. First and foremost I don not believe that these data charges are legitimate, second, the representative said that the fee would be reduced to 1/3rd, which also hasn’t happened, and finally my 3G access has not been reactivated which was also meant to have happened. Im currently in the process of compiling this information and am ready to lodge a complaint with the telecommunications industry ombudsman if this can’t be resolved internally by speaking to your customer service representatives.
Third Correspondance With Virgin
For the last week now I’ve been trying to get excess data fees taken off of my account, as I believe that they’re inaccurate. In the previous conversation I had with them they established that it must have been the torrent client on my computer chewing through the data, and the representative offered said they would reduce the roughly $300 in overage fees to 1/3rd or ~$100. This is where this email kicks off:
Hi, I spoke to someone last week about issues with my mobile data, in particular 1.2gb of data used in a 3.5 hour session. For that amount of data to be transfered in that time it would have had to have been going at a constant rate of ~100kb/s.
The person I spoke to suspected it was a torrent client on my computer unknowingly seeding, which I was unable to disprove, until just now. I realised that I had a speed throttle on my torrent client, limiting uploads to only 50kb/s, so even if it were open, which I highly doubt in the first place, it would have only uploaded a maximum of half as much data as is alleged. I also brought up the issue that I doubt that my phone has ever been able to upload at 100kb/s (or greater), which I still doubt.
I have had the limit activated because unless it’s in place when I use the torrent client it would saturate the upstream connection, slowing downstream traffic to a trickle.
In combination with my 3G Watchdog logs which show a far lower usage for the given day/month, I believe this session is still unexplained and I that it is an error in the Virgin’s data monitoring systems.
Let’s hope this is resolved soon…
GitHub: Processes & Methodologies
I have written this essay for my SE110 unit
GitHub is an online service that aims to make sharing code as easy as possible and to simplify code collaboration. GitHub started in late 2007 with three co-founders hacking in coffee-shops around San Francisco and has now grown to have over a million users and to host more than two million code repositories.
Github is a product built around the utility Git which is a Distributed Version Control System (DVCS) created by Linus Torvalds in 2005 in an attempt to make collaboration in software development far easier. It varies from a traditional Version Control System such as Subversion (SVN) because ‘forking’ code into a new branch is basically free and very easy. These branches can then be worked on independently and then merged back together into a ‘master’ branch. When someone who isn’t the owner of the repository wants to have their changes merged into master they do it through the use of a ‘pull request’.
User Interface (UI) Driven Development
A process commonly used by GitHub is “UI Driven Development” where in the UI of a given feature of the application is completed to a functional degree before any middleware programming is undertaken. The reasoning behind this process is that until a UI is created the exact way that the code will have to interact with it is not known, and if the middleware is programmed without the UI there will be the potential for large amounts of redundant code to be created because it will simply not be needed in the interface, or worse the UI and the user experience (UX) could be compromised and morphed into a form that will fit the middleware.
In GitHub’s early days this meant that the Ruby to Git binding that were in the deep backend could be worked on while the user interfaces were being created, but the specific controllers that linked the two together, for example the code that generated the “Repository View” could not be written until the user interface was complete, as that would have interfered with both the production of the interface and have the potential to be redundant.
Continuous Integration (CI)
GitHub were firm believers of the agile principles, and one way in which they utilized it was through the use of CI. CI is a process where systems are put in place that allow the live application to be updated many times a day. GitHub claims to update their site roughly 15-20 times on a given day as various employees add on their small fixes or improvements to the application.
GitHub utilises their own infrastructure to allow them to base their CI workflow. When changes in the application are ‘pushed’ into the master branch of the GitHub repository the Testing server runs through the comprehensive test suite, looking for errors in the code. If that succeeds, then the code is pushed out onto the production servers in a ‘rolling’ configuration, with only a small number of the servers updating at a given time. By rolling out a large number of small updates this allowed them to minimise the risk of something going wrong in any given update.
Pull Requests
A prominent methodology used by GitHub which then made its way into being a prominent methodology used in the GitHub application is the use of pull requests for code review. Whenever a new feature or patch is being worked on a new branch is created with changes then committed to that branch. When the programmer is ready to have their code either considered for inclusion in the master branch or looked over by colleagues they submit a pull request which draws attention to the changes at hand, and if everything looks good they are then merged into the master branch.
Culture
To ensure that their employees are constantly producing the best possible work GitHub encouraged a work environment that was an enjoyable place, and had them in “the zone” as much as possible. To achieve this GitHub engaged in a methodology where employees could come in to the office at any time of the day, or even work from home or remotely. They recognized that different people operate best at different hours, and embraced that fact. GitHub also recognised that meetings were a barrier between programmers and writing code, so they reduced the required meetings down to near zero, with the time instead used to actually do work.
Internal communications for GitHub are all done via the 37signals product “Campfire” which allows the discussion to be asynchronous, enabling any employee to join a given conversation at any point and still have a complete understanding of what is going on. Campfire keeps comprehensive logs which allows all communications to be referenced, decreasing time required in re-answering the same questions, or having to resolve a problem already figured out in a previous discussion.
Customer Relations Via Twitter
Due to the nature of GitHub being a Web Application it is only fitting for them to use another Web Application for the Customer Relation Management (CRM) to find when an update to the product has potentially gone awry. Whenever a new feature is pushed into production Twitter is then watched for any angry or annoyed customers complaining about the site breaking in some way, shape or form. if there is little noise on Twitter after an update then they know that it had been successful, otherwise a quick fix or a rollback would need to be considered.
Hiring
In employing people to work on the GitHub project certain techniques were used to make sure that the best possible people for the job were employed. A key way that this is achieved is through hiring people from Open Source projects. This has proven to be a very effective way of hiring people as their work on Open Source shows off their programming ability and their passion for the area of interest, removing the need to discover that through the interview process which enables them to better spend their time actually working on the product.
Creative Freedom
The employees working on GitHub are given near-full freedom to work on whatever part of GitHub that they like. this means that people are always working on aspects of the project that they are passionate about, ensuring a high quality of work.
The employees of GitHub are believed to have been hired because they were experts in their respective fields, and their judgement is trusted when working on their fields of expertise. it is widely accepted that a micro-management hierarchy of leadership is extremely detrimental to employee performance.
Conclusion
GitHub has employed a large number of multifaceted methodologies which aim to increase the quality of their product. By removing as many barriers as possible between their employees and them getting their work done they are able to keep them perpetually in ‘the zone’ of productivity. By being intelligent in how they work their efforts can always be concentrated on the particular areas that will greatest improve the product at a given time, and through the sustained use of these methodologies they have transformed themselves from three men in a San Francisco coffee-shop with an idea to the online repository for code.
Bibliography
- “How GitHub Works” by Zach Holman (2011)
- “Ten Lessons from GitHub’s First Year” by Tom Preston-Werner (2008)
- “Bootstrapped, Profitable & Proud: GitHub” by Matt Linderman (2010)
- “Startups Open Sourced: Stories to inspire and educate” by Jared Tame (2011)
I’ve started to work on the frontend of Photokarma, which is my idea of a website which lets users exchange constructive criticism in a fresh way. For this prototype / beta of the site I’m trying to rush out a version of the UI that contains all the buttons and links that the site is going to need, and then down the track I’ll change the frontend to something more customised.
The choice to do this is based off a chapter in the book Startups: Open Sourced where Tom Preston-Werner is interviewed and brings forward the idea of UI Driven Development, where the UI and the UX come first, and all the other code in the project come in second to that. When done correctly the product ends up being better, as instead of having to make UI compromises based on the codebase, free-reign can be had and as a developer you have to embrace the challenges involved.