Example software development projects
These are a few representive examples of some of systems I developed whilst at the BBC (there were many others). I've chosen examples that I designed, implemented, skinned and documented myself, rather than those I designed and tasked someone else with implementation. In all cases components have unit tests and API documentation.
idstore - an anonymous web storage system
This is a web server storage mechanism I developed whilst at the BBC, providing access to a "locker" (a block of storage) with a "key" (a unique ID).
By virtue of being minimal ("malloc for the web"), it's very versatile and has been put to a wide range of uses. It's well suited to web 2.0 clients such as AJAX or Flash which need some way to maintain state between sessions or for "send to a friend" or "group challenge" scenarios where 2 or more users need access to a private resource.
It's also useful in combination with a personalisation system which can be used as a "key ring" to hold the pointers to the storage.
The system consists of:
- MySQL database providing the storage. InnoDB tables used so hot backups can be taken.
- Perl API providing the storage primatives (fetch, store, delete, touch etc)
- mod_perl application providing the web interface with output in a range of formats (e.g. JSON and XML as well as templated output).
- HTTP remoting interface (proxy module and modperl server script) allowing backoffice systems to publish and retrieve data, and find out what's changed since they last did.
- Weekly cleaner process to expunge inactive IDs (command-line script run by cron)
- Web admin interface to allow BBC staff to set the inactivity expiry interval on an instance etc.
- User guide for client-side developers, support guide for webmasters, demo site with minimal examples
Some of the services on bbc.co.uk built with this include:
Batch processing system
Whilst at the BBC, I developed a system for asynchronous batch processing. Its main features are:
- A web user interface to administrate and interrogate queues
- Optional workflow layer: a linear workflow may be defined to chain processes together. Once a job has completed successfully in one stage of the workflow it is automatically transferred to the next.
- Dependencies: A job may have dependencies (in the same, or other queues). The job will not run until all its dependencies have completed successfully.
- Prioritisation based on age and importance
- Scheduling: Jobs can be marked as "don't run before date/time". Jobs due in the past have maximum priority (over ones due ASAP).
- Parallel processing: multiple dequeuer processes may be run on a machine to take advantage of multiple CPUs or hyperthreads
- Load balancing: queues may be assigned to multiple hosts to spread the load of computationally intensive tasks
- Support for heterogeneous environment: some parts of a workflow may run in a Windows environment and others in a Unix environment.
- Simple API for writing plugins: Dequeuer commands are simply perl modules with an execute() method that prints output to STDOUT and errors to STDERR and returns true/false to indicate success or failure.
- Archival: Archiver process collects all the finished jobs and their output into an archive file
- Email notification
The system consists of:
- A MySQL database providing data model for jobs, queues, workflows, host assignments etc.
- A perl object model (Workflows, Queues, Jobs, Commands, Logging etc)
- Encapsulates logic such as prioritisation algorithm, dependency tracking etc
- Command plugin architecture so new commands can be deployed and used in the running system.
- Logging mechanism to capture STDOUT/STDERR from commands and to abstract a hierarchically-organised directory structure for logfiles
- Serialisation used to store arbitrary job/queue data in DB
- A dequeuer daemon - this is the process that actually runs the jobs.
- A web management interface consisting of:
- mod_perl scripts to expose the relevant parts of the object model over HTTP
- XHTML templates to abstract the presentation
- CSS for styling
- User guide with screenshots etc
- An init.d/ startup script to control daemon
- An archival process (run via cron) to expunge successful jobs from database (and their logfiles) and generate a tar archive that can be stored offline.
The system is used to manage publishing of content from a variety of sources to bbc.co.uk and mirror servers and for running QA processes over content. There are plans to use it for distributing other "heavy lifting" across machines (such as HTML email generation or processing of user-submitted images or AV content).
Wildfacts
This is an example of much more applied work, working closely with editorial staff and client-side developers to produce an end product. It's a small content management database to manage/publish a set of several hundred animal factfiles for the nature website. The output is here.
The system consists of:
- An Oracle database to model the content (species taxonomy, animals, images, video clips, links etc)
- A user interface in MS Access/VBA talking to backend Oracle DB over ODBC
- A perl publishing script to generate leaf pages and search repository. Written as a commandline script so it can either be run from cron for a nightly publish or launched from a button in MS Access via a generic CGI wrapper (which I also wrote). It makes use of a set of reusable read, transform and write components I wrote for publishing tabular data which have subsequently formed the basis of all the department's database publishing.
- HTML Templates built by client-side developers
The initial build involved collating data from a number of data sources (a defunct mysql database, and a number of excel spreadsheets). Some ad-hoc perl scripts were used to marshal the data into the new schema.