Using Python’s argparse for a “turn on”/”turn off” argument

What’s argparse ?

Argparse is a Python standard module and “makes it easy to write user-friendly command-line interfaces”. The 2.x doco is here, the 3.x doco is here. Before 2.7 there was the optparse module supplied as part of Python but that’s been deprecated and replaced with argparse.

“turn off” / “turn off” type arguments

I was working on some code yesterday and I wanted an argument of the “turn on” / “turn off” type. So for instance you might want the output to be verbose or not, it’s not uncommon to see this implemented by means of a

--verbose

argument. When ‘–verbose’ is present the programmer provides verbose output, when it’s absent they don’t.

How then ?

A nice neat way to do this is to make use of the `action` (2.x and 3.x) argument of the `add_argument` method and to combine that with use of the `set_defaults` method so that a value is set in the case when the argument is not used by the user.

Here’s an example taken from my django-row-count project :

parser.add_argument('--echotostdout', dest='echotostdout', action='store_true')
parser.set_defaults(echotostdout=False)
args = parser.parse_args()

In this case a command line argument …

--echotostdout

… sets an attribute

args.echotostdout

… to True if it’s present as an argument on the command line and to False if it’s absent.

Django and Heroku – getting it working

Django and Heroku – getting it working

What follows is based on a short talk I gave to the New Zealand Python User Group in Feb 2015. This blog post provides some specifics on areas I was only able to hand wave over during the talk.

Motivation

I recently tried to deploy a Django side project to Heroku.

I’d previously used Heroku for a Ruby on Rails project and remembered it being very straightforward so I was surprised to find it wasn’t that great an experience. The documentation is fragmentary and seems to have been only partially updated to reflect changes in Django and the Heroku environment.

“Simplest Possible”

In the end I decided to suspend my original project and try to make the simplest possible Django project work on Heroku. For “simplest possible” I chose the “Polls” project from the Django Tutorial . I got it working and the code is available in my github account:  https://github.com/shearichard/polls17/tree/v2.0 . If you’re interested the version of the Project which works locally and before I made any changes to support use on Heroku is here : https://github.com/shearichard/polls17/tree/v1.0 .

What needed to be done

To complement the Heroku documentation I’m going to record here the changes that were made to the Project between v1.0 (working locally) and v2.0 (working on Heroku).

The files to which changes were applied to support use in Heroku are as follows :

mysite/mysite/settings.py (before and after)
mysite/mysite/settings_heroku.py (after – there was no ‘before’ for this file !)
mysite/mysite/wsgi.py (before and after)
requirements.txt (before and after)

settings.py

diff --git a/mysite/mysite/settings.py b/mysite/mysite/settings.py
index cb992c1..b2082ba 100644
--- a/mysite/mysite/settings.py
+++ b/mysite/mysite/settings.py
@@ -87,4 +87,5 @@ USE_TZ = True
# https://docs.djangoproject.com/en/1.7/howto/static-files/

STATIC_URL = '/static/'
STATIC_ROOT = 'staticfiles'
TEMPLATE_DIRS = [os.path.join(BASE_DIR, 'templates')]

 

settings_heroku.py

The settings_heroku.py file was completely new for use within the Heroku environment and we can see it referenced below from within wsgi.py when a test is made to see if the code is running within Heroku.

The final form of settings_heroku.py is as follows :

from .settings import *

import dj_database_url
DATABASES['default'] =  dj_database_url.config()

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
STATIC_ROOT = 'staticfiles'
STATIC_URL = '/static/'

STATICFILES_DIRS = (
    os.path.join(BASE_DIR, 'static'),
)
# Simplified static file serving.
# https://warehouse.python.org/project/whitenoise/
STATICFILES_STORAGE = 'whitenoise.django.GzipManifestStaticFilesStorage'

Things worthy of note here are :

  • we import the whole of the local settings file (referenced here as ‘.settings’) and then change or add to it as necessary.
  • we make use of the dj-database-url to pick up the database configuration to be used in the Heroku environment
  • `STATIC_ROOT` and `STATICFILES_DIRS` are not needed in the standard version of the ‘Polls’ project but they are needed when we move to Heroku so they’re added here.
  • `STATIC_URL` is already defined in the standard settings file and so doesn’t actually need to be in settings_heroku.py at all.
  • STATICFILES_STORAGE allow for the use of Whitenoise a module which allows wsgi apps (such as this one) to serve their own static files, something which hadn’t previously been possible. There’s other good reasons to use Whitenoise in the areas of file compression and cache-header handling

wsgi.py

The version of wsgi.py before the changes for Heroku is very straightforward and can be seen below.

"""
WSGI config for mysite project.

It exposes the WSGI callable as a module-level variable named ``application``.

For more information on this file, see

https://docs.djangoproject.com/en/1.7/howto/deployment/wsgi/

"""

import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.settings")

from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()

To make wsgi.py work for Heroku there are essentially three changes:

  • Make the settings file used dependent on the existence of an environmental variable, ‘DYNO’. If it’s present then the code is running on Heroku and the server is started  using the the settings_heroku.py file shown above, otherwise we continue to use the settings.py file.
  • To make use of Whitenoise we take the the output of `get_wsgi_application` and use it as an argument when instantiating a `DjangoWhiteNoise` object.
  • Lastly, and least important, we redirect standard output so to standard error. This isn’t necessary at all and is something I did to make for easier diagnosis of issues while getting the Heroku specific version working.
diff --git a/mysite/mysite/wsgi.py b/mysite/mysite/wsgi.py
index 15c7d49..e5e1e5c 100644
--- a/mysite/mysite/wsgi.py
+++ b/mysite/mysite/wsgi.py
@@ -8,7 +8,20 @@ https://docs.djangoproject.com/en/1.7/howto/deployment/wsgi/
"""

import os
import sys

#Allows us to see useful stuff in Gunicorn output
sys.stdout = sys.stderr

#Rely upon env var 'DYNO` to determine if we are
#running within Heroku
if 'DYNO' in os.environ:
    os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.settings_heroku")
else:
    os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.settings")

from django.core.wsgi import get_wsgi_application
from whitenoise.django import DjangoWhiteNoise

application = get_wsgi_application()
application = DjangoWhiteNoise(application)

 

requirements.txt

The requirements.txt (created as the output from a `pip freeze` command) reflects the libraries installed at any given point.

Here’s the diff of requirements.txt between the local installation and the ‘Heroku’ ready installation.

As can be seen the extra libraries required by the migration to Heroku were :

  • dj-database-url
  • gunicorn
  • whitenoise
diff --git a/requirements.txt b/requirements.txt
index 98b2fd1..4e189d2 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,6 +1,7 @@
Django==1.7.4
Pygments==2.0.2
argparse==1.2.1
dj-database-url==0.3.0
django-extensions==1.5.0
django-pdb==0.4.1
fancycompleter==0.4
gunicorn==19.2.1
@@ -11,5 +12,6 @@ psycopg2==2.6
pyflakes==0.8.1
pyrepl==0.8.4
six==1.9.0
whitenoise==1.0.6
wmctrl==0.1
wsgiref==0.1.2

A general point about Project structure

A good deal of the Heroku documention assumes that your project directory (the one that contains manage.py) is also your root directory . This isn’t how I do things. I prefer my root directory to contain stuff like .gitignore, requirements.txt, README.md etc and to have a directory within the root which is my project directory.

If your project is similarly structured it’s worth bearing in mind that the Procfile required by Heroku should include ” –pythonpath ./mysite” (where ‘mysite’ is the name of your project directory) as an argument to the gunicorn invocation … I had a number of issues before I did this . Here’s an example of the argument in use.

A general point about the Heroku CLI

The Heroku Toolbelt includes the Heroku CLI which allows you to manage Heroku apps from the command line. For instance this :

heroku ps --app foo

Provides a list of running dynos in your ‘foo’ application.

Anyway the strange thing is that it seems to me that almost every command you issue via the Heroku CLI requires the

--app foo

argument, where ‘foo’ is the name of your application, and yet the documentation never mentions that ! You work it out pretty quickly because you don’t do much without without it but it’s strange all the same.

 In conclusion

Using the free levels of Heroku for running a Django project gives you access to a really high quality hosting environment at a very attractive price (free as long as you don’t get too much traffic or data). Once you’ve got over the bumps it works really well and for many people will be a good solution for hobby projects.

Openshift client tools install instructions are wrong for Ubuntu 14.x

Openshift client tools install instructions are wrong for Ubuntu 14.x

Summary

This is a short post to provide a correction to the Openshift getting started instructions for Ubuntu for those using Ubuntu 14.x

Caveat

What follows concerns Ruby and what I know about Ruby could be written on the back of an envelope.

Background

I have an Ubuntu 14.x headless box and I want to make use of the Openshift client tools from it. The Ubuntu machine had never had Ruby installed on it previously and the Openshirt client tools makes use of Ruby so I had to install that.

What do they say ?

Under the heading “Setting up the OpenShift Environment on Ubuntu” you’re told to install Ruby from scratch like this :

$ sudo apt-get install ruby-full rubygems git-core

When I tried that I got :

$ sudo gem install rhc
[sudo] password for rshea:
ERROR:  While executing gem ... (Zlib::DataError)
incorrect header check

So what do you do ?

With help from this Stackoverflow question I adapted the initial command to read

$ sudo apt-get install ruby-full rubygems-integration git-core

Then a bit more

The documentation does tell you to do the following to update your Ruby stack:

$ sudo gem install rubygems-update
$ sudo update_rubygems

It’s a bit odd the way this is tucked away at the bottom as it seems to have been necessary since Ubuntu 11.10. After that I was able to run

$ sudo gem install rhc

followed by

$ rhc setup

After that everything worked as it should.

Dominate: Manipulate HTML DOM using Python

Dominate: Manipulate HTML DOM using Python

This is a talk I gave in March 2014 which I never got around to doing a blog post for.

Dominate is : “a Python library for creating and manipulating HTML documents using an elegant DOM API” .

There’s a part of me which deep down feels that using templates is “wrong” and that procedural processing is the way to go … it might be a deluded part of me but it is a part of me ! Anyway as a result when I read about Dominate I had to give it a spin.

I gave a talk to the Wellington branch of the New Zealand Python User group and the slides are here : https://s3.amazonaws.com/shearichard/dominate-demo-NZPUG-2014-March.pdf . They are really very simple examples but, I hope, instructive.

As part of that I put my examples up on Bitbucket and they’re available here : https://bitbucket.org/rshea/nzpug201403/src

 

Charting with Django : three approaches

Charting with Django : three approaches

This is a belated (and hasty) post about a talk I gave in October 2013 at the Wellington branch of the New Zealand Python User Group.

Comparing three different charting libraries

In the talk I compared three different approaches to providing charts within a Django project.

The three different approaches used were :

Chartit
Django-Graphos
Chartkick

Sample code and slides

I built a Django project and an application for each of the three approaches and that code is available here : https://bitbucket.org/rshea/django-charts-demo .

The slides for my talk are available here as a PDF : https://s3.amazonaws.com/shearichard/django-with-charts.pdf

Conclusion in brief

If you’re only interested in my conclusion I would suggest Django-Graphos – read the slides for why.

Python you want a string you get a tuple – howzat ?

Python you want a string you get a tuple – howzat ?

Summary

How come you’re getting a tuple when you passed a string ?

Don’t do this at home

This is something that happened to me today. It really perplexed me so maybe this post will help someone else.

My class

I’d got a class a bit like the one below:

class cat(object):
    def __init__(self, name, colour, weight):
        self.name = name
        self.colour = colour,
        self.weight = weight
    def report(self):
        print self.name
        print self.colour
        print self.weight

Using it

But when I tried to use it like this:

mycat = cat('Garfield', 'Marmalade', 10)
mycat.report()

the output looked like this :

Garfield
('Marmalade',)
10

The problem being the attribute `colour` was being stored as a tuple.

The Answer

Looking back on it the problem is quite obvious but I was so busy looking at other parts of the situation (which was significantly more complex than the my cat example I missed it for quite a while.

Within the __init__ method I had inadvertently appended a comma onto the end of the self.colour assignment and Python takes that to mean, in our example, colour is the first element of a tuple.

repoze.catalog and ZODB beginners example – part 2

repoze.catalog and ZODB beginners example – part 2

Summary

The second of two posts which illustrate how to use repoze.catalog alongside ZODB. The first post can be seen at: “repoze.catalog and ZODB beginners example – part 1” .

Where we’re up to

In the first post I explained how you can have objects stored within a ZODB database indexed by repoze.catalog and why that was sometimes a good idea. In this post I’m going to demonstrate searching for the previously stored objects using repoze.catalog’s search facilities. If you haven’t read the first post I suggest you read that now because what follows assumes you have.

Finding ZODB objects with repoze.catalog

As discussed in the first post repoze.catalog allows you to index arbitrary properties of the objects you save into a ZODB database and then do complex searches on those properties to extract only the objects you’re interested in.

The example I’m showing here demonstrates how we can search through those objects we added in the example of the last post using a number of criteria.

Example Code

Here’s my example code and underneath I’ll expand a little more on what each part does:

'''
Demonstrates how to use repoze.catalog to find objects
being stored in ZODB. This example has the catalog and ZODB
within the same repository
'''
from myzodb import MyZODB
from persistent import Persistent

from repoze.catalog.catalog import FileStorageCatalogFactory
from repoze.catalog.catalog import ConnectionManager
from repoze.catalog.query import InRange, Lt

class City(Persistent):
    '''Represents a City by name and population'''
    def __init__(self, cityname, citypop):
        self.name = cityname
        self.population = citypop
    def __str__(self):
        return "%s  (Pop: %s)" % \
                (self.name, \
                str(self.population))

def print_all_city_instances(myzodbinst):
    '''
    Pull everything keyed under 'cities' out of the
    ZODB instance (without any regard to the
    repoze.catalog cataloguing and print them
    '''
    print ""
    print "About to dump all City Instances:"
    for acity in myzodbinst.dbroot['cities'].itervalues():
        print acity
    print ""

def print_city_query_results(myzodbinst, res):
    '''
    Use the list of integers returned by a
    repoze.catalog query to pull elements
    keyed underneath 'cities' in the ZODB
    instance which we are using repoze.catalog
    to catalogue
    '''
    print ""
    print "Objects stored in ZODB corresponding"
    print "to the repoze.catalog resultset:"
    for idx in res:
        print myzodbinst.dbroot['cities'][idx]
    print ""

if __name__ == '__main__':
    #Setup access to the repoze.catalog instance
    factory = FileStorageCatalogFactory('../data/mdcatalog.db',
                                        'mycatalog')
    manager = ConnectionManager()
    catalog = factory(manager)
    #Setup access to the ZODB instance containing data
    #catalogued by the repoze.catalog instance
    myzodbinstance = MyZODB('../data/mdzdb.fs')
    #Demonstrate we really have all the Cities
    print_all_city_instances(myzodbinstance)
    #Demonstrate use of `Lt` on the `population` index
    print ""
    print "*" * 60
    print "Looking for 'less than' value on the `population` index"
    print "Populations less than 1,000,000"
    numdocs, results = catalog.query(Lt('populations', 1000000))
    print "Raw Result: "
    print (numdocs, [ x for x in results ])
    print_city_query_results(myzodbinstance, results)

    #Demonstrate use of `InRange` on the `population` index
    print ""
    print "*" * 60
    print "Looking for 'InRange' values on the `population` index"
    print "Populations between 1,000,000 and 4,000,000"
    numdocs, results = catalog.query(InRange('populations',
                                              1000000, 4000000))
    print "Raw Result: "
    print (numdocs, [ x for x in results ])
    print_city_query_results(myzodbinstance, results)

Example Step by Step

Here’s a breakdown on what’s happening in the above example

Initialize repoze.catalog

factory = FileStorageCatalogFactory('../data/mdcatalog.db', 'mycatalog')
manager = ConnectionManager()
catalog = factory(manager)

Here we connect to our repoze.catalog repository and instantiate a `catalog` object

Make our ZODB database ready for use

myzodbinstance = MyZODB('../data/mdzdb.fs')

`MyZODB` is a convenience class which wraps up the instantiation of a ZODB database instance and provides : `storage`; `db`;`connection`; and `dbroot` properties to help the programmer interact with the ZODB database, connection, storage objects. `MyZODB` also provides a close method to cleanly close the ZODB database, connection and storage.

`MyZODB` is not explicitly included in the above example but it looks like this :

from ZODB import FileStorage, DB
class MyZODB(object):
    '''Manage the state of a ZODB FileStorage connection'''
    def __init__(self, path):
        self.storage = FileStorage.FileStorage(path)
        self.db = DB(self.storage)
        self.connection = self.db.open()
        self.dbroot = self.connection.root()
    def close(self):
        self.connection.close()
        self.db.close()
        self.storage.close()</pre>

Dump contents of ZODB without using repoze.catalog

The first data access we do in the above example is just a simple dump of every object, held under the key ‘cities’, in our ZODB database. Notice we are not using repoze.catalog at all at this point. By viewing this data we can be sure that the subsequent queries using repoze.catalog do what we think they do.

So we call the function `print_all_city_instances`

print_all_city_instances(myzodbinstance)

which iterates over the ‘cities’ element of the `dbroot` property of the ZODB `connection` to allow us to see everything that’s in the ZODB database.

for acity in myzodbinst.dbroot['cities'].itervalues():
    print acity

Our output looks like this :

About to dump all City Instances:
Windhoek  (Pop: 322500)
Pretoria  (Pop: 525387)
Nairobi  (Pop: 3138295)
Maputo  (Pop: 1244227)
Jakarta  (Pop: 10187595)
Canberra  (Pop: 358222)
Wellington  (Pop: 393400)
Santiago  (Pop: 5428590)
Buenos Aires  (Pop: 2891082)

Demonstrating the `Lt` function of repoze.catalog

The next thing that happens in the sample is to make use of the `Lt` function offered by repoze.catalog

numdocs, results = catalog.query(Lt('populations', 1000000))

In the previous post when we initialized our repoze.catalog we created a `populations` index which was associated with the `population` property of our `City` class (take a look at the previous post if you’ve forgotten the details).

Our use of the `Lt` method asks repoze.catalog to find all `City` instances stored in our ZODB database with a population of less than 1,000,000. As you can see we get two objects returned which I’ve named `numdocs` and `results`.

`numdocs` is an integer showing how many instances have been found which meet the criteria.

`results` is a list of integers which are keys used when storing into ZODB those objects which satisfy the search criteria.

We then use our function

print_city_query_results(myzodbinstance, results)

to output the objects found. The resulting output looks like this :

Objects stored in ZODB corresponding
to the repoze.catalog resultset:
Windhoek  (Pop: 322500)
Pretoria  (Pop: 525387)
Canberra  (Pop: 358222)
Wellington  (Pop: 393400)

It’s worth mentioning that whilst there are many comporator methods offered by repoze.catalog.query not all of them are applicable to all index types. In this example of the `Lt` method we are searching on an index, ‘populations’ of type CatalogTextIndex which does offer the `Lt` method but not all do.

Demonstrating the `InRange` function of repoze.catalog

Finally in the sample we show off the `InRange` function offered by repoze.catalog

 numdocs, results = catalog.query(InRange('populations',
                                          1000000, 4000000))

As with the previous example we utilise the previously created catalog index ‘populations’ to find instances of `City` – in this case those instances that have their `population` property set to a value between 1,000,000 and 4,000,000.

We do this by using the  `InRange` method offered by repoze.catalog. As with the `Lt` example above we get two objects returned which I’ve named `numdocs` and `results`.

`numdocs` is an integer showing how many instances have been found which meet the criteria.

`results` is a list of integers which are keys used when storing into ZODB those objects which satisfy the search critiera.

We then use our function

print_city_query_results(myzodbinstance, results)

to output the objects found. The resulting output looks like this :

Objects stored in ZODB corresponding
to the repoze.catalog resultset:
Nairobi  (Pop: 3138295)
Maputo  (Pop: 1244227)
Buenos Aires  (Pop: 2891082)

Credit where credits due

As with part one of this two part post the example I’ve shown here owes some parts to one of the examples on the repoze.catalog website and the structure of the `myZODB` was taken from the article cited above,  ‘Example Driven ZODB‘ .

repoze.catalog and ZODB beginners example – part 1

repoze.catalog and ZODB beginners example – part 1

Summary

The first of two posts which illustrate how to use repoze.catalog alongside ZODB

What’s ZODB ?

To quote the ZODB home page :

“The ZODB is a native object database, that stores your objects while allowing you to work with any paradigms that can be expressed in Python. Thereby your code becomes simpler, more robust and easier to understand”

What’s repoze.catalog ?

repoze.catalog is one of a number of frameworks which can be used to supply indexing for ZODB for those circumstances where accessing objects stored in ZODB would otherwise be unacceptably slow

Intended Audience

I’m assuming that readers of this post have a basic familiarity with ZODB. If you don’t there are lots of good resources out there of which ‘Example Driven ZODB‘ is a good example.

What’s the purpose of this post ?

For any reasonably experienced Python programmer using ZODB and repoze.catalog is pretty straightforward. Unfortunately a new user of repoze.catalog cannot find an example on the repoze.catalog site which shows both how to catalog items and save them into ZODB. This is understandable as repoze.catalog is not only for use with ZODB but I thought it was worthwhile doing a specific example for that scenario and that’s what on this page.

Where repoze.catalog helps ZODB

Because of the nature of ZODB it’s easy to access objects by the value they’re keyed on but otherwise it’s a question of a sequential search.

So instances of a class that look like this :

class Country(Persistent):
    def __init__(self, pop):
        self.name = name
        self.population = pop

Might be saved into the `root` property of a ZODB `connection` object like this :

root['un']['nz'] = Country('New Zealand', 4000000)

But subsequently if we wanted to obtain `Country` instances on the basis of population the key doesn’t help us at all and a scan of all `Country` objects would be necessary, like this :

for cou in root['un'].itervalues():
    if cou.population > 1000000:
        print cou.name

When we use repoze.catalog to catalogue a ZODB database we specify properties of objects that will be saved in ZODB and which interest us and by which will subsequently want to find the objects.

repoze.catalog allows us quickly and easily search for objects with property values that interest us.

Example Code

Here’s my example code and underneath I’ll expand a little more on what each part does:

'''
Demonstrates how to use repoze.catalog to catalogue objects
being stored in ZODB. This example has the catalog and ZODB
database as seperate repositories
'''
from repoze.catalog.catalog import FileStorageCatalogFactory
from repoze.catalog.catalog import ConnectionManager

from repoze.catalog.indexes.field import CatalogFieldIndex
from repoze.catalog.indexes.text import CatalogTextIndex

import transaction
from persistent import Persistent
from BTrees.OOBTree import OOBTree

from myzodb import MyZODB

factory = FileStorageCatalogFactory('../data/mdcatalog.db', 'mycatalog')

_initialized = False

def initialize_catalog():
    '''
    Create a repoze.catalog instance and specify
    indices of intereset

    NB: Use of global variable
    '''
    global _initialized
    if not _initialized:
        # create a catalog
        manager = ConnectionManager()
        catalog = factory(manager)
        # set up indexes
        catalog['names'] = CatalogTextIndex('name')
        catalog['populations'] = CatalogFieldIndex('population')
        # commit the indexes
        manager.commit()
        manager.close()
        _initialized = True

class City(Persistent):
    '''Represents a City by name and population'''
    def __init__(self, cityname, citypop):
        self.name = cityname
        self.population = citypop
    def __str__(self):
        return "%s  (Pop: %s)" % \
                (self.name, \
                str(self.population))

if __name__ == '__main__':
    initialize_catalog()
    manager = ConnectionManager()
    catalog = factory(manager)
    myzodbinstance = MyZODB('../data/mdzdb.fs')
    myzodbinstance.dbroot['cities'] = OOBTree()

    #For ease of demonstration set up a local dict
    #containing a number of `City` instances keyed
    #by a unique integer
    cities = {
        1:City('Windhoek', 322500),
        2:City('Pretoria', 525387),
        3:City('Nairobi', 3138295),
        4:City('Maputo', 1244227),
        5:City('Jakarta', 10187595),
        6:City('Canberra', 358222),
        7:City('Wellington', 393400),
        8:City('Santiago', 5428590),
        9:City('Buenos Aires', 2891082),
    }
    #Iterate over our local dict and for each
    #element generate the catlog entry for
    #repoze.catalog and add the corresponding
    #instance to the ZODB database we are
    #cataloguing
    for docid, doc in cities.items():
        catalog.index_doc(docid, doc)
        myzodbinstance.dbroot['cities'][docid] = doc
        transaction.commit()

    myzodbinstance.close()

Example Step by Step

Here’s a breakdown on what’s happening in the above example

Initialize repoze.catalog

initialize_catalog()
manager = ConnectionManager()
catalog = factory(manager)

The `initialize_catalog` function creates a repoze.catalog instance and initializes two indices : `names` and `populations`. These index the `name` and `population` properties of any objects indexed with the repoze.catalog instance just created

Make our ZODB database ready for use

myzodbinstance = MyZODB('../data/mdzdb.fs')

`MyZODB` is a convenience class which wraps up the instantiation of a ZODB database instance and provides : `storage`; `db`;`connection`; and `dbroot` properties to help the programmer interact with the ZODB database, connection, storage objects. `MyZODB` also provides a close method to cleanly close the ZODB database, connection and storage.

`MyZODB` is not explicitly included in the above example but it looks like this :

from ZODB import FileStorage, DB
class MyZODB(object):
    '''Manage the state of a ZODB FileStorage connection'''
    def __init__(self, path):
        self.storage = FileStorage.FileStorage(path)
        self.db = DB(self.storage)
        self.connection = self.db.open()
        self.dbroot = self.connection.root()
    def close(self):
        self.connection.close()
        self.db.close()
        self.storage.close()</pre>

Create a sub-tree in ZODB for our `City` objects

Now we have an instance of `MyZODB` we can treat the `dbroot` property (which corresponds to the ZODB `dbroot` property of the ZODB `connection` object) as a plain old dictionary and assign a value to it under some key of our choosing, for our example because we’re going to save a set of `City` objects we’ve chosen ‘cities’.

At this stage we just assign an instance of `OOBTree` to that ‘cities’ key. An OOBTree instance acts like a dictionary but, when a lot of elements are within it, works much more efficiently for the purposes of ZODB.

myzodbinstance.dbroot['cities'] = OOBTree()

Create a set of `City` objects

Now we pause for a moment and make ourselves a set of `City` objects and put them into a dictionary for later use.

What’s significant here is that the key used to save each `City` instance is a unique integer which has no specific meaning in itself, we’ll see why in a moment.

cities = {
1:City('Windhoek', 322500),
2:City('Pretoria', 525387),
3:City('Nairobi', 3138295),
4:City('Maputo', 1244227),
5:City('Jakarta', 10187595),
6:City('Canberra', 358222),
7:City('Wellington', 393400),
8:City('Santiago', 5428590),
9:City('Buenos Aires', 2891082),
}

Save `City` objects to ZODB and index them

Now at last we’re going to do what we’ve come for.

We iterate over our set of `City` instances and for each one we make use of the `index_doc` method of repoze.catalog . Notice that the two arguments are the integer we’ve arbitarily associated with each `City` instance, ‘docid’ in this example, and the `City` instance itself, ‘doc’ in this example. By using the `index_doc` method we update the catalog entries maintained by repoze.catalog

In the same interation we assign the `City` object instance, ‘doc’ to our `OOBTree` (stored under the ‘cities’ key of `dbroot`) using as an index the same integer we’ve just passed to the `index_doc` call.

Finally we make use of the ZODB Transaction manager to commit our changes. Because repoze.catalog is actually a ZODB database inside our single transaction is sufficient to commit both the catalog update and the actual update of the ZODB database.

for docid, doc in cities.items():
    catalog.index_doc(docid, doc)
    myzodbinstance.dbroot['cities'][docid] = doc
    transaction.commit()

Credit where credits due

The example I’ve shown here owes some parts to one of the examples on the repoze.catalog website. The structure of the `myZODB` was taken from the article cited above,  ‘Example Driven ZODB‘ . Lastly I got some useful advice in response to a question I posed on StackOverflow and I’m grateful to the people who provided answers .

In Closing

That’s all there is to it ! In many small scale instances there’s no need to do anything other than use ZODB as it comes and not worry about indexing – machines are fast and many applications deal with relatively small data sets however if you do need it repoze.catalog (or one of the other, similar, cataloguing tools) is a useful way to squeeze more speed out of ZODB.

This has been a very long blog post by my standards so I’m going to show how to access the data indexed under repoze.catalog (and prove that it all actually works !) in a blog post next week.

Droopy : Very simple HTTP file uploads

Droopy : Very simple HTTP file uploads

Summary

Droopy is a mini web server which makes allowing file uploads very easy

I just want to upload this file !

I’ve been writing a system which shares processing tasks across two machines.

Part of this involved shipping an image file from one machine to the other; doing some stuff to the file and then; bringing the file back again.

I was looking around for easy ways to move the files and I found droopy .

To quote the author, Pierre  Duqeusne, “Droopy is a mini Web server whose sole purpose is to let others upload files to your computer”. It’s a single python script so as long you’ve got Python installed starting the server is as simple as this

python droopy --message "Upload the bb images here" --picture 0.jpg --dl 8080

And you’re ready to upload files immediately

Image in screendump courtesy of paloetic via flickr

Because of the `–dl` argument used to launch droopy in my example above you also have the option to download files from the same director you’re uploading to.

How I used it

Uploading Files using Requests and Droopy

My solution was written in python so uploading files to the droopy server was very easy using the excellent requests library

def uploadfile(filepath, uploadurl, fileformelementname="upfile"):
    '''
    This will invoke an upload to the webserver
    on the VM
    '''

    files = {fileformelementname : open(filepath,'rb')}
    r = requests.post(uploadurl, files=files)
    return r.status_code

uploadStatus = uploadfile(currentFile.fullpath, UPLOADURL, "upfile")

Download Files using urllib and Droopy

I can’t now remember why but I decided to do the download using urllib instead of Requests

def downloadfile(filename, dloadurl, outputdirectory):
    '''
    Pull the converted file off the droopy server
    '''

    fullurl = urljoin(dloadurl, filename)
    fulloutputpath = os.path.join(outputdirectory, 'divided', filename)

    urllib.urlretrieve(fullurl, fulloutputpath)

downloadfile(currentFile.outputName, DOWNLOADURL, IMGDIR)

Summary

Droopy provides a very useful, very simple web server for both uploading and downloading files. Combined with Python it makes a very useful facility for moving files around under programmtic control.

Versions

All of the above was done on Python 2.7.x.