repoze.catalog and ZODB beginners example – part 2

repoze.catalog and ZODB beginners example – part 2

Summary

The second of two posts which illustrate how to use repoze.catalog alongside ZODB. The first post can be seen at: “repoze.catalog and ZODB beginners example – part 1” .

Where we’re up to

In the first post I explained how you can have objects stored within a ZODB database indexed by repoze.catalog and why that was sometimes a good idea. In this post I’m going to demonstrate searching for the previously stored objects using repoze.catalog’s search facilities. If you haven’t read the first post I suggest you read that now because what follows assumes you have.

Finding ZODB objects with repoze.catalog

As discussed in the first post repoze.catalog allows you to index arbitrary properties of the objects you save into a ZODB database and then do complex searches on those properties to extract only the objects you’re interested in.

The example I’m showing here demonstrates how we can search through those objects we added in the example of the last post using a number of criteria.

Example Code

Here’s my example code and underneath I’ll expand a little more on what each part does:

'''
Demonstrates how to use repoze.catalog to find objects
being stored in ZODB. This example has the catalog and ZODB
within the same repository
'''
from myzodb import MyZODB
from persistent import Persistent

from repoze.catalog.catalog import FileStorageCatalogFactory
from repoze.catalog.catalog import ConnectionManager
from repoze.catalog.query import InRange, Lt

class City(Persistent):
    '''Represents a City by name and population'''
    def __init__(self, cityname, citypop):
        self.name = cityname
        self.population = citypop
    def __str__(self):
        return "%s  (Pop: %s)" % \
                (self.name, \
                str(self.population))

def print_all_city_instances(myzodbinst):
    '''
    Pull everything keyed under 'cities' out of the
    ZODB instance (without any regard to the
    repoze.catalog cataloguing and print them
    '''
    print ""
    print "About to dump all City Instances:"
    for acity in myzodbinst.dbroot['cities'].itervalues():
        print acity
    print ""

def print_city_query_results(myzodbinst, res):
    '''
    Use the list of integers returned by a
    repoze.catalog query to pull elements
    keyed underneath 'cities' in the ZODB
    instance which we are using repoze.catalog
    to catalogue
    '''
    print ""
    print "Objects stored in ZODB corresponding"
    print "to the repoze.catalog resultset:"
    for idx in res:
        print myzodbinst.dbroot['cities'][idx]
    print ""

if __name__ == '__main__':
    #Setup access to the repoze.catalog instance
    factory = FileStorageCatalogFactory('../data/mdcatalog.db',
                                        'mycatalog')
    manager = ConnectionManager()
    catalog = factory(manager)
    #Setup access to the ZODB instance containing data
    #catalogued by the repoze.catalog instance
    myzodbinstance = MyZODB('../data/mdzdb.fs')
    #Demonstrate we really have all the Cities
    print_all_city_instances(myzodbinstance)
    #Demonstrate use of `Lt` on the `population` index
    print ""
    print "*" * 60
    print "Looking for 'less than' value on the `population` index"
    print "Populations less than 1,000,000"
    numdocs, results = catalog.query(Lt('populations', 1000000))
    print "Raw Result: "
    print (numdocs, [ x for x in results ])
    print_city_query_results(myzodbinstance, results)

    #Demonstrate use of `InRange` on the `population` index
    print ""
    print "*" * 60
    print "Looking for 'InRange' values on the `population` index"
    print "Populations between 1,000,000 and 4,000,000"
    numdocs, results = catalog.query(InRange('populations',
                                              1000000, 4000000))
    print "Raw Result: "
    print (numdocs, [ x for x in results ])
    print_city_query_results(myzodbinstance, results)

Example Step by Step

Here’s a breakdown on what’s happening in the above example

Initialize repoze.catalog

factory = FileStorageCatalogFactory('../data/mdcatalog.db', 'mycatalog')
manager = ConnectionManager()
catalog = factory(manager)

Here we connect to our repoze.catalog repository and instantiate a `catalog` object

Make our ZODB database ready for use

myzodbinstance = MyZODB('../data/mdzdb.fs')

`MyZODB` is a convenience class which wraps up the instantiation of a ZODB database instance and provides : `storage`; `db`;`connection`; and `dbroot` properties to help the programmer interact with the ZODB database, connection, storage objects. `MyZODB` also provides a close method to cleanly close the ZODB database, connection and storage.

`MyZODB` is not explicitly included in the above example but it looks like this :

from ZODB import FileStorage, DB
class MyZODB(object):
    '''Manage the state of a ZODB FileStorage connection'''
    def __init__(self, path):
        self.storage = FileStorage.FileStorage(path)
        self.db = DB(self.storage)
        self.connection = self.db.open()
        self.dbroot = self.connection.root()
    def close(self):
        self.connection.close()
        self.db.close()
        self.storage.close()</pre>

Dump contents of ZODB without using repoze.catalog

The first data access we do in the above example is just a simple dump of every object, held under the key ‘cities’, in our ZODB database. Notice we are not using repoze.catalog at all at this point. By viewing this data we can be sure that the subsequent queries using repoze.catalog do what we think they do.

So we call the function `print_all_city_instances`

print_all_city_instances(myzodbinstance)

which iterates over the ‘cities’ element of the `dbroot` property of the ZODB `connection` to allow us to see everything that’s in the ZODB database.

for acity in myzodbinst.dbroot['cities'].itervalues():
    print acity

Our output looks like this :

About to dump all City Instances:
Windhoek  (Pop: 322500)
Pretoria  (Pop: 525387)
Nairobi  (Pop: 3138295)
Maputo  (Pop: 1244227)
Jakarta  (Pop: 10187595)
Canberra  (Pop: 358222)
Wellington  (Pop: 393400)
Santiago  (Pop: 5428590)
Buenos Aires  (Pop: 2891082)

Demonstrating the `Lt` function of repoze.catalog

The next thing that happens in the sample is to make use of the `Lt` function offered by repoze.catalog

numdocs, results = catalog.query(Lt('populations', 1000000))

In the previous post when we initialized our repoze.catalog we created a `populations` index which was associated with the `population` property of our `City` class (take a look at the previous post if you’ve forgotten the details).

Our use of the `Lt` method asks repoze.catalog to find all `City` instances stored in our ZODB database with a population of less than 1,000,000. As you can see we get two objects returned which I’ve named `numdocs` and `results`.

`numdocs` is an integer showing how many instances have been found which meet the criteria.

`results` is a list of integers which are keys used when storing into ZODB those objects which satisfy the search criteria.

We then use our function

print_city_query_results(myzodbinstance, results)

to output the objects found. The resulting output looks like this :

Objects stored in ZODB corresponding
to the repoze.catalog resultset:
Windhoek  (Pop: 322500)
Pretoria  (Pop: 525387)
Canberra  (Pop: 358222)
Wellington  (Pop: 393400)

It’s worth mentioning that whilst there are many comporator methods offered by repoze.catalog.query not all of them are applicable to all index types. In this example of the `Lt` method we are searching on an index, ‘populations’ of type CatalogTextIndex which does offer the `Lt` method but not all do.

Demonstrating the `InRange` function of repoze.catalog

Finally in the sample we show off the `InRange` function offered by repoze.catalog

 numdocs, results = catalog.query(InRange('populations',
                                          1000000, 4000000))

As with the previous example we utilise the previously created catalog index ‘populations’ to find instances of `City` – in this case those instances that have their `population` property set to a value between 1,000,000 and 4,000,000.

We do this by using the  `InRange` method offered by repoze.catalog. As with the `Lt` example above we get two objects returned which I’ve named `numdocs` and `results`.

`numdocs` is an integer showing how many instances have been found which meet the criteria.

`results` is a list of integers which are keys used when storing into ZODB those objects which satisfy the search critiera.

We then use our function

print_city_query_results(myzodbinstance, results)

to output the objects found. The resulting output looks like this :

Objects stored in ZODB corresponding
to the repoze.catalog resultset:
Nairobi  (Pop: 3138295)
Maputo  (Pop: 1244227)
Buenos Aires  (Pop: 2891082)

Credit where credits due

As with part one of this two part post the example I’ve shown here owes some parts to one of the examples on the repoze.catalog website and the structure of the `myZODB` was taken from the article cited above,  ‘Example Driven ZODB‘ .

repoze.catalog and ZODB beginners example – part 1

repoze.catalog and ZODB beginners example – part 1

Summary

The first of two posts which illustrate how to use repoze.catalog alongside ZODB

What’s ZODB ?

To quote the ZODB home page :

“The ZODB is a native object database, that stores your objects while allowing you to work with any paradigms that can be expressed in Python. Thereby your code becomes simpler, more robust and easier to understand”

What’s repoze.catalog ?

repoze.catalog is one of a number of frameworks which can be used to supply indexing for ZODB for those circumstances where accessing objects stored in ZODB would otherwise be unacceptably slow

Intended Audience

I’m assuming that readers of this post have a basic familiarity with ZODB. If you don’t there are lots of good resources out there of which ‘Example Driven ZODB‘ is a good example.

What’s the purpose of this post ?

For any reasonably experienced Python programmer using ZODB and repoze.catalog is pretty straightforward. Unfortunately a new user of repoze.catalog cannot find an example on the repoze.catalog site which shows both how to catalog items and save them into ZODB. This is understandable as repoze.catalog is not only for use with ZODB but I thought it was worthwhile doing a specific example for that scenario and that’s what on this page.

Where repoze.catalog helps ZODB

Because of the nature of ZODB it’s easy to access objects by the value they’re keyed on but otherwise it’s a question of a sequential search.

So instances of a class that look like this :

class Country(Persistent):
    def __init__(self, pop):
        self.name = name
        self.population = pop

Might be saved into the `root` property of a ZODB `connection` object like this :

root['un']['nz'] = Country('New Zealand', 4000000)

But subsequently if we wanted to obtain `Country` instances on the basis of population the key doesn’t help us at all and a scan of all `Country` objects would be necessary, like this :

for cou in root['un'].itervalues():
    if cou.population > 1000000:
        print cou.name

When we use repoze.catalog to catalogue a ZODB database we specify properties of objects that will be saved in ZODB and which interest us and by which will subsequently want to find the objects.

repoze.catalog allows us quickly and easily search for objects with property values that interest us.

Example Code

Here’s my example code and underneath I’ll expand a little more on what each part does:

'''
Demonstrates how to use repoze.catalog to catalogue objects
being stored in ZODB. This example has the catalog and ZODB
database as seperate repositories
'''
from repoze.catalog.catalog import FileStorageCatalogFactory
from repoze.catalog.catalog import ConnectionManager

from repoze.catalog.indexes.field import CatalogFieldIndex
from repoze.catalog.indexes.text import CatalogTextIndex

import transaction
from persistent import Persistent
from BTrees.OOBTree import OOBTree

from myzodb import MyZODB

factory = FileStorageCatalogFactory('../data/mdcatalog.db', 'mycatalog')

_initialized = False

def initialize_catalog():
    '''
    Create a repoze.catalog instance and specify
    indices of intereset

    NB: Use of global variable
    '''
    global _initialized
    if not _initialized:
        # create a catalog
        manager = ConnectionManager()
        catalog = factory(manager)
        # set up indexes
        catalog['names'] = CatalogTextIndex('name')
        catalog['populations'] = CatalogFieldIndex('population')
        # commit the indexes
        manager.commit()
        manager.close()
        _initialized = True

class City(Persistent):
    '''Represents a City by name and population'''
    def __init__(self, cityname, citypop):
        self.name = cityname
        self.population = citypop
    def __str__(self):
        return "%s  (Pop: %s)" % \
                (self.name, \
                str(self.population))

if __name__ == '__main__':
    initialize_catalog()
    manager = ConnectionManager()
    catalog = factory(manager)
    myzodbinstance = MyZODB('../data/mdzdb.fs')
    myzodbinstance.dbroot['cities'] = OOBTree()

    #For ease of demonstration set up a local dict
    #containing a number of `City` instances keyed
    #by a unique integer
    cities = {
        1:City('Windhoek', 322500),
        2:City('Pretoria', 525387),
        3:City('Nairobi', 3138295),
        4:City('Maputo', 1244227),
        5:City('Jakarta', 10187595),
        6:City('Canberra', 358222),
        7:City('Wellington', 393400),
        8:City('Santiago', 5428590),
        9:City('Buenos Aires', 2891082),
    }
    #Iterate over our local dict and for each
    #element generate the catlog entry for
    #repoze.catalog and add the corresponding
    #instance to the ZODB database we are
    #cataloguing
    for docid, doc in cities.items():
        catalog.index_doc(docid, doc)
        myzodbinstance.dbroot['cities'][docid] = doc
        transaction.commit()

    myzodbinstance.close()

Example Step by Step

Here’s a breakdown on what’s happening in the above example

Initialize repoze.catalog

initialize_catalog()
manager = ConnectionManager()
catalog = factory(manager)

The `initialize_catalog` function creates a repoze.catalog instance and initializes two indices : `names` and `populations`. These index the `name` and `population` properties of any objects indexed with the repoze.catalog instance just created

Make our ZODB database ready for use

myzodbinstance = MyZODB('../data/mdzdb.fs')

`MyZODB` is a convenience class which wraps up the instantiation of a ZODB database instance and provides : `storage`; `db`;`connection`; and `dbroot` properties to help the programmer interact with the ZODB database, connection, storage objects. `MyZODB` also provides a close method to cleanly close the ZODB database, connection and storage.

`MyZODB` is not explicitly included in the above example but it looks like this :

from ZODB import FileStorage, DB
class MyZODB(object):
    '''Manage the state of a ZODB FileStorage connection'''
    def __init__(self, path):
        self.storage = FileStorage.FileStorage(path)
        self.db = DB(self.storage)
        self.connection = self.db.open()
        self.dbroot = self.connection.root()
    def close(self):
        self.connection.close()
        self.db.close()
        self.storage.close()</pre>

Create a sub-tree in ZODB for our `City` objects

Now we have an instance of `MyZODB` we can treat the `dbroot` property (which corresponds to the ZODB `dbroot` property of the ZODB `connection` object) as a plain old dictionary and assign a value to it under some key of our choosing, for our example because we’re going to save a set of `City` objects we’ve chosen ‘cities’.

At this stage we just assign an instance of `OOBTree` to that ‘cities’ key. An OOBTree instance acts like a dictionary but, when a lot of elements are within it, works much more efficiently for the purposes of ZODB.

myzodbinstance.dbroot['cities'] = OOBTree()

Create a set of `City` objects

Now we pause for a moment and make ourselves a set of `City` objects and put them into a dictionary for later use.

What’s significant here is that the key used to save each `City` instance is a unique integer which has no specific meaning in itself, we’ll see why in a moment.

cities = {
1:City('Windhoek', 322500),
2:City('Pretoria', 525387),
3:City('Nairobi', 3138295),
4:City('Maputo', 1244227),
5:City('Jakarta', 10187595),
6:City('Canberra', 358222),
7:City('Wellington', 393400),
8:City('Santiago', 5428590),
9:City('Buenos Aires', 2891082),
}

Save `City` objects to ZODB and index them

Now at last we’re going to do what we’ve come for.

We iterate over our set of `City` instances and for each one we make use of the `index_doc` method of repoze.catalog . Notice that the two arguments are the integer we’ve arbitarily associated with each `City` instance, ‘docid’ in this example, and the `City` instance itself, ‘doc’ in this example. By using the `index_doc` method we update the catalog entries maintained by repoze.catalog

In the same interation we assign the `City` object instance, ‘doc’ to our `OOBTree` (stored under the ‘cities’ key of `dbroot`) using as an index the same integer we’ve just passed to the `index_doc` call.

Finally we make use of the ZODB Transaction manager to commit our changes. Because repoze.catalog is actually a ZODB database inside our single transaction is sufficient to commit both the catalog update and the actual update of the ZODB database.

for docid, doc in cities.items():
    catalog.index_doc(docid, doc)
    myzodbinstance.dbroot['cities'][docid] = doc
    transaction.commit()

Credit where credits due

The example I’ve shown here owes some parts to one of the examples on the repoze.catalog website. The structure of the `myZODB` was taken from the article cited above,  ‘Example Driven ZODB‘ . Lastly I got some useful advice in response to a question I posed on StackOverflow and I’m grateful to the people who provided answers .

In Closing

That’s all there is to it ! In many small scale instances there’s no need to do anything other than use ZODB as it comes and not worry about indexing – machines are fast and many applications deal with relatively small data sets however if you do need it repoze.catalog (or one of the other, similar, cataloguing tools) is a useful way to squeeze more speed out of ZODB.

This has been a very long blog post by my standards so I’m going to show how to access the data indexed under repoze.catalog (and prove that it all actually works !) in a blog post next week.

Droopy : Very simple HTTP file uploads

Droopy : Very simple HTTP file uploads

Summary

Droopy is a mini web server which makes allowing file uploads very easy

I just want to upload this file !

I’ve been writing a system which shares processing tasks across two machines.

Part of this involved shipping an image file from one machine to the other; doing some stuff to the file and then; bringing the file back again.

I was looking around for easy ways to move the files and I found droopy .

To quote the author, Pierre  Duqeusne, “Droopy is a mini Web server whose sole purpose is to let others upload files to your computer”. It’s a single python script so as long you’ve got Python installed starting the server is as simple as this

python droopy --message "Upload the bb images here" --picture 0.jpg --dl 8080

And you’re ready to upload files immediately

Image in screendump courtesy of paloetic via flickr

Because of the `–dl` argument used to launch droopy in my example above you also have the option to download files from the same director you’re uploading to.

How I used it

Uploading Files using Requests and Droopy

My solution was written in python so uploading files to the droopy server was very easy using the excellent requests library

def uploadfile(filepath, uploadurl, fileformelementname="upfile"):
    '''
    This will invoke an upload to the webserver
    on the VM
    '''

    files = {fileformelementname : open(filepath,'rb')}
    r = requests.post(uploadurl, files=files)
    return r.status_code

uploadStatus = uploadfile(currentFile.fullpath, UPLOADURL, "upfile")

Download Files using urllib and Droopy

I can’t now remember why but I decided to do the download using urllib instead of Requests

def downloadfile(filename, dloadurl, outputdirectory):
    '''
    Pull the converted file off the droopy server
    '''

    fullurl = urljoin(dloadurl, filename)
    fulloutputpath = os.path.join(outputdirectory, 'divided', filename)

    urllib.urlretrieve(fullurl, fulloutputpath)

downloadfile(currentFile.outputName, DOWNLOADURL, IMGDIR)

Summary

Droopy provides a very useful, very simple web server for both uploading and downloading files. Combined with Python it makes a very useful facility for moving files around under programmtic control.

Versions

All of the above was done on Python 2.7.x.

Using Lillypond to show the ‘sim’ notation in a score

How can you include ‘sim’ in your score ?

Summary

How to define the ‘sim’ notation when using Lilypond to engrave a score. This is a little off topic for my blog but I found it difficult to find this information myself so I’m hoping others will find it here.

Lilypond and Frescobaldi

Lilypond is a file format allowing you to define a musical score using simple text notation; Lilypond is also the program which takes that text notation and outputs a score in, for instance, Acrobat PDF format. It’s a pretty amazing technology.

In using Lilypond I’ve found it very useful to use the Frescobaldi sheet music text editor which provides support for writing files in the Lilypond format.

What I’ve done

My music theory knowledge isn’t great but I’ve really enjoyed using Lilypond and Frescobaldi to engrave a score for a relative of mine. It’s just an amazing technology and the output is beautiful.

It took a little time to get to grips with some of the ideas although the Lilypond manuals are very good. I did however find one little thing I couldn’t work out how to do and that is the subject of this, rather top heavy post.

‘sim’ notation on a score

Sometimes in scores you want the text ‘sim.’ to appear as shown below in the screen dump.

I looked all over the documentation and there just didn’t seem to be a way of including that. However I visited the Lilypond IRC channel (#lilypond@irc.freenode.net) which you can visit directly by going to the support page and paging down a little.

The good folk on there were able to point me in the right direction. By appending

_\markup{\large \italic "sim."}

to the note I wished ‘sim’ to appear under the right thing was done.

Just to make that a little clearer here are a few bars which include the annotation to create the ‘sim’

As you might imagine the

\large

only controls how large the ‘sim’ appear on the score and I fiddled around with this a little until I found an adjective which produced text that matched the rest of the score. There are a number of options for altering the size of text displayed as can be seen in the Selecting font and font size part of the Lilypond manual.

Crazy little thing called : SimpleHTTPServer

Crazy little thing called : SimpleHTTPServer

Summary

Python brings you a one line web-server !

Simplest webserver ever !

I was logged onto a tiny VPS I use sometimes and thinking it would be useful to have a web-server to help move a few files off it. The thought of installing a ‘real’ web-server seemed a bit much for what would five minutes of use so I started thinking about Python (of course !).

It turns out that you can fire up a (completely unsecured and rather wild-west) web server like this :

python -m SimpleHTTPServer

When you type that command the current directory becomes the root of a web-resource, accessible via port 8000 – rather than the default port 80 we usually use for HTTP, and any files, and sub-directories, in the current directory can be accessed like this

http://mylittlevps.kubadev.com:8000/thefileIwant.txt

and, for stuff in sub-directories, like this

http://mylittlevps.kubadev.com:8000/foo/bar/anotherfileIwant.txt

Don’t say you weren’t warned

There are of course all sort of circumstances where doing this would be a REALLY BAD IDEA ™ but all the same it’s nice to know it’s there if you need it !

optparse: simulating ‘–help’

optparse: simulating ‘–help’

Summary

Getting optparse to show a summary of script options under program control.

What do you use optparse for ?

optparse is a Python standard library for parsing command-line options.

optparse provides a relatively simple way of

* defining a set of command line arguments which the user may enter
* parsing the command line entered and storing the results of the parse

I need help !

By default optparse will respond to a either of :

<yourscript> -h

or

<yourscript> --help

by printing a summary of your scripts options.

What I learned today

In the script I was working on today I wanted to respond to the user not entering any argument at all by printing a summary of all options available (in other words as if they user had entered ‘–help’ or ‘-h’ as an argument).

There is a way you can do this but for some reason it’s not in the list of OptionParser methods shown in the documentation.

print_help()

If you call the method ‘print_help’ as shown in the code below optparse will respond by by printing the same text that would be shown if the user were to enter an argument of ‘–help’.

parser = OptionParser(description=desc, usage=usage)
parser.add_option(  "-i", "--inbox", action="store",  dest="inbox",
metavar="INBOX", help="Location of INBOX")
parser.add_option(  "-o", "--outpath", action="store", dest="outpath",
metavar="PATH", help="PATH to output csv file")
parser.add_option(  "-v", "--verbose", action="store_true",
dest="verbose", help="Show each file processed")

(options, args) = parser.parse_args()

if (options.inbox is None) and (options.outpath is None):
  parser.print_help()
  exit(-1)
elif not os.path.exists(options.inbox):
  parser.error('inbox location does not exist')
elif not os.path.exists(os.path.dirname(options.outpath)):
  parser.error('path to ouput location does not exist')

return options

Using Paramiko to control an EC2 instance

An example of using Paramiko to issue commands to an EC2 instance

Summary

An example of using the Python library Paramiko to ‘remote control’ an EC2 instance .

“Do that, Do this”

Recently I’ve been looking into the use of the Bellatrix library to start, control and stop Amazon EC2 instances (my posts about that are here and here).

Spinning off the side of that I’ve taken a look at the Paramiko module which “implements the SSH2 protocol for secure (encrypted and authenticated) connections to remote machines”.

There’s a good article on beginning to use Paramiko “SSH Programming with Paramiko” by Jesse Noller which I found very helpful but there’s enough stuff I had to change to deal with using EC2 and the controlling Python script running on Windows that I thought it would be worth recording my sample script.

Installing Paramiko

So first off I’d seen the comments about Paramiko maybe needing a special compilation step for installation to Windows but I’m pleased to say that’s not true, I downloaded 1.7.7.1 to my Windows Vista machine, did a quick…

python setup.py install

… and it all went very smoothly, just to be sure I tried out an import …

>>> import paramiko

… no problem.

Starting the EC2 instance

I now needed a server to talk to so I used Bellatrix to spin up a micro instance of Ubuntu like this :

python "C:\bin\installed\Python2.6\Scripts\bellatrix" start --security_groups mySecGrp ami-3e9b4957 mykeypair

The arguments you can see here are :

  • “mySecGrp” is a Security Group I’ve previously setup via the AWS Management Console;
  • ‘ami-3e9b4957’ is the AMI of Ubuntu 10.04 (Lucid Lynx); and
  •  ‘mykeypair’ is the name of a Key Pair that, again, I’ve previously setup via the AWS Management Console.

When you run that command you get an output that looks like this :

C:\Users\Richard Shea>python "C:\bin\installed\Python2.6\Scripts\bellatrix" start --security_groups mySecGrp ami-3e9b4957 mykeypair
2012-04-19 21:27:03,135 INFO starting EC2 instance...
2012-04-19 21:27:03,180 INFO ami:ami-3e9b4957 type:t1.micro key_name:mykeypair security_groups:mySecGrp new size:None
2012-04-19 21:27:07,657 INFO starting image: ami-3e9b4957 key mykeypair type t1.micro shutdown_behavior terminate new size None
2012-04-19 21:27:08,555 INFO we got 1 instance (should be only one).
2012-04-19 21:27:08,556 INFO tagging instance:i-f1234567 key:Name value:Bellatrix started me
2012-04-19 21:27:12,361 INFO instance:i-f1234567 was successfully tagged with: key:Name value:Bellatrix started me
2012-04-19 21:27:12,361 INFO getting the dns name for instance: i-f1234567 time out is: 300 seconds...
2012-04-19 21:27:34,173 INFO DNS name for i-f1234567 is ec2-10-20-30-40.compute-1.amazonaws.com
2012-04-19 21:27:34,173 INFO waiting until instance: i-f1234567 is ready. Time out is: 300 seconds...
2012-04-19 21:27:34,174 INFO Instance i-f1234567 is running

And the key thing here is that we now now have access to the host name of the EC2 instance we’ve just spun up:

ec2-10-20-30-40.compute-1.amazonaws.com

Talking to the EC2 instance

We’re now ready to send commands to our new instance. Making use of some of Jesses code I was able to write :

import paramiko
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect('ec2-107-22-80-32.compute-1.amazonaws.com',
            username='ubuntu',
            key_filename='''mykeypair-ssh2-rsa.openssh''')
stdin, stdout, stderr = ssh.exec_command("uptime;ls -l;touch mickymouse;ls -l;uptime")
stdin.flush()
data = stdout.read().splitlines()
for line in data:
    print line
ssh.close()

Anyone who’s got this far can probably see what’s happening here, but just to be sure :

  1. having instantiated an instance of paramiko.SSHClient we’re able to use our private key file and the address of our EC2 server to start an SSH session.
  2. We then use the exec_command method to submit a string of commands and get back three references to files corresponding to : standard input, standard output and standard error.
  3. By reading through the standard output file we can print locally the output from the commands executed on the EC2 instance.

The Key Thing

As you can see to identify ourselves to the remote server we’re doing a key exchange. Our private key is ‘mykeypair-ssh2-rsa.openssh’. A point worth mentioning here is that generally I logon to EC2 instances using the excellent PuTTY . The private key files used by default by PuTTY are not in the same format as the ones required by Paramiko so as a result when I first tried this I found Paramiko fell over complaining my ‘key_filename’ argument was ‘not a valid dsa private key file’.

PuttyGen to the rescue

Well the great thing is that PuTTY actually comes with a tool PuttyGen which will import a standard PuTTY key file (foo.ppk) – you need to do ‘Conversions’ | ‘Import Key’ and then ‘Conversions’ | ‘Export OpenSSH Key’

Ubuntu Specific

Bear in mind that the way the SSHClient connect method is used above is suitable for an Ubuntu instance as it is by default however you can’t rely on all *nix instance working just that way.

Seeing the Output

Just to close out I’ll show you the output

12:39:45 up  3:12,  0 users,  load average: 0.08, 0.02, 0.01
total 0
total 0
-rw-r--r-- 1 ubuntu ubuntu 0 2012-04-19 12:39 mickymouse
12:39:45 up  3:12,  0 users,  load average: 0.08, 0.02, 0.01

Powerful Stuff

The combination of Bellatrix allowing you to spin up EC2 instances with a single command and Paramiko allowing you send arbitary commands to those servers is powerful stuff and I’m impressed at the work done by their respective developers. Of course Bellatrix can do ‘for free’ what I’ve used Paramiko to do here are part of it’s Provisioning commands but it was an interesting exercise for me to do my own version of that.

Bellatrix Provisioning Commands

A quick guide to built-in Bellatrix provisioning commands

Summary

A table of Bellatrix provisioning commands as at 1.1.2

Bellatrix Provisioning

As mentioned in my previous post Bellatrix allows you to start and provision Amazon Web Services EC2 instances.In this context ‘provision’ refers to the process of installing software, creating directories, changing file permissions etc.

The built in commands are documented here in the official doco but before I found that I’d ripped the cmds.py file apart and produced the following table. It’s a little more succinct than the official version so I think it’s worth publishing here.

Version Alert

For obvious reasons this type of thing is prone to Bellatrix changing. This page is valid as of the 1.1.2 version of Bellatrix.

Provisioning Commands

Command Description
apt_get_install Return the “sudo apt-get install” command
apt_get_update Executes apt-get update.
chmod Applies the chmod command
createSoftLink Creates a new soft link
copy Copy a file using -f so it doesn’t fail if the destination exists
create_django_project Creates a Django project
createVirtualEnv Generate a new Python virtual environment using virtualenv
executeInVirtualEnv Execute a command within a virtualenv environmen
flatCommands Given a list of commands it will return a single one concatenated by ‘&&’ so they will be executed in sequence until any of them fail
install_pip Install pip using apt-get install
install_nginx Install Nginx in Ubuntu using the repo they provide as described in http://wiki.nginx.org/Install#Ubuntu_PPA
installPackageInVirtualEnv Install a Python package into a virtualenv
mkdir Created a new directory. “-p” flag is used so the command generates the same result regardless whether the directory exists or not.
pip_install Install a Python package using pip
sudo Execute a list of commands using sudo
wget Downloads a web resource using wget

Bellatrix + Windows Users – Two things to remember

Bellatrix – Windows Idiosyncracies

Summary

Two things you might find useful if you’re starting to use Bellatrix on Windows.

What’s Bellatrix Again ?

Bellatrix allows you browse and control Amazon EC2 instances using simple commands on your local machine.

For instance :

bellatrix start ami key_name

run on your machine will launch an EC2 instance within Amazon Web Services based on the named ami (Amazon Machine Image) and allow you to contact it using the key_name you have specified.

You’ve been able to do something like this with the Python package Boto for some time but Bellatrix provides a lot of ready to use commands rather than having to write your own scripts.

Bellatrix on Windows Then ?

Location of Home

I’m a bit embarrassed to write this but in the Bellatrix doco which describes what config files you need and where to place them it refers to

<your_home>

I’m familiar with the idea of ‘Home’ on a unix box but in a Windows enviroment ? I wasn’t really sure what it meant (I’ve only been using Windows for 15 years so you can see where the confusion crept in). I managed to persuade myself it was referring to the Bellatrix directory within the Python site-packages directory !

Well here’s the news for the Windows users out there who are as ‘home-challenged’ as I am. It turns out you should putting those config files in a directory located at :

%HOMEPATH%\.bellatrix

which for me is

C:\Users\Richard Shea\.bellatrix

Yes but what about the dots ?

Which brings me onto another windows specific thing – how do you produce a directory with a dot as its first character ? I started off trying to do it via File Explorer and discovered something strange. File Explorer won’t let you specify a directory name that starts with a dot … unless you specify the directory name as having a trailing dot as well … if you do that File Explorer quietly throws away the trailing dot and you have the name you wanted in the first place ! Weird … or perhaps “only on Windows”

In closing

Anyway the good news is I’ve got it up and running and listing my instances ! Next step is to to use Bellatrix to start an EC2 instance and provision it as a Nagare environment. There’ll be a blog post about that by the time I’ve done, I’m sure !

Datejs – Date.isAfter – no such function ?

date.js – missing methods

Summary

Save yourself a lot of pain and ensure you’re using the right version of  Datejs.

Datejs – It’s great !

Let me start by saying that Datejs, an open-source JavaScript Date Library, is great ! It provides a lot of much needed functionality to the area of dates in Javascript.

Weirdly missing methods

Having said that what I want to highlight is that if you use the standard download links offered by the website (as shown below) you’ll get a version of Datejs which is not the most current one.

What’s more it seems that the version you will get contains a number of defects.

I can vouch for this as I’ve spent a couple of hours today wondering why the ‘.isAfter’ and the ‘.compare’ methods didn’t seem to exist in the form they are documented ! Very frustrating !

What You Should Do

Instead of using the download link on the first page to get the date.js file go here instead : http://www.datejs.com/build/date.js. The version number is the same, “1.0 Alpha-1”, but the build date is “2008-05-13”.

At least at the time of writing – it’s the “2008-05-13” build you want.

Credit Where Credits Due

I’m grateful to Ben McIntyre whose post on the Datejs forums alerted me to this problem, I only wish I’d seen it earlier !