repoze.catalog and ZODB beginners example – part 2

repoze.catalog and ZODB beginners example – part 2

Summary

The second of two posts which illustrate how to use repoze.catalog alongside ZODB. The first post can be seen at: “repoze.catalog and ZODB beginners example – part 1” .

Where we’re up to

In the first post I explained how you can have objects stored within a ZODB database indexed by repoze.catalog and why that was sometimes a good idea. In this post I’m going to demonstrate searching for the previously stored objects using repoze.catalog’s search facilities. If you haven’t read the first post I suggest you read that now because what follows assumes you have.

Finding ZODB objects with repoze.catalog

As discussed in the first post repoze.catalog allows you to index arbitrary properties of the objects you save into a ZODB database and then do complex searches on those properties to extract only the objects you’re interested in.

The example I’m showing here demonstrates how we can search through those objects we added in the example of the last post using a number of criteria.

Example Code

Here’s my example code and underneath I’ll expand a little more on what each part does:

'''
Demonstrates how to use repoze.catalog to find objects
being stored in ZODB. This example has the catalog and ZODB
within the same repository
'''
from myzodb import MyZODB
from persistent import Persistent

from repoze.catalog.catalog import FileStorageCatalogFactory
from repoze.catalog.catalog import ConnectionManager
from repoze.catalog.query import InRange, Lt

class City(Persistent):
    '''Represents a City by name and population'''
    def __init__(self, cityname, citypop):
        self.name = cityname
        self.population = citypop
    def __str__(self):
        return "%s  (Pop: %s)" % \
                (self.name, \
                str(self.population))

def print_all_city_instances(myzodbinst):
    '''
    Pull everything keyed under 'cities' out of the
    ZODB instance (without any regard to the
    repoze.catalog cataloguing and print them
    '''
    print ""
    print "About to dump all City Instances:"
    for acity in myzodbinst.dbroot['cities'].itervalues():
        print acity
    print ""

def print_city_query_results(myzodbinst, res):
    '''
    Use the list of integers returned by a
    repoze.catalog query to pull elements
    keyed underneath 'cities' in the ZODB
    instance which we are using repoze.catalog
    to catalogue
    '''
    print ""
    print "Objects stored in ZODB corresponding"
    print "to the repoze.catalog resultset:"
    for idx in res:
        print myzodbinst.dbroot['cities'][idx]
    print ""

if __name__ == '__main__':
    #Setup access to the repoze.catalog instance
    factory = FileStorageCatalogFactory('../data/mdcatalog.db',
                                        'mycatalog')
    manager = ConnectionManager()
    catalog = factory(manager)
    #Setup access to the ZODB instance containing data
    #catalogued by the repoze.catalog instance
    myzodbinstance = MyZODB('../data/mdzdb.fs')
    #Demonstrate we really have all the Cities
    print_all_city_instances(myzodbinstance)
    #Demonstrate use of `Lt` on the `population` index
    print ""
    print "*" * 60
    print "Looking for 'less than' value on the `population` index"
    print "Populations less than 1,000,000"
    numdocs, results = catalog.query(Lt('populations', 1000000))
    print "Raw Result: "
    print (numdocs, [ x for x in results ])
    print_city_query_results(myzodbinstance, results)

    #Demonstrate use of `InRange` on the `population` index
    print ""
    print "*" * 60
    print "Looking for 'InRange' values on the `population` index"
    print "Populations between 1,000,000 and 4,000,000"
    numdocs, results = catalog.query(InRange('populations',
                                              1000000, 4000000))
    print "Raw Result: "
    print (numdocs, [ x for x in results ])
    print_city_query_results(myzodbinstance, results)

Example Step by Step

Here’s a breakdown on what’s happening in the above example

Initialize repoze.catalog

factory = FileStorageCatalogFactory('../data/mdcatalog.db', 'mycatalog')
manager = ConnectionManager()
catalog = factory(manager)

Here we connect to our repoze.catalog repository and instantiate a `catalog` object

Make our ZODB database ready for use

myzodbinstance = MyZODB('../data/mdzdb.fs')

`MyZODB` is a convenience class which wraps up the instantiation of a ZODB database instance and provides : `storage`; `db`;`connection`; and `dbroot` properties to help the programmer interact with the ZODB database, connection, storage objects. `MyZODB` also provides a close method to cleanly close the ZODB database, connection and storage.

`MyZODB` is not explicitly included in the above example but it looks like this :

from ZODB import FileStorage, DB
class MyZODB(object):
    '''Manage the state of a ZODB FileStorage connection'''
    def __init__(self, path):
        self.storage = FileStorage.FileStorage(path)
        self.db = DB(self.storage)
        self.connection = self.db.open()
        self.dbroot = self.connection.root()
    def close(self):
        self.connection.close()
        self.db.close()
        self.storage.close()</pre>

Dump contents of ZODB without using repoze.catalog

The first data access we do in the above example is just a simple dump of every object, held under the key ‘cities’, in our ZODB database. Notice we are not using repoze.catalog at all at this point. By viewing this data we can be sure that the subsequent queries using repoze.catalog do what we think they do.

So we call the function `print_all_city_instances`

print_all_city_instances(myzodbinstance)

which iterates over the ‘cities’ element of the `dbroot` property of the ZODB `connection` to allow us to see everything that’s in the ZODB database.

for acity in myzodbinst.dbroot['cities'].itervalues():
    print acity

Our output looks like this :

About to dump all City Instances:
Windhoek  (Pop: 322500)
Pretoria  (Pop: 525387)
Nairobi  (Pop: 3138295)
Maputo  (Pop: 1244227)
Jakarta  (Pop: 10187595)
Canberra  (Pop: 358222)
Wellington  (Pop: 393400)
Santiago  (Pop: 5428590)
Buenos Aires  (Pop: 2891082)

Demonstrating the `Lt` function of repoze.catalog

The next thing that happens in the sample is to make use of the `Lt` function offered by repoze.catalog

numdocs, results = catalog.query(Lt('populations', 1000000))

In the previous post when we initialized our repoze.catalog we created a `populations` index which was associated with the `population` property of our `City` class (take a look at the previous post if you’ve forgotten the details).

Our use of the `Lt` method asks repoze.catalog to find all `City` instances stored in our ZODB database with a population of less than 1,000,000. As you can see we get two objects returned which I’ve named `numdocs` and `results`.

`numdocs` is an integer showing how many instances have been found which meet the criteria.

`results` is a list of integers which are keys used when storing into ZODB those objects which satisfy the search criteria.

We then use our function

print_city_query_results(myzodbinstance, results)

to output the objects found. The resulting output looks like this :

Objects stored in ZODB corresponding
to the repoze.catalog resultset:
Windhoek  (Pop: 322500)
Pretoria  (Pop: 525387)
Canberra  (Pop: 358222)
Wellington  (Pop: 393400)

It’s worth mentioning that whilst there are many comporator methods offered by repoze.catalog.query not all of them are applicable to all index types. In this example of the `Lt` method we are searching on an index, ‘populations’ of type CatalogTextIndex which does offer the `Lt` method but not all do.

Demonstrating the `InRange` function of repoze.catalog

Finally in the sample we show off the `InRange` function offered by repoze.catalog

 numdocs, results = catalog.query(InRange('populations',
                                          1000000, 4000000))

As with the previous example we utilise the previously created catalog index ‘populations’ to find instances of `City` – in this case those instances that have their `population` property set to a value between 1,000,000 and 4,000,000.

We do this by using the  `InRange` method offered by repoze.catalog. As with the `Lt` example above we get two objects returned which I’ve named `numdocs` and `results`.

`numdocs` is an integer showing how many instances have been found which meet the criteria.

`results` is a list of integers which are keys used when storing into ZODB those objects which satisfy the search critiera.

We then use our function

print_city_query_results(myzodbinstance, results)

to output the objects found. The resulting output looks like this :

Objects stored in ZODB corresponding
to the repoze.catalog resultset:
Nairobi  (Pop: 3138295)
Maputo  (Pop: 1244227)
Buenos Aires  (Pop: 2891082)

Credit where credits due

As with part one of this two part post the example I’ve shown here owes some parts to one of the examples on the repoze.catalog website and the structure of the `myZODB` was taken from the article cited above,  ‘Example Driven ZODB‘ .

Leave a Reply

Your email address will not be published. Required fields are marked *