Tag Archives: zodb repoze.catalog example tutorial

repoze.catalog and ZODB beginners example – part 2

repoze.catalog and ZODB beginners example – part 2

Summary

The second of two posts which illustrate how to use repoze.catalog alongside ZODB. The first post can be seen at: “repoze.catalog and ZODB beginners example – part 1” .

Where we’re up to

In the first post I explained how you can have objects stored within a ZODB database indexed by repoze.catalog and why that was sometimes a good idea. In this post I’m going to demonstrate searching for the previously stored objects using repoze.catalog’s search facilities. If you haven’t read the first post I suggest you read that now because what follows assumes you have.

Finding ZODB objects with repoze.catalog

As discussed in the first post repoze.catalog allows you to index arbitrary properties of the objects you save into a ZODB database and then do complex searches on those properties to extract only the objects you’re interested in.

The example I’m showing here demonstrates how we can search through those objects we added in the example of the last post using a number of criteria.

Example Code

Here’s my example code and underneath I’ll expand a little more on what each part does:

'''
Demonstrates how to use repoze.catalog to find objects
being stored in ZODB. This example has the catalog and ZODB
within the same repository
'''
from myzodb import MyZODB
from persistent import Persistent

from repoze.catalog.catalog import FileStorageCatalogFactory
from repoze.catalog.catalog import ConnectionManager
from repoze.catalog.query import InRange, Lt

class City(Persistent):
    '''Represents a City by name and population'''
    def __init__(self, cityname, citypop):
        self.name = cityname
        self.population = citypop
    def __str__(self):
        return "%s  (Pop: %s)" % \
                (self.name, \
                str(self.population))

def print_all_city_instances(myzodbinst):
    '''
    Pull everything keyed under 'cities' out of the
    ZODB instance (without any regard to the
    repoze.catalog cataloguing and print them
    '''
    print ""
    print "About to dump all City Instances:"
    for acity in myzodbinst.dbroot['cities'].itervalues():
        print acity
    print ""

def print_city_query_results(myzodbinst, res):
    '''
    Use the list of integers returned by a
    repoze.catalog query to pull elements
    keyed underneath 'cities' in the ZODB
    instance which we are using repoze.catalog
    to catalogue
    '''
    print ""
    print "Objects stored in ZODB corresponding"
    print "to the repoze.catalog resultset:"
    for idx in res:
        print myzodbinst.dbroot['cities'][idx]
    print ""

if __name__ == '__main__':
    #Setup access to the repoze.catalog instance
    factory = FileStorageCatalogFactory('../data/mdcatalog.db',
                                        'mycatalog')
    manager = ConnectionManager()
    catalog = factory(manager)
    #Setup access to the ZODB instance containing data
    #catalogued by the repoze.catalog instance
    myzodbinstance = MyZODB('../data/mdzdb.fs')
    #Demonstrate we really have all the Cities
    print_all_city_instances(myzodbinstance)
    #Demonstrate use of `Lt` on the `population` index
    print ""
    print "*" * 60
    print "Looking for 'less than' value on the `population` index"
    print "Populations less than 1,000,000"
    numdocs, results = catalog.query(Lt('populations', 1000000))
    print "Raw Result: "
    print (numdocs, [ x for x in results ])
    print_city_query_results(myzodbinstance, results)

    #Demonstrate use of `InRange` on the `population` index
    print ""
    print "*" * 60
    print "Looking for 'InRange' values on the `population` index"
    print "Populations between 1,000,000 and 4,000,000"
    numdocs, results = catalog.query(InRange('populations',
                                              1000000, 4000000))
    print "Raw Result: "
    print (numdocs, [ x for x in results ])
    print_city_query_results(myzodbinstance, results)

Example Step by Step

Here’s a breakdown on what’s happening in the above example

Initialize repoze.catalog

factory = FileStorageCatalogFactory('../data/mdcatalog.db', 'mycatalog')
manager = ConnectionManager()
catalog = factory(manager)

Here we connect to our repoze.catalog repository and instantiate a `catalog` object

Make our ZODB database ready for use

myzodbinstance = MyZODB('../data/mdzdb.fs')

`MyZODB` is a convenience class which wraps up the instantiation of a ZODB database instance and provides : `storage`; `db`;`connection`; and `dbroot` properties to help the programmer interact with the ZODB database, connection, storage objects. `MyZODB` also provides a close method to cleanly close the ZODB database, connection and storage.

`MyZODB` is not explicitly included in the above example but it looks like this :

from ZODB import FileStorage, DB
class MyZODB(object):
    '''Manage the state of a ZODB FileStorage connection'''
    def __init__(self, path):
        self.storage = FileStorage.FileStorage(path)
        self.db = DB(self.storage)
        self.connection = self.db.open()
        self.dbroot = self.connection.root()
    def close(self):
        self.connection.close()
        self.db.close()
        self.storage.close()</pre>

Dump contents of ZODB without using repoze.catalog

The first data access we do in the above example is just a simple dump of every object, held under the key ‘cities’, in our ZODB database. Notice we are not using repoze.catalog at all at this point. By viewing this data we can be sure that the subsequent queries using repoze.catalog do what we think they do.

So we call the function `print_all_city_instances`

print_all_city_instances(myzodbinstance)

which iterates over the ‘cities’ element of the `dbroot` property of the ZODB `connection` to allow us to see everything that’s in the ZODB database.

for acity in myzodbinst.dbroot['cities'].itervalues():
    print acity

Our output looks like this :

About to dump all City Instances:
Windhoek  (Pop: 322500)
Pretoria  (Pop: 525387)
Nairobi  (Pop: 3138295)
Maputo  (Pop: 1244227)
Jakarta  (Pop: 10187595)
Canberra  (Pop: 358222)
Wellington  (Pop: 393400)
Santiago  (Pop: 5428590)
Buenos Aires  (Pop: 2891082)

Demonstrating the `Lt` function of repoze.catalog

The next thing that happens in the sample is to make use of the `Lt` function offered by repoze.catalog

numdocs, results = catalog.query(Lt('populations', 1000000))

In the previous post when we initialized our repoze.catalog we created a `populations` index which was associated with the `population` property of our `City` class (take a look at the previous post if you’ve forgotten the details).

Our use of the `Lt` method asks repoze.catalog to find all `City` instances stored in our ZODB database with a population of less than 1,000,000. As you can see we get two objects returned which I’ve named `numdocs` and `results`.

`numdocs` is an integer showing how many instances have been found which meet the criteria.

`results` is a list of integers which are keys used when storing into ZODB those objects which satisfy the search criteria.

We then use our function

print_city_query_results(myzodbinstance, results)

to output the objects found. The resulting output looks like this :

Objects stored in ZODB corresponding
to the repoze.catalog resultset:
Windhoek  (Pop: 322500)
Pretoria  (Pop: 525387)
Canberra  (Pop: 358222)
Wellington  (Pop: 393400)

It’s worth mentioning that whilst there are many comporator methods offered by repoze.catalog.query not all of them are applicable to all index types. In this example of the `Lt` method we are searching on an index, ‘populations’ of type CatalogTextIndex which does offer the `Lt` method but not all do.

Demonstrating the `InRange` function of repoze.catalog

Finally in the sample we show off the `InRange` function offered by repoze.catalog

 numdocs, results = catalog.query(InRange('populations',
                                          1000000, 4000000))

As with the previous example we utilise the previously created catalog index ‘populations’ to find instances of `City` – in this case those instances that have their `population` property set to a value between 1,000,000 and 4,000,000.

We do this by using the  `InRange` method offered by repoze.catalog. As with the `Lt` example above we get two objects returned which I’ve named `numdocs` and `results`.

`numdocs` is an integer showing how many instances have been found which meet the criteria.

`results` is a list of integers which are keys used when storing into ZODB those objects which satisfy the search critiera.

We then use our function

print_city_query_results(myzodbinstance, results)

to output the objects found. The resulting output looks like this :

Objects stored in ZODB corresponding
to the repoze.catalog resultset:
Nairobi  (Pop: 3138295)
Maputo  (Pop: 1244227)
Buenos Aires  (Pop: 2891082)

Credit where credits due

As with part one of this two part post the example I’ve shown here owes some parts to one of the examples on the repoze.catalog website and the structure of the `myZODB` was taken from the article cited above,  ‘Example Driven ZODB‘ .

repoze.catalog and ZODB beginners example – part 1

repoze.catalog and ZODB beginners example – part 1

Summary

The first of two posts which illustrate how to use repoze.catalog alongside ZODB

What’s ZODB ?

To quote the ZODB home page :

“The ZODB is a native object database, that stores your objects while allowing you to work with any paradigms that can be expressed in Python. Thereby your code becomes simpler, more robust and easier to understand”

What’s repoze.catalog ?

repoze.catalog is one of a number of frameworks which can be used to supply indexing for ZODB for those circumstances where accessing objects stored in ZODB would otherwise be unacceptably slow

Intended Audience

I’m assuming that readers of this post have a basic familiarity with ZODB. If you don’t there are lots of good resources out there of which ‘Example Driven ZODB‘ is a good example.

What’s the purpose of this post ?

For any reasonably experienced Python programmer using ZODB and repoze.catalog is pretty straightforward. Unfortunately a new user of repoze.catalog cannot find an example on the repoze.catalog site which shows both how to catalog items and save them into ZODB. This is understandable as repoze.catalog is not only for use with ZODB but I thought it was worthwhile doing a specific example for that scenario and that’s what on this page.

Where repoze.catalog helps ZODB

Because of the nature of ZODB it’s easy to access objects by the value they’re keyed on but otherwise it’s a question of a sequential search.

So instances of a class that look like this :

class Country(Persistent):
    def __init__(self, pop):
        self.name = name
        self.population = pop

Might be saved into the `root` property of a ZODB `connection` object like this :

root['un']['nz'] = Country('New Zealand', 4000000)

But subsequently if we wanted to obtain `Country` instances on the basis of population the key doesn’t help us at all and a scan of all `Country` objects would be necessary, like this :

for cou in root['un'].itervalues():
    if cou.population > 1000000:
        print cou.name

When we use repoze.catalog to catalogue a ZODB database we specify properties of objects that will be saved in ZODB and which interest us and by which will subsequently want to find the objects.

repoze.catalog allows us quickly and easily search for objects with property values that interest us.

Example Code

Here’s my example code and underneath I’ll expand a little more on what each part does:

'''
Demonstrates how to use repoze.catalog to catalogue objects
being stored in ZODB. This example has the catalog and ZODB
database as seperate repositories
'''
from repoze.catalog.catalog import FileStorageCatalogFactory
from repoze.catalog.catalog import ConnectionManager

from repoze.catalog.indexes.field import CatalogFieldIndex
from repoze.catalog.indexes.text import CatalogTextIndex

import transaction
from persistent import Persistent
from BTrees.OOBTree import OOBTree

from myzodb import MyZODB

factory = FileStorageCatalogFactory('../data/mdcatalog.db', 'mycatalog')

_initialized = False

def initialize_catalog():
    '''
    Create a repoze.catalog instance and specify
    indices of intereset

    NB: Use of global variable
    '''
    global _initialized
    if not _initialized:
        # create a catalog
        manager = ConnectionManager()
        catalog = factory(manager)
        # set up indexes
        catalog['names'] = CatalogTextIndex('name')
        catalog['populations'] = CatalogFieldIndex('population')
        # commit the indexes
        manager.commit()
        manager.close()
        _initialized = True

class City(Persistent):
    '''Represents a City by name and population'''
    def __init__(self, cityname, citypop):
        self.name = cityname
        self.population = citypop
    def __str__(self):
        return "%s  (Pop: %s)" % \
                (self.name, \
                str(self.population))

if __name__ == '__main__':
    initialize_catalog()
    manager = ConnectionManager()
    catalog = factory(manager)
    myzodbinstance = MyZODB('../data/mdzdb.fs')
    myzodbinstance.dbroot['cities'] = OOBTree()

    #For ease of demonstration set up a local dict
    #containing a number of `City` instances keyed
    #by a unique integer
    cities = {
        1:City('Windhoek', 322500),
        2:City('Pretoria', 525387),
        3:City('Nairobi', 3138295),
        4:City('Maputo', 1244227),
        5:City('Jakarta', 10187595),
        6:City('Canberra', 358222),
        7:City('Wellington', 393400),
        8:City('Santiago', 5428590),
        9:City('Buenos Aires', 2891082),
    }
    #Iterate over our local dict and for each
    #element generate the catlog entry for
    #repoze.catalog and add the corresponding
    #instance to the ZODB database we are
    #cataloguing
    for docid, doc in cities.items():
        catalog.index_doc(docid, doc)
        myzodbinstance.dbroot['cities'][docid] = doc
        transaction.commit()

    myzodbinstance.close()

Example Step by Step

Here’s a breakdown on what’s happening in the above example

Initialize repoze.catalog

initialize_catalog()
manager = ConnectionManager()
catalog = factory(manager)

The `initialize_catalog` function creates a repoze.catalog instance and initializes two indices : `names` and `populations`. These index the `name` and `population` properties of any objects indexed with the repoze.catalog instance just created

Make our ZODB database ready for use

myzodbinstance = MyZODB('../data/mdzdb.fs')

`MyZODB` is a convenience class which wraps up the instantiation of a ZODB database instance and provides : `storage`; `db`;`connection`; and `dbroot` properties to help the programmer interact with the ZODB database, connection, storage objects. `MyZODB` also provides a close method to cleanly close the ZODB database, connection and storage.

`MyZODB` is not explicitly included in the above example but it looks like this :

from ZODB import FileStorage, DB
class MyZODB(object):
    '''Manage the state of a ZODB FileStorage connection'''
    def __init__(self, path):
        self.storage = FileStorage.FileStorage(path)
        self.db = DB(self.storage)
        self.connection = self.db.open()
        self.dbroot = self.connection.root()
    def close(self):
        self.connection.close()
        self.db.close()
        self.storage.close()</pre>

Create a sub-tree in ZODB for our `City` objects

Now we have an instance of `MyZODB` we can treat the `dbroot` property (which corresponds to the ZODB `dbroot` property of the ZODB `connection` object) as a plain old dictionary and assign a value to it under some key of our choosing, for our example because we’re going to save a set of `City` objects we’ve chosen ‘cities’.

At this stage we just assign an instance of `OOBTree` to that ‘cities’ key. An OOBTree instance acts like a dictionary but, when a lot of elements are within it, works much more efficiently for the purposes of ZODB.

myzodbinstance.dbroot['cities'] = OOBTree()

Create a set of `City` objects

Now we pause for a moment and make ourselves a set of `City` objects and put them into a dictionary for later use.

What’s significant here is that the key used to save each `City` instance is a unique integer which has no specific meaning in itself, we’ll see why in a moment.

cities = {
1:City('Windhoek', 322500),
2:City('Pretoria', 525387),
3:City('Nairobi', 3138295),
4:City('Maputo', 1244227),
5:City('Jakarta', 10187595),
6:City('Canberra', 358222),
7:City('Wellington', 393400),
8:City('Santiago', 5428590),
9:City('Buenos Aires', 2891082),
}

Save `City` objects to ZODB and index them

Now at last we’re going to do what we’ve come for.

We iterate over our set of `City` instances and for each one we make use of the `index_doc` method of repoze.catalog . Notice that the two arguments are the integer we’ve arbitarily associated with each `City` instance, ‘docid’ in this example, and the `City` instance itself, ‘doc’ in this example. By using the `index_doc` method we update the catalog entries maintained by repoze.catalog

In the same interation we assign the `City` object instance, ‘doc’ to our `OOBTree` (stored under the ‘cities’ key of `dbroot`) using as an index the same integer we’ve just passed to the `index_doc` call.

Finally we make use of the ZODB Transaction manager to commit our changes. Because repoze.catalog is actually a ZODB database inside our single transaction is sufficient to commit both the catalog update and the actual update of the ZODB database.

for docid, doc in cities.items():
    catalog.index_doc(docid, doc)
    myzodbinstance.dbroot['cities'][docid] = doc
    transaction.commit()

Credit where credits due

The example I’ve shown here owes some parts to one of the examples on the repoze.catalog website. The structure of the `myZODB` was taken from the article cited above,  ‘Example Driven ZODB‘ . Lastly I got some useful advice in response to a question I posed on StackOverflow and I’m grateful to the people who provided answers .

In Closing

That’s all there is to it ! In many small scale instances there’s no need to do anything other than use ZODB as it comes and not worry about indexing – machines are fast and many applications deal with relatively small data sets however if you do need it repoze.catalog (or one of the other, similar, cataloguing tools) is a useful way to squeeze more speed out of ZODB.

This has been a very long blog post by my standards so I’m going to show how to access the data indexed under repoze.catalog (and prove that it all actually works !) in a blog post next week.