I have a dictionary website which needs to store over 50,000+ xml files of words in the datastore. The following is the snapshot of part of the xml fils.
$ ls ebhi.xml ekaṃsena.xml ekunasattati.xml esaṃ.xml edhati.xml ekamsikata.xml ekūnasattati.xml esanam.xml edha.xml ekaṁsikatā.xml ekunasatthi.xml esānaṃ.xml edhi.xml ekamsika.xml ekūnasaṭṭhi.xml esana.xml edh.xml ekaṃsika.xml ekunasiti.xml esāna.xml edisaka.xml ekanika.xml ekūnāsīti.xml esanā.xml edisa.xml ekānika.xml ekunatimsati.xml esani.xml ehi-bhikkhu.xml ekantam.xml ekūnatiṃsati.xml esanī.xml ehibhikkhu.xml ekantaṃ.xml ekunavisati.xml esanta.xml ...
Before we start, we first need to define some terminology, which comes from GAE. Objects in the GAE datastore are known as entities. Each entity in the Datastore has a key that uniquely identifies it. The key consists of the following components:
So now we define four terms: entity, key, kind, id (identifier). Refer to Entities, Properties, and Keys - Python — Google Cloud Platform for more details.
In my application, I defile a kind called PaliWord for storing xml files in the datastore.
from google.appengine.ext import ndb ... class PaliWord(ndb.Model): xmlfilename = ndb.StringProperty() xmlfiledata = ndb.TextProperty()
To store one xml file in the datastore, I use file name of the xml file without .xml extension as id for manipulating the entity. The name and data of the file is store in the entity.
def storeToNDB(filename, filedata): # id = filename without .xml extension paliword = PaliWord(id = filename[0:-4], xmlfilename = filename, xmlfiledata = filedata) paliword.put() return '%s : ok' % filename
To retrieve specific entity, we simply supply id, i.e., the file name without .xml extension, to the get_by_id() of PaliWord kind. The entity will be returned by this call if exists. Then we can further process the entity the way we want.
def lookup(word): paliword = PaliWord.get_by_id(word) if (paliword): return decodeXML(paliword.xmlfilename, paliword.xmlfiledata.encode('utf8')) else: return u'查無此字(No Such Word)'
In my dictionary application, the use of App Engine Datastore is very simple and straight forward because of the characteristics of my data. If you have a similar application and want to know how to use GAE Datastore, I hope this would be helpful.