FeatherDB - Java JSON Document database
Marcus R. Breese
Posted: Apr 11, 2008
Updated: Apr 11, 2008
Source: http://svn.fourspaces.com/public/featherdb/trunk

I've been holding onto this code for a while now (months)...  I haven't finished polishing is, and chances are I won't.  But there are a few important concepts in here that I think are important to get out into the open.

FeatherDB was my attempt at a Java clone of CouchDB.  When I first found out about CouchDB, I was very intrigued.  A non-relational database fits one of my projects perfectly... when you find yourself fighting with Hibernate just to get a schema mapped, there is an issue on one side or the other.  In this case it was me. [I should note that I actually like relational databases for most things, it just so happens that my particular application doesn't map well to a static schema]

So, I did what any self-respecting geek would do - I downloaded the code and tried to get it running.  And that led to the first problem: I was on a Windows machine and the (at the time) binary download for Windows washorribly out of date.  Not to fear though, I had a linux box handy, so I just moved over to that machine.  I downloaded CouchDB, and got it running.  Then I had the problem that the java library wasn't compatible with the newer JSON syntax (the old version was all XML, all the time).
  No problem there either... I spend a day or two getting my feet wet with the REST API, and wrote a new library couchdb4j.

Now, I was off and running working with CouchDB, and non-relational nirvana. 

Or at least I should have been.  At the time, CouchDB was (and is) very much a work in progress... there were a bunch of things that were planned but were missing, such as: support for stored/named views and any sort of authentication.  No problem!  I'll just add this myself!

Problem: CouchDB is written in Erlang.

I don't do Erlang.

I don't have anything against Erlang... we've never met.  I just don't want to learn another language.  Especially one that is so specialized.  I know that I should learn a language a year, but Erlang was just too much of a hurdle for me to get in and start mucking around in code.  Plus, would you really accept a patch from a guy that is just learning a language?  Neither would I.

No problem!  I'll write my own Java version of CouchDB... and do it better.  The result was/is FeatherDB. 

FeatherDB is a Java based document database.  It has an all HTTP/REST interface.  It supports querying by a Java class (added as a jar to the server), or via JavaScript (uses Java6's JavaScript support).  In theory you could also search by any language that is on the JVM and implements the correct interface.

FeatherDB allows for flexibility in the backend storage, allowing you to use any backend you'd like.  I wrote three: a filesystem based backend, an all-in-memory backend, and a caching backend that uses an in-memory hash and punts to a supporting backend when needed.  (I'd like to add a BerkeleyDB or Derby backend to make things a bit more resilient).

FeatherDB uses an embedded Jetty HTTP server to handle all interaction. 

There were a few things about the way CouchDB handles documents that I didn't like, so FeatherDB does things a little bit differently. 

Like CouchDB:

  1. Documents are accessed through the HTTP interface by a REST API.
  2. Documents are versioned.  Always.  You can access any version of any document.
  3. Document information is stored and retrieved in JSON format.
  4. The Document space is "flat".  Documents can link to other documents, but there is no concept of "joining" documents.
  5. You query the database through 'views'.  Views are also documents.
Unlike CouchDB (last I checked):
  1. Access is controlled through an authentication layer.
    • Users are authenticated
      • though HTTP-Auth
      • or by loading /_auth?username=foo&password=bar
      • or via a previously generated 'token' (an authenticated user can generate a token, and pass it to an unauthenticated user for temporary access).
    • Users can be restricted to read-only, write-only, or read-write access to a database (document-level ACLs proved to be too slow).
  2. Documents can have "common" data that is shared among all revisions.
    • Any attribute whose name starts with "_" is determined to be "common".
    • When you update that attribute, it is updated for all revisions.
    • This makes it trivial to support a feature like tagging, where you want to update all revisions of a document, including future versions.
  3. Documents don't need to be JSON documents.  A "document" in FeatherDB lingo is any file of any HTTP Content-type. This has a few nice side-effects.  The FeatherDB server can now serve any type of file to any client.  For example, you can upload a binary image to the server, load it in your browser, and it renders correctly! 
    • No need to base64 encode a binary payload. 
    • Content-types are determined from the HTTP Content-type header. 
    • Everything is stored in binary format, unless the incoming HTTP Content-type has an associated handler. 
      • So far, there are handlers for:
        • text/html
        • text/plain
        • application/javascript (JSON)
        • and image/{png,gif,jpeg}
    • Only JSON documents can be properly queried
    • A document's id can contain the '/' character.  This means that you can 'fake' an HTTP server's paths by being creative with your document ids (Similar in this respect to Amazon's S3).
  4. Because the content of a document is no longer JSON, there is the need for a "Meta-data" JSON file to accompany the main file.  This meta-file is what stores the document's id, revision, etc...  For JSON only documents, the JSON data is stored directly in the meta-data entry.  For other documents, this is a separate file.  Information in this meta-file can be dynamically added by the content-type handler.  So, for example, the image handler can add width/height information to the record, and binary files can have their MD5/SHA1 sums calculated automatically on insert.
    • If you're keeping score, this means that using the default file-system backend, there are 2 files per revision for non-JSON documents, plus one more to store the common data.
  5. The JavaScript views can query the database for associated documents.  Okay, so this actually is like "joining" documents.  Because the JavaScript engine is hosted in the same JVM, you can query the Java backend from the JavaScript view.  This lets you create joined views of your data, if needed
Those are the major differences, and things that I think should be in CouchDB.  As far as the status of FeatherDB, it has sat in SVN untouched for a few months, so I wanted to try and make it public, in case anyone else was interested.  AFAIK, it works.  However, there aren't any included libraries for accessing the server, so the API is a little undocumented.  It does follow the CouchDB API as closely as possible though.
You can access the project via SVN at: http://svn.fourspaces.com/public/featherdb/trunk
If you are interested in more information about it, leave a comment, or send me an email at: mbreese at fourspaces dot com.  Currently, the project requires Java 6 for the Scripting API support.
 
Getting started
  1. Download the code via SVN from the above url
  2. Run "ant run".
This starts the server with the default configuration:
  • Port: 8889
  • Admin username: sa
  • Admin passowrd:password
  • Backend: File system (directory "testdb")
  • Allow anonymous access
To access the server, connect to http://localhost:8889.
Important URIs 
  • GET /_all_dbs -> list all databases
  • GET /_auth -> authenticate
    • You can authenticate via HTTP-Auth, or by passing the request parameters "username" and "password" (?username=sa&password=pass)
    • If anonymous access is enabled, all requests are treated as if authenticated as an administrator
  • GET /_invalidate -> invalidate the current credentials
  • GET /_sessions -> show active authenticated sessions (not in anonymous mode)
  • GET /_shutdown -> shutsdown the server (must be admin)
Databases
  • GET /{dbnme} -> db stats
  • PUT /{dbname} -> add a database (must be authenticated as admin)
  • DELETE /{dbname} -> remove a database (must be authenticated as admin)
Documents
  • GET /{dbname}/{documentid} -> Get the document's current revision
    • If this isn't found, but /db/docid/index is found, this will be returned instead
    • Optional parameters:
      • showMeta=true -> returns the meta-information for the document (see above)
      • showRevisions=true -> includes a list of available revisions in the meta-information
  • GET /{dbname}/{documentid}/{revision} -> Get the document's content (see above)
  • POST or PUT /{dbname}/{documentid} -> write the request's body as a new revision of the given document
    • If the documentid doesn't exist, it is created
  • POST /{dbname} -> write a new document, but use a generated id
  • DELETE /{dbname}/{documentid} -> delete the document (must be able to write to db)
Views
  • POST /{dbname}/_temp_view -> perform an adhoc query 
    • The contents of the POST should be in the format of a javascript function
    • Ex:
function(doc) { if (doc.value=='foo') {map(doc.id,doc.value); }}
  • POST or PUT /{dbname}/viewname/functionname -> add/update a new view.
    • Note: the view name must start with an underscore ('_').
    • The default functionname is "default"
  • GET /{dbname}/_all_docs -> returns a list of all document ids
    • This actually calls an included view named "_all_docs"
Views can be either written in either Java or JavaScript.  View documents are JSON documents that have the following attributes:
'view_type': 'application/javascript' or 'java:fully.qualified.class.Name'
Java views must implement the interface: com.fourspaces.featherdb.views.View
JavaScript views are implemented as JSON documents in the format:
 { 
'view_type': 'application/javascript',

'view1': function(doc) {
	// your code here
	if (doc.val='foo') {
		return doc; 
	} ,
'view2': function(doc) {
	// your code here
	if (doc.val='foo') {
		map('key',doc.val); 
	} 
} 
So, as you see, JavaScript views are functions that take a JavaScript object as input, and either returns a JSON object, or builds a map with a key and value. (See CouchDB docs for more information).
There is another associated JavaScript method that you can call, and that is: function get(id,rev,db).  From JavaScript, you can retrieve other documents by id, revisions (optional), and database (optional).
Views are maintained by a "ViewRunner".  The ViewRunner is responsible for maintaining the index of results for documents in the database.  The only included ViewRunner doesn't store an index, but instead iterates over all of the documents in the database on each request.   
Configuration
FeatherDB is configured by writing a "featherdb.properties" file. In the src/java/com/fourspaces/featherdb directory, there is a 'default.properties' file that shows the default properties and their values.  Your featherdb.properties file will overwrite the default values.
 
Additional Notes
If you are using an access mechanism that doesn't support cookies, you'll need to pass a "token" with each authenticated access.  This can be passed as a request parameter (?token=asdfasdf), or via an HTTP Header "FeatherDB-Token".  The token is given to you when you access "/_auth".  This isn't needed in anonymous access mode.
There is a severe lack of unit tests.  I'm sorry :).  Specifically, there is a lack of tests for running views, but it _should_ work.
I make no promises about this code.  If it works for you, great.  If not, let me know, and I'll see what I can do about it.  As always, patches are gratefully accepted. 
 

 Future directions

I'm not sure what to do with this project.  I'd prefer to not maintain a separate code base that may or may not track the CouchDB API.  I'd personally like to see the changes in the document structures implemented in CouchDB, but I'm not sure if it is possible given the split in a document data and meta data.  Then again, I'm completely unfamiliar with the CouchDB code base, so perhaps this will be possible.

I wanted to make this all public to see if anyone else may be interested in using this type of database and may be interested in helping to flesh it all out.

Hope you enjoy it! 

Comments

Elliot ShepherdApr 12, 2008 10:17 AM
Hi, just came across this!

I'm currently using CouchDB4J in a new project I've just started (with some changes so it works with the latest couch), but this looks to be an even better fit... (I can hack at it!)

The changes you've made sound good (especially the common attributes!), as long as it remains as simple to use as couch.

I'm downloading it now and will have a play...

Cheers!
Elliot

C.M.Apr 12, 2008 10:31 AM
I will also download it. Sounds cool.

--c
C.MApr 12, 2008 10:32 AM
and maybe you can create a project for it in google code ? this way more ppl will learn of your work.

--c
American JeffApr 13, 2008 5:43 PM
I don't see the word 'distributed' anywhere. That would be another major difference.
Nic DoyeApr 14, 2008 2:45 PM
Excellent, I'd got grails talking to CouchDB last week, via your couchdb4j, but an embedded server would be even better.

Great work.
Marcus R. BreeseApr 15, 2008 8:58 AM
@American Jeff
Re: Distributed - Via the plugin backend storage mechanism, it actually wouldn't be that hard to make the system distributed. There would need to be a new backend written that delegates requests to other servers. The trick is to figure out the logic to correctly 'guess' which server has the data you're interested in. However, this is entirely possible.

@Nic Doye
If you'd like an embedded version, take a look at some of the unit tests (all 3 of them :), and you'll see an example of how to have an embeddable version of FeatherDB. It's actually pretty easy. You just need to create a instance of the Server, and you're off and running.
Peter FeinApr 15, 2008 5:50 PM
Sounds similar to my Python project GrassyKnoll: http://grassyknoll.googlecode.com
siliApr 15, 2008 9:35 PM
you're a lazy (not the good kind) of programmer.
Marcus R. BreeseApr 15, 2008 10:23 PM
@sili
I can't argue that I'm lazy, but I'm curious as to exactly what aspect of my laziness is particularly bad...
Scott G. FillywickApr 15, 2008 11:54 PM
The part where you partially re-implemented the wheel because you didn't want to learn a foreign language =)
berkayApr 16, 2008 2:58 AM
good stuff! don't pay attention to this lazy bs.
we struggle with the static schema a lot and featherdb offer hope!
also working with grails so it seems we're not alone.
good luck!
berkay
Scott G. FillywickApr 16, 2008 2:30 PM
Oh, it's not "lazy BS". It's that he wrote something that doesn't have the single most critical property of CouchDB: simple distributed scalability.

The thing Erlang is good at.

To say that "it actually wouldn't be that hard to make the system distributed" is disingenuous. Distributed systems are hard, always. This is remarkably similar to AppDrop. They "ported" App Engine to EC2, and then stated that it wouldn't be hard to provide a big table API on top of MySQL -- missing the point of BigTable (and App Engine) *entirely*.
AaronMay 9, 2008 1:35 AM
so a port of couchdb to scala is they way to go!

Add comment

Your name
Your link