Download source code: nub-0.3.2.zip
Download API documentation: nub-0.3.0-doc.zip
Online API documentation: doc/index.html (v. 1.0/0.3.2)
NUB is a set of high performance C++ codes providing indexes for variable-length keys, compressed resource files, and an istream interface for blocks of memory. Now supports Unicode keys and up to 17 trillion gigabyte file sizes.
2012-03-14: I have decided to further document Index.h
when I can get to it. That thing is a beast. Needs more explanation of what da heck it's doing. If you are a neophyte C++ programmer, please refer to IndexT
- reading the source code will drive you nuts.
2011-03-01: After I was contracted by 10gen, Inc
. to work on their MongoDB product line, further development of NUB was indefinitely suspended. MongoDB
already does more than what I had been I was planning to add to NUB. Ah, the sorrows of solitary development, but joys too: there is no way a *team* could have written NUB. After being away from NUB code for a while, it is a challenge for me to read and understand and I wrote it. It took a tremendous amount of concentrated coherent thinking to take local indexing to the level achieved by NUB.
NUB is by no means dead or obsolete. NUB takes the science and invention of indexing to new levels. If you need networking and a boatload of features, then go with MongoDB. However, if you want the fastest possible local indexes customized to your application, then NUB is the way to go. It utilizes templates to rework the index file structure so as to optimize the space and time needed to handle your index specification. This code is actually a sort of index design compiler. It will produce machine code tailored to your desired indexing need.
2010-08-21: NUB version 0.3.2 came a long way from 0.0.1. The index file structure is actually at major version 6. ResourceFile needs some more work, but NUB 0.3.3 to be repackaged and released soon as NUB 1.0. *.ndx and *.dat files will include a signature to aid in recognition and verification. This will change the major version of the index files to 7. ResourceFiles will be reorganized in a specifiable NodeSize allocation unit to improve speed slightly and will include at least a NodeSize header (with a signature).
2009-02-07: NUB 0.3.2 released. Nodes merged in more cases during key deletion. Template interface changed slightly as did the interface of only a few functions. Only minor changes to your code are required to fix things. The online documentation is currently at
. 2012-03-12: I finally upgraded the online documentation to v. 0.3.2.
Oops, sorry for my procrastination. Doxygen
has made some good progress. Cool. Seriously, NUB 0.3.2 could be renamed to 1.0... There are no bugs anywhere except perhaps in DataFile.cpp. I am not sure about that. I never got around to proving it correct, but you probably have your own data files you'd like to get flying with NUB's IndexT
There are not many more opportunities for improving balance. Performance should be better than that of a B-tree now. In a B-tree, there is no special provision for variable-length keys. NUB indexes are a considerable improvement as far as packing more keys per node. However, NUB indexes are still very similar to B-trees. In general, the focus is on dynamic response to on-the-fly inserts and deletes. Because of this, the nodes of the trees usually run between half full to full. There is still a need to pack the tree for distribution of a release where there will be little additional modifications to a largely static resource set. Work on IndexT::pack() is in progress.
2009-02-06: NUB 0.3.1 released. I have addressed the main area where unbalance can occur. During
key deletion, sibling nodes are now combined in the majority of cases where this
is possible. ResourceFile now combines free areas during deletion also, but this has not been thoroughly tested yet.
2009-02-02: NUB 0.3.0 released. Unicode keys now supported. Index has been generalized to a customizable template allowing specification of:
- A KeyInterface to be used for that index.
- index file offset size
- data file offset size
- node size
- cache size
Since NUB now supports huge files, I am rethinking the pack() approach. It may be more appropriate to work on the extra codes needed to do some balancing on the fly.
2009-01-30: NUB 0.2.0 released. Now includes support for huge index and resource files. File offsets are now 64 bits allowing file sizes up to 17 trillion gigabytes.
There are currently no special codes to balance the tree during insert and delete so a tree can become badly unbalanced under certain dynamic circumstances. To compensate for this, I am working on a function Index::pack() which will balance the tree and optimize it for fastest access. This would be used once your resource file is complete (prior to release). I should warn you that for very large indexes this function takes a good bit of time.
A Note on NUB Index Files
NUB indexes originally targeted storing ASCIIZ variable-length keys to resource files in a
highly efficient manner. Because of this requirement, NUB indexes are not
B-trees but very similar. B-trees are
a variant of M-way branching trees. NUB indexes are not of any particular
order M. Instead, each node packs in as many keys as will fit.Edit
NUB grew out of a start on the Johnny Bashful project. Unfortunately, independent development takes too long these days. The project died for lack of my time to spend on it. It may get resurrected some day. The NUB project is currently undergoing refinement. Support for Unicode keys plus huge index and resource files (64-bit) coming soon (DONE!). Index::pack() is next on the list. I have made one attempt at it, but it resulted in fast but large indexes in some cases.