STOR Tutorial¶

STOR is broken down into two major sections:

The Document

The Store

We will provide examples here about these sections, their constituent classes, and operations.

Note

All the examples assume that the following headers are incuded:

//for Documents usage
#include <stor/document/document.h>

//for Store usage
#include <stor/store/store.h>

And that we are using the following namespace implicitly:

using namespace esft::stor;

Or explicitly:

esft::stor::<some_class>

C++ STL Headers for classes like std::string, etc. are omitted.

Document¶

Documents are the objects we put into the Store’s Collections.

Think of Documents as JSON objects with unique identifiers.

Note

The C++ Syntax R(” … “), known as raw string, allows us to build regex-friendly strings and will take care of the many character escapes that we would otherwise need when writing a JSON manually.

Note

If you don’t care about uniqueness and just want to use a Document as a JSON, you can. Just don’t add it to a Collection without a unique ID, or you will get an exception.

A Document is a Node and can be composed of other Nodes.

Nodes can represent:

primitive types:
- int
- long
- double
- bool
- string
objects, in the form: {“key”: Node}
arrays: [Node, Node, …]

Note

A Node object is a thin handler of a RapidJson “Node” (rapidjson::Value) and is given to you by value by STOR.

You can cache it but do not access it after its owner Document has been destroyed.

You can deep copy a Node with another Node to copy the content instead of the handler, with node::copy (see Misc below)

Part of the Node API is inspired by the Jackson API.

Instantiation¶

A Document can be created as an empty JSON:

document doc;

a primitive JSON:

document doc("5");

a JSON array:

//empty
document doc("[]");
//with data
document doc(R("[1, "a", false, 2.0]"));
//
//Or Alternatively...
//
//empty
document doc = document::as_array();

a JSON object:

//empty
document doc("{}");
//with data
document doc(R("
    {
       "key1" : 1,
       "key2": "hello",
       "key3": true,
       "key4": 2.0,
       "key5": 3L,
       "key6": [1,2,3],
       "key7": {
                "subkey": 5
               }
    }
"));
//
//Or Alternatively...
//
//empty
document doc = document::as_object();

A Document may also be initialized from a std::istream, pointing to some character data representing a valid JSON object:

document doc;
some_json_istream >> doc;

A Document identifier can be queried, like so:

const auto &id = doc.id();

Note

A Document identifer is generated by default in the constructor via the static function:

document::make_id()

which uses Boost.UUID. You can override that by providing your own identifier. If you do, it is your responsibility to guarantee uniqueness of this identifier when inserting this document in a Collection.

Interaction¶

Once we have a document, we can know what kind of Node we are looking at, we can move between its constituent Nodes, we can add, remove and modify constituent Nodes.

Type Checking¶

Let’s say we have an Object Document:

node &n = doc; //not necessary, as Document extend a node, but to make the point.
n.is_num();    //false
n.is_int();    //false
n.is_bool();   //false
n.is_long();   //false
n.is_double(); //false
n.is_string(); //false
n.is_object(); //true
n.is_array();  //false
n.is_null();   //false

Value Extraction¶

We can get the value of a Node like this:

int value             = n.as_int();
bool value            = n.as_bool();
long long value       = n.as_long();
double value          = n.as_double();
std::string value     = n.as_string();
node value            = n.as_object();
node value            = n.as_array();
std::pair<const char*,
          size_t> p   = n.as_cstring();//returns the actual length,
                                       //in case string contains \0,
                                       //which is possible in a JSON.

Note

node::as_<type>_() function will throw an esft::stor::document_exception if invoked on a Node of the wrong type.

Navigation¶

From an Object Node, we can get the children nodes indexing by key:

node child = node["child_key"];

If the Node with key “child_key” is not contained in our node, we will get a esft::stor::document_exception.

To check whether such node exists, from an Object Node:

bool exists = node.has("child_key");

From an Array Node, we can get the members by their position:

node mem = node[0];

To query the size of an Array Node (or Object Node):

node.size();

Objects or Array Nodes may also be iterated over with an STL-like forward iterator:

//iterators range to mutable nodes
auto beg = node.begin();
auto end = node.end();
//iterators range to immutable nodes
auto beg = node.cbegin();
auto end = node.cend();
//iterating styles
//1)
while(beg++ != end) { auto node = *beg; }
//2)
std::for_each(beg, end,[]( [const] node& n){} );
//3)
for (; beg != end; ++beg) {auto node = *beg; }
//4)
for ([const] auto &n : node) {}

Note

For an Object Node, the iterator over its children may be used to get the key of a child, like so:

auto child_key = some_iterator.key();

Modification¶

If we have a Value Node, then we may change its value, like so:

node = 1;
node.is_int();//true

node = 2.0;
node.is_double();//true

node = false;
node.is_bool();//true

node = "hello";
node.is_string();//true

node = 5L;
node.is_long();//true

If we have an ArrayNode, then we may add a new node, like so:

//let's assume node is an empty array
node.empty();//true

node.add(1);
node[0].is_int();//true

node.add(2.0);
node[1].is_double();//true

node.add(true);
node[2].is_bool();//true

node.add("hello");
node[3].is_string();//true

node.add(5L);
node[4].is_long();//true

node.add_array();
node[5].is_array();//true
node[5].empty();//true

node.add_object();
node[6].is_object();//true
node[6].empty();//true

We may also add a all members of a Node Array into another, like so:

node.add(other.cbegin(), other.cend());

Or if we have a vector of nodes vect:

node.add(vect);

Note

node::add returns a reference to the current node, so we could do the following:

node
    .add(1)
    .add(2.0)
    .add(true)
    .add("hello");

which would yield:

[1, 2.0, true, "hello"]

We can remove Node Array members, like so:

node.remove(node.cbegin(), node.cend());
//which is equivalent to...
auto it = node.cbegin();
while (it != node.cend(){ it = node.remove(it); }
//which is equivalent to...
node.remove_all();

If we have an Object Node, we may add children, like so:

//put key/value, with value as a primitive
node.put("key", 1);//yields  {"key": 1}
node.put("key", 2.0);//yields {"key": 2.0}
node.put("key", true);//yields {"key": true}
node.put("key", "hello");//yields {"key": "hello"}
node.put("key", 5L);//yields {"key": 5L}

//add a key "key" pointing to Node of type Object, then
//add a key "subkey" pointing to Node of type int directly
//into the newly added Object Node.
node.with("key").put("subkey", 1);//yields {"key": {"subkey": 1}}

//add a key "key" pointing to Node of type Array, then
//add a member Node of type int directly
//into the newly added Array Node.
node.with_array("key").add(1);//yields {"key": [1]}

Note

As noted in the above example, node::with(key) and node::with_array(key) return the newly added node with key key, so you can fluidly start modifying the new node.

node::put instead, like node::add above, returns a reference to the current node. So:

node
    .put("a",1)
    .put("b",2);

will yield:

{
   "a": 1,
   "b": 2
}

while:

node
    .put("a",1)
    .with("b").put("c", true)
              .with_array("d").add(5L)
                              .add(2.0);

will yield:

{
    "a": 1,
    "b": {"c": true,
          "d": [5L, 2.0]
         }
 }

We can remove Objecet Array children, like so:

node.remove("child_key");
//or
node.remove(node.cbegin(), node.cend());
//which is equivalent to...
auto it = node.cbegin();
while (it != node.cend(){ it = node.remove(it); }
//which is equivalent to...
node.remove_all();

Misc¶

To check if an Array Node or Object Node are empty:

node.empty();
//or
node.size() == 0;

You can get the JSON represenation of a Node with:

auto json = node.json();

And you can overwrite the represenation of a Node with a JSON string, like so:

std::string my_json = {"a": 1};
node.json(my_json);

To deep copy the content of another Node v:

node.copy(v);

To write the content of the Node, as a JSON, to a std::ostream os:

node.write_to_stream(os);
//or
os << node;

Store¶

A Store is made up of named Collections. Collections contain Documents.

Create a Database¶

You can create/open an unencrypted database, like so:

std::string path_to_db = "/some/path";
std::string name_of_db = "my_db";
bool remove_on_destruction = false;

store my_db(path_to_db, name_of_db, remove_on_destruction);

Note

As noted from the above example, an unencrypted store may be removed from disk when the corresponding store object goes out of scope and is destroyed.

Or if you compiled STOR with -DSTOR_CRYPTO=ON option, you may do:

std::string path_to_db = "/some/path";
std::string name_of_db = "my_db";
std::string my_key = "123456789abcdefg";//16 len string
std::unique_ptr<access_manager> am{new access_manager{my_key}};

store my_db(path_to_db, name_of_db, std::move(am));

Create/Remove Collections¶

Once a database instance has been initialized, we can create or access a Collection by the name of the collection:

auto &my_collection = my_db["my_collection"];

In the above example, “my_collection” is created if it didn’t exist or is returned if it already existed. To check if a collection exists without having to create, you can use:

my_db.has("my_collection");

To remove a collection you can instead do:

my_db.remove("my_collection");

Note

Collections are always returned by reference, as it is the Database that owns them and frees the associated resources upon destruction.

CRUD Documents in a Collection¶

Once we have a Collection, we can Create, Read, Update, Delete Documents within.

Given some Document doc, initialized as shown in the above section, we can upsert it (update it if it exists, or insert it if it doesn’t), like so:

my_collection.put(my_doc);

Note

Note how this operation replaces an existing document. This is determined by the doc.id(), which is used as “primary key”. This is why it’s important for these IDs to be unique, or data would be lost.

You can remove it, like so:

my_collection.remove(doc);
//or
my_collection.remove(doc.id());

Or check for a document existence:

my_collection.has(doc.id());

You can also get a document by its ID:

auto my_doc = my_collection["some_unique_doc_id"];

An example of how to use these functions could be the following:

std::string my_doc_id = "some_unique_doc_id";

auto doc_to_update = my_collection[my_doc_id];
doc_to_update.with("new_obj").put("important reminder", "buy candies");

my_collection.put(doc_to_update);

Queries¶

Documents can be extracted from a Collection by more than just their IDs. We can define some “indices” to fields of a Document and query them by those fields.

Indices¶

An index role is to point to some field of a Document, so that a Collection may be aware of this field when inserting a Document and use it to facilitate querying of Documents by that field.

This of an Index like a “dot-separated path” to a field:

Given a Document doc with the below JSON representation:

doc = {
            "a": 1,
            "b": "hello"
            "c": {"sub", 5}
       }

We would be able to query documents by key “a”, “b”, and “sub” with the following indexex:

index_path("a");
index_path("b");
index_path("c.sub");

Note

In the current implementation, we can only practically index “primitive” fields. What this means is that the index will have to point to a primitive Node within a Document and not an Object or an Array Node.

i.e.

Given:

doc = {
        "a": 1,
        "b": {"ba": 3},
        "c": [1,2,3]
      }

The following indices will produce desirable results, given the operations introduced below:

index_path("a");
index_path("b.ba")

While these won’t:

index_path("b");
index_path("c");

Future implementations may change this, for example, by allowing to index an Array Node and check for the existence of a value within its members by using the $in operator.

To register an Index with a collection, you can do:

my_collection.add_index(index_path("some.path.to.field"));
//or more simply
my_collection.add_index("some.path.to.field");

Multiple indices may be registered at once, like so:

//some range of iterators `beg` and `end` pointing to
//valid index_path or strings convertible to index_path
my_collection.add_indices(beg, end);

//some STL-like container `container` (with begin(), end() functions)
my_collection.add_indices(container);

Finally, indices may be removed, like so:

my_collection.clear_indices();

And queried, like so:

const auto &index_set = my_collection.indices();
for (const auto &index: index_set){
    std::cout << index.str() << "\n";
}

Supported Query Operations¶

Below, we will present the currently implemented querying operations. The syntax for these operations is influenced by that of MongoDB

Queries are performed, like so:

query = /** a Document of type Object Node with a JSON
        represenation matching a valid `Query Object` **/
auto matching_documents_set = my_collection.find( query );

Query Objects can be divided into two families of operations:

                |---Leaf Operations: [ "$eq", "$neq", "$gt', "$gte", "$lt', "$lte" ]
Operation-------|
                |---Node Operations: [ "$and", "$or" ]

General Query Syntax:

Leaf Operations:
{ "$<leaf_operator>": {"<index_path>": <value>} }
Node Operations (or Aggregating Operations):
{ "$<node_operator>": [ <Operation>, <Operation> ] }

Now, let’s explore some examples.

Let’s use the following JSON represenations of Documents:

+------------------+------------------+
|    Document A    |    Document B    |
+------------------+------------------+
|{                 |{                 |
|    "a": 1,       |    "a": 2,       |
|    "z": true,    |    "z": true,    |
|    "sub": {      |    "sub": {      |
|      "b": "aaa"  |      "b": "baa"  |
|    }             |    }             |
|}                 |}                 |
+------------------+------------------+

Let’s assume that both these Documents have been added in the collection my_collection and that the following indices have been registered:

my_collection.add_indices(std::vector<index_path>{"a", "z", "sub.b"});

Operations:

Equality:

//returns set with: Document A
my_collection.find(R"( {"$eq": {"a": 1}} )");

//returns set with: Document B
my_collection.find(R"( {"$eq": {"sub.b": "baa"}} )");

//returns set with: Nothing
>> my_collection.find(R"( {"$eq": {"a": 5}} )");

Non-Equality:

//returns set with: Document B
my_collection.find(R"( {"$neq": {"a": 1}} )");

//returns set with: Document A
my_collection.find(R"( {"$neq": {"a": 2}} )");

//returns set with: Document A, B
my_collection.find(R"( {"$neq": {"a": 5}} )");

//returns set with: Nothing
my_collection.find(R"( {"$neq": {"z": true}} )");

Greater Than:

//returns set with: Document B
my_collection.find(R"( {"$gt': {"a": 1}} )");

//returns set with: Document A, B
my_collection.find(R"( {"$gt': {"a": 0}} )");

//returns set with: Nothing
>> my_collection.find(R"( {"$gt': {"a": 2}} )");

Greater-Equal Than:

//returns set with: Document A, B
my_collection.find(R"( {{"$gte": {"a": 1}} )");

//returns set with: Document B
>> my_collection.find(R"( {{"$gte": {"a": 2}} )");

//returns set with: Nothing
my_collection.find(R"( {{"$gte": {"a": 3}} )");

Less Than:

//returns set with: Nothing
my_collection.find(R"( {{"$lt': {"a": 1}} )");

//returns set with: Document A
my_collection.find(R"( {{"$lt': {"a": 2}} )");

//returns set with: Document A,B
my_collection.find(R"( {{"$lt': {"a": 3}} )");

Less-Equal Than:

//returns set with: Document A
my_collection.find(R"( {{"$lte": {"a": 1}} )");

//returns set with: Document A, B
my_collection.find(R"( {{"$lte": {"a": 2}} )");

//returns set with: Nothing
my_collection.find(R"( {{"$lte": {"a": 0} )");

OR:

//returns set with: Document A, B
my_collection.find(R"(
                        {
                          "$or": [
                                    {"$eq":  {"sub.b": "aaa"}},
                                    {"$eq":  {"sub.b": "baa"}}
                                 ]
                        }
                     )");

AND:

//returns set with: Document A
my_collection.find(R"(
                        {
                          "$and": [
                                    {"$eq":  {"z": true}},
                                    {"$neq": {"a": 2}   }
                                  ]
                        }
                     )");

//returns set with: Document A,B
my_collection.find(R"(
                        {
                          "$and": [
                                    {"$eq":  {"z": true}},
                                    {
                                      "$or": [
                                                {"$eq":  {"sub.b": "aaa"}},
                                                 {"$eq":  {"sub.b": "baa"}}
                                             ]
                                    }
                                  ]
                        }
                     )");

Backup/Restoration of a Collection¶

We can write the content of a Collection to some std::ostream os for backup, like so:

bool keep_indices = false;//choose whether you want to backup the indices as well
my_collection.deflate(os, keep_indices);

And restore it with an std::istream is pointing to data created by a previous deflation, like so:

my_collection.inflate(is);

Warning

The current implementation will not preserve encryption with these operations, so your database will be deflated unencrypted. Future implementation will change this and keep the deflated data encrypted. For now, be aware of this limitation.

Misc¶

Sync/Async¶

Write operations against a Database collections may happen in a synchronous or asynchronous fashion, as defined by LevelDB.

By default, a store is initialized in “async” mode. You can change that by doing:

my_db.set_async(false);

Or switch it back to async:

my_db.set_async(true);

You can check the current mode, like so:

bool async = my_db.is_async();