Introduction to Lovefield, a cross-browser SQL-Like Query Engine

Posted on Oct 9, 2015

class: center, middle

An introduction to Lovefield

cross-browser SQL-Like relational query engine

by Ivan Fraixedes

@ifraixedes

http://ivan.fraixed.es


Agenda

#### 1. Introduction

2. Data Store

3. Schema

4. Query

5. Transactions

5. Architecture & Implementation

6. sql.js

#### 7. Demo

8. Credits


class: center, middle

Introduction

???

The main goal is to provide a relational query engine for web apps after WebSQL was deprecated (end of 2010) and apps need to work online & offline and the current standard (IndexedDB) achieve that goal however it is a key-value store.

IndexedDB is a perfect solution for use cases where a key-value store fits very well however there are use cases which requires to select and sort data by multiple columns or make relations between different data entities, which can be also achieved by a key-value store but make the implementation more complex and larger, hence less maintainable and increasing the development time not just for the itself implementation but testing as well.

Those mentioned use cases fit very well to relational database (a.k.a SQL databases) and for this reason Lovefield has been created.

Lovefield provides structural data queries besides transactions making more attractive to be used for these kind of web apps.

Assumptions:

  • At most 2GB.
  • Limited subset of SQL-2003.
  • It doesn’t provide storage technology, uses existing ones.

Mentioned goodness:

  • SQL-Lie relational db query engine which covers most uses cases supported by WebSQL.
  • Chrome Apps v2 compliant.
  • Drop-in library.
  • Can be used as a component.
  • Can be used in several JS frameworks: Closure, jQuery, Polymer, AngularJS, etc.
  • cross-browser.
  • Low-end device support.
  • It can be used with different storages.

Data Store

No data store is provided, it uses existing data stores Currently supported:

  • IndexedDB
  • Memory (not persistent)
  • Firebase (cloud hosted, non-local)
  • WebSQL* (For Safari compatibility)
  • LocalStore* (PoC)

* Experimental

???

Lovefield doesn’t provide a data store, it uses existing data stores; it abstracts data persistence into classes to make it adaptive to different storage media and technologies besides to help to decouple storage from the query engine.

Internally it uses rows as data unit but the select queries return plain JS objects. For insert statements the JS plain object must be converted in row data units; Lovefield provide helper createRow function to do that.

Data is organised to optimise the performance of the data store, so it cannot guarantee the raw data readability; moreover specific data store requirements as authentication (e.g. Firebase), allocated size aren’t managed by Lovefield.


Schema

  • A schema is created with a schema builder which is instantiated through a schema builder with lf.schema.create() static function.

  • A schema builder instance has two functions createTable() and connect()

??? As a usual SQL database the schema is the the definition of the tables, their columns which and constraints. Database name must abide a specific naming rule and a version must be specified as an integer greater than 0; any schema update requires to increase this number otherwise won’t be update or if is smaller it will thrown an error;

lf.schema.create() static function provides a single point of truth and prevents the document from being outdated. Schema builder instances is stateful: “building” and “finalised”; schema can only be modified in building state.

Columns can be nullable by they aren’t by default, so there are default values for each type except ARRAY_BUFFER and OBJECT whose default value is null. Indexes key can only be set on “STRING” and “NUMBER” the reset of them are converted to those with the exception of ARRAY_BUFFER and OBJECT cannot put as index or have any constraint nor they aren’t searchable (use on WHERE clauses).

  • As SQL columns are specified with a name, type and optional constraints

??? Constraints available are: primary key, foreign key, unique (multiple column accepted), Null & Not-nullable.

// Begin schema creation.
var schemaBuilder = lf.schema.create('crdb', 1);

schemaBuilder.createTable('Asset').
    addColumn('id', lf.Type.STRING).
    addColumn('asset', lf.Type.STRING).
    addColumn('timestamp', lf.Type.INTEGER).
    addPrimaryKey(['id']);

// Schema is defined, now connect to the database instance.
schemaBuilder.connect().then(
    function(db) {
      // Schema is not mutable once the connection to DB has established.
    }, console.error);

??? Lovefield implements its own indexes, it doesn not use any index system provided by the store; it uses by default a B+ Tree structure and only index non-null and indexable types. By default indexes aren’t persisted to the store and they’re build in memory during loading; they can be persisted with persistentIndex()


Query

  • Lovefiled offers 4 types of queries as we expect to find in a persitent storage; the famous CRUD acronym

  • Queries ARE NOT created with raw SQL statements; they are created though a query builder

  • The queries are composed with a “chainable pattern” ???

All the operations are performed as a chain; we execute a function which define the operation (select, create, etc.) to perform and we get a query builder instance which has available a set of functions to specify the query that we desire; each function specify define a different part of the operations (for example the tables, conditions, etc.) and returns the same query builder instance to be chainable.

  • Only a few functions (depending of the operation) can be called more than once

  • exec function executes the query and returns a promise which resolve with the results or reject in case of error ???

When the query is specified as we want, we must call exec to execute it;


Query: Insert/Create (C)

Insert depends of another function createRow which transform a plain JS object in a Lovefield Row object

var infoCard = db.getSchema().table('InfoCard');

var row = infoCard.createRow({
  'id': 'something',
  'lang': 140
});

db.insertOrReplace().
  into(infoCard).
  values([row]).
  exec().
  catch(console.error);

Query: Select (R)

  • Select function receive the COLUMNS which we want to retrieve

  • Only supports inner joins and left outer joins

??? There are 2 functions to perform them however INNER JOIN can be done with WHERE clause as SQL

  • from function to specify the TABLE(s)

  • where function to specify CONDITION(s)

  • limit & skip functions for PAGINATION

  • orderBy function to SORT (it can be called more than once)

  • groupBy function for aggregation (HAVING operation is not supported)

Query: Select (R) // EXAMPLE

var infoCard = db.getSchema().table('InfoCard');
var asset = db.getSchema().table('Asset');
var q = db.select(infoCard.id, infoCard.itag, asset.timestamp).
    from(infoCard.as(card))
    innerJoin(asset, asset.id.eq(infoCard.id)).
    where(asset.id.eq('1')).
    exec().then(function(rows) {
      // Prefixed columns, context involves two tables
      console.log(
	rows[0]['card']['id'],
	rows[0]['card']['tag'],
	rows[0]['Asset']['timestamp']);
    }).
    catch(console.error);

??? Mention ALIAS and how the results format.


Query: Update (U)

// UPDATE order SET amount = 51, currency = 'EUR'
//   WHERE currency = 'DEM' AND amount = 100;

var order = db.getSchema().table('order');
db.update(order).
    set(order.amount, 51).
    set(order.currency, 'EUR').
    where(lf.op.and(
	order.currency.eq('DEM'), order.amount.eq(100))).
    exec().
    catch(console.error);

???

Mention that set can be called more than once.


Query: Delete (D)

// DELETE FROM infoCard WHERE lang = 'es';
db.delete().
  from(infoCard).
  where(infoCard.lang.eq('es')).
  exec().
  catch(console.error);

db.delete().
  from(infoCard).
  exec().
  catch(console.error); // Delete everything in infoCard

Query: Extras // Parametrised Queries

var p = db.getSchema().table('Photo');
// UPDATE Photo SET timestamp = ?1, local = ?2 WHERE id = ?0;
var q2 = db.
    update(p).
    set(p.timestamp, lf.bind(1)).
    set(p.local, lf.bind(2)).
    where(p.id.eq(lf.bind(0)));

q2.bind(['id3', 345, false]).
  exec().
  catch(console.error);  // update without reconstructing query

q2.bind(['id4', 2222, true]).
  exec().
  catch(console.error);

??? LIMIT and SKIP can also be parametrized


Query: Extras // Observers

var p = db.getSchema().table('Photo');
var query = db.select().from(p).where(p.id.eq('1'));

// Handler shares exactly same syntax as the handler for Array.observe.
var handler = function(changes) {
  // Will be called every time there is a change until db.unobserve is called.
};
db.observe(query, handler);

// The call below will trigger changes to the observed select. Internally
// Lovefield will run the query again if the scope overlaps, therefore please
// be aware of performance consequences of complex SELECT.
db.update(p).set(p.title, 'New Title').where(p.id.eq('1')).exec();

// Remember to release observer to avoid leaking.
db.unobserve(query, handler);

??? Lovefield supports data observation for select queries, and the syntax is very similar to ES7 Array.observe() Combining parametrized query with Observers can be used to handle a common scenario of updating data in MVC environment


Transactions

It also has transactions; 2 types:

  1. Implicit: Created automatically when exec is called.

  2. Explicit: similar to SQL BEGIN => COMMIT or ROLLBACK and they are executed through a transaction object which is created with db.createTransaction

  3. Lovefield can execute, in a transaction, queries which are referencing data from results of previous queries using tx.attach()

???

All queries are executed inside of a transaction, its cycle is:

  1. Creation
  2. Binding (optional) in case of paremetrised values
  3. Execution within a transaction context
  4. Finalise, commit or rollback

When a transaction (either implicit or explicit) is resolved, no further queries can be made with it.


Architecture & Implementation

  1. Its architecture has been very affected by IndexedDB specification

??? Lovefield specification has been affected by some caveats in IndexedDB specification:

  • Auto commit when a transaction has gone outside of its message loop. e.g XHR call in IndexedDB event callback.
  • Upgrade schema can only be upgraded on onupgradeneeded event so it can be only on initialisation time and it isn’t possible to rename tables and the only workaround it’s recreate the table with the exact content and delete the old one which cannot be done safely so user must be do it manually outside of the onUpgrade function.
  • Close the conneciton is not guaranteed because the IndexedDB doesn’t guarantee that if there’re connections open, so Lovefield doens’t offer a reliable way to close and reopen to upgrade the schema.
  • First writer win due IndexedDB allows multiple connections from different processes/tabs but it doesn’t offer a general blocking; hence the first writer to reach the DB wins and the others, with conflict scope, will be aborted. Moreover it doesn’t offer observer/events cross sessions/processes/tabs.

  1. Tables are mapped to IndexedDB stores and rows to objects.

??? The storage format is on the left Lovefield and on the right the IndexedDB equivalent

  • Database - Database
  • Table - Object store
  • Row - Object in object store

Lovefield’s rows are two fields: an ID (unique row accross DB) and value which is an object where each field represents a column.

  1. Efforts are focused on IndexedDB as store due it’s been declared standard

??? There is an experimental Bundled mode where the difference is that an object in IndexedDB’s object store contains up to 512 Lovefield’s rows. The test done suggest that it’s better for 10k+ rows; it’s always slow on write/update but it’s faster on read operations however for smaller databases than 10+rows this mode may experience slower performance. Data is harder to inspect via web developer tools and to convert from one format to another, requires to export, delete all, connect and import back.

Other stores are structured different and they have different caveats.

  1. Assumption that there is only one connection at given time.

???

All of these things has specified that for now Lovefield assume to have only one connection to database instance at a given time; having several can produce data inconsistency; Lovefield’s team is working on this issue evaluating several proposals, right now the best practices is to use a background process (background page, WebWorker or ServiceWorker) to connect to db and the clients use postMessage to that component to perform the operations.

  1. Results are references to the object in the cache, NO COPIES.

???

All the objects returned in the results are references, no copies of the objects stored in the Lovefield internal cache, so NOT TO MODIFY those objects is a RULE that developer must follow.


sql.js

sql.js is a port of SQLite to JavaScript, by compiling the SQLite C code with Emscripten. no C bindings or node-gyp compilation here.

//Create the database
var db = new SQL.Database();
// Run a query without reading the results
db.run("CREATE TABLE test (col1, col2);");
// Insert two rows: (1,111) and (2,222)
db.run("INSERT INTO test VALUES (?,?), (?,?)", [1,111,2,222]);

// Prepare a statement
var stmt = db.prepare("SELECT * FROM test WHERE col1 BETWEEN $start AND $end");
stmt.getAsObject({$start:1, $end:1}); // {col1:1, col2:111}

// Bind new values
stmt.bind({$start:1, $end:2});
while(stmt.step()) { //
    var row = stmt.getAsObject();
    // [...] do something with the row of result
}

???

The API is quite small and all the queries are done using “standard” SQL (SQLite compatbile) so we don’t have to learn a new API as in Lovefield, even though its API is similar to SQL.

The tradeoff is that I’m not really sure how it performs and if it’s exactly possible to store the database in an IndexedDB field for example and retrieve the SQLite DB each time that user access to manage it and how efficient it can be because I to avoid to lose data each write should persist the SQLite DB to IndexedDB so it would require to store the whole data for any small modification.


Credits

This material is heavily taken form the resources listed above, in case of any modification or addition the same LICENSE than the original work is applied. Lovelfield documentation and code snippets have been taken from Lovefield project which is under Apache License and sql.js documentation and code snippets have been taken from sql.js project which is under MIT License