3.6. TODO

3.6.1. For next release

  • vscode syntax highlighting: https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide

  • Interpreted_File with concurrency and fsync: std::streambuf from joedb::File

  • more joedbc code generation:

    • Interpreted_Local_Client (support for lock and sync)

    • Split Database with Database_Storage parent

    • Compiler options: allow encapsulation:

      • make read/write access to some fields private

      • allow writing custom member functions

      • example: encapsulate stone-vector allocation / deletion for kifusnap training set

      • indexes:

        • encapsulate multi-column update (cannot write column individually)

        • find_or_new_<index>(cols)

        • delete_<index>(cols)

        • update_<index>(id, cols)

        • in case of unique index failure, throw before actually inserting

      • private access to dropped fields (for old custom functions), cleared at the time of drop

  • Blob cache:

    • keep blob translation index in a joedb file (erasable)

    • write blobs to another file with max size

    • when max size reached, start again from the start (evict overwritten entries)

  • joedb_pack: fill holes left by deleted elements, like write_json.

  • Add support for vcpkg

  • Use clang-format (try to customize it, use tabs)

  • non-durable transactions that do not break durability:

    • switch checkpoints only after durable transaction

    • use negative value for non-durable checkpoint

    • when opening a file: if non-durable checkpoint is equal to file size, OK by default (but option)

    • client option to checkpoint its file every n seconds

    • try to remove default_checkpoint: checkpoint level should be parameter of push and transaction.

3.6.2. New Operations and Types

  • delete_vector

  • Add an undo operation to the log. This way, it is possible to keep all branches of history.

  • Use diff for large-string update

  • Differentiate between “storage type” and “usage type”:

    • remove bool type and use int8 instead, with bool usage

    • usages: bool(int8), date(int64).

    • uint8, uint16, uint32, uint64

    • custom usage label: ip address(int32), URL(string), PNG file(string), UTF8(string) (use base64 instead for json output), …?

3.6.3. Blobs

  • network protocol extension to handle local blob cache without downloading everything

  • zero-copy access to blob data using memory-mapped file

3.6.4. On-disk Storage

  • In a directory

  • A checkpoint file (2 copies, valid if identical)

  • A subdirectory for each table

  • One file per column vector

  • One file for string data (string column = size + start_index)

  • Use memory-mapped files (is there a portable way?)

3.6.5. Compiler

  • check that vector range is OK in constructor of vector update

  • modularize code generation

    • Each module should have:

      • required include files

      • data structure for storing data

      • additional hidden table fields?

      • triggers (after/before insert/update/delete)

      • public methods

    • Possible to modularize:

      • indexes

      • sort functions

      • referential integrity

      • safety checks

      • incrementally updated group-by queries

  • use std::set and std::multiset for indexes? Might be better for strings.

  • Table options:

    • no_delete: allows more efficient indexing (+smaller code)

    • last N (for web access log) (last 0 = none)

  • Allow the user to write custom event-processing functions and store information in custom data structures (for instance: collect statistics from web access log without storing whole log in RAM).

  • Compiler utilities:

    • referential integrity

    • queries (SQL compiler?)

    • incrementally-updated group-by queries (OLAP, hypercube, …)

3.6.6. Concurrency

  • joedb_server:

  • restart very large download from where it stopped (use hash to check before continuing?)

  • SHA-256: option for either none, fast or full.

  • Connection_Multiplexer for multiple parallel backup servers? Complicated. requires asynchronous client code.

  • Do not crash on write error, continue to allow reading?

  • Notifications from server to client, in a second channel:

    • when another client makes a push

    • when the lock times out

    • when the server is interrupted

    • ping

  • SQLite connection (store checkpoint and lock in DB + fail on pull if anything to be pulled)

3.6.7. Use case: log with safe real-time remote backup

  • log rotation, ability to delete or compress early part of the log:

    • multi-part file

    • keeps a table with all parts

    • keep first part as schema definition + checkpoint

    • skip deleted parts when reading

    • option to compress a part at rotation time

  • Asynchronous Server Connection (for tamper-proof log backup)

    • does not wait for confirmation after push

    • can batch frequent pushes (do not send new push until after receiving the previous push confirmation)

    • keeps working even if server dies

3.6.8. Performance

3.6.9. joedb_admin

  • serve with boost::beast.

  • work as a client to a joedb_server.

  • customizable GUI, similar to the icga database editor.

3.6.10. Other Ideas

  • One separate class for each exception, like joedb::exception::Out_Of_Date.

  • ability to indicate minimum joedb version in joedbc (and joedbi?)

  • better readable interface:

    • a separate table abstraction (that could be used for query output)

    • cursors on tables

  • Deal properly with inf and nan everywhere (logdump, joedb_admin, …)

  • Note that SQL does not support inf and nan. Use NULL instead.

  • Raw commands in interpreter?

  • import from SQL

  • namespace for each subdir?