This is the blog section.
Files in these directories will be listed in reverse chronological order.
This is the multi-page printable view of this section. Click here to print.
This is the blog section.
Files in these directories will be listed in reverse chronological order.
Announcing the release of Spice v1.0-rc.2 π
Spice v1.0.0-rc.2 is the second release candidate for the first major version of Spice.ai OSS. This release continues to build on the stability of Spice for production use, including key Data Connector graduations, bug fixes, and AI features.
MS SQL and File Data Connectors: Graduated from Alpha to Beta.
GraphQL and Databricks Delta Lake Data Connectors: Graduated from Beta to Release Candidate.
gospice SDK Release: The Spice Go SDK has updated to v7.0, adding support for refreshing datasets and upgrading dependencies.
Azure AI Support: Added support for both LLMs and embedding models. Example spicepod.yml
configuration:
embeddings:
- name: azure
from: azure:text-embedding-3-small
params:
endpoint: https://your-resource-name.openai.azure.com
azure_api_version: 2024-08-01-preview
azure_deployment_name: text-embedding-3-small
azure_api_key: ${ secrets:SPICE_AZURE_API_KEY }
models:
- name: azure
from: azure:gpt-4o-mini
params:
endpoint: https://your-resource-name.openai.azure.com
azure_api_version: 2024-08-01-preview
azure_deployment_name: gpt-4o-mini
azure_api_key: ${ secrets:SPICE_AZURE_TOKEN }
Accelerate subsets of columns: Spice now supports acceleration for specific columns from a federated source. Specify the desired columns directly in the Refresh SQL for more selective and efficient data acceleration.
Example spicepod.yaml
configuration:
datasets:
- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips
params:
file_format: parquet
acceleration:
refresh_sql: SELECT tpep_pickup_datetime, tpep_dropoff_datetime, trip_distance, total_amount FROM taxi_trips
Sharepoint Authentication Parameters: now use access tokens instead of authorization codes, using the sharepoint_bearer_token
parameter. The sharepoint_auth_code
parameter has been removed.
Data Connector Delimiters: now support /
and ://
, in addition to :
in the from
parameter of the dataset configuration. The following examples are equivalent:
from: postgres://my_postgres_table
from: postgres/my_postgres_table
from: postgres:my_postgres_table
Some data connectors, such as s3
which only accepts ://
, place further restrictions on the allowed delimiter.
The file
data connector has changed how it interprets the ://
delimiter to reflect how most other URL parsers work, i.e. file://my_file_path
. Previously, the file path was interpreted as /my_file_path
. Now, it is interpreted as a relative path, i.e. my_file_path
.
Spice Search limit: is now applied to the final search result, instead of previously being applied separately to each dataset involved in a search before aggregation.
baggage
header by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3722llms
dependencies: mistralrs
, async-openai
by @Jeadie in https://github.com/spiceai/spiceai/pull/3725jsonl
for object store by @Jeadie in https://github.com/spiceai/spiceai/pull/3726create_accelerated_table
by @sgrebnov in https://github.com/spiceai/spiceai/pull/3739sentence_*_config.json
, download HF async, use TEI functions by @Jeadie in https://github.com/spiceai/spiceai/pull/3724http_requests
metric and deprecate http_requests_total
by @sgrebnov in https://github.com/spiceai/spiceai/pull/3748Map
type mapping to arrow type by @Sevenannn in https://github.com/spiceai/spiceai/pull/3776/v1/packages/generate
API to generate a Spicepod package from a GitHub repo. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3782Spice-Target-Source
header for spice add
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3783spice connect
for connecting to existing Spice.ai instances by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3790eval
spicepod component; basic HTTP api to run eval. by @Jeadie in https://github.com/spiceai/spiceai/pull/3766trace_id
& parent_span_id
overrides for v1/chat/completion
by @Jeadie in https://github.com/spiceai/spiceai/pull/3791:
, /
or ://
as the delimiter for the data connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3821read_write
mode support for Postgres Data Connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/3813spice.ai
data connector dataset path format to <org>/<app>/datasets/<table_reference>
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3828Memtable
by @Jeadie in https://github.com/spiceai/spiceai/pull/3829spice login abfs
by @Sevenannn in https://github.com/spiceai/spiceai/pull/3844crates/llms
dependencies to ‘spiceai’ branch by @Jeadie in https://github.com/spiceai/spiceai/pull/3846spice.eval.{results, runs}
tables. by @Jeadie in https://github.com/spiceai/spiceai/pull/3780tokio::test
per test/model by @Jeadie in https://github.com/spiceai/spiceai/pull/3696max_completion_tokens
vs max_tokens
for openai vs azure by @Jeadie in https://github.com/spiceai/spiceai/pull/3869evalconverter
that creates spice eval components. by @Jeadie in https://github.com/spiceai/spiceai/pull/3864evals
accelerated tables updates in debug mode by @sgrebnov in https://github.com/spiceai/spiceai/pull/3884endpoint
parameter required by @sgrebnov in https://github.com/spiceai/spiceai/pull/3883Full Changelog: https://github.com/spiceai/spiceai/compare/v1.0.0-rc.1...v1.0.0-rc.2
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v1.0-rc.1 π
Spice v1.0.0-rc.1 marks the release candidate for the first major version of Spice.ai OSS. This milestone includes key Connector and Accelerator graduations and bug fixes, positioning Spice for a stable and production-ready release.
API Key Authentication: Spice now supports optional authentication for API endpoints via configurable API keys, for additional security and control over runtime access.
Example Spicepod.yml configuration:
runtime:
auth:
api-key:
enabled: true
keys:
- ${ secrets:api_key } # Load from a secret store
- my-api-key # Or specify directly
Usage:
X-API-Key
header.Authorization
header as a Bearer token.--api-key
flag for CLI commands.For more details on using API Key auth, refer to the API Auth documentation.
DuckDB Data Connector: Has graduated from Beta to Release Candidate.
Arrow and DuckDB Data Accelerators: Both have graduated from Beta to Release Candidates.
Debezium Kafka Integration: Spice now supports secure authentication and encryption options for Kafka connections when using Debezium for Change Data Capture (CDC). The previous limitation of PLAINTEXT protocol-only connections has been lifted. Spice now supports the following Kafka security configurations:
Example Spicepod.yml configuration:
datasets:
- from: debezium:my_kafka_topic_with_debezium_changes
name: my_dataset
params:
kafka_security_protocol: SASL_SSL
kafka_sasl_mechanism: SCRAM-SHA-512
kafka_sasl_username: kafka
kafka_sasl_password: ${secrets:kafka_sasl_password}
kafka_ssl_ca_location: ./certs/kafka_ca_cert.pem
Model Parameters: The params.spice_tools
parameter has been replaced by params.tools
. Backward compatibility is maintained for existing configurations using params.spice_tools
.
Dataset Accelerator State: The ready_state
parameter has been moved to the dataset level.
Ready Handler Response: The response body of the /v1/ready
handler has been changed from Ready
(uppercase) to ready
(lowercase) for consistency and adherence to standards.
Default Kafka Security for Debezium: The default Kafka kafka_security_protocol
parameter for Debezium datasets has changed from PLAINTEXT
to SASL_SSL
, improving security by default.
Metrics Name Updates: Adjustments have been made to specific metrics for improved observability and accuracy:
Before | v1.0-rc.1 |
---|---|
catalogs_load_error | catalog_load_errors |
catalogs_status | catalog_load_state |
datasets_acceleration_append_duration_ms, datasets_acceleration_load_duration_ms | dataset_acceleration_refresh_duration_ms {mode: append/full} |
datasets_acceleration_last_refresh_time | dataset_acceleration_last_refresh_time_ms |
datasets_acceleration_refresh_error | dataset_acceleration_refresh_errors |
datasets_count | dataset_active_count |
datasets_load_error | dataset_load_errors |
datasets_status | dataset_load_state |
datasets_unavailable_time | dataset_unavailable_time_ms |
embeddings_count | embeddings_active_count |
embeddings_load_error | embeddings_load_errors |
embeddings_status | embeddings_load_state |
flight_do_action_duration_ms, flight_do_get_get_primary_keys_duration_ms, flight_do_get_get_catalogs_duration_ms, flight_do_get_get_schemas_duration_ms, flight_do_get_get_sql_info_duration_ms, flight_do_get_table_types_duration_ms, flight_do_get_get_tables_duration_ms, flight_do_get_prepared_statement_query_duration_ms, flight_do_get_simple_duration_ms, flight_do_get_statement_query_duration_ms, flight_do_put_duration_ms, flight_handshake_request_duration_ms, flight_list_actions_duration_ms, flight_get_flight_info_request_duration_ms | flight_request_duration_ms {method: method_name, command: command_name} |
flight_do_action_requests, flight_do_exchange_data_updates_sent, flight_do_exchange_requests, flight_do_put_requests, flight_do_get_requests, flight_handshake_requests, flight_list_actions_requests, flight_list_flights_requests, flight_get_flight_info_requests, flight_get_schema_requests | flight_requests {method: method_name, command: command_name} |
http_requests_duration_ms | http_request_duration_ms |
models_count | model_active_count |
models_load_duration_ms | model_load_duration_ms |
models_load_error | model_load_errors |
models_status | model_load_state |
tool_count | tool_active_count |
tool_load_error | tool_load_errors |
tools_status | tool_load_state |
query_count | query_executions |
query_execution_duration | query_execution_duration_ms |
results_cache_hit_count | results_cache_hits |
results_cache_item_count | results_cache_items_count |
results_cache_max_size | results_cache_max_size_bytes |
results_cache_request_count | results_cache_requests |
results_cache_size | results_cache_size_bytes |
secrets_stores_load_duration_ms | secrets_store_load_duration_ms |
bytes_processed | query_processed_bytes |
bytes_returned | query_returned_bytes |
spiced_runtime_flight_server_start | runtime_flight_server_started |
spiced_runtime_http_server_start | runtime_http_server_started |
views_load_error | view_load_errors |
refresh-sql
via CLI by @sgrebnov in https://github.com/spiceai/spiceai/pull/3374params.model_type
for most HF LLMs by @Jeadie in https://github.com/spiceai/spiceai/pull/3342query_duration_seconds
and http_requests_duration_seconds
with milliseconds
metrics by @sgrebnov in https://github.com/spiceai/spiceai/pull/3251Extension<Runtime>
to HTTP routes to simplify tooling in NSQL. by @Jeadie in https://github.com/spiceai/spiceai/pull/3384--pods-watcher-enabled
by @Jeadie in https://github.com/spiceai/spiceai/pull/3428datatype_is_semantically_equal
in verify_schema
by @Sevenannn in https://github.com/spiceai/spiceai/pull/3423TableReference
quoting for MySQL by @Jeadie in https://github.com/spiceai/spiceai/pull/3461params.tools
, not params.spice_tools
. Allow backwards compatibility to params.spice_tools
. by @Jeadie in https://github.com/spiceai/spiceai/pull/3473v1/nsql
by @Jeadie in https://github.com/spiceai/spiceai/pull/3487document_similarity
to return markdown, not JSON. by @Jeadie in https://github.com/spiceai/spiceai/pull/3477datafusion-table-providers
version by @Jeadie in https://github.com/spiceai/spiceai/pull/3503text-embeddings-inference
and mistral.rs
from downstream. by @Jeadie in https://github.com/spiceai/spiceai/pull/3505ready_state
to dataset level by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3526--force
option to spice upgrade
to force it to upgrade to the latest released version by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3527spice search
error handling by @sgrebnov in https://github.com/spiceai/spiceai/pull/3571spice search
to default to only datasets with embeddings by @sgrebnov in https://github.com/spiceai/spiceai/pull/3588none
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3598/v1/datasets
APIs when app is locked by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3601query_invocations
to query_executions
by @sgrebnov in https://github.com/spiceai/spiceai/pull/3613--set-runtime
CLI flags by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3619v1/datasets
api to indicate if dataset can be used in vector search by @sgrebnov in https://github.com/spiceai/spiceai/pull/3644spice search
to warn if dataset is not ready and won’t be included in search by @sgrebnov in https://github.com/spiceai/spiceai/pull/3590llms
crate, with basic Anthropic test. by @Jeadie in https://github.com/spiceai/spiceai/pull/3647microsoft/Phi-3-mini-4k-instruct
to llms crate testing, with MODEL_SKIPLIST
& MODEL_ALLOWLIST
by @Jeadie in https://github.com/spiceai/spiceai/pull/3690Full Changelog: https://github.com/spiceai/spiceai/compare/v0.20.0-beta...v1.0.0-rc.1
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.20-beta π§©
Spice v0.20.0-beta improves federated query performance with column pruning and adds support for Metal (Apple Silicon) and CUDA (NVidia) accelerators. The S3, PostgreSQL, MySQL, and GitHub Data Connectors have graduated from Beta to Release Candidates. The Arrow, DuckDB, and SQLite Data Accelerators have graduated from Alpha to Beta.
Data Connectors: The S3
, PostgreSQL
, MySQL
, and GitHub
Data Connectors have graduated from beta
to release candidate
.
Data Accelerators: The Arrow
, DuckDB
, and SQLite
Data Accelerators have graduated from alpha
to beta
.
Metal and CUDA Support: Added support for Metal (Apple Silicon) and CUDA (NVidia) for AI/ML workloads including embeddings and local LLM inference.
For instructions on compiling a Meta or CUDA binary, see the Installation Docs.
Example invalid connection string:
DRIVER={/path/to/driver.so};SERVER=localhost;DATABASE=master
Example valid connection string:
DRIVER={My ODBC Driver};SERVER=localhost;DATABASE=master
Where My ODBC Driver
is the name of an ODBC driver registered in the ODBC driver manager.
metal
& cuda
flags for spice by @Jeadie in https://github.com/spiceai/spiceai/pull/3212spice upgrade
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3341Full Changelog: https://github.com/spiceai/spiceai/compare/v0.19.4-beta...v0.20.0-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.19.4-beta βοΈ
Spice v0.19.4-beta introduces a new localpod
Data Connector, improvements to accelerator resiliency and control, and a new configuration to control when accelerated datasets are considered ready.
localpod
Connector: Implement a “tiered” acceleration strategy with a new localpod
Data Connector that can be used to accelerate datasets from other datasets registered in Spice.
datasets:
- from: s3://my_bucket/my_dataset
name: my_dataset
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_check_interval: 60s
- from: localpod:my_dataset
name: my_localpod_dataset
acceleration:
enabled: true
Refreshes on the localpod
’s parent dataset will automatically be synchronized with the localpod
dataset.
Improved Accelerator Resiliency: When Spice is restarted, if the federated source for a dataset configured with a file-based accelerator is not available, the dataset will still load from the existing file data and will attempt to connect to the federated source in the background for future refreshes.
Accelerator Ready State: Control when an accelerated dataset is considered “ready” by the runtime with the new ready_state
parameter.
datasets:
- from: s3://my_bucket/my_dataset
name: my_dataset
acceleration:
enabled: true
ready_state: on_load # or on_registration
ready_state: on_load
: Default. The dataset is considered ready after the initial load of the accelerated data. For file-based accelerated datasets that have existing data, this means the dataset is ready immediately.ready_state: on_registration
: The dataset is considered ready when the dataset is registered in Spice. Queries against this dataset before the data is loaded will fallback to the federated source.Accelerated datasets configured with ready_state: on_load
(the default behavior) that are not ready will return an error instead of returning zero results.
ROLLUP
and GROUPING
by @sgrebnov in https://github.com/spiceai/spiceai/pull/3277stddev
by @sgrebnov in https://github.com/spiceai/spiceai/pull/3279spice_sys_dataset_checkpoint
to store federated table schema by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3303Full Changelog: https://github.com/spiceai/spiceai/compare/v0.19.3-beta...v0.19.4-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.19.3-beta π
Spice v0.19.3-beta improves the performance and stability of data connectors and accelerators, including faster queries across multiple federated sources by optimizing how filters are applied. Anthropic has also been added as a LLM model provider.
DataFusion Fixes: Resolved bugs in DataFusion and DataFusion Table Providers, expanding TPC-DS coverage and correctness.
GitHub Data Connector Beta Milestone: The GitHub Data Connector has graduated to Beta after extensive testing, stability, and performance improvements.
Anthropic Models Provider: Anthropic has been added as an LLM provider, including support for streaming.
Example spicepod.yml
:
models:
- from: anthropic:claude-3-5-sonnet-20240620
name: claude_3_5_sonnet
params:
anthropic_api_key: ${ secrets:SPICE_ANTHROPIC_API_KEY }
None.
text_embedding_inference::Infer
for more complete embedding solution by @Jeadie in https://github.com/spiceai/spiceai/pull/3199v1/nsql
. by @Jeadie in https://github.com/spiceai/spiceai/pull/3105localpod
Data Connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3249Full Changelog: https://github.com/spiceai/spiceai/compare/v0.19.2-beta...v0.19.3-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.19.2-beta β‘
Spice v0.19.2-beta continues to improve performance and stability of data connectors and data accelerators, further expands TPC-DS coverage, and includes several bug fixes.
DataFusion Fixes: Resolved bugs in DataFusion and DataFusion Table Providers, improving TPC-DS query support and correctness.
TPC-DS Snapshots: Extended support for TPC-DS benchmarks with added snapshot tests for validating query plans and result accuracy.
PostgreSQL Accelerator Beta: Postgres Data Accelerator has been promoted to Beta Quality
hive_infer_partitions
parameter been changed to hive_partitioning_enabled
, now defaults to false
and must be explicitly enabled.2bcf481b4abe9d0bd6bb2479ce49020df66ff97f
.unnest
support for federated plans by @sgrebnov in https://github.com/spiceai/spiceai/pull/3133.clone()
unnecessarily by @Jeadie in https://github.com/spiceai/spiceai/pull/3128get_schema
to construct logical plan and return that schema. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3131spice-postgres-tpcds-bench
image by @sgrebnov in https://github.com/spiceai/spiceai/pull/3140doRuntimeApiRequest
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3157-build.{GIT_SHA}
for unreleased versions by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3159hive_infer_partitions
by default by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3160Full Changelog: https://github.com/spiceai/spiceai/compare/v0.19.1-beta...v0.19.2-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.19.1-beta π₯
Spice v0.19.1 brings further performance and stability improvements to data connectors, including improved query push-down for file-based connectors (s3
, abfs
, file
, ftp
, sftp
) that use Hive-style partitioning.
TPC-H and TPC-DS Coverage: Expanded coverage for TPC-H and TPC-DS benchmarking suites across accelerators and connectors.
GitHub Connector Array Filter: The GitHub connector now supports filter push down for the array_contains
function in SQL queries using search
query mode.
NSQL CLI Command: A new spice nsql
CLI command has been added to easily query datasets with natural language from the command line.
None
f22b96601891856e02a73d482cca4f6100137df8
.merge_group
checks for PR workflows by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3058/v1/sql/
and /v1/nsql
by @Jeadie in https://github.com/spiceai/spiceai/pull/3032sql_query_keep_partition_by_columns
& enable by default by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3065EXCEPT
, INTERSECT
, duplicate names by @sgrebnov in https://github.com/spiceai/spiceai/pull/3069hf_token
from params/secrets by @Jeadie in https://github.com/spiceai/spiceai/pull/3071hive_infer_partitions
to remaining object store connectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3086Full Changelog: https://github.com/spiceai/spiceai/compare/v0.19.0-beta...v0.19.1-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.19-beta π¦
Spice v0.19.0-beta brings performance improvements for accelerators and expanded TPC-DS coverage. A new Azure Blob Storage data connector has also been added.
Improved TPC-DS Coverage: Enhanced support for TPC-DS derived queries.
CLI SQL REPL: The CLI SQL REPL (spice sql
) now supports multi-line editing and tab indentation. Note, a terminating semi-colon ‘;’ is now required for each executed SQL block.
Azure Storage Data Connector: A new Azure Blob Storage data connector (abfs://
) has been added, enabling federated SQL queries on files stored in Azure Blob-compatible endpoints, including Azure BlobFS (abfss://
) and Azure Data Lake (adl://
). Supported file formats can be specified using the file_format
parameter.
Example spicepod.yml
:
datasets:
- from: abfs://foocontainer/taxi_sample.csv
name: azure_test
params:
azure_account: spiceadls
azure_access_key: abc123==
file_format: csv
For a full list of supported files, see the Object Store File Formats documentation.
For more details, see the Azure Blob Storage Data Connector documentation.
Spice.ai Data Connector: The key for the Spice.ai Cloud Platform Data Connector has changed from spiceai
to spice.ai
. To upgrade, change uses of from: spiceai:
to from: spice.ai:
.
GitHub Data Connector: Pull Requests column login
has been renamed to author
.
CLI SQL REPL: A terminating semi-colon ‘;’ is now required for each executed SQL block.
Spicepod Hot-Reload: When running spiced
directly, hot-reload of spicepod.yml configuration is now disabled. Run with spice run
to use hot-reload.
826814ab149aad8ee668454c83a0650fb8b18d60
.paths-ignore:
by @Jeadie in https://github.com/spiceai/spiceai/pull/2906spiceai
data connector to spice.ai
by @sgrebnov in https://github.com/spiceai/spiceai/pull/2899paths-ignore
for docs. by @Jeadie in https://github.com/spiceai/spiceai/pull/2911x-spiceai-app-id
metadata in spiceai data connector by @ewgenius in https://github.com/spiceai/spiceai/pull/2934params.file_format: md
. by @Jeadie in https://github.com/spiceai/spiceai/pull/2943--pods-watcher-enabled
. Watcher disabled by default for spiced. by @ewgenius in https://github.com/spiceai/spiceai/pull/2953mdx
file extensions to apply a markdown splitter by @ewgenius in https://github.com/spiceai/spiceai/pull/2977messages[*].tool_calls
for local models by @Jeadie in https://github.com/spiceai/spiceai/pull/2957round
for Postgres) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2984trace
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2995spice login
writes to .env.local
if present by @slyons in https://github.com/spiceai/spiceai/pull/2996Full Changelog: https://github.com/spiceai/spiceai/compare/v0.18.3-beta...v0.19.0-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.18.3-beta π οΈ
The Spice v0.18.3-beta release includes several quality-of-life improvements including verbosity flags for spiced
and the Spice CLI, vector search over larger documents with support for chunking dataset embeddings, and multiple performance enhancements. Additionally, the release includes several bug fixes, dependency updates, and optimizations, including updated table providers and significantly improved GitHub data connector performance for issues and pull requests.
GitHub Query Mode: A new github_query_mode: search
parameter has been added to the GitHub Data Connector, which uses the GitHub Search API to enable faster and more efficient query of issues and pull requests when using filters.
Example spicepod.yml
:
- from: github:github.com/spiceai/spiceai/issues/trunk
name: spiceai.issues
params:
github_query_mode: search # Use GitHub Search API
github_token: ${secrets:GITHUB_TOKEN}
Output Verbosity: Higher verbosity output levels can be specified through flags for both spiced
and the Spice CLI.
Example command line:
spice -v
spice --very-verbose
spiced -vv
spiced --verbose
Embedding Chunking: Chunking can be enabled and configured to preprocess input data before generating dataset embeddings. This improves the relevance and precision for larger pieces of content.
Example spicepod.yml
:
- name: support_tickets
embeddings:
- column: conversation_history
use: openai_embeddings
chunking:
enabled: true
target_chunk_size: 128
overlap_size: 16
trim_whitespace: true
For details, see the Search Documentation.
b0af91992699ecbf5adf2036a07122578f06150e
.-v
, -vv
, --verbose
, --very-verbose
. by @Jeadie in https://github.com/spiceai/spiceai/pull/2831spiceai
data connector to spice.ai
by @sgrebnov in https://github.com/spiceai/spiceai/pull/2680BytesProcessedRule
to be an optimizer rather than an analyzer rule by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2867spiceai
data connector to spice.ai
” by @sgrebnov in https://github.com/spiceai/spiceai/pull/2881no process-level CryptoProvider available
when using REPL and TLS by @sgrebnov in https://github.com/spiceai/spiceai/pull/2887log/slog
to spice CLI tool by @Jeadie in https://github.com/spiceai/spiceai/pull/2859Full Changelog: https://github.com/spiceai/spiceai/compare/v0.18.2-beta...v0.18.3-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.18.1-beta. ποΈ
The v0.18.1-beta release continues to improve runtime performance and reliability. Performance for accelerated queries joining multiple datasets has been significantly improved with join push-down support. GraphQL, MySQL, and SharePoint data connectors have better reliability and error handling, and a new Microsoft SQL Server data connector has been introduced. Task History now has fine-grained configuration, including the ability to disable the feature entirely. A new spice search
CLI command has been added, enabling development-time embeddings-based searches across datasets.
Join push-down for accelerations: Queries to the same accelerator will now push-down joins, significantly improving acceleration performance for queries joining multiple tables.
Microsoft SQL Server Data Connector: Use from: mssql:
to access and accelerate Microsoft SQL Server datasets.
Example spicepod.yml
:
datasets:
- from: mssql:path.to.my_dataset
name: my_dataset
params:
mssql_connection_string: ${secrets:mssql_connection_string}
See the Microsoft SQL Server Data Connector documentation.
Task History: Task History can be configured in the spicepod.yml
, including the ability to include, or truncate outputs such as the results of a SQL query.
Example spicepod.yml
:
runtime:
task_history:
enabled: true
captured_output: truncated
retention_period: 8h
retention_check_interval: 15m
See the Task History Spicepod reference for more information on possible values and behaviors.
Search CLI Command Use the spice search
CLI command to perform embeddings-based searches across search configure datasets. Note: Search requires the ai
feature to be installed.
Refresh on File Changes: File Data Connector data refreshes can be configured to be triggered when the source file is modified through a file system watcher. Enable the watcher by adding file_watcher: enabled
to the acceleration parameters.
Example spicepod.yml
:
datasets:
- from: file://path/to/my_file.csv
name: my_file
acceleration:
enabled: true
refresh_mode: full
params:
file_watcher: enabled
The Query History table runtime.query_history
has been deprecated and removed in favor of the Task History table runtime.task_history
. The Task History table tracks tasks across all features such as SQL query, vector search, and AI completion in a unified table.
See the Task History documentation.
json_pointer
and improve error messaging. by @Jeadie in https://github.com/spiceai/spiceai/pull/2713spice search
CLI command by @lukekim in https://github.com/spiceai/spiceai/pull/2739Full Changelog: https://github.com/spiceai/spiceai/compare/v0.18.0-beta...v0.18.1-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.18-beta.
The v0.18.0-beta release adds new Sharepoint and File data connectors, introduces AWS Identity and Access Management (IAM) support for the S3 Data Connector, improves performance of the GitHub connector, and increases the overall reliability of all data accelerators. The /ready
API endpoint was enhanced to report as ready only when all components, including loaded data, have successfully reported readiness.
Sharepoint Data Connector: Use from: sharepoint:
to access and accelerate documents stored in Microsoft 365 OneDrive for Business (Sharepoint). The CLI also includes a new spice login sharepoint
to aid in local development and testing.
Example spicepod.yml
:
datasets:
- from: sharepoint:drive:Documents/path:/important_documents/
name: important_documents
params:
sharepoint_client_id: ${secrets:SPICE_SHAREPOINT_CLIENT_ID}
sharepoint_tenant_id: ${secrets:SPICE_SHAREPOINT_TENANT_ID}
sharepoint_client_secret: ${secrets:SPICE_SHAREPOINT_CLIENT_SECRET}
See the Sharepoint Data Connector documentation.
AWS Identity and Access Management (IAM) for S3: A new s3_auth
parameter for the s3
data connector to configure the authentication method to use when connecting to S3. Supported values are public
, key
, and iam_role
. Use s3_auth: iam_role
to assume the instance IAM role.
Example spicepod.yml
:
datasets:
- from: s3://my-bucket
name: bucket
params:
s3_auth: iam_role # Assume IAM role of instance
See the S3 Data Connector documentation.
File Data Connector Use from: file:
to query files stored by locally accessible filesystems.
Example spicepod.yml
:
datasets:
- from: file://path/to/customer.parquet
name: customer
params:
file_format: parquet
See the File Data Connector documentation.
Improved /ready
Api Now includes the initial data load for accelerated datasets in addition to component readiness to ensure readiness is only reported when data has loaded and can be successfully queried.
GitHub Data Connector: The data type for time-related columns has changed from Utf8
to Timestamp
. To upgrade, data type references to timestamp. For example, if using time_format:
, change uses of time_format: ISO8601
to time_format: timestamp
.
Ready API: The /ready
API reports ready only when all components have reported ready and data is fully loaded. To upgrade, evaluate uses of the Ready API (such as Kubernetes readiness probes) and consider how it might affect system behavior.
No major dependencies updates.
refresh_mode: append
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2609/ready
to only report ready when components have all reported Ready by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2600s3_auth
parameter to configure IAM role authentication by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2611/ready
to only mark a dataset ready iff the initial refresh completed by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2630error decoding response body
GitHub file connector bug by @sgrebnov in https://github.com/spiceai/spiceai/pull/2645issues
data connector schema upfront by @sgrebnov in https://github.com/spiceai/spiceai/pull/2646spill_to_disk_and_rehydration
integration test by @sgrebnov in https://github.com/spiceai/spiceai/pull/2658Full Changelog: https://github.com/spiceai/spiceai/compare/v0.17.4-beta...v0.18.0-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.17.4-beta.
The v0.17.4-beta release adds compatibility, performance, and reliability improvements to the DuckDB and SQLite accelerators. The GitHub data connector adds a Stargazers table, Snowflake and Clickhouse data connectors have improved resiliency for empty tables, and core data processing and quality has been improved.
Improved benchmarking, testing, and robustness of data accelerators: Continued compatibility, performance, and reliability improvements for SQLite and DuckDB data accelerators and expanded performance and quality testing.
GitHub Stargazers: The GitHub Data Connector adds support for a /stargazers
table making it easy to query GitHub Stargazers using SQL!
None.
datafusion
(fixes subquery alias table unparsing for SQLite) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2532period +- jitter
by @Jeadie in https://github.com/spiceai/spiceai/pull/2534POST /v1/datasets/:name/acceleration/refresh
by @Jeadie in https://github.com/spiceai/spiceai/pull/2515RwLock
from EmbeddingModelStore
by @Jeadie in https://github.com/spiceai/spiceai/pull/2541refresh_data_window
by @ewgenius in https://github.com/spiceai/spiceai/pull/2578Full Changelog: https://github.com/spiceai/spiceai/compare/v0.17.3-beta...v0.17.4-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.17.3-beta.
The v0.17.3-beta release further improves data accelerator robustness and adds a new github
data connector that makes accelerating GitHub Issues, Pull Requests, Commits, and Blobs easy.
Improved benchmarking, testing, and robustness of data accelerators: Continued improvements to benchmarking and testing of data accelerators, leading to more robust and reliable data accelerators.
GitHub Connector (alpha): Connect to GitHub and accelerate Issues, Pull Requests, Commits, and Blobs.
datasets:
# Fetch all rust and golang files from spiceai/spiceai
- from: github:github.com/spiceai/spiceai/files/trunk
name: spiceai.files
params:
include: '**/*.rs; **/*.go'
github_token: ${secrets:GITHUB_TOKEN}
# Fetch all issues from spiceai/spiceai. Similar for pull requests, commits, and more.
- from: github:github.com/spiceai/spiceai/issues
name: spiceai.issues
params:
github_token: ${secrets:GITHUB_TOKEN}
None.
spice upgrade
docker pull spiceai/spiceai:latest
spiceai/spiceai:latest
or spiceai/spiceai:0.17.3-beta
delta_kernel
from 0.2.0 to 0.3.0.files
support (basic fields) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2393--force
flag to spice install
to force it to install the latest released version by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2395spice chat
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2396include
param support to GitHub Data Connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2397content
column to GitHub Connector when dataset is accelerated by @sgrebnov in https://github.com/spiceai/spiceai/pull/2400crates/llms/src/chat/
by @Jeadie in https://github.com/spiceai/spiceai/pull/2439spice chat
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2442labels
and hashes
to primitive arrays by @sgrebnov in https://github.com/spiceai/spiceai/pull/2452datafusion
version to the latest by @sgrebnov in https://github.com/spiceai/spiceai/pull/2456/
for S3 data connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2458accelerated_refresh
to task_history
table by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2459assignees
and labels
fields to github issues and github pulls datasets by @ewgenius in https://github.com/spiceai/spiceai/pull/2467updatedAt
field to GitHub connector by @ewgenius in https://github.com/spiceai/spiceai/pull/2474updated_at
by @lukekim in https://github.com/spiceai/spiceai/pull/2479Full Changelog: https://github.com/spiceai/spiceai/compare/v0.17.2-beta...v0.17.3-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.17.2-beta π
The v0.17.2-beta release focuses on improving data accelerator compatibility, stability, and performance. Expanded data type support for DuckDB, SQLite, and PostgreSQL data accelerators (and data connectors) enables significantly more data types to be accelerated. Error handling and logging has also been improved along with several bugs.
Expanded Data Type Support for Data Accelerators: DuckDB, SQLite, and PostgreSQL Data Accelerators now support a wider range of data types, enabling acceleration of more diverse datasets.
Enhanced Error Handling and Logging: Improvements have been made to aid in troubleshooting and debugging.
Anonymous Usage Telemetry: Optional, anonymous, aggregated telemetry has been added to help improve Spice. This feature can be disabled. For details about collected data, see the telemetry documentation.
To opt out of telemetry:
Using the CLI flag:
spice run -- --telemetry-enabled false
Add configuration to spicepod.yaml
:
runtime:
telemetry:
enabled: false
Improved Benchmarking: A suite of performance benchmarking tests have been added to the project, helping to maintain and improve runtime performance; a top priority for the project.
None.
v0.17.2-beta
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2203retrieved_primary_keys
in v1/search by @Jeadie in https://github.com/spiceai/spiceai/pull/2176runtime.task_history
table for queries, and embeddings by @Jeadie in https://github.com/spiceai/spiceai/pull/2191metrics-rs
with OpenTelemetry Metrics by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2240description
field from spicepod.yaml and include in LLM context by @ewgenius in https://github.com/spiceai/spiceai/pull/2261connection_pool_size
in the Postgres Data Connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2251DocumentSimilarityTool
by @Jeadie in https://github.com/spiceai/spiceai/pull/2263runtime.metrics
table by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2296secrets.inject_secrets
when secret not found. by @Jeadie in https://github.com/spiceai/spiceai/pull/2306DataAccelerator::init()
for SQLite acceleration federation by @peasee in https://github.com/spiceai/spiceai/pull/2293disable_query_push_down
option to acceleration settings by @y-f-u in https://github.com/spiceai/spiceai/pull/2327v1/assist
by @Jeadie in https://github.com/spiceai/spiceai/pull/2312v1/search
: include WHERE condition, allow extra columns in projection. by @Jeadie in https://github.com/spiceai/spiceai/pull/2328task_history
nested spans by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2337bytes_processed
telemetry metric by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2343runtime.metrics
/Prometheus as well by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2352Full Changelog: https://github.com/spiceai/spiceai/compare/v0.17.1-beta...v0.17.2-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.17.1-beta minor release focuses on enhancing stability, performance, and usability. The Flight interface now supports the GetSchema
API and s3
, ftp
, sftp
, http
, https
, and databricks
data connectors have added support for a client_timeout
parameter.
Flight API GetSchema: The GetSchema
API is now supported by the Flight interface. The schema of a dataset can be retrieved using GetSchema
with the PATH
or CMD
FlightDescriptor types. The CMD
FlightDescriptor type is used to get the schema of an arbitrary SQL query as the CMD bytes. The PATH
FlightDescriptor type is used to retrieve the schema of a dataset.
Client Timeout: A client_timeout
parameter has been added for Data Connectors: ftp
, sftp
, http
, https
, and databricks
. When defined, the client timeout configures Spice to stop waiting for a response from the data source after the specified duration. The default timeout is 30 seconds.
datasets:
- from: ftp://remote-ftp-server.com/path/to/folder/
name: my_dataset
params:
file_format: csv
# Example client timeout
client_timeout: 30s
ftp_user: my-ftp-user
ftp_pass: ${secrets:my_ftp_password}
TLS is now required to be explicitly enabled. Enable TLS on the command line using --tls-enabled true
:
spice run -- --tls-enabled true --tls-certificate-file /path/to/cert.pem --tls-key-file /path/to/key.pem
Or in the spicepod.yml
with enabled: true
:
runtime:
tls:
# TLS explicitly enabled
enabled: true
certificate_file: /path/to/cert.pem
key_file: /path/to/key.pem
v1/models
by @Jeadie in https://github.com/spiceai/spiceai/pull/2152EmbeddingConnector
by @Jeadie in https://github.com/spiceai/spiceai/pull/2165CREATE TABLE...
and infer on first write by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2167GetSchema
API by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2169flightsubscriber
/flightpublisher
tools by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2194Full Changelog: https://github.com/spiceai/spiceai/compare/v0.17.0-beta...v0.17.1-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the first beta release of Spice.ai OSS! π
The core Spice runtime has graduated from alpha to beta! Components, such as Data Connectors and Models, follow independent release milestones. Data Connectors graduating from alpha
to beta
include databricks
, spiceai
, postgres
, s3
, odbc
, and mysql
. From beta to 1.0, project will be to on improving performance and scaling to larger datasets.
This release also includes enhanced security with Transport Layer Security (TLS) secured APIs, a new spice install
CLI command, and several performance and stability improvements.
Enable TLS using the --tls-certificate-file
and --tls-key-file
command-line flags:
spice run -- --tls-certificate-file /path/to/cert.pem --tls-key-file /path/to/key.pem
Or configure in the spicepod.yml
:
runtime:
tls:
certificate_file: /path/to/cert.pem
key_file: /path/to/key.pem
Get started with TLS by following the TLS Sample. For more details see the TLS Documentation.
spice install
: Running the spice install
CLI command will download and install the latest version of the runtime.spice install
Improved SQLite and DuckDB compatibility: The SQLite and DuckDB accelerators support more complex queries and additional data types.
Pass through arguments from spice run
to runtime: Arguments passed to spice run
are now passed through to the runtime.
Secrets replacement within connection strings: Secrets are now replaced within connection strings:
datasets:
- from: mysql:my_table
name: my_table
params:
mysql_connection_string: mysql://user:${secrets:mysql_pw}@localhost:3306/db
The odbc
data connector is now optional and has been removed from the released binaries. To use the odbc
data connector, use the official Spice Docker image or build the Spice runtime from source.
To build Spice from source with the odbc
feature:
cargo build --release --features odbc
To use the official Spice Docker image from DockerHub:
# Pull the latest official Spice image
docker pull spiceai/spiceai:latest
# Pull the official v0.17-beta Spice image
docker pull spiceai/spiceai:0.17.0-beta
unixodbc
for E2E test release installation by @peasee in https://github.com/spiceai/spiceai/pull/2063json_pointer
param optional for the GraphQL connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2072spice install
CLI command by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2090delta_kernel
to 0.2.0 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2102spice run
and spice sql
to runtime by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2123spice sql
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2125--tls
flag by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2128Full Changelog: https://github.com/spiceai/spiceai/compare/v0.16.0-alpha...v0.17-beta
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.16-alpha release is the first candidate release for the beta milestone on a path to finalizing the v1.0 developer and user experience. Upgraders should be aware of several breaking changes designed to improve the Secrets configuration experience and to make authoring spicepod.yml
files more consistent. See the [Breaking Changes](#Breaking Changes) section below for details. Additionally, the Spice Java SDK was released, providing Java developers a simple but powerful native experience to query Spice.
secrets
configuration in spicepod.yaml
:secrets:
- from: env
name: env
- from: aws_secrets_manager:my_secret_name
name: aws_secret
Secrets managed by configured Secret Stores can be referenced in component params
using the syntax ${<store_name>:<key>}
. E.g.
datasets:
- from: postgres:my_table
name: my_table
params:
pg_host: localhost
pg_port: 5432
pg_pass: ${ env:MY_PG_PASS }
Java Client SDK: The Spice Java SDK has been released for JDK 17 or greater.
Federated SQL Query: Significant stability and reliability improvements have been made to federated SQL query support in most data connectors.
ODBC Data Connector: Providing a specific SQL dialect to query ODBC data sources is now supported using the sql_dialect
param. For example, when querying Databricks using ODBC, the databricks
dialect can be specified to ensure compatibility. Read the ODBC Data Connector documentation for more details.
spicepod.yml
schema. File based secrets stored in the ~/.spice/auth
file are no longer supported. See Secret Stores Documentation for full reference.To upgrade Secret Stores, rename any parameters ending in _key
to remove the _key
suffix and specify a secret inline via the secret replacement syntax (${<secret_store>:<key>}
):
datasets:
- from: postgres:my_table
name: my_table
params:
pg_host: localhost
pg_port: 5432
pg_pass_key: my_pg_pass
to:
datasets:
- from: postgres:my_table
name: my_table
params:
pg_host: localhost
pg_port: 5432
pg_pass: ${secrets:my_pg_pass}
And ensure the MY_PG_PASS
environment variable is set.
time_format
has changed from unix_seconds
to timestamp
.To upgrade:
datasets:
- from:
name: my_dataset
# Explicitly define format when not specified.
time_format: unix_seconds
3000
to port 8090
to avoid conflicting with frontend apps which typically use the 3000 range. If an SDK is used, upgrade it at the same time as the runtime.To upgrade and continue using port 3000, run spiced with the --http
command line argument:
# Using Dockerfile or spiced directly
spiced --http 127.0.0.1:3000
9000
to 9090
to avoid conflicting with other metrics protocols which typically use port 9000.To upgrade and continue using port 9000, run spiced with the metrics command line argument:
# Using Dockerfile or spiced directly
spiced --metrics 127.0.0.1:9000
json_path
has been replaced with json_pointer
to access nested data from the result of the GraphQL query. See the GraphQL Data Connector documentation for full details and RFC-6901 - JSON Pointer.To upgrade, change:
json_path: my.json.path
To:
json_pointer: /my/json/pointer
params
parameters. Prefixed parameter names helps ensure parameters do not collide.For example, the Databricks data connector specific params are now prefixed with databricks
:
datasets:
- from: databricks:spiceai.datasets.my_awesome_table # A reference to a table in the Databricks unity catalog
name: my_delta_lake_table
params:
mode: spark_connect
endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com
token: MY_TOKEN
To upgrade:
datasets:
# Example for Spark Connect
- from: databricks:spiceai.datasets.my_awesome_table # A reference to a table in the Databricks unity catalog
name: my_delta_lake_table
params:
mode: spark_connect
databricks_endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com # Now prefixed with databricks
databricks_token: ${secrets:my_token} # Now prefixed with databricks
Refer to the Data Connector documentation for parameter naming changes in this release.
Clickhouse Data Connector: The clickhouse_connection_timeout
parameter has been renamed to connection_timeout
as it applies to the client and is not Clickhouse configuration itself.
To upgrade, change:
clickhouse_connection_timeout: time
To:
connection_timeout: time
No major dependency updates.
spice chat
command, to interact with deployed spiced instance in spice.ai cloud by @ewgenius in https://github.com/spiceai/spiceai/pull/1990/v1/chat/completions
with streaming in spice chat
cli command by @ewgenius in https://github.com/spiceai/spiceai/pull/1998spice chat
command, add --model
flag by @ewgenius in https://github.com/spiceai/spiceai/pull/2007${ <secret>:<key> }
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2026connector
and runtime
categories by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2028dataset configure
endpoint param by @sgrebnov in https://github.com/spiceai/spiceai/pull/2052Full Changelog: https://github.com/spiceai/spiceai/compare/v0.15.2-alpha...v0.16.0-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.15.2-alpha minor release focuses on enhancing stability, performance, and introduces Catalog Providers for streamlined access to Data Catalog tables. Unity Catalog, Databricks Unity Catalog, and the Spice.ai Cloud Platform Catalog are supported in v0.15.2-alpha. The reliability of federated query push-down has also been improved for the MySQL, PostgreSQL, ODBC, S3, Databricks, and Spice.ai Cloud Platform data connectors.
Catalog Providers: Catalog Providers streamline access to Data Catalog tables. Initial catalog providers supported are Databricks Unity Catalog, Unity Catalog and Spice.ai Cloud Platform Catalog.
For example, to configure Spice to connect to tpch
tables in the Spice.ai Cloud Platform Catalog use the new catalogs:
section in the spicepod.yml
:
catalogs:
- name: spiceai
from: spiceai
include:
- tpch.*
sql> show tables
+---------------+--------------+---------------+------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------+---------------+------------+
| spiceai | tpch | region | BASE TABLE |
| spiceai | tpch | part | BASE TABLE |
| spiceai | tpch | customer | BASE TABLE |
| spiceai | tpch | lineitem | BASE TABLE |
| spiceai | tpch | partsupp | BASE TABLE |
| spiceai | tpch | supplier | BASE TABLE |
| spiceai | tpch | nation | BASE TABLE |
| spiceai | tpch | orders | BASE TABLE |
| spice | runtime | query_history | BASE TABLE |
+---------------+--------------+---------------+------------+
Time: 0.001866958 seconds. 9 rows.
ODBC Data Connector Push-Down: The ODBC Data Connector now supports query push-down for joins, improving performance for joined datasets configured with the same odbc_connection_string
.
Improved Spicepod Validation Improved spicepod.yml
validation has been added, including warnings when loading resources with duplicate names (datasets
, views
, models
, embeddings
).
None.
catalog
from Spicepod. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1903Runtime
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1906spice.ai
CatalogProvider
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1925UnityCatalog
catalog provider by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1940Databricks
catalog provider by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1941params
into dataset_params
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1947Full Changelog: https://github.com/spiceai/spiceai/compare/v0.15.1-alpha...v0.15.2-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.15.1-alpha minor release focuses on enhancing stability, performance, and usability. Memory usage has been significantly improved for the postgres
and duckdb
acceleration engines which now use stream processing. A new Delta Lake Data Connector has been added, sharing a delta-kernel-rs based implementation with the Databricks Data Connector supporting deletion vectors.
Improved memory usage for PostgreSQL and DuckDB acceleration engines: Large dataset acceleration with PostgreSQL and DuckDB engines has reduced memory consumption by streaming data directly to the accelerated table as it is read from the source.
Delta Lake Data Connector: A new Delta Lake Data Connector has been added for using Delta Lake outside of Databricks.
ODBC Data Connector Streaming: The ODBC Data Connector now streams results, reducing memory usage, and improving performance.
GraphQL Object Unnesting: The GraphQL Data Connector can automatically unnest objects from GraphQL queries using the unnest_depth
parameter.
None.
The MySQL, PostgreSQL, SQLite and DuckDB DataFusion TableProviders developed by Spice AI have been donated to the datafusion-contrib/datafusion-table-providers community repository.
From the v0.15.1-alpha release, a new dependency is taken on datafusion-contrib/datafusion-table-providers
datafusion-table-providers
crate by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1873delta-rs
with delta-kernel-rs
and add new delta
data connector. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1878delta
tables by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1891delta
to delta_lake
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1892Full Changelog: https://github.com/spiceai/spiceai/compare/v0.15.0-alpha...v0.15.1-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.15-alpha release introduces support for streaming databases changes with Change Data Capture (CDC) into accelerated tables via a new Debezium connector, configurable retry logic for data refresh, and the release of a new C# SDK to build with Spice in Dotnet.
Debezium data connector with Change Data Capture (CDC): Sync accelerated datasets with Debezium data sources over Kafka in real-time.
Data Refresh Retries: By default, accelerated datasets attempt to retry data refreshes on transient errors. This behavior can be configured using refresh_retry_enabled
and refresh_retry_max_attempts
.
C# Client SDK: A new C# Client SDK has been released for developing applications in Dotnet.
Integrating Debezium CDC is straightforward. Get started with the Debezium CDC Sample, read more about CDC in Spice, and read the Debezium data connector documentation.
Example Spicepod using Debezium CDC:
datasets:
- from: debezium:cdc.public.customer_addresses
name: customer_addresses_cdc
params:
debezium_transport: kafka
debezium_message_format: json
kafka_bootstrap_servers: localhost:19092
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_mode: changes
Example Spicepod configuration limiting refresh retries to a maximum of 10 attempts:
datasets:
- from: eth.blocks
name: blocks
acceleration:
refresh_retry_enabled: true
refresh_retry_max_attempts: 10
refresh_check_interval: 30s
None.
No major dependency updates.
feature--
branches by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1788symlink
-> symlink_file
. by @Jeadie in https://github.com/spiceai/spiceai/pull/1793Unsupported DataType: conversion
for time predicates by @sgrebnov in https://github.com/spiceai/spiceai/pull/1795clippy::module_name_repetitions
lint by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1812v1/search
that performs vector search. by @Jeadie in https://github.com/spiceai/spiceai/pull/1836embeddings
with models
by @Jeadie in https://github.com/spiceai/spiceai/pull/1829"cmake-build"
feature to rdkafka
for windows by @Jeadie in https://github.com/spiceai/spiceai/pull/1840Full Changelog: https://github.com/spiceai/spiceai/compare/v0.14.1-alpha...v0.15.0-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.14.1-alpha release is focused on quality, stability, and type support with improvements in PostgreSQL, DuckDB, and GraphQL data connectors.
None.
No major dependency updates.
spiceai/async-openai
to solve Deserialize
issue in v1/embed
by @Jeadie in https://github.com/spiceai/spiceai/pull/1707v1/assist
into a VectorSearch
struct by @Jeadie in https://github.com/spiceai/spiceai/pull/1699spiceai/duckdb-rs
, support LargeUTF8 by @Jeadie in https://github.com/spiceai/spiceai/pull/1746tonic::async_trait
-> async_trait::async_trait
by @Jeadie in https://github.com/spiceai/spiceai/pull/1757Full Changelog: https://github.com/spiceai/spiceai/compare/v0.14.0-alpha...v0.14.1-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.14-alpha release focuses on enhancing accelerated dataset performance and data integrity, with support for configuring primary keys and indexes. Additionally, the GraphQL data connector been introduced, along with improved dataset registration and loading error information.
Accelerated Datasets: Ensure data integrity using primary key and unique index constraints. Configure conflict handling to either upsert new data or drop it. Create indexes on frequently filtered columns for faster queries on larger datasets.
GraphQL Data Connector: Initial support for using GraphQL as a data source.
Example Spicepod showing how to use primary keys and indexes with accelerated datasets:
datasets:
- from: eth.blocks
name: blocks
acceleration:
engine: duckdb # Use DuckDB acceleration engine
primary_key: '(hash, timestamp)'
indexes:
number: enabled # same as `CREATE INDEX ON blocks (number);`
'(number, hash)': unique # same as `CREATE UNIQUE INDEX ON blocks (number, hash);`
on_conflict:
'(hash, number)': drop # possible values: drop (default), upsert
'(hash, timestamp)': upsert
Primary Keys, constraints, and indexes are currently supported when using SQLite, DuckDB, and PostgreSQL acceleration engines.
Learn more with the indexing quickstart and the primary key sample.
Read the Local Acceleration documentation.
None.
runtime.metrics
table by @ewgenius in https://github.com/spiceai/spiceai/pull/1678runtime.metrics
by @ewgenius in https://github.com/spiceai/spiceai/pull/1681labels
to properties
and make it nullable by @ewgenius in https://github.com/spiceai/spiceai/pull/1686tpch_q7
, tpch_q8
, tpch_q9
, tpch_q14
by @sgrebnov in https://github.com/spiceai/spiceai/pull/1683v1/assist
by @Jeadie in https://github.com/spiceai/spiceai/pull/1653primary_key
in Spicepod and create in accelerated table by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1687ArrayDistance
scalar UDF by @Jeadie in https://github.com/spiceai/spiceai/pull/1697on_conflict
behavior for accelerated tables with constraints by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1688Full Changelog: https://github.com/spiceai/spiceai/compare/v0.13.3-alpha...v0.14.0-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.13.3-alpha release is focused on quality and stability with improvements to metrics, telemetry, and operability.
Ready API: - Add /v1/ready
API that returns success once all datasets and models are loaded and ready.
Enhanced Grafana dashboard: The dashboard now includes charts for query duration and failures, the last update time of accelerated datasets, the count of refresh errors, and the last successful time the runtime was able to access federated datasets
array_distance
as euclidean distance between Float32[] by @Jeadie in https://github.com/spiceai/spiceai/pull/1601crates/runtime/src/http/v1/
by @Jeadie in https://github.com/spiceai/spiceai/pull/1619/v1/ready
API that returns 200 when all datasets have loaded by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1629v1/assist
response and panic bug. Include primary keys in response too by @Jeadie in https://github.com/spiceai/spiceai/pull/1635err_code
to query_failures
metric by @sgrebnov in https://github.com/spiceai/spiceai/pull/1639ObjectStoreMetadataTable
& ObjectStoreTextTable
by @Jeadie in https://github.com/spiceai/spiceai/pull/1649v1/assist
by @Jeadie in https://github.com/spiceai/spiceai/pull/1648Time Since Offline
chart to Grafana dashboard by @sgrebnov in https://github.com/spiceai/spiceai/pull/1664Full Changelog: https://github.com/spiceai/spiceai/compare/v0.13.2-alpha...v0.13.3-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.13.2-alpha release is focused on quality and stability with improvements to federated query push-down, telemetry, and query history.
Filesystem Data Connector: Adds the Filesystem Data Connector for directly using files as data sources.
Federated Query Push-Down: Improved stability and schema compatibility for federated queries.
Enhanced Telemetry: Runtime Metrics now include last update time for accelerated datasets, count of refresh errors, and new metrics for query duration and failures.
Query History: Enabled query history logging for Arrow Flight queries in addition to HTTP queries.
spice_cloud
- connect to cloud api by @ewgenius in https://github.com/spiceai/spiceai/pull/1523llm
UX in spicepod.yaml
by @Jeadie in https://github.com/spiceai/spiceai/pull/1545runtime.metrics
schema, if remote (spiceai) data connector provided by @ewgenius in https://github.com/spiceai/spiceai/pull/1554object_store
table provider for UTF8 data formats by @Jeadie in https://github.com/spiceai/spiceai/pull/1562query_duration_seconds
and query_failures
metrics by @sgrebnov in https://github.com/spiceai/spiceai/pull/1575/app
as a default workdir in spiceai docker image by @ewgenius in https://github.com/spiceai/spiceai/pull/1586EmbeddingConnector
by @Jeadie in https://github.com/spiceai/spiceai/pull/1592Full Changelog: https://github.com/spiceai/spiceai/compare/v0.13.1-alpha...v0.13.2
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.13.1-alpha release of Spice is a minor update focused on stability, quality, and operability. Query result caching provides protection against bursts of queries and schema support for datasets has been added logical grouping. An issue where Refresh SQL predicates were not pushed down underlying data sources has been resolved along with improved Acceleration Refresh logging.
Results Caching: Introduced query results caching to handle bursts of requests and support caching of non-accelerated results, such as refresh data returned on zero results. Results caching is enabled by default with a 1s
item time-to-live (TTL). Learn more.
Query History Logging: Recent queries are now logged in the new spice.runtime.query_history
dataset with a default retention of 24-hours. Query history is initially enabled for HTTP queries only (not Arrow Flight queries).
Dataset Schemas: Added support for dataset schemas, allowing logical grouping of datasets by separating the schema name from the table name with a .
. E.g.
datasets:
- from: mysql:app1.identities
name: app.users
- from: postgres:app2.purchases
name: app.purchases
In this example, queries against app.users
will be federated to my_schema.my_table
, and app.purchases
will be federated to app2.purchases
.
@y-f-u @Jeadie @sgrebnov @ewgenius @phillipleblanc @lukekim @gloomweaver @Sevenannn
file_format
parameter required for S3/FTP/SFTP connector by @ewgenius in https://github.com/spiceai/spiceai/pull/1455file_format
from dataset path by @ewgenius in https://github.com/spiceai/spiceai/pull/1489file_format
to helm chart sample dataset by @ewgenius in https://github.com/spiceai/spiceai/pull/1493file_format
prompt for s3 and ftp datasets in Dataset Configure CLI if no extension detected by @ewgenius in https://github.com/spiceai/spiceai/pull/1494runtime
schema by @ewgenius in https://github.com/spiceai/spiceai/pull/1524Full Changelog: https://github.com/spiceai/spiceai/compare/v0.13.0-alpha...v0.13.1-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.13.0-alpha release significantly improves federated query performance and efficiency with Query Push-Down. Query push-down allows SQL queries to be directly executed by underlying data sources, such as joining tables using the same data connector. Query push-down is supported for all SQL-based and Arrow Flight data connectors. Additionally, runtime metrics, including query duration, collected and accessed in the spice.runtime.metrics
table. This release also includes a new FTP/SFTP data connector and improved CSV support for the S3 data connector.
Federated Query Push-Down (#1394): All SQL and Arrow Flight data connectors support federated query push-down.
Runtime Metrics (#1361): Runtime metric collection can be enabled using the --metrics
flag and accessed by the spice.runtime.metrics
table.
FTP & SFTP data connector (#1355) (#1399): Added support for using FTP and SFTP as data sources.
Improved CSV support (#1411) (#1414): S3/FTP/SFTP data connectors support CSV files with expanded CSV options.
release
cargo feature to docker builds by @ewgenius in https://github.com/spiceai/spiceai/pull/1377spice.runtime.metrics
table by @ewgenius in https://github.com/spiceai/spiceai/pull/1361runtime.metrics
table by @ewgenius in https://github.com/spiceai/spiceai/pull/1408Full Changelog: https://github.com/spiceai/spiceai/compare/v0.12.2-alpha...v0.13.0-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.12.2-alpha release introduces data streaming and key-pair authentication for the Snowflake data connector, enables general append
mode data refreshes for time-series data, improves connectivity error messages, adds nested folders support for the S3 data connector, and exposes nodeSelector and affinity keys in the Helm chart for better Kubernetes management.
Improved Connectivity Error Messages: Error messages provide clearer, actionable guidance for misconfigured settings or unreachable data connectors.
Snowflake Data Connector Improvements: Enables data streaming by default and adds support for key-pair authentication in addition to passwords.
API for Refresh SQL Updates: Update dataset Refresh SQL via API.
Append Data Refresh: Append mode data refreshes for time-series data are now supported for all data connectors. Specify a dataset time_column
with refresh_mode: append
to only fetch data more recent than the latest local data.
Docker Image Update: The spiceai/spiceai:latest
Docker image now includes the ODBC data connector. For a smaller footprint, use spiceai/spiceai:latest-slim
.
Helm Chart Improvements: nodeSelector
and affinity
keys are now supported in the Helm chart for improved Kubernetes deployment management.
POST /v1/datasets/:name/refresh
to POST /v1/datasets/:name/acceleration/refresh
to be consistent with the Spicepod.yaml
structure.release
feature in docker image by @ewgenius in https://github.com/spiceai/spiceai/pull/1324DataConnectorResult
and DataConnectorError
by @ewgenius in https://github.com/spiceai/spiceai/pull/1339Full Changelog: https://github.com/spiceai/spiceai/compare/v0.12.1-alpha...v0.12.2-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.12.1-alpha release introduces a new Snowflake data connector, support for UUID and TimestampTZ types in the PostgreSQL connector, and improved error messages across all data connectors. The Clickhouse data connector enables data streaming by default. The public SQL interface now restricts DML and DDL queries. Additionally, accelerated tables now fully support NULL values, and issues with schema conversion in these tables have been resolved.
Snowflake Data Connector: Initial support for Snowflake as a data source.
Clickhouse Data Streaming: Enables data streaming by default, eliminating in-memory result collection.
Read-only SQL Interface: Disables DML (INSERT/UPDATE/DELETE) and DDL (CREATE/ALTER TABLE) queries for improved data source security.
Error Message Improvements: Improved the error messages for commonly encountered issues with data connectors.
Accelerated Tables: Supports NULL values across all data types and fixes schema conversion errors for consistent type handling.
GITHUB_TOKEN
environment variable in the installation script, if available, to avoid rate limiting in CI workflows by @ewgenius in https://github.com/spiceai/spiceai/pull/1302spice login spark
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1303Full Changelog: https://github.com/spiceai/spiceai/compare/v0.12.0-alpha...v0.12.1-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.12-alpha release introduces Clickhouse and Apache Spark data connectors, adds support for limiting refresh data periods for temporal datasets, and includes upgraded Spice Client SDKs compatible with Spice OSS.
Clickhouse data connector: Use Clickhouse as a data source with the clickhouse:
scheme.
Apache Spark Connect data connector: Use Apache Spark Connect connections as a data source using the spark:
scheme.
Refresh data window: Limit accelerated dataset data refreshes to the specified window, as a duration from now configuration setting, for faster and more efficient refreshes.
ODBC data connector: Use ODBC connections as a data source using the odbc:
scheme. The ODBC data connector is currently optional and not included in default builds. It can be conditionally compiled using the odbc
cargo feature when building from source.
Spice Client SDK Support: The official Spice SDKs have been upgraded with support for Spice OSS.
refresh_interval
acceleration setting and been changed to refresh_check_interval
to make it clearer it is the check versus the data interval.SELECT count(*)
for Sqlite Data Accelerator by @sgrebnov in https://github.com/spiceai/spiceai/pull/1166show tables
in Spice SQL & update next version to v0.12.0-alpha
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1206Full Changelog: https://github.com/spiceai/spiceai/compare/v0.11.1-alpha...v0.12.0-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The v0.11.1-alpha release introduces retention policies for accelerated datasets, native Windows installation support, and integration of catalog and schema settings for the Databricks Spark connector. Several bugs have also been fixed for improved stability.
Retention Policies for Accelerated Datasets: Automatic eviction of data from accelerated time-series datasets when a specified temporal column exceeds the retention period, optimizing resource utilization.
Windows Installation Support: Native Windows installation support, including upgrades.
Databricks Spark Connect Catalog and Schema Settings: Improved translation between DataFusion and Spark, providing better Spark Catalog support.
refresh_sql
and manual refresh to e2e tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/1125spice dataset configure
by @ewgenius in https://github.com/spiceai/spiceai/pull/1140spice upgrade
on Windows by @sgrebnov in https://github.com/spiceai/spiceai/pull/1155Full Changelog: https://github.com/spiceai/spiceai/compare/v0.11.0-alpha...v0.11.1-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
The Spice v0.11-alpha release significantly improves the Databricks data connector with Databricks Connect (Spark Connect) support, adds the DuckDB data connector, and adds the AWS Secrets Manager secret store. In addition, enhanced control over accelerated dataset refreshes, improved SSL security for MySQL and PostgreSQL connections, and overall stability improvements have been added.
DuckDB data connector: Use DuckDB databases or connections as a data source.
AWS Secrets Manager Secret Store: Use AWS Secrets Managers as a secret store.
Custom Refresh SQL: Specify a custom SQL query for dataset refresh using refresh_sql
.
Dataset Refresh API: Trigger a dataset refresh using the new CLI command spice refresh
or via API.
Expanded SSL support for Postgres: SSL mode now supports disable
, require
, prefer
, verify-ca
, verify-full
options with the default mode changed to require
. Added pg_sslrootcert
parameter for setting a custom root certificate and the pg_insecure
parameter is no longer supported.
Databricks Connect: Choose between using Spark Connect or Delta Lake when using the Databricks data connector for improved performance.
Improved SSL support for Postgres: ssl mode now supports disable
, require
, prefer
, verify-ca
, verify-full
options with default mode changed to require
.
Added pg_sslrootcert
parameter to allow setting custom root cert for postgres connector, pg_insecure
parameter is no longer supported as redundant.
Internal architecture refactor: The internal architecture of spiced
was refactored to simplify the creation data components and to improve alignment with DataFusion concepts.
@edmondop’s first contribution github.com/spiceai/spiceai/pull/1110!
NULL
values by @gloomweaver in https://github.com/spiceai/spiceai/pull/1067NULL
values for NUMERIC
by @gloomweaver in https://github.com/spiceai/spiceai/pull/1068spice refresh
CLI command for dataset refresh by @sgrebnov in https://github.com/spiceai/spiceai/pull/1112TEXT
and DECIMAL
types support and properly handling NULL
for MySQL by @gloomweaver in https://github.com/spiceai/spiceai/pull/1067DATE
and TINYINT
types support for MySQL by @ewgenius in https://github.com/spiceai/spiceai/pull/1065ssl_rootcert_path
parameter for MySql data connector by @ewgenius in https://github.com/spiceai/spiceai/pull/1079LargeUtf8
support and explicitly passing the schema to data accelerator SqlTable
by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1077pg_insecure
parameter support from Postgres by ewgenius in https://github.com/spiceai/spiceai/pull/1081Full Changelog: https://github.com/spiceai/spiceai/compare/v0.10.2-alpha...v0.11.0-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.10.2-alpha (Apr 9, 2024)! π₯
The v0.10.2-alpha release adds the MySQL data connector and makes external data connections more robust on initialization.
MySQL data connector: Connect to any MySQL server, including SSL support.
Data connections verified at initialization: Verify endpoints and authorization for external data connections (e.g. databricks, spice.ai) at initialization.
show tables;
parsing in the Spice SQL repl.lookback_size
(& improve SpiceAI’s ModelSource) by @Jeadie in https://github.com/spiceai/spiceai/pull/1016Full Changelog: https://github.com/spiceai/spiceai/compare/v0.10.1-alpha...v0.10.2-alpha
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.10.1-alpha! π₯
The v0.10.1-alpha release focuses on stability, bug fixes, and usability by improving error messages when using SQLite data accelerators, improving the PostgreSQL support, and adding a basic Helm chart.
Improved PostgreSQL support for Data Connectors TLS is now supported with PostgreSQL Data Connectors and there is improved VARCHAR and BPCHAR conversions through Spice.
Improved Error messages Simplified error messages from Spice when propagating errors from Data Connectors and Accelerator Engines.
Spice Pods Command The spice pods
command can give you quick statistics about models, dependencies, and datasets that are loaded by the Spice runtime.
Spice.ai can be deployed to Kubernetes using Helm. Here’s a quick guide to get started:
Step 1. (Optional) Start a local kind
cluster:
go install sigs.k8s.io/[email protected]
kind create cluster
Step 2. Install Spice in your Kubernetes cluster using Helm:
helm repo add spiceai https://helm.spiceai.org
helm install spiceai spiceai/spiceai
Step 3. Verify that the Spice pods are running:
kubectl get pods
kubectl logs deploy/spiceai
Step 4. Run the Spice SQL REPL inside the running pod:
kubectl exec -it deploy/spiceai -- spiced --repl
Learn more about deploying Spice.ai to Kubernetes
spice login
in environments with no browser. (https://github.com/spiceai/spiceai/pull/994)spice pods
Returns incorrect counts. (https://github.com/spiceai/spiceai/pull/998)Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
TL;DR: We’ve rebuilt Spice.ai OSS from the ground up in Rust, as a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets sourced from any database, data warehouse or data lake. Learn more at github.com/spiceai/spiceai.
In September, 2021, we introduced Spice.ai OSS as a runtime for building AI-driven applications using time-series data.
We quickly ran into a big problems in making these applications work… data, the fuel for intelligent software, was painfully difficult to access, operationalize, and use, not only in machine learning, but also in web frontends, backend applications, dashboards, data pipelines, and notebooks. And we had to make hard tradeoffs between cost and query performance.
We felt this pain every day building 100TB+ scale data and AI systems for the Spice.ai Cloud Platform. So we took our learnings and infused them back into Spice.ai OSS with the capabilities we wished we had.
We rebuilt Spice.ai OSS from the ground up in Rust, as a unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse or data lake.
Spice is a fast, lightweight (< 150Mb), single-binary, designed to be deployed alongside your application, dashboard, and within your data or machine learning pipelines. Spice federates SQL query across databases (MySQL, PostgreSQL, etc.), data warehouses (Snowflake, BigQuery, etc.) and data lakes (S3, MinIO, Databricks, etc.) so you can easily use and combine data wherever it lives. Datasets, declaratively defined, can be materialized and accelerated using your engine of choice, including DuckDB, SQLite, PostgreSQL, and in-memory Apache Arrow records, for ultra-fast, low-latency query. Accelerated engines run in your infrastructure giving you flexibility and control over price and performance.
The next-generation of Spice.ai OSS enables:
Better applications. Accelerate and co-locate data with frontend and backend applications, for high concurrent queries, serving more users with faster page loads and data updates. Try the CQRS sample app.
Snappy dashboards, analytics, and BI. Faster, more responsive dashboards without massive compute costs. Spice supports Arrow Flight SQL (JDBC/ODBC/ADBC) for connectivity with Tableau, Looker, PowerBI, and more. Watch the Apache Superset with Spice demo.
Faster data pipelines, machine learning training and inference. Co-locate datasets with pipelines where the data is needed to minimize data-movement and improve query performance. Predict hard drive failure with the SMART data demo.
Easily query many data sources. Federated SQL query across databases, data warehouses, and data lakes using Data Connectors.
Spice is open-source, Apache 2.0 licensed, and is built using industry-leading technologies including Apache DataFusion, Arrow, and Arrow Flight SQL. We’re launching with several built-in Data Connectors and Accelerators and Spice is extensible so more will be added in each release. If you’re interested in contributing, we’d love to welcome you to the community!
You can download and run Spice in less than 30 seconds by following the quickstart at github.com/spiceai/spiceai.
Spice, rebuilt in Rust, introduces a unified SQL query interface, making it simpler and faster to build data-driven applications. The lightweight Spice runtime is easy to deploy and makes it possible to materialize and query data from any source quickly and cost-effectively. Applications can serve more users, dashboards and analytics can be snappier, and data and ML pipelines finish faster, without the heavy lifting of managing data.
For developers this translates to less time wrangling data and more time creating innovative applications and business value.
Check out and star the project on GitHub!
Thank you,
Phillip
Announcing the release of Spice v0.10-alpha! π§ββοΈ
The Spice.ai v0.10-alpha release focused on additions and updates to improve stability, usability, and the overall Spice developer experience.
Public Bucket Support for S3 Data Connector: The S3 Data Connector now supports public buckets in addition to buckets requiring an access id and key.
JDBC-Client Connectivity: Improved connectivity for JDBC clients, like Tableau.
User Experience Improvements:
spice login postgres
command, streamlining the process for connecting to PostgreSQL databases.Grafana Dashboard: Improving the ability to monitor Spice deployments, a standard Grafana dashboard is now available.
spice login postgres
commandspice status
with dataset metricsshow tables
outputSpice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.9.1-alpha! π§ββοΈ
The v0.9.1 release focused on stability, bug fixes, and usability by adding spice
CLI commands for listing Spicepods (spice pods
), Models (spice models
), Datasets (spice datasets
), and improved status (spice status
) details. In addition, the Arrow Flight SQL (flightsql
) data connector and SQLite (sqlite
) data store were added.
FlightSQL data connector: Arrow Flight SQL can now be used as a connector for federated SQL query.
SQLite data backend: SQLite can now be used as a data store for acceleration.
flightsql
).sqlite
).spice pods
, spice status
, spice datasets
, and spice models
CLI commands.GET /v1/spicepods
API for listing loaded Spicepods.spiced
Docker CI build and release.linux/arm64
binary build.spice sql
REPL panics when query result is too large. (https://github.com/spiceai/spiceai/pull/875)--access-secret
in spice s3 login
. (https://github.com/spiceai/spiceai/pull/894)Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.9-alpha! π§ββοΈ
The v0.9 release adds several data connectors including the Spice data connector for the ability to connect to other Spice instances. Improved observability for the Spice runtime has been added with the new /metrics
endpoint for monitoring deployed instances.
Arrow Flight SQL endpoint: The Arrow Flight endpoint now supports Flight SQL, including JDBC, ODBC, and ADBC enabling database clients like DBeaver or BI applications like Tableau to connect to and query the Spice runtime.
Spice.ai data connector: Use other Spice runtime instances as data connectors for federated SQL query across Spice deployments and for chaining Spice runtimes.
Keyring secret store: Use the operating system native credential store, like macOS keychain for storing secrets used by the Spice runtime.
PostgreSQL data connector: PostgreSQL can now be used as both a data store for acceleration and as a connector for federated SQL query.
Databricks data connector: Databricks as a connector for federated SQL query across Delta Lake tables.
S3 data connector: S3 as a connector for federated SQL query across Parquet files stored in S3.
Metrics endpoint: Added new /metrics
endpoint for Spice runtime observability and monitoring with the following metrics:
- spiced_runtime_http_server_start counter
- spiced_runtime_flight_server_start counter
- datasets_count gauge
- load_dataset summary
- load_secrets summary
- datasets/load_error counter
- datasets/count counter
- models/load_error counter
- models/count counter
keyring
).postgres
).spiceai
).databricks
) - Delta Lake support.s3
) - Parquet support./v1/models
API./v1/status
API./metrics
API.Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.8-alpha! πΉ
This is a minor release that builds on the new Rust-based runtime, adding stability and a preview of new features for the first major release.
Secrets management: Spice 0.8 runtime can now configure and retrieve secrets from local environment variables and in a Kubernetes cluster.
Data tables can be locally accelerated using PostgreSQL
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
Announcing the release of Spice v0.7-alpha! πΉ
Spice v0.7-alpha is an all new implementation of Spice written in Rust. The Spice v0.7 runtime provides developers with a unified SQL query interface to locally accelerate and query data tables sourced from any database, data warehouse, or data lake.
Learn more and get started in minutes with the updated Quickstart in the repository README!
DataFusion SQL Query Engine: Spice v0.7 leverages the Apache DataFusion query engine to provide very fast, high quality SQL query across one or more local or remote data sources.
Data tables can be locally accelerated using Apache Arrow in-memory or by DuckDB.
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.
In February, we announced Spice.ai OSS v0.6 with its data processing and transport completely rebuilt upon Apache Flight. This enables Spice.ai OSS to scale to datasets 10-100 times larger and brings Spice.ai into the Apache Arrow ecosystem paving the way for integrations with many popular projects, like Apache Parquet, pandas and big data systems like Hive, Drill, Spark, Snowflake, BigQuery, and many more.
In Spice.ai OSS v0.6.1 we announced a new big data system integration⦠our own, Spice.xyz!
Spice.xyz is data and AI infrastructure for web3.
Itβs web3 data made easy. Insanely fast and purpose designed for applications and ML.
Spice.xyz delivers data in Apache Arrow format, over high-performance Apache Arrow Flight APIs to your application, notebook, ML pipeline, and of course, to the Spice.ai runtime.
With Spice.ai OSS v0.6.1, a new Apache Arrow Flight data connector was made available, creating a high-performance bulk-data transport directly into the Spice.ai ML engine. Coupled with Spice.xyz, developers can quickly and easily build web3 data-driven applications that learn and adapt using Spice.ai.
To read the announcement post for Spice.xyz, visit blog.spice.xyz.
Apache Arrow is a specification for an in-memory columnar data format thatβs very efficient for analytics operations. Arrowβs zero-copy read semantics coupled with the Flight client-server framework mean extremely fast and efficient data transport and access without serialization overhead. This enables high-performance bulk-data scenarios, critical for data-driven applications and ML. These properties enable an open-architecture based on Apache Arrow, Flight, and Parquet.
Paul Dix, CTO of InfluxData wrote a fantastic post on the Arrow ecosystem and why the future core of InfluxDB is built with Arrow. Sam Crowder also wrote A (Recent) History of Batch Data showing how Arrow is a cornerstone of modern data architecture.
Joining projects like InfluxDB, the core of both Spice.ai OSS and Spice.xyz are built with a foundation of Arrow and Flight. This means they benefit from the same high-performance data operations, they work great with each other and other projects in the ecosystem.
Betting on Arrow in Spice.ai enables exciting new applications because AI needs AI-ready data.
Previously it was difficult to efficiently get bulk data from a provider like Spice.xyz to the Spice.ai engine, but now it’s just a matter of configuring the connection through a few lines of YAML.
Imagine creating an application to trade NFTs. With Spice.xyz, developers can query Ethereum for data relating to NFT trading activity. That data is then delivered with the high-performance Arrow format to the Spice.ai runtime. The applicationβs Spicepod could learn how to value NFTs based upon itβs trading history and the communities itβs owners have been engaged in. And this could be all done in real-time, something not feasible before.
In addition, using the Arrow Flight connector, other exciting applications are enabled across a ton of domains, like IoT, financial applications, security monitoring, and many more.
To get somewhere you need a goal or destination, a vehicle to get there, and fuel for that vehicle.
When it comes to intelligent, AI-driven applications, Spice.xyz now provides the Spice.ai vehicle with a massive pipeline of web3 data fuel.
The next step is to make it easier for developers to define the destination for the vehicle. Upcoming on the Spice.ai OSS roadmap is the ability for developers to define goals for how the decision-engine should learn. Like learning to maximize measurement βAβ or optimizing to a target of βBβ.
For example, in web3, this might be to build a client that can learn and adapt to optimize Ethereum Gas Fee prices for token swaps. The goal would be to minimize the gas fee, a problem we experienced first-hand when we built defly.ai. Today you have to encode that goal into your reward function, but our plan is to help do that for you, and all you have to do is tell us the end goal.
Goal-oriented learning applies to many domains, whether it be minimizing fees in crypto or maximizing engagement on a social platform. And personally, weβre excited about the eventual ability to apply Spice.ai and just say βminimize my taxesβ :-)
Even for advanced developers, building intelligent apps that leverage AI is still way too hard. Our mission is to make this as easy as creating a modern web page. If that vision resonates with you, join us!
If youβd like to get involved, weβd love to talk. Try out Spice.ai OSS, Spice.xyz, email us βhey,β get in touch on Discord, or reach out on Twitter.
Luke
Announcing the release of Spice.ai v0.6.1-alpha! πΆ
Building upon the Apache Arrow support in v0.6-alpha, Spice.ai now includes new Apache Arrow data processor and Apache Arrow Flight data connector components! Together, these create a high-performance bulk-data transport directly into the Spice.ai ML engine. Coupled with big data systems from the Apache Arrow ecosystem like Hive, Drill, Spark, Snowflake, and BigQuery, it’s now easier than ever to combine big data with Spice.ai.
And we’re also excited to announce the release of Spice.xyz! π
Spice.xyz is data and AI infrastructure for web3. Itβs web3 data made easy. Insanely fast and purpose designed for applications and ML.
Spice.xyz delivers data in Apache Arrow format, over high-performance Apache Arrow Flight APIs to your application, notebook, ML pipeline, and of course through these new data components, to the Spice.ai runtime.
Read the announcement post at blog.spice.ai.
Now built with Go 1.18.
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!
Announcing the release of Spice.ai v0.6-alpha! πΉ
Spice.ai now scales to datasets 10-100 larger enabling new classes of uses cases and applications! π We’ve completely rebuilt Spice.ai’s data processing and transport upon Apache Arrow, a high-performance platform that uses an in-memory columnar format. Spice.ai joins other major projects including Apache Spark, pandas, and InfluxDB in being powered by Apache Arrow. This also paves the way for high-performance data connections to the Spice.ai runtime using Apache Arrow Flight and import/export of data using Apache Parquet. We’re incredibly excited about the potential this architecture has for building intelligent applications on top of a high-performance transport between application data sources the Spice.ai AI engine.
From data connectors, to REST API, to AI engine, we’ve now rebuilt Spice.ai’s data processing and transport on the Apache Arrow project. Specifically, using the Apache Arrow for Go implementation. Many thanks to Matt Topol for his contributions to the project and guidance on using it.
This release includes a change to the Spice.ai runtime to AI Engine transport from sending text CSV over gGPC to Apache Arrow Records over IPC (Unix sockets).
This is a breaking change to the Data Processor interface, as it now uses arrow.Record
instead of Observation
.
Before v0.6, Spice.ai would not scale into the 100s of 1000s of rows.
Format | Row Number | Data Size | Process Time | Load Time | Transport time | Memory Usage |
---|---|---|---|---|---|---|
csv | 2,000 | 163.15KiB | 3.0005s | 0.0000s | 0.0100s | 423.754MiB |
csv | 20,000 | 1.61MiB | 2.9765s | 0.0000s | 0.0938s | 479.644MiB |
csv | 200,000 | 16.31MiB | 0.2778s | 0.0000s | NA (error) | 0.000MiB |
csv | 2,000,000 | 164.97MiB | 0.2573s | 0.0050s | NA (error) | 0.000MiB |
json | 2,000 | 301.79KiB | 3.0261s | 0.0000s | 0.0282s | 422.135MiB |
json | 20,000 | 2.97MiB | 2.9020s | 0.0000s | 0.2541s | 459.138MiB |
json | 200,000 | 29.85MiB | 0.2782s | 0.0010s | NA (error) | 0.000MiB |
json | 2,000,000 | 300.39MiB | 0.3353s | 0.0080s | NA (error) | 0.000MiB |
After building on Arrow, Spice.ai now easily scales beyond millions of rows.
Format | Row Number | Data Size | Process Time | Load Time | Transport time | Memory Usage |
---|---|---|---|---|---|---|
csv | 2,000 | 163.14KiB | 2.8281s | 0.0000s | 0.0194s | 439.580MiB |
csv | 20,000 | 1.61MiB | 2.7297s | 0.0000s | 0.0658s | 461.836MiB |
csv | 200,000 | 16.30MiB | 2.8072s | 0.0020s | 0.4830s | 639.763MiB |
csv | 2,000,000 | 164.97MiB | 2.8707s | 0.0400s | 4.2680s | 1897.738MiB |
json | 2,000 | 301.80KiB | 2.7275s | 0.0000s | 0.0367s | 436.238MiB |
json | 20,000 | 2.97MiB | 2.8284s | 0.0000s | 0.2334s | 473.550MiB |
json | 200,000 | 29.85MiB | 2.8862s | 0.0100s | 1.7725s | 824.089MiB |
json | 2,000,000 | 300.39MiB | 2.7437s | 0.0920s | 16.5743s | 4044.118MiB |
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!
Last month in the v0.5-alpha version, a new learning algorithm was added to Spice.ai: Soft Actor-Critic. This is a very popular algorithm in the Reinforcement Learning field. Let’s see what it is and why this is an interesting addition.
The previous article Understanding Q-learning: How a Reward Is All You Need is not necessary but can be helpful to understand this article.
Deepmind first introduced the actor-critic approach in deep learning in a 2016 paper. We can think of this approach as having 2 tasks:
Those tasks will be made by 2 different neural networks or a single network that branches out in 2 heads. The actor is the part that outputs the policy, while the critic outputs the values.
In most cases, this model was proven to perform very well, better than Deep Q-Learning. The actor is trained to prefer actions associated with the best values from the critic. The critic is trained to correctly estimate rewards (current and future ones) of the actions.
Both will improve over time though we have to keep in mind that the critic is unlikely to evaluate all possible actions in the environment as it will only see actions from states that the actor is likely to take (the policy).
This bias of the system toward its policy is important: the algorithm is meant to train on-policy. The duo actor-critic works together: trying to train it with inputs and outputs from another system (humans or even itself in past iterations of its own training) will not work.
Multiple improvements were made to limit the bias of the actor-critic approach but the necessity to train on-policy remains. This is very limiting as being able to train from any experience can be very valuable for time and data efficiency.
Soft Actor-Critic allows an Actor-Critic network to train off-policy. It was introduced in a paper in 2018 and included multiple additions to improve its parent algorithm. The main difference is the introduction of the entropy of the actor outputs during the training phase.
The entropy measures the chaos/order of a system (or uncertainty). If a system always acts the same way, the entropy is minimal. Here the actor’s entropy is maximum if all possible actions have the same weight (same probability) and minimum if the actor always chose only a single action with 100% confidence.
During the training phase, the actor is trained to maintain the entropy of its outputs at a specific value.
The introduction of the entropy changes the goal of the training not only to find the bests output but to keep exploring the other actions. The critic part will be trained on all actions, even if they may occur only in rare cases.
There are other essential parts, such as having 2 critics and being able to output continuous values, but the entropy is the crucial difference in this algorithm’s training and potential.
As we saw above, the Actor-Critic algorithm is known to outperform Deep Q-Learning in most cases. If we also want to leverage previous data (off-policy training), Soft Actor-Critic is a natural choice. This approach is heavier despite better theoretical results, making it more suitable for complex tasks. For simpler tasks, Deep Q-Learning will still be an appealing option for its speed of training and its capability to quickly convergence to a good solution.
We can think of Soft Actor-Critic as a complex machine designed to take actions while keeping a variety of possibilities. Sometimes several options seem equally rewarding: a simpler algorithm would take what it evaluates as the best one even though the margin is small and the precision of its evaluation shouldn’t be enough. This tendency to quickly convergence to a solution has its benefits and inconveniences.
Adding new algorithms is essential to Spice.ai, so the procedure was designed to be straightforward.
Looking a the source code, the code related to training agents is in the ai/src
folder. This part of the code uses the python language as most modern AI libraries are distributed in this language.
In this folder, every agent is in the algorithms
folder, and each has its subfolder. There is an agent_interface
file that defines the main class that the different agents should inherit from and a factory
script responsible for creating instances of an agent from a given algorithm name.
Adding a new agent is simple:
algorithms
algorithm_id
, name
, and docs_link
(see other json as an example) in the folderSpiceAIAgent
defined in the agent_interface
scriptfactory
script to instantiate the new implementation when its name is called.For the new agent, inheriting from the main SpiceAIAgent
class, 5 functions need to be implemented:
Soft Actor-Critic is a fascinating algorithm that performs well in complex environments. We now support Soft Actor Critic in Spice.ai, which is another step forward in constantly improving the performance of the AI engine. Additionally, we’ll continue improving existing algorithms and adding newer ones over time. We designed the platform for ease of implementation and experimentation so if you’d like to try building your own agent, you can get the source code on Github and contribute to the platform. Say hi on Discord, reach out on Twitter or email us.
I hope you enjoy this post and something new.
Corentin
AI unlocks a new generation of intelligent applications that learn and adapt from data. These applications use machine learning (ML) to out-perform traditionally developed software. However, the data engineering required to leverage ML is a significant challenge for many product teams. In this post, we’ll explore the three classes of data you need to build next-generation applications and how Spice.ai handles runtime data engineering for you.
While ML has many different applications, one way to think about ML in a real-time application that can adapt is as a decision engine. Phillip discussed decision engines and their potential uses in A New Class of Applications That Learn and Adapt. This decision engine learns and informs the application how to operate. Of course, applications can and do make decisions without ML, but a developer normally has to code that logic. And the intelligence of that code is fixed, whereas ML enables a machine to constantly find the appropriate logic and evolve the code as it learns. For ML to do this, it needs three classes of data.
We don’t want any decision, though. We want high-quality, informed decisions. If you consider making higher quality, informed decisions over time, you need three classes of information. These classes are historical information, real-time or present information, and the results of your decisions.
Especially recently, stock or crypto trading is something many of us can relate to. To make high-quality, informed investing decisions, you first need general historical information on the price, security, financials, industry, previous trades, etc. You study this information and learn what might make a good investment or trade.
Second, you need a real-time updated stream of data as it happens to make a decision. If you were stock trading, this information might be the stock price on the day or hour you want to make the trade. You need to apply what you learned from historical data to the current information to decide what trade to place.
Finally, if we’re going to make better decisions over time, we need to capture and learn from the results of those decisions. Whether you make a great or poor trade, you want to incorporate that experience into your historical learning.
Using all three data classes together results in higher quality decisions over time. Broad data across these classes are useful, and we could make some nice trades with that. Still, we can make an even higher quality trading decision with personal context. For example, we may want to consider the individual tax consequences or risk level of the trade for our situation. So each of these classes also comes with global or local variants. We combine global information, like what worked well for everyone, and local experience, what worked well for us and our situation, to make the best, overall informed decision.
Consider how you would capture these three data classes and make them available to both the application and ML in the trading example. This data engineering can be a pretty big challenge.
First, you need a way to gather and consume historical information, like stock prices, and keep that updated over time. You need to handle streaming constantly updated real-time data to make runtime decisions on how to operate. You need to capture and match the decisions you make and feed that back into learning. And finally, you need a way to provide personal or local context, like holding off on sell trades until next year, to stay within a tax threshold, or identifying a pattern you like to trade. If all this wasn’t enough, as we learned from Phillip’s AI needs AI-ready data post, all three data classes need to be in a format that ML can use.
If you can afford a data or ML team, they may do much of this for you. However, this model starts to look quite waterfall-like and is not suited well to applications that want to learn and adapt in real-time. Like a waterfall approach, you would provide requirements to your data team, and they would do the data engineering required to provide you with the first two classes of data, historical and real-time. They may give you ML-ready data or train an ML model for you. However, there is often a large latency to apply that data or model in your application and a long turn-around time if it does not meet your requirements. In addition, to capture the third class of data, you would need to capture and send the results of the decisions your application made as a result of using those models back to the data team to incorporate in future learning. This latency through the data, decision-making, learning, and adaptation process is often infeasible for a real-world app.
And, if you can’t afford a data team, you have to figure out how to do all that yourself.
Modern software engineering practices have favored agile methodologies to reduce time to learn and adapt applications to customer and business needs. Spice.ai takes inspiration from agile methods to provide developers with a fast, iterative development cycle.
Spice.ai provides mechanisms for making all three classes of data available to both the application and the decision engine. Developers author Spicepods declaring how data should be captured, consumed, and made ML-ready so that all three classes are consistent and ML available.
The Spice.ai runtime exposes developer-friendly APIs and data connectors for capturing and consuming data and annotating that data with personal context. The runtime generates AI-ready data for you and makes it available directly for ML. These APIs also make it easy to capture application decisions and incorporate the resulting learning.
The Spice.ai approach short circuits the traditional waterfall-like data process by keeping as much data as possible application local instead of round-tripping through an external pipeline or team, especially valuable for real-time data. The application can learn and adapt faster by reducing the latency of decision consequences to learning.
Spice.ai enables personalized learning from personal context and experiences through the interpretations mechanism. Interpretations allow an application to provide additional information or an “interpretation” of a time range as input to learning. The trading example could be as simple as labeling a time range as a good time to buy or providing additional contextual information such as tax considerations, etc. Developers can also use interpretations to record the results of decisions with more context than what might be available in the observation space. You can read more about Interpretations in the Spice.ai docs.
While Spice.ai focuses on ensuring consistent ML-ready data is available, it does not replace traditional data systems or teams. They still have their place, especially for large historical datasets, and Spice.ai can consume data produced by them. Where possible, especially for application and real-time data, Spice.ai keeps runtime data local to create a virtuous cycle of data from the application to the decision engine and back again, enabling faster and more agile learning and adaption.
In summary, to build an intelligent application driven from AI recommended decisions, a significant amount of data engineering can be required to learn, make decisions, and incorporate the results. The Spice.ai runtime enables you as a developer to focus on consuming those decisions and tuning how the AI engine should learn rather than the runtime data engineering.
The potential of the next generation of intelligent applications to improve the quality of our lives is very exciting. Using AI to help applications make better decisions, whether that be AI-assisted investing, improving the energy efficiency of our homes and buildings, or supporting us in deciding on the most appropriate medical treatment, is very promising.
Even for advanced developers, building intelligent apps that leverage AI is still way too hard. Our mission is to make this as easy as creating a modern web page. If that vision resonates with you, join us!
If you want to get involved, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.
Luke
A new class of applications that learn and adapt is becoming possible through machine learning (ML). These applications learn from data and make decisions to achieve the application’s goals. In the post Making apps that learn and adapt, Luke described how developers integrate this ability to learn and adapt as a core part of the application’s logic. You can think of the component that does this as a “decision engine.” This post will explore a brief history of decision engines and use-cases for this application class.
The idea to make intelligent decision-making applications is not new. Developers first created these applications around the 1970s1, and they are some of the earliest examples of using artificial intelligence to solve real-world problems.
The first applications used a class of decision engines called “expert systems.” A distinguishing trait of expert systems is that they encode human expertise in rules for decision-making. Domain experts created combinations of rules that powered decision-making capabilities.
Some uses of expert systems include:
However, the resources required to build expert systems make employing them infeasible for many applications2. They often need a significant time and resource investment to capture and encode expertise into complex rule sets. These systems also do not automatically learn from experience, relying on experts to write more rules to improve decision-making.
With the advent of modern deep-learning techniques and the ability to access significantly more data, it is now possible for the computer, not only the developer, to learn and encode the rules to power a decision engine and improve them over time. The vision for Spice.ai is to make it easy for developers to build this new class of applications. So what are some use-cases for these applications?
Today: The air conditioning system for an office building runs on a fixed schedule and is set to a fixed temperature in business hours, only adjusting using in-room sensor data, if at all. This behavior potentially over cools at business close as the outside temperature lowers and the building starts vacating.
With Spice.ai: Using Spice.ai, the application combines time-series data from multiple data sources, including the time of day and day of the week, building/room occupancy, and outside temperature, energy consumption, and pricing. The A/C controller application learns how to adjust the air conditioning system as the room naturally cools towards the end of the day. As the occupancy decreases, the decision engine is rewarded for maintaining the desired temperature and minimizing energy consumption/cost.
Today: Customers order food delivery with a mobile app. When the order is ready to be picked up from the restaurant, the order is dispatched to a delivery driver by a simple heuristic that chooses the nearest available driver. As the app gets more popular with customers and the number of restaurants, drivers, and customers increases, the heuristic needs to be constantly tuned or supplemented with human operators to handle the demand.
With Spice.ai: The application learns which driver to dispatch to minimize delivery time and maximize customer star ratings. It considers several factors from data, including patterns in both the restaurant and driver’s order histories. As the number of users, drivers, and customers increases over time, the app adapts to keep up with the changing patterns and demands of the business.
Today: When trading stocks through a broker like Fidelity or TD Ameritrade, your broker will likely route your order to an exchange like the NYSE. And in the emerging world of crypto, you can place your trade or swap directly on a decentralized exchange (DEX) like Uniswap or Pancake Swap. In both cases, the routing of orders is likely to be either a form of traditional expert system based upon rules or even manually routed.
With Spice.ai: A smart order routing application learns from data such as pending transactions, time of day, day of the week, transaction size, and the recent history of transactions. It finds patterns to determine the most optimal route or exchange to execute the transaction and get you the best trade.
A new class of applications that can learn and adapt are made possible by integrating AI-powered decision engines. Spice.ai is a decision engine that makes it easy for developers to build these applications.
If you’d like to partner with us in creating this new generation of intelligent decision-making applications, we invite you to join us on Discord, reach out on Twitter or email us.
Phillip
Russell, Stuart; Norvig, Peter (1995). Artificial Intelligence: A Modern Approach. Simon & Schuster. pp. 22β23. ISBN 978-0-13-103805-9. ↩︎
Kendal, S. L., & Creen, M. (2007). An introduction to knowledge engineering. London: Springer. ISBN 978-1-84628-475-5 ↩︎
Announcing the release of Spice.ai v0.5.1-alpha! π
This minor release builds upon v0.5-alpha adding the ability to start training from the dashboard plus support for monitoring training runs with TensorBoard.
A “Start Training” button has been added to the pod page on the dashboard so that you can easily start training runs from that context.
Training runs can now be started by:
/api/v0.1/pods/{pod name}/train
TensorBoard monitoring is now supported when using DQL (default) or the new SACD learning algorithms that was announced in v0.5-alpha.
When enabled, TensorBoard logs will automatically be collected and a “Open TensorBoard” button will be shown on the pod page in the dashboard.
Logging can be enabled at the pod level with the training_loggers pod param or per training run with the CLI --training-loggers
argument.
Support for VPG will be added in v0.6-alpha. The design allows for additional loggers to be added in the future. Let us know what you’d like to see!
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!
There are two general ways to train an AI to match a given expectation: we can either give it the expected outputs (commonly named labels) for differents inputs; we call this supervised learning. Or we can provide a reward for each output as a score: this is reinforcement learning (RL).
Supervised learning works by tweaking all the parameters (weights in neural networks) to fit the desired outputs, expecting that given enough input/label pairs the AI will find common rules that generalize for any input.
Reinforcement learning’s reward is often provided from a simple function that can score any output: we don’t know what specific output would be best, but we can recognize how good the result is. In this latter statement there are two underlying concepts we will address in this post:
Those questions are already mostly answered, and many algorithms deal with those topics. Our journey here will be to understand how we tackle those questions and end up with a beautiful formula that is at the core of modern approaches of RL:
The vast majority, if not all, of modern RL algorithms are based on the principles of Q-learning: the idea is to evaluate a ‘reward expectation’ for each possible action. If we can have a good evaluation, we could maximize the reward by choosing actions with the maximum evaluated rewards. The function giving this expected reward is named Q. For now, we will assume we can have a reward for any action.
The t
indices show that the state and action aren’t constant and will vary, usually with time/action taken. On the other hand, the Q
function and the reward function r
are unique functions that ideally return the ’expected reward’ for any (state, action) pairs.
For now, we will assume we can have a reward that gives an objective and perfect evaluation of each state/action.
We know that actions’ outcomes (rewards) will vary depending on the current state we are in, otherwise the problem would be trivial to solve. If the states that are relevant to our actions can be numbered, a simple way would be to build a table with all the possible states/action pairs. There are different ways to build such a table depending on how we can interact with our environment. Eventually, we would have a good ‘map’ to guide us to do the best actions.
When the number of variables of the environment relevant to our actions/rewards becomes too large, the number of possible states grows quickly. It doesn’t take a lot of possible parameters to make the Q-table approach unfeasible. Neural networks are known to work very nicely and efficiently in high dimensionality (with many input variables). They also generalize well, so the idea in Deep Q-Learning is to use a neural network to predict the different Q values for each action given a state.
In this case, we do not need to give the state/action pairs but only the state, as the neural network would exhaustively return all the Q values associated with each action. Outputting all actions’ Q value is a common method as the general cases have a complex environment but a smaller number of possible actions.
This method works very well. It is similar to supervised learning with states as inputs and rewards as labels. We assumed so far that we had a reward for each action, and we chose the next action with the best reward (called a greedy policy). In many cases this is not enough: even if an action would yield the best reward at a given state, this may affect the next state so that we wouldn’t optimize the reward in the long term. Also, if we can’t have a reward for each action, we usually give 0 as a reward. We will not be able to choose the right action if they affect later states despite not yielding different rewards at the current state.
The sparsity of rewards or the long-term calculation of total reward (non-greedy policies) leads us to diverge from supervised learning and learn potential future rewards.
TD-learning is a clever way to account for potential future value without knowing them yet. TD is a model-free class of algorithms: it does not simulate future states. The main idea is to consider all the rewards of a sequence of actions to give a better value than just the reward of the next action.
We can, for instance, sum all the future rewards:
Mathematically this can be written as:
This is named TD(0): the simplest form of TD method, accumulating all the rewards.
We could try different trajectories (sequence of actions) and retrospectively get the final reward for each action, but this has 2 drawbacks: the environment is usually too vast, and the sequence of actions might not even have a definite end. Also, such exhaustive methods might not be very efficient. Instead, we can evaluate the ‘value’ of the next state overall, like the maximum of all its possible rewards (direct reward), and add this value to the reward of a given action.
If a state can have different branches, we can select the best one, and this would be our policy, the way we choose actions. This simple form of taking the maximum is called the ‘greedy’ policy.
This can be written down as:
The expected value notation is defined as:
For a greedy policy the probabilities p
would all be set to 0 but the one associated with the highest return to 1 (in case of equality between n actions, we would attribute ‘1/n’ as probabilities to get the same expected value).
The expected reward can be replaced by the Q function we used earlier, which now can be denominated to be specific to our chosen policy (named Ο):
We previously discussed the problem of not being able to go through all the states exhaustively and that the evaluation of the Q value from a neural network could help. We want to use the TD method to have a better value estimation that will consider potential future rewards.
The TD(0) method is elegant as we can, in fact, only use the next state’s expected value instead of all future ones. The idea is that with successive evaluations, we build a chain of dependencies as each states’ value depends on the next one.
We can see that the greedy policy would work even with null rewards in the trajectory. We can explicit our greedy policy, going back to use Q value instead of the state value V:
We need to fix a problem: if a trajectory grows too long or never ends, a state value can potentially grow indefinitely. To counter that, we can add a discount factor (originally named lambda, usually refer as gamma in Q-learning) for the next state’s value:
Notice that we simplify the reward notation for clarity.
To avoid exploding values, this discount has to be between 0 and 1 (strictly below 1). We can think about it as giving more importance to the direct reward than the future ones. As the contribution to the latter reward decrease, the chain of action can grow without the calculated value growing. If the reward has an upper limit, the value will also be bounded.
The sparsity of rewards is also solved: giving only a positive reward after many non-rewarding steps will create smooth values for the intermediate states. Any reward, positive or negative, will diffuse its value to the neighbor states.
Finally, as we train a neural network to estimate the Q function, we need to update its target with successive iteration. We cannot fully trust the estimator (a neural network here) to give the correct value, so we introduce a learning rate to update the target smoothly.
That is it! We now understand all the parts of this formula. Over multiple training steps with different sates, the training should find a good average Q function. While training, the estimator uses its own output to train itself (commonly referred to as bootstrapping): it is like it is chasing itself. Bootstrapping can lead to instability in the training process. There are many additional methods to help against such instability.
From giving rewards, sparse or not, binary or fine-grained, we have a smooth space of values for all our states/actions so the AI can follow a greedy policy to the best outcome.
This way of training is not a silver bullet and there is no guarantee that the AI will find a correlation from the information given as state to the returned reward.
We can see how our rewards are used to train AI’s policies using Q-learning. By understanding the many iterations required and the bootstrapping issues, we can help our AI by carefully giving relevant state information and reward:
We didn’t see how the AI’s algorithm can explore different actions given an environment here. Spice.ai’s technology focuses exclusively on off-policy training where we only have past data and cannot interact with the environment. RL is a vast topic and currently quickly growing. Robotics is a fantastic field of application; many other areas are yet to be explored with such a technology. We hope to push forward the technology and its field of application with our platform.
If you’d like to partner with us on the mission of making new applications by leveraging RL, we invite you to discuss with us on Discord, reach out on Twitter or email us.
I hope you enjoy this post and learn new things.
Corentin
We are excited to announce the release of Spice.ai v0.5-alpha! π₯
Highlights include a new learning algorithm called “Soft Actor-Critic” (SAC), fixes to the behavior of spice upgrade
, and a more consistent authoring experience for reward functions.
If you are new to Spice.ai, check out the getting started guide and star spiceai/spiceai on GitHub.
The addition of the Soft Actor-Critic (Discrete) (SAC) learning algorithm is a significant improvement to the power of the AI engine. It is not set as the default algorithm yet, so to start using it pass the --learning-algorithm sacd
parameter to spice train
. We’d love to get your feedback on how its working!
With the addition of the reward function files that allow you to edit your reward function in a Python file, the behavior of starting a new training session by editing the reward function code was lost. With this release, that behavior is restored.
In addition, there is a breaking change to the variables used to access the observation state and interpretations. This change was made to better reflect the purpose of the variables and make them easier to work with in Python
Previous (Type) | New (Type) |
---|---|
prev_state (SimpleNamespace) | current_state (dict) |
prev_state.interpretations (list) | current_state_interpretations (list) |
new_state (SimpleNamespace) | next_state (dict) |
new_state.interpretations (list) | next_state_interpretations (list) |
The Spice.ai CLI will no longer recommend “upgrading” to an older version. An issue was also fixed where trying to upgrade the Spice.ai CLI using spice upgrade
on Linux would return an error.
prev_state
and new_state
to current_state
and next_state
to be consistent with the reward function files.spice upgrade
command.Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!
A significant challenge when developing an app powered by AI is providing the machine learning (ML) engine with data in a format that it can use to learn. To do that, you need to normalize the numerical data, one-hot encode categorical data, and decide what to do with incomplete data - among other things.
This data handling is often challenging! For example, to learn from Bitcoin price data, the prices are better if normalized to a range between -1 and 1. Being close to 0 is also a problem because of the lack of precision in floating-point representations (usually under 1e-5).
As a developer, if you are new to AI and machine learning, a great talk that explains the basics is Machine Learning Zero to Hero. Spice.ai makes the process of getting the data into an AI-ready format easy by doing it for you!
You write code with if statements and functions, but your machine only understands 1s and 0s. When you write code, you leverage tools, like a compiler, to translate that human-readable code into a machine-readable format.
Similarly, data for AI needs to be translated or “compiled” to be understood by the ML engine. You may have heard of tensors before; they are simply another word for a multi-dimensional array and they are the language of ML engines. All inputs to and all outputs from the engine are in tensors. You could use the following techniques when converting (or “compiling”) source data to a tensor.
There are excellent tools like Pandas, Numpy, scipy, and others that make the process of data transformation easier. However, most of these tools are Python libraries and frameworks - which means having to learn Python if you don’t know it already. Plus, when building intelligent apps (instead of just doing pure data analysis), this all needs to work on real-time data in production.
The tools mentioned above are not designed for building real-time apps. They are often designed for analytics/data science.
In your app, you will need to do this data compilation in real-time - and you can’t rely on a local script to help process your data. It becomes trickier if the team responsible for the initial training of the machine learning model is not the team responsible for deploying it out into production.
How data is loaded and processed in a static dataset is likely very different from how the data is loaded and processed in real-time as your app is live. The result often is two separate codebases that are maintained by different teams that are both responsible for doing the same thing! Ensuring that those codebases stay consistent and evolve together is another challenge to tackle.
Spice.ai handles the “compilation” of data for you.
You specify the data that your ML should learn from in a Spicepod. The Spice.ai runtime handles the logistics of gathering the data and compiling it into an AI-ready format.
It does this by using many techniques described earlier, such as normalization and one-hot encoding. And because we’re continuing to evolve Spice.ai, our data compilation will only get better over time.
In addition, the design of the Spice.ai runtime naturally ensures that the data used for both the training and real-time cases are consistent. Spice.ai uses the same data-components and runtime logic to produce the data. And not only that, you can take this a step further and share your Spicepod with someone else, and they would be able to use the same AI-ready data for their applications.
Spice.ai handles the process of compiling your data into an AI-ready format in a way that is consistent both during the training and real-time stages of the ML engine. A Spicepod defines which data to get and where to get it. Sharing this Spicepod allows someone else to use the same AI-ready data format in their application.
Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page. If the vision resonates with you, join us!
Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.
If you are interested in partnering, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.
We are just getting started! π
Phillip
In my previous post, Teaching Apps how to Learn with Spicepods, I introduced Spicepods as packages of configuration that describe an application’s data-driven goals and how it should learn from data. To leverage Spice.ai in your application, you can author a Spicepod from scratch or build upon one fetched from the spicerack.org registry. In this post, we’ll walk through the creation and authoring of a Spicepod step-by-step from scratch.
As a refresher, a Spicepod consists of:
We’ll create the Spicepod for the ServerOps Quickstart, an application that learns when to optimally run server maintenance operations based upon the CPU-usage patterns of a server machine.
We’ll also use the Spice CLI, which you can install by following the Getting Started guide or Getting Started YouTube video.
Modern web development workflows often include a file watcher to hot-reload so you can iteratively see the effect of your change with a live preview.
Spice.ai takes inspiration and enables a similar Spicepod manifest authoring experience. If you first start the Spice.ai runtime in your application root before creating your Spicepod, it will watch for changes and apply them continuously so that you can develop in a fast, iterative workflow.
You would normally do this by opening two terminal windows side-by-side, one that runs the runtime using the command spice run
and one where you enter CLI commands. In addition, developers would open the Spice.ai dashboard located at http://localhost:8000 to preview changes they make.
The easiest way to create a Spicepod is to use the Spice.ai CLI command: spice init <Spicepod name>
. We’ll make one in the ServerOps Quickstart application called serverops
.
The CLI saves the Spicepod manifest file in the spicepods
directory of your application. You can see it created a new serverops.yaml file, which should be included in your application and be committed to your source repository. Let’s take a look at it.
The initialized manifest file is very simple. It contains a name and three main sections being:
We’ll walk through each of these in detail, and as a Spicepod author, you can always reference the documentation for the Spicepod manifest syntax.
You author and edit Spicepod manifest files in your favorite text editor with a combination of Spice.ai CLI helper commands. We eventually plan to have a VS Code extension and dashboard/portal editing abilities to make this even easier.
To build an intelligent, data-driven application, we must first start with data.
A Spice.ai dataspace is a logical grouping of data with definitions of how that data should be loaded and processed, usually from a single source. A combination of its data source and its name identifies it, for example, nasdaq/msft or twitter/tweets. Read more about Dataspaces in the Core Concepts documentation.
Let’s add a dataspace to the Spicepod manifest to load CPU metric data from a CSV file. This file is a snapshot of data from InfluxDB, a time-series database we like.
We can see this dataspace is identified by its source hostmetrics
and name cpu
. It includes a data
section with a file data connector, the path to the file, and a data processor to know how to process it. In addition, it defines a single measurement usage_idle
under the measurements section, which is a measurement of CPU load. In Spice.ai, measurements are the core primitive the AI engine uses to learn and is always numerical data. Spice.ai includes a growing library of community contributable data connectors and data processors you can consist of in your Spicepod to access data. You can also contribute your own.
Finally, because the data is a snapshot of live data loaded from a file, we must set a Spicepod epoch_time
that defines the data’s start Unix time.
Now we have a dataspace, called hostmetrics/cpu
, that loads CSV data from a file and processes the data into a usage_idle
measurement. The file connector might be swapped out with the InfluxDB connector in a production application to stream real-time CPU metrics into Spice.ai. And in addition, applications can always send real-time data to the Spice.ai runtime through its API with a simple HTTP POST (and in the future, using Web Sockets and gRPC).
Now that the Spicepod has data, let’s define some data-driven actions so the ServerOps application can learn when is the best time to take them. We’ll add three actions using the CLI helper command, spice action add
.
And in the manifest:
The Spicepod now has data and possible actions, so we can now define how it should learn when to take them. Similar to how humans learn, we can set rewards or punishments for actions taken based on their effect and the data. Let’s add scaffold rewards for all actions using the spice rewards add
command.
We now have rewards set for each action. The rewards are uniform (all the same), meaning the Spicepod is rewarded the same for each action. Higher rewards are better, so if we change perform_maintenance
to 2, the Spicepod will learn to perform maintenance more often than the other actions. Of course, instead of setting these arbitrarily, we want to learn from data, and we can do that by referencing the state of data at each time-step in the time-series data as the AI engine trains.
The rewards themselves are just code. Currently, we currently support Python code, either inline or in a .py external code file and we plan to support several other languages. The reward code can access the time-step state through the prev_state
and new_state
variables and the dataspace name. For the full documentation, see Rewards.
Let’s add this reward code to perform_maintenance, which will reward performing maintenance when there is low CPU usage.
cpu_usage_prev = 100 - prev_state.hostmetrics_cpu_usage_idle
cpu_usage_new = 100 - new_state.hostmetrics_cpu_usage_idle
cpu_usage_delta = cpu_usage_prev - cpu_usage_new
reward = cpu_usage_delta / 100
This code takes the CPU usage (100 minus the idle time) deltas between the previous time state and the current time state, and sets the reward to be a normalized delta value between 0 and 1. When the CPU usage is moving from higher cpu_usage_prev
to lower cpu_usage_low
, its a better time to run server maintenance and so we reward the inverse of the delta. E.g. 80% - 50% = 30% = 0.3
. However, if the CPU moves lower to higher, 50% - 80% = -30% = -0.3
, it’s a bad time to run maintenance, so we provide a negative reward or “punish” the action.
Through these rewards and punishments and the CPU metric data, the Spicepod will when it is a good time to perform maintence and be the decision engine for the ServerOps application. You might be thinking you could write code without AI to do this, which is true, but handling the variety of cases, like CPU spikes, or patterns in the data, like cyclical server load, would take a lot of code and a development time. Applying AI helps you build faster.
The manifest now has defined data, actions, and rewards. The Spicepod can get data to learn which actions to take and when based on the rewards provided.
If the Spice.ai runtime is running, the Spicepod automatically trains each time the manifest file is saved. As this happens reward performance can be monitored in the dashboard.
Once a training run completes, the application can query the Spicepod for a decision recommendation by calling the recommendations API http://localhost:8000/api/v0.1/pods/serverops/recommendation. The API returns a JSON document that provides the recommended action, the confidence of taking that action, and when that recommendation is valid.
In the ServerOps Quickstart, this API is called from the server maintenance PowerShell script to make an intelligent decision on when to run maintenance. The ServerOps Sample, which uses live data, can be continuously trained to learn and adapt even as the live data changes due to load patterns changing.
The full Spicepod manifest from this walkthrough can be added from spicerack.org using the spice add quickstarts/serverops
command.
Leveraging Spice.ai to be the decision engine for your server maintenance application helps you build smarter applications, faster that will continue to learn and adapt over time, even as usage patterns change over time.
Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page. If the vision resonates with you, join us!
Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.
If you are interested in partnering, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.
We are just getting started! π
Luke
Announcing the release of Spice.ai v0.4.1-alpha! β
This point release focuses on fixes and improvements to v0.4-alpha. Highlights include AI engine performance improvements, updates to the dashboard observations data grid, notification of new CLI versions, and several bug fixes.
A special acknowledgment to @Adm28, who added the CLI upgrade detection and prompt, which notifies users of new CLI versions and prompts to upgrade.
Overall training performance has been improved up to 13% by removing a lock in the AI engine.
In versions before v0.4.1-alpha, performance was especially impacted when streaming new data during a training run.
The dashboard observations datagrid now automatically resizes to the window width, and headers are easier to read, with automatic grouping into dataspaces. In addition, column widths are also resizable.
When it is run, the Spice.ai CLI will now automatically check for new CLI versions once a day maximum.
If it detects a new version, it will print a notification to the console on spice version
, spice run
or spice add
commands prompting the user to upgrade using the new spice upgrade
command.
time_format
of hex
or prefix with 0x
.Spicepods
directory, and a resulting error when loading a non-Spicepod file.Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!
The Spice.ai project strives to help developers build applications that leverage new AI advances which can be easily trained, deployed, and integrated. A previous blog post introduced Spicepods: a declarative way to create AI applications with Spice.ai technology. While there are many libraries and platforms in the space, Spice.ai is focused on time-series data aligning to application-centric and frequently time-dependent data, and a Reinforcement Learning approach, which can be more developer-friendly than expensive, labeled supervised learning.
This post will discuss some of the challenges and directions for the technology we are developing.
Time series AI has become more popular over recent years, and there is extensive literature on the subject, including time-series-focused neural networks. Research in this space points to the likelihood that there is no silver bullet, and a single approach to time series AI will not be sufficient. However, for developers, this can make building a product complex, as it comes with the challenge of exploring and evaluating many algorithms and approaches.
A fundamental challenge of time series is the data itself. The shape and length are usually variable and can even be infinite (real-time streams of data). The volume of data required is often too much for simple and efficient machine learning algorithms such as Decision Trees. This challenge makes Deep Learning popular to process such data. There are several types of neural networks that have been shown to work well with time series so let’s review some of the common classes:
While not a complete representation of classes of neural networks, this list represents the areas of the most potential for Spice.ai’s time-series AI technology. We also see other interesting paradigms to explore when improving the core technology like Memory Augmented Neural Networks (MANN) or neural network-based Genetical Algorithms.
Reinforcement Learning (RL) has grown steadily, especially in fields like robotics. Usually, RL doesn’t require as much data processing as Supervised Learning, where large datasets can be demanding for hardware and people alike. RL is more dynamic: agents aren’t trained to replicate a specific behaviors/output but explore and ’exploit’ their environment to maximize a given reward.
Most of today’s research is based on environments the agent can interact with during the training process, known as online learning. Usually, efficient training processes have multiple agent/environment pairs training together and sharing their experiences. Having an environment for agents to interact enables different actions from the actual historical state known as on-policy learning, and using only past experiences without an environment is off-policy learning.
Spice.ai is initially taking an off-policy approach, where an environment (either pre-made or given by the user) is not required. Despite limiting the exploration of agents, this aligns to an application-centric approach as:
The Spice.ai approach to time series AI can be described as ‘Data-Driven’ Reinforcement Learning. This domain is very exciting, and we are building upon excellent research that is being published. The Berkeley Artificial Intelligence Research’s blog shows the potential of this field and many other research entities that have made great discoveries like DeepMind, Open AI, Facebook AI and Google AI (among many others). We are inspired and are building upon all the research in Reinforcement Learning to develop core Spice.ai technology.
If you are interested in Reinforcement Learning, we recommend following these blogs, and if you’d like to partner with us on the mission of making it easier to build intelligent applications by leveraging RL, we invite you to discuss with us on Discord, reach out on Twitter or email us.
Corentin
We are excited to announce the release of Spice.ai v0.4-alpha! πββοΈ
Highlights include support for authoring reward functions in a code file, the ability to specify the time of recommendation, and ingestion support for transaction/correlation ids. Authoring reward functions in a code file is a significant improvement to the developer experience than specifying functions inline in the YAML manifest, and we are looking forward to your feedback on it!
If you are new to Spice.ai, check out the getting started guide and star spiceai/spiceai on GitHub.
The spice upgrade
command was added in the v0.3.1-alpha release, so you can now upgrade from v0.3.1 to v0.4 by simply running spice upgrade
in your terminal. Special thanks to community member @Adm28 for contributing this feature!
In addition to defining reward code inline, it is now possible to author reward code in functions in a separate Python file.
The reward function file path is defined by the reward_funcs
property.
A function defined in the code file is mapped to an action by authoring its name in the with
property of the relevant reward.
Example:
training:
reward_funcs: my_reward.py
rewards:
- reward: buy
with: buy_reward
- reward: sell
with: sell_reward
- reward: hold
with: hold_reward
Learn more in the documentation: docs.spiceai.org/concepts/rewards/external
Spice.ai can now learn from cyclical patterns, such as daily, weekly, or monthly cycles.
To enable automatic cyclical field generation from the observation time, specify one or more time categories in the pod manifest, such as a month
or weekday
in the time
section.
For example, by specifying month
the Spice.ai engine automatically creates a field in the AI engine data stream called time_month_{month}
with the value calculated from the month of which that timestamp relates.
Example:
time:
categories:
- month
- dayofweek
Supported category values are:
month
dayofmonth
dayofweek
hour
Learn more in the documentation: docs.spiceai.org/reference/pod/#time
It is now possible to specify the time of recommendations fetched from the /recommendation
API.
Valid times are from pod epoch_time
to epoch_time + period
.
Previously the API only supported recommendations based on the time of the last ingested observation.
Requests are made in the following format: GET http://localhost:8000/api/v0.1/pods/{pod}/recommendation?time={unix_timestamp}
An example for quickstarts/trader
GET http://localhost:8000/api/v0.1/pods/trader/recommendation?time=1605729600
Specifying {unix_timestamp}
as 0
will return a recommendation based on the latest data. An invalid {unix_timestamp}
will return a result that has the valid time range in the error message:
{
"response": {
"result": "invalid_recommendation_time",
"message": "The time specified (1610060201) is outside of the allowed range: (1610057600, 1610060200)",
"error": true
}
}
order_id
, trace_id
) in the pod manifest.training
section is not included in the manifest.Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!
The last post in this series, Making Apps that Learn and Adapt, described the shift from building AI/ML solutions to building apps that learn and adapt. But, how does the app learn? And as a developer, how do I teach it what it should learn?
With Spice.ai, we teach the app how to learn using a Spicepod.
Imagine you own a restaurant. You created a menu, hired staff, constructed the kitchen and dining room, and got off to a great start when it first opened. However, over the years, your customers’ tastes changed, you’ve had to make compromises on ingredients, and there’s a hot new place down the street… business is stagnating, and you know that you need to make some changes to stay competitive.
You have a few options. First, you could gather all the data, such as customer surveys, seasonal produce metrics, and staff performance profiles. You may even hire outside consultants. You then take this data to your office, and after spending some time organizing, filtering, and collating it, you’ve discovered an insight! Your seafood dishes sell poorly and cost the most… you are losing money! You spend several weeks or months perfecting a new menu, which you roll out with much fanfare! And thenβ¦ business is still poor. What!? How could this be? It was a data-driven approach! You start the process again. While this approach is a worthy option, it has long latency from data to learning to implementation.
Another option is to build real-time learning and adaption directly into the restaurant. Imagine a staff member whose sole job was learning and adapting how the restaurant should operate; lets name them Blue. You write a guide for Blue that defines certain goal metrics, like customer food ratings, staff happiness, and of course, profit. Blue tracks each dish served, from start to finish, from who prepared it to its temperature, its costs, and its final customer taste rating. Blue not only learns from each customer review as each dish is consumed but also how dish preparation affects other goal metrics, like profitability. The restaurant staff consults Blue to determine any adjustments to improve goal metrics as they work. The latency from data to learning, to adaption, has been reduced, from weeks or months to minutes. This option, of course, is not feasible for most restaurants, but software applications can use this approach. Blue and his instructions are analogous to the Spice.ai runtime and manifest.
In the Spice.ai model, developers teach the app how to learn by describing goals and rewarding its actions, much like how a parent might teach a child. As these rewards are applied in training, the app learns what actions maximize its rewards towards the defined goals.
Returning to the restaurant example, you can think of the Spice.ai runtime as Blue, and Spicepod manifests as the guide on how Blue should learn. Individual staff members would consult with Blue for ongoing recommendations on decisions to make and how to act. These goals and rewards are defined in Spicepods or “pods” for short. Spicepods are packages of configuration that describe the application’s goals and how it should learn from data. Although it’s not a direct analogy, Spicepods and their manifests can be conceptualized similar to Docker containers and Dockerfiles. In contrast, Dockerfiles define the packaging of your app, Spicepods specify the packaging of your app’s learning and data.
Anatomy of a Spicepod
A Spicepod consists of:
Developers author Spicepods using the spice
CLI command such as with spice pod init <name>
or simply by creating a manifest file such as mypod.yaml
in the spicepods
directory of their application.
Here’s an example of the Tweet Recommendation Quickstart Spicepod manifest.
A screenshot of the Spicepod manifest for the Tweet Recommendation Quickstart
You can see the data definitions under dataspaces
, the actions the application may take under actions
, and their rewards when training.
In the next post, I’ll walk through in detail each section of the pod manifest. In the meantime, you can review the documentation for a complete reference of the Spicepod manifest syntax.
Spicepods as packages
On disk, Spicepods are generally layouts of a manifest file, seed data, and trained models, but they can also be exported as zipped packages.
A screenshot of the Spicepod layout for the trader quickstart application
When the runtime exports a Spicepod using the spice export
command, it is saved with a .spicepod
extension. It can then be shared, archived, or imported into another instance of the Spice.ai runtime.
Soon, we also expect to enable publishing of .spicepods
to spicerack.org, from where community-created Spicepods can easily be added to your application using spice add <pod name>
(currently, only Spice AI published pods are available on spicerack.org).
Treating Spicepods as packages and enabling their sharing and distribution through spicerack.org will help developers share their “restaurant guides” and build upon each other’s work, much like they do with npmjs.org or pypi.org. In this way, developers can together build better and more intelligent applications.
In the next post, we’ll dive deeper into authoring a Spicepod manifest to create an intelligent application. Follow @spice_ai on Twitter to get an update when we post.
If you haven’t already, read the next the first post in the series, Making Apps that Learn and Adapt.
Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page. If the vision resonates with you, join us!
Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.
If you are interested in partnering, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.
We are just getting started! π
Luke
In the Spice.ai announcement blog post, we shared some of the inspiration for the project stemming from challenges in applying and integrating AI/ML into a neurofeedback application. Building upon those ideas, in this post, we explore the shift in approach from a focus of data science and machine learning (ML) to apps that learn and adapt.
As a developer, I’ve followed the AI/ML space with keen interest and been impressed with the advances and announcements that only seem to be increasing. stateof.ai recently published its 2021 report, and once again, it’s been another great year of progress. At the same time, it’s still more challenging than ever for mainstream developers to integrate AI/ML into their applications. For most developers, where AI/ML is not their full-time job, and without the support of a dedicated ML team, creating and developing an intelligent application that learns and adapts is still too hard.
Most solutions on the market, even those that claim they are for developers, focus on helping make ML easier instead of making it easier to build applications. These solutions have been great for advancing ML itself but have not helped developers leverage ML in their apps to make them intelligent. Even when a developer successfully integrates ML into an application, it might make that application smart, but often does not help the app continue to learn and adapt over time.
Traditionally, the industry has viewed AI/ML as separate from the application. A pipeline, service, or team is provided with data, which trains on that data, and can then provide answers or insights. These solutions are often created with a waterfall-like approach, gathering and defining requirements, designing, implementing, testing, and deploying. Sometimes this process can take months or even years.
With Spice.ai, we propose a new approach to building applications. By bringing AI/ML alongside your compute and data and incorporating it as part of your application, the app can incrementally adopt recommendations from the AI engine and in addition the AI engine can learn from the application’s data and actions. This approach shifts from waterfall-like to agile-like, where the AI engine ingests streams of application and external data, along with the results of the application’s actions, to continuously learn. This virtuous feedback cycle from the app to the AI engine and back again enables the app to get smarter and adapt over time. In this approach, building your application is developing the ML.
Being part of the application is not just conceptual. Development teams deploy the Spice.ai runtime and AI engine with the application as a sidecar or microservice, enabling the app services and runtime to work together and for data to be kept application local. A developer teaches the AI engine how to learn by defining application goals and rewards for actions the application takes. The AI Engine observes the application and the consequences of its actions, which feeds into its experience. As the AI engine learns, the application can adapt.
As developers shift from thinking about disparate applications and ML to building applications where AI that learns and adapts is integrated as a core part of the application logic, a new class of intelligent applications will emerge. And as technical talent becomes even more scarce, applications built this way will be necessary, not just to be competitive but to be even built at all.
In the next post, I’ll discuss the concept of Spicepods, bundles of configuration that describes how the application should learn, and how the Spice.ai runtime hosts and uses them to help developers make applications that learn.
Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page. If the vision resonates with you, join us!
Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.
If you are interested in partnering, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.
We are just getting started! π
Luke
We are excited to announce the release of Spice.ai v0.3.1-alpha! π
This point release focuses on fixes and improvements to v0.3-alpha. Highlights include the ability to specify both seed and runtime data, to select custom named fields for time
and tags
, a new spice upgrade
command and several bug fixes.
A special acknowledgment to @Adm28, who added the new spice upgrade
command, which enables the CLI to self-update, which in turn will auto-update the runtime.
The CLI can now be updated using the new spice upgrade
command. This command will check for, download, and install the latest Spice.ai CLI release, which will become active on it’s next run.
When run, the CLI will check for the matching version of the Spice.ai runtime, and will automatically download and install it as necessary.
The version of both the Spice.ai CLI and runtime can be checked with the spice version
CLI command.
When working with streaming data sources, like market prices, it’s often also useful to seed the dataspace with historical data. Spice.ai enables this with the new seed_data
node in the dataspace configuration. The syntax is exactly the same as the data
syntax. For example:
dataspaces:
- from: coinbase
name: btcusd
seed_data:
connector: file
params:
path: path/to/seed/data.csv
processor:
name: csv
data:
connector: coinbase
params:
product_ids: BTC-USD
processor:
name: json
The seed data will be fetched first, before the runtime data is initialized. Both sets of connectors and processors use the dataspace scoped measurements
, categories
and tags
for processing, and both data sources are merged in pod-scoped observation timeline.
Before v0.3.1-alpha, data was required to include a specific time
field. In v0.3.1-alpha, the JSON and CSV data processors now support the ability to select a specific field to populate the time field. An example selector to use the created_at
column for time
is:
data:
processor:
name: csv
params:
time_selector: created_at
Before v0.3.1-alpha, tags were required to be placed in a _tags
field. In v0.3.1-alpha, any field can now be selected to populate tags. Tags are pod-unique string values, and the union of all selected fields will make up the resulting tag list. For example:
dataspace:
from: twitter
name: tweets
tags:
selectors:
- tags
- author_id
values:
- spice_ai
- spicy
spice upgrade
command for self-upgrade of the Spice.ai CLI.seed_data
node to the dataspace configuration, enabling the dataspace to be seeded with an alternative source of data.time_selector
parameter.selectors
list.Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!
We are excited to announce the release of Spice.ai v0.3-alpha! π
This release adds support for ingestion, automatic encoding, and training of categorical data, enabling more use-cases and datasets beyond just numerical measurements. For example, perhaps you want to learn from data that includes a category of t-shirt sizes, with discrete values, such as small, medium, and large. The v0.3 engine now supports this and automatically encodes the categorical string values into numerical values that the AI engine can use. Also included is a preview of data visualizations in the dashboard, which is helpful for developers as they author Spicepods and dataspaces.
A special acknowledgment to @sboorlagadda, who submitted the first Spice.ai feature contribution from the community ever! He added the ability to list pods from the CLI with the new spice pods list
command. Thank you, @sboorlagadda!!!
If you are new to Spice.ai, check out the getting started guide and star spiceai/spiceai on GitHub.
In v0.1, the runtime and AI engine only supported ingesting numerical data. In v0.2, tagged data was accepted and automatically encoded into fields available for learning. In this release, v0.3, categorical data can now also be ingested and automatically encoded into fields available for learning. This is a breaking change with the format of the manifest changing separating numerical measurements and categorical data.
Pre-v0.3, the manifest author specified numerical data using the fields
node.
In v0.3, numerical data is now specified under measurements
and categorical data under categories
. E.g.
dataspaces:
- from: event
name: stream
measurements:
- name: duration
selector: length_of_time
fill: none
- name: guest_count
selector: num_guests
fill: none
categories:
- name: event_type
values:
- dinner
- party
- name: target_audience
values:
- employees
- investors
tags:
- tagA
- tagB
A top piece of community feedback was the ability to visualize data. After first running Spice.ai, we’d often hear from developers, “how do I see the data?”. A preview of data visualizations is now included in the dashboard on the pod page.
Once the Spice.ai runtime has started, you can view the loaded pods on the dashboard and fetch them via API call localhost:8000/api/v0.1/pods. To make it even easier, we’ve added the ability to list them via the CLI with the new spice pods list
command, which shows the list of pods and their manifest paths.
A new Coinbase data connector is included in v0.3, enabling the streaming of live market ticker prices from Coinbase Pro. Enable it by specifying the coinbase
data connector and providing a list of Coinbase Pro product ids. E.g. “BTC-USD”. A new sample which demonstrates is also available with its associated Spicepod available from the spicerack.org registry. Get it with spice add samples/trader
A new Tweet Recommendation Quickstart has been added. Given past tweet activity and metrics of a given account, this app can recommend when to tweet, comment, or retweet to maximize for like count, interaction rates, and outreach of said given Twitter account.
A new Trader Sample has been added in addition to the Trader Quickstart. The sample uses the new Coinbase data connector to stream live Coinbase Pro ticker data for learning.
/observations
API. Previously, only CSV was supported./observations
endpoint was not providing fully qualified field names.Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!
Announcing the release of Spice.ai v0.2.1-alpha! π
This point release focuses on fixes and improvements to v0.2-alpha. Highlights include the ability to specify how missing data should be treated and a new production mode for spiced
.
This release supports the ability to specify how the runtime should treat missing data. Previous releases filled missing data with the last value (or initial value) in the series. While this makes sense for some data, i.e., market prices of a stock or cryptocurrency, it does not make sense for discrete data, i.e., ratings. In v0.2.1, developers can now add the fill
parameter on a dataspace field to specify the behavior. This release supports fill types previous
and none
. The default is previous
.
Example in a manifest:
dataspaces:
- from: twitter
name: tweets
fields:
- name: likes
fill: none # The new fill parameter
spiced
now defaults to a new production mode when run standalone (not via the CLI), with development mode now explicitly set with the --development
flag. Production mode does not activate development time features, such as the Spicepod file watcher. The CLI always runs spiced
in development mode as it is not expected to be used in production deployments.
fill
parameter to dataspace fields to specify how missing values should be treated.spiceai
release instead of separate spice
and spiced
releases.spiced
. Production mode does not activate the file watcher.epoch_time
was not set which would cause data not to be sent to the AI engine.Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!
We are excited to announce the release of Spice.ai v0.2-alpha! π
This release is the first major version since the initial v0.1 announcement and includes significant improvements based upon community and customer feedback. If you are new to Spice.ai, check out the getting started guide and star spiceai/spiceai on GitHub.
In the first release, the runtime and AI engine could only ingest numerical data. In v0.2, tagged data is accepted and automatically encoded into fields available for learning. For example, it’s now possible to include a “liked” tag when using tweet data, automatically encoded to a 0/1 field for training. Both CSV and the new JSON observation formats support tags. The v0.3 release will add additional support for sets of categorical data.
Previously, the runtime would trigger each data connector to fetch on a 15-second interval. In v0.2, we upgraded the interface for data connectors to a push/streaming model, which enables continuous streaming data into the environment and AI engine.
Spice.ai works together with your application code and works best when it’s provided continuous feedback. This feedback could be from the application itself, for example, ratings, likes, thumbs-up/down, profit from trades, or external expertise. The interpretations API was introduced in v0.1.1, and v0.2 adds AI engine support providing a way to give meaning or an interpretation of ranges of time-series data, which are then available within reward functions. For example, a time range of stock prices could be a “good time to buy,” or perhaps Tuesday mornings is a “good time to tweet,” and an application or expert can teach the AI engine this through interpretations providing a shortcut to it’s learning.
/pods//dataspaces
API/pods//diagnostics
APISpice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!
AI has recently seen some impressive advances, like with OpenAI Codex and DeepMind AlphaFold 2. And at the same time, for most developers, leveraging AI to create intelligent applications is still way too hard. The Data Science Hierarchy of Needs pyramid from 2017 still illustrates it well; there are too many unmet needs in applying ML in applications.
We faced the same AI development challenges many developers do, even though we had years of engineering experience at Microsoft and GitHub, there was too much to learn and build. And we simply didn’t have the time, resources, or tools to learn and utilize AI effectively in the project. After experiencing this pain ourselves, we saw an opportunity to make it better for everyone.
Today, we are making Spice.ai available on GitHub, a new open source project that helps developers use deep learning to create intelligent applications. We’re looking for feedback on the direction. It’s not finished, in fact, we only started this summer, and we invite you to try out the alpha.
Figure 1. Adding a Spice.ai pod, training and getting a recommendation in three commands
Like many developer stories, it all started with a side-project. We were interested in neurofeedback, a type of biofeedback therapy that reinforces healthy brain function but can cost up to $15,000. We wanted to make it accessible to more people, so we set out to build a system that leverages AI to deliver neurofeedback more cost-effectively. Using AI for the application was much more challenging than expected, and this sparked the inspiration for Spice.ai.
In the neurofeedback project, we worked with brain activity EEG data - time series data. We realized that time series data applies to many domains, from health and biometrics to finance, sales, logistics, security, IoT, and application monitoring. The amount of time series data in these fields is growing exponentially, and extracting insights from this data to make more intelligent software will determine the success of the next generation of applications.
We also realized that handling time series data is often sensitive, such as with health, financial, and security data. Instead of sending all data into a 3rd-party AI service, we needed the choice to bring the AI runtime to wherever our data and compute lived, either in the cloud, on-premises or on edge devices.
Spice.ai is an open source, portable runtime for training and using deep learning on time series data. It’s written in Golang and Python and runs as a container or microservice with applications calling a simple HTTP API. It’s deployable to any public cloud, on-premises, and edge.
The vision for Spice.ai is to make creating intelligent applications as easy as possible for developers in their development environment of choice. Spice.ai brings AI development to their editor in any language or framework with a fast, iterative, inner development loop, continuous-integration (CI), and continuous-deployment (CD) workflows.
The Spice.ai runtime also includes a library of community-driven components for streaming and processing time series data, enabling developers to quickly and easily combine data with learning to create intelligent models.
Developers can write easy-to-understand and re-useable, “pods,” with manifests that connect these data components with a simple definition of the learning environment. These pods also serve as a package for the resulting trained model.
Modern developers build together with the community by leveraging registries such as npm, NuGet, and pip. The registry for sharing and using pods is spicerack.org. As the community shares more and more pods, developers can quickly build upon each others’ work, initially by sharing manifests and eventually by reusing fully-trained models.
We are currently piloting Spice.ai with several companies to create the next generation of modern applications, such as optimizing in-store pickups for a large online retailer or scheduling optimizations for healthcare workers and resources. We’ve already seen some cool use cases, including suspicious login detection, intelligent cloud-spend analysis, and order routing for a food delivery app.
Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page.
This mission is a huge undertaking and Spice.ai v0.1-alpha has many gaps, including limited deep learning algorithms and training scale, streaming data, simulated environments, and offline learning modes. Pods aren’t searchable or even listed on spicerack.org yet. But if the vision resonates with you, join us! Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.
If you are interested in partnering, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.
We are just getting started! π