This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Spice.ai blog

Spice v1.0-rc.2 (Dec 16, 2024)
Spice v1.0-rc.1 (Nov 27, 2024)
Spice v0.20-beta (Nov 4, 2024)
Spice v0.19.4-beta (Oct 30, 2024)
Spice v0.19.3-beta (Oct 28, 2024)
Spice v0.19.2-beta (Oct 21, 2024)
Spice v0.19.1-beta (Oct 14, 2024)
Spice v0.19-beta (Oct 7, 2024)
Spice v0.18.3-beta (Sep 30, 2024)
Spice v0.18.1-beta (Sep 23, 2024)
Spice v0.18-beta (Sep 16, 2024)
Spice v0.17.4-beta (Sep 9, 2024)
Spice v0.17.3-beta (Sep 2, 2024)
Spice v0.17.2-beta (August 26, 2024)
Spice v0.17.1-beta (August 5, 2024)
Spice v0.17-beta (July 29, 2024)
Spice v0.16-alpha (July 22, 2024)
Spice v0.15.2-alpha (July 15, 2024)
Spice v0.15.1-alpha (July 8, 2024)
Spice v0.15-alpha (July 1, 2024)
Spice v0.14.1-alpha (June 24, 2024)
Spice v0.14-alpha (June 17, 2024)
Spice v0.13.3-alpha (June 10, 2024)
Spice v0.13.2-alpha (June 3, 2024)
Spice v0.13.1-alpha (May 27, 2024)
Spice v0.13-alpha (May 20, 2024)
Spice v0.12.2-alpha (May 13, 2024)
Spice v0.12.1-alpha (May 6, 2024)
Spice v0.12-alpha (April 29, 2024)
Spice v0.11.1-alpha (April 22, 2024)
Spice v0.11-alpha (April 15, 2024)
Announcing the release of Spice.ai v0.10.2-alpha
Announcing the release of Spice.ai v0.10.1-alpha
Adding Spice - The Next Generation of Spice.ai OSS
Announcing the release of Spice.ai v0.10-alpha
Announcing the release of Spice.ai v0.9.1-alpha
Announcing the release of Spice.ai v0.9-alpha
Announcing the release of Spice.ai v0.8-alpha
Announcing the release of Spice.ai v0.7-alpha
Building on Apache Arrow and Flight
Announcing the release of Spice.ai v0.6.1-alpha
Announcing the release of Spice.ai v0.6-alpha
Adding Soft Actor-Critic
What Data Informs AI-driven Decision Making?
A New Class of Applications That Learn and Adapt
Announcing the release of Spice.ai v0.5.1-alpha
Understanding Q-learning: How a Reward Is All You Need
Announcing the release of Spice.ai v0.5-alpha
AI needs AI-ready data
Spicepods: From Zero To Hero
Announcing the release of Spice.ai v0.4.1-alpha
Spice.ai's approach to Time-Series AI
Announcing the release of Spice.ai v0.4-alpha
Teaching Apps how to Learn with Spicepods
Making Apps That Learn And Adapt
Announcing the release of Spice.ai v0.3.1-alpha
Spice.ai v0.3-alpha is now available
Announcing the release of Spice.ai v0.2.1-alpha
Spice.ai v0.2-alpha is now available
Introducing Spice.ai - open source, time series AI for developers

This is the blog section.

Files in these directories will be listed in reverse chronological order.

Spice v1.0-rc.2 (Dec 16, 2024)

By Spice AI (@spice_ai) | Monday, December 16, 2024

Announcing the release of Spice v1.0-rc.2 🔗

Spice v1.0.0-rc.2 is the second release candidate for the first major version of Spice.ai OSS. This release continues to build on the stability of Spice for production use, including key Data Connector graduations, bug fixes, and AI features.

Highlights in v1.0-rc.2

MS SQL and File Data Connectors: Graduated from Alpha to Beta.
GraphQL and Databricks Delta Lake Data Connectors: Graduated from Beta to Release Candidate.
gospice SDK Release: The Spice Go SDK has updated to v7.0, adding support for refreshing datasets and upgrading dependencies.
Azure AI Support: Added support for both LLMs and embedding models. Example spicepod.yml configuration:

embeddings:
  - name: azure
    from: azure:text-embedding-3-small
    params:
      endpoint: https://your-resource-name.openai.azure.com
      azure_api_version: 2024-08-01-preview
      azure_deployment_name: text-embedding-3-small
      azure_api_key: ${ secrets:SPICE_AZURE_API_KEY }
models:
  - name: azure
    from: azure:gpt-4o-mini
    params:
      endpoint: https://your-resource-name.openai.azure.com
      azure_api_version: 2024-08-01-preview
      azure_deployment_name: gpt-4o-mini
      azure_api_key: ${ secrets:SPICE_AZURE_TOKEN }

Accelerate subsets of columns: Spice now supports acceleration for specific columns from a federated source. Specify the desired columns directly in the Refresh SQL for more selective and efficient data acceleration.

Example spicepod.yaml configuration:

datasets:
  - from: s3://spiceai-demo-datasets/taxi_trips/2024/
    name: taxi_trips
    params:
      file_format: parquet
    acceleration:
      refresh_sql: SELECT tpep_pickup_datetime, tpep_dropoff_datetime, trip_distance, total_amount FROM taxi_trips

Breaking changes

Sharepoint Authentication Parameters: now use access tokens instead of authorization codes, using the sharepoint_bearer_token parameter. The sharepoint_auth_code parameter has been removed.

Data Connector Delimiters: now support / and ://, in addition to : in the from parameter of the dataset configuration. The following examples are equivalent:

from: postgres://my_postgres_table
from: postgres/my_postgres_table
from: postgres:my_postgres_table

Some data connectors, such as s3 which only accepts ://, place further restrictions on the allowed delimiter.

The file data connector has changed how it interprets the :// delimiter to reflect how most other URL parsers work, i.e. file://my_file_path. Previously, the file path was interpreted as /my_file_path. Now, it is interpreted as a relative path, i.e. my_file_path.

Spice Search limit: is now applied to the final search result, instead of previously being applied separately to each dataset involved in a search before aggregation.

Dependencies

Rust: Upgraded to 1.83

Contributors

@phillipleblanc
@ewgenius
@Jeadie
@sgrebnov
@peasee
@Sevenannn
@Advayp

New Contributors

@Advayp made their first contribution in https://github.com/spiceai/spiceai/pull/3862

What’s Changed

Fix install scripts to handle the RC release by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3718
Update helm chart to v1.0.0-rc.1 by @ewgenius in https://github.com/spiceai/spiceai/pull/3720
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/3719
Add logic to ignore task cancellations due to runtime shutdown by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3717
Update to next relese version v1.0.0-rc.2 by @ewgenius in https://github.com/spiceai/spiceai/pull/3721
Handle parsing OTel KeyValues from the baggage header by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3722
Update llms dependencies: mistralrs, async-openai by @Jeadie in https://github.com/spiceai/spiceai/pull/3725
Support jsonl for object store by @Jeadie in https://github.com/spiceai/spiceai/pull/3726
Fix NSQL models integration tests for HF by @sgrebnov in https://github.com/spiceai/spiceai/pull/3727
standardise ‘csv_schema_infer_max_records’ -> ‘schema_infer_max_records’; include deprecation messages for dataset params by @Jeadie in https://github.com/spiceai/spiceai/pull/3732
feat: Add script to generate TPC-H data for file connector by @peasee in https://github.com/spiceai/spiceai/pull/3737
feat: Add file connector integration test by @peasee in https://github.com/spiceai/spiceai/pull/3735
fix: Add explicit message for ODBC connector when not installed by @peasee in https://github.com/spiceai/spiceai/pull/3736
Remove Box::leak in create_accelerated_table by @sgrebnov in https://github.com/spiceai/spiceai/pull/3739
docs: Update enhancement and PR template by @peasee in https://github.com/spiceai/spiceai/pull/3740
feat: add file connector benchmark by @peasee in https://github.com/spiceai/spiceai/pull/3734
docs: Release file connector beta by @peasee in https://github.com/spiceai/spiceai/pull/3738
For embeddings, use sentence_*_config.json, download HF async, use TEI functions by @Jeadie in https://github.com/spiceai/spiceai/pull/3724
Optimize build & release workflow for trunk builds by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3741
Update benchmark snapshots by @github-actions in https://github.com/spiceai/spiceai/pull/3752
Skip Spice cloud integration tests by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3755
Add http_requests metric and deprecate http_requests_total by @sgrebnov in https://github.com/spiceai/spiceai/pull/3748
Update benchmark snapshots by @github-actions in https://github.com/spiceai/spiceai/pull/3759
fix: Parquet file generation script by @peasee in https://github.com/spiceai/spiceai/pull/3762
fix: Use InvalidConfiguration error for GraphQL query errors by @peasee in https://github.com/spiceai/spiceai/pull/3763
Extend Spice Search integration and E2E tests to cover chunking by @sgrebnov in https://github.com/spiceai/spiceai/pull/3750
test: Add GraphQL integration tests from external sources by @peasee in https://github.com/spiceai/spiceai/pull/3756
docs: Release GraphQL release candidate by @peasee in https://github.com/spiceai/spiceai/pull/3764
Accelerate a subset of columns from source dataset in Refresh SQL by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3765
Run TPCDS benchmark for databricks delta mode by @Sevenannn in https://github.com/spiceai/spiceai/pull/3751
Update dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3747
Implement vector search benchmark initialization by @sgrebnov in https://github.com/spiceai/spiceai/pull/3774
Implement InvalidTypeAction for PostgreSQL Data Connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3767
fix: Check ODBC parameters are positive integers by @peasee in https://github.com/spiceai/spiceai/pull/3777
Fix Delta DataType Map type mapping to arrow type by @Sevenannn in https://github.com/spiceai/spiceai/pull/3776
Update Databricks & Delta Lake Connector RC criteria by @Sevenannn in https://github.com/spiceai/spiceai/pull/3778
Add a /v1/packages/generate API to generate a Spicepod package from a GitHub repo. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3782
Set Spice-Target-Source header for spice add by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3783
Call v1 spicerack API by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3784
Run models integration tests on self-hosted macOS runners by @sgrebnov in https://github.com/spiceai/spiceai/pull/3785
Fix OpenAI models integration tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/3786
Integration test for Databricks delta_lake mode by @Sevenannn in https://github.com/spiceai/spiceai/pull/3779
Add spice connect for connecting to existing Spice.ai instances by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3790
Add eval spicepod component; basic HTTP api to run eval. by @Jeadie in https://github.com/spiceai/spiceai/pull/3766
Release RC for databricks delta_lake mode by @Sevenannn in https://github.com/spiceai/spiceai/pull/3792
Include Huggingface model to E2E models tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/3788
Enable trace_id & parent_span_id overrides for v1/chat/completion by @Jeadie in https://github.com/spiceai/spiceai/pull/3791
Search benchmark: run search workload and measure result by @sgrebnov in https://github.com/spiceai/spiceai/pull/3793
Search benchmark: measure search precision by @sgrebnov in https://github.com/spiceai/spiceai/pull/3804
Use MinIO instead of S3 for benchmark tests by @Sevenannn in https://github.com/spiceai/spiceai/pull/3794
Update benchmark snapshots by @github-actions in https://github.com/spiceai/spiceai/pull/3814
Only verify TPCH / TPCDS official query results for DuckDB by @Sevenannn in https://github.com/spiceai/spiceai/pull/3816
Fixes for the Debezium connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3819
Fix insert statement when all columns are constraint columns by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3820
docs: Move ODBC to Beta for current state of roadmap by @peasee in https://github.com/spiceai/spiceai/pull/3823
Accept :, / or :// as the delimiter for the data connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3821
Update dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3826
Enable read_write mode support for Postgres Data Connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/3813
feat: add Databricks ODBC TPCDS benchmark by @peasee in https://github.com/spiceai/spiceai/pull/3825
Change spice.ai data connector dataset path format to <org>/<app>/datasets/<table_reference> by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3828
fix: enable tpcds explain snapshotting by @peasee in https://github.com/spiceai/spiceai/pull/3830
Azure AI support for both LLMs & embedding models by @Jeadie in https://github.com/spiceai/spiceai/pull/3824
Add Github Workflow to run Search Benchmark by @sgrebnov in https://github.com/spiceai/spiceai/pull/3834
Fetch access token with Microsoft OAuth, and use access token to initiate Sharepoint data connector graph client by @Sevenannn in https://github.com/spiceai/spiceai/pull/3836
Initialize accelerator for datasets dynamically included by @Sevenannn in https://github.com/spiceai/spiceai/pull/3714
Update cargo.lock by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3838
feat: add MS SQL TPCH benchmark by @peasee in https://github.com/spiceai/spiceai/pull/3833
Improve Azure AI models support by @sgrebnov in https://github.com/spiceai/spiceai/pull/3835
Primary key support for Arrow’s Memtable by @Jeadie in https://github.com/spiceai/spiceai/pull/3829
Update Tokenizer to 0.21 and mistral.rs by @Jeadie in https://github.com/spiceai/spiceai/pull/3839
Fix models integration tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/3843
Enable spice login abfs by @Sevenannn in https://github.com/spiceai/spiceai/pull/3844
update crates/llms dependencies to ‘spiceai’ branch by @Jeadie in https://github.com/spiceai/spiceai/pull/3846
Make eval runs non-blocking; spice.eval.{results, runs} tables. by @Jeadie in https://github.com/spiceai/spiceai/pull/3780
fix: Update GraphQL snapshots by @peasee in https://github.com/spiceai/spiceai/pull/3849
Update to Rust 1.83 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3847
feat: add mssql integration test by @peasee in https://github.com/spiceai/spiceai/pull/3848
Prepend user-specified user agent in flight repl by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3850
fix: trim CHAR in mssql by @peasee in https://github.com/spiceai/spiceai/pull/3852
Fix column quoting for SpiceCloudPlatform dialect by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3857
Optimize builds by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3861
Endgame template: Add recently added AI/ML quickstarts and samples by @sgrebnov in https://github.com/spiceai/spiceai/pull/3859
docs: Release MS SQL Beta by @peasee in https://github.com/spiceai/spiceai/pull/3853
Fix nsql sampling for tables with embeddings by @sgrebnov in https://github.com/spiceai/spiceai/pull/3860
Make GH workflows with spiceai-macos runners more stable by @sgrebnov in https://github.com/spiceai/spiceai/pull/3863
fix: Remove GraphQL swapi test by @peasee in https://github.com/spiceai/spiceai/pull/3867
create 1 tokio::test per test/model by @Jeadie in https://github.com/spiceai/spiceai/pull/3696
handle max_completion_tokens vs max_tokens for openai vs azure by @Jeadie in https://github.com/spiceai/spiceai/pull/3869
Search benchmark: write results to dataset by @sgrebnov in https://github.com/spiceai/spiceai/pull/3871
Create evalconverter that creates spice eval components. by @Jeadie in https://github.com/spiceai/spiceai/pull/3864
Update quickstart in README.md by @ewgenius in https://github.com/spiceai/spiceai/pull/3876
Remove reference to spiceai-smart-demo from the repo home by @sgrebnov in https://github.com/spiceai/spiceai/pull/3885
Trace evals accelerated tables updates in debug mode by @sgrebnov in https://github.com/spiceai/spiceai/pull/3884
Clarify confusing log message by @Advayp in https://github.com/spiceai/spiceai/pull/3862
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/3840
Azure OpenAI models: make endpoint parameter required by @sgrebnov in https://github.com/spiceai/spiceai/pull/3883
Use spiceai delta kernel fork, actionable message for delta checkpoint errors by @Sevenannn in https://github.com/spiceai/spiceai/pull/3856
Add support for GGUF files in HF by @Jeadie in https://github.com/spiceai/spiceai/pull/3875

Full Changelog: https://github.com/spiceai/spiceai/compare/v1.0.0-rc.1...v1.0.0-rc.2

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v1.0-rc.1 (Nov 27, 2024)

By Spice AI (@spice_ai) | Wednesday, November 27, 2024

Announcing the release of Spice v1.0-rc.1 🚀

Spice v1.0.0-rc.1 marks the release candidate for the first major version of Spice.ai OSS. This milestone includes key Connector and Accelerator graduations and bug fixes, positioning Spice for a stable and production-ready release.

Highlights in v1.0-rc.1

API Key Authentication: Spice now supports optional authentication for API endpoints via configurable API keys, for additional security and control over runtime access.

Example Spicepod.yml configuration:

runtime:
  auth:
    api-key:
      enabled: true
      keys:
        - ${ secrets:api_key } # Load from a secret store
        - my-api-key # Or specify directly

Usage:

HTTP API: Include the API key in the X-API-Key header.
Flight SQL: Use the API key in the Authorization header as a Bearer token.
Spice CLI: Provide the --api-key flag for CLI commands.

For more details on using API Key auth, refer to the API Auth documentation.

DuckDB Data Connector: Has graduated from Beta to Release Candidate.

Arrow and DuckDB Data Accelerators: Both have graduated from Beta to Release Candidates.

Debezium Kafka Integration: Spice now supports secure authentication and encryption options for Kafka connections when using Debezium for Change Data Capture (CDC). The previous limitation of PLAINTEXT protocol-only connections has been lifted. Spice now supports the following Kafka security configurations:

Security protocol: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL
SASL mechanisms: PLAIN, SCRAM-SHA-256, SCRAM-SHA-512

Example Spicepod.yml configuration:

datasets:
  - from: debezium:my_kafka_topic_with_debezium_changes
    name: my_dataset
    params:
      kafka_security_protocol: SASL_SSL
      kafka_sasl_mechanism: SCRAM-SHA-512
      kafka_sasl_username: kafka
      kafka_sasl_password: ${secrets:kafka_sasl_password}
      kafka_ssl_ca_location: ./certs/kafka_ca_cert.pem

Breaking changes

Model Parameters: The params.spice_tools parameter has been replaced by params.tools. Backward compatibility is maintained for existing configurations using params.spice_tools.

Dataset Accelerator State: The ready_state parameter has been moved to the dataset level.

Ready Handler Response: The response body of the /v1/ready handler has been changed from Ready (uppercase) to ready (lowercase) for consistency and adherence to standards.

Default Kafka Security for Debezium: The default Kafka kafka_security_protocol parameter for Debezium datasets has changed from PLAINTEXT to SASL_SSL, improving security by default.

Metrics Name Updates: Adjustments have been made to specific metrics for improved observability and accuracy:

Before	v1.0-rc.1
catalogs_load_error	catalog_load_errors
catalogs_status	catalog_load_state
datasets_acceleration_append_duration_ms, datasets_acceleration_load_duration_ms	dataset_acceleration_refresh_duration_ms {mode: append/full}
datasets_acceleration_last_refresh_time	dataset_acceleration_last_refresh_time_ms
datasets_acceleration_refresh_error	dataset_acceleration_refresh_errors
datasets_count	dataset_active_count
datasets_load_error	dataset_load_errors
datasets_status	dataset_load_state
datasets_unavailable_time	dataset_unavailable_time_ms
embeddings_count	embeddings_active_count
embeddings_load_error	embeddings_load_errors
embeddings_status	embeddings_load_state
flight_do_action_duration_ms, flight_do_get_get_primary_keys_duration_ms, flight_do_get_get_catalogs_duration_ms, flight_do_get_get_schemas_duration_ms, flight_do_get_get_sql_info_duration_ms, flight_do_get_table_types_duration_ms, flight_do_get_get_tables_duration_ms, flight_do_get_prepared_statement_query_duration_ms, flight_do_get_simple_duration_ms, flight_do_get_statement_query_duration_ms, flight_do_put_duration_ms, flight_handshake_request_duration_ms, flight_list_actions_duration_ms, flight_get_flight_info_request_duration_ms	flight_request_duration_ms {method: method_name, command: command_name}
flight_do_action_requests, flight_do_exchange_data_updates_sent, flight_do_exchange_requests, flight_do_put_requests, flight_do_get_requests, flight_handshake_requests, flight_list_actions_requests, flight_list_flights_requests, flight_get_flight_info_requests, flight_get_schema_requests	flight_requests {method: method_name, command: command_name}
http_requests_duration_ms	http_request_duration_ms
models_count	model_active_count
models_load_duration_ms	model_load_duration_ms
models_load_error	model_load_errors
models_status	model_load_state
tool_count	tool_active_count
tool_load_error	tool_load_errors
tools_status	tool_load_state
query_count	query_executions
query_execution_duration	query_execution_duration_ms
results_cache_hit_count	results_cache_hits
results_cache_item_count	results_cache_items_count
results_cache_max_size	results_cache_max_size_bytes
results_cache_request_count	results_cache_requests
results_cache_size	results_cache_size_bytes
secrets_stores_load_duration_ms	secrets_store_load_duration_ms
bytes_processed	query_processed_bytes
bytes_returned	query_returned_bytes
spiced_runtime_flight_server_start	runtime_flight_server_started
spiced_runtime_http_server_start	runtime_http_server_started
views_load_error	view_load_errors

Contributors

@phillipleblanc
@sgrebnov
@Jeadie
@Sevenannn
@peasee
@slyons
@barracudarin
@lukekim
@ewgenius

What’s changed

Update to next release version by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3372
Update Helm chart to v0.20.0-beta by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3373
Upgrade dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3375
E2E: Add a test to confirm refreshing with custom refresh-sql via CLI by @sgrebnov in https://github.com/spiceai/spiceai/pull/3374
Fix regression in inferring embedding model vector size for non-default models by @Jeadie in https://github.com/spiceai/spiceai/pull/3376
add AI quickstarts to endgame by @Jeadie in https://github.com/spiceai/spiceai/pull/3378
Remove need for params.model_type for most HF LLMs by @Jeadie in https://github.com/spiceai/spiceai/pull/3342
Replace query_duration_seconds and http_requests_duration_seconds with milliseconds metrics by @sgrebnov in https://github.com/spiceai/spiceai/pull/3251
Add Extension<Runtime> to HTTP routes to simplify tooling in NSQL. by @Jeadie in https://github.com/spiceai/spiceai/pull/3384
Update datafusion patch by @Sevenannn in https://github.com/spiceai/spiceai/pull/3386
Ensure hyperparameters are obeyed in recursive chat/completion calls. by @Jeadie in https://github.com/spiceai/spiceai/pull/3395
fix: update odbc benchmarks by @peasee in https://github.com/spiceai/spiceai/pull/3394
Implement traits & plumbing for pluggable HTTP Auth by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3397
Add allow_http parameter for S3 data connector by @Sevenannn in https://github.com/spiceai/spiceai/pull/3398
Add column field to dataset spicepod component by @Jeadie in https://github.com/spiceai/spiceai/pull/3336
feat: add duckdb connector benchmarks by @peasee in https://github.com/spiceai/spiceai/pull/3403
Add integration tests for OpenAI NSQL functionality by @sgrebnov in https://github.com/spiceai/spiceai/pull/3402
Implement optional api-key auth for the HTTP endpoint by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3405
Add integration tests for Search API (OpenAI and HF models) by @sgrebnov in https://github.com/spiceai/spiceai/pull/3410
HTTP APIs: list tools, call tool by @Jeadie in https://github.com/spiceai/spiceai/pull/3404
Implement optional api-key auth for the Flight/FlightSQL endpoint by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3412
Adding semicolons to some TPCH queries to make sure they run on the CLI by @slyons in https://github.com/spiceai/spiceai/pull/3420
Add GrpcAuth to protect the OpenTelemetry endpoint by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3417
Support Kafka-native authentication and TLS connections for Debezium connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3419
Add integration tests for Embeddings API (OpenAI and HF models) by @sgrebnov in https://github.com/spiceai/spiceai/pull/3416
Support base64 embedding format by @Jeadie in https://github.com/spiceai/spiceai/pull/3418
Give local models some love by @Jeadie in https://github.com/spiceai/spiceai/pull/3425
Have views update on --pods-watcher-enabled by @Jeadie in https://github.com/spiceai/spiceai/pull/3428
Simplify running models integration tests locally by @sgrebnov in https://github.com/spiceai/spiceai/pull/3424
Make Debezium connector MySQL compatible by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3432
Store + load memory tooling, enable by @Jeadie in https://github.com/spiceai/spiceai/pull/3413
Statically compile OpenSSL by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3434
Build macOS x64 on macos-14 (Sonoma) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3435
Upgrade dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3443
Bump azure_core from 0.20.0 to 0.21.0 by @dependabot in https://github.com/spiceai/spiceai/pull/3436
Add integration tests for chat completion API (HF and OpenAI) by @sgrebnov in https://github.com/spiceai/spiceai/pull/3433
Run Clickbench with Spice Benchmark Binary by @Sevenannn in https://github.com/spiceai/spiceai/pull/3389
Use datatype_is_semantically_equal in verify_schema by @Sevenannn in https://github.com/spiceai/spiceai/pull/3423
Use spiceai-large-runners to build benchmark binary by @Sevenannn in https://github.com/spiceai/spiceai/pull/3446
Skip reqwest_retry::middleware tracing in non verbose configuration by @sgrebnov in https://github.com/spiceai/spiceai/pull/3445
feat: Add invalid type action handling for DuckDB by @peasee in https://github.com/spiceai/spiceai/pull/3430
Fix benchmark: Lock poisoning issue from INSTA by @Sevenannn in https://github.com/spiceai/spiceai/pull/3457
docs: Release DuckDB Connector RC by @peasee in https://github.com/spiceai/spiceai/pull/3459
DR: Code Pattern For Obtaining Milliseconds-Based Duration by @sgrebnov in https://github.com/spiceai/spiceai/pull/3460
Improve ClickBench setup script: avoid re-downloading test data every time by @sgrebnov in https://github.com/spiceai/spiceai/pull/3463
Fix TableReference quoting for MySQL by @Jeadie in https://github.com/spiceai/spiceai/pull/3461
Tool use and model name for local models by @Jeadie in https://github.com/spiceai/spiceai/pull/3458
params.tools, not params.spice_tools. Allow backwards compatibility to params.spice_tools. by @Jeadie in https://github.com/spiceai/spiceai/pull/3473
fix: Support DuckDB boolean list by @peasee in https://github.com/spiceai/spiceai/pull/3474
Upgrade to DataFusion 43 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3462
Build explicit ODBC Docker image by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3476
Promote Arrow acceleration to RC by @sgrebnov in https://github.com/spiceai/spiceai/pull/3478
Update benchmark workflow to create PR for updating snapshot by @Sevenannn in https://github.com/spiceai/spiceai/pull/3479
Update benchmark snapshots for spice.ai connector tpch by @github-actions in https://github.com/spiceai/spiceai/pull/3481
Update setup-make action by @Sevenannn in https://github.com/spiceai/spiceai/pull/3488
Option to return sql from v1/nsql by @Jeadie in https://github.com/spiceai/spiceai/pull/3487
Adding scripts to run and monitor TPC-H/-DS queries at larger scale factors by @slyons in https://github.com/spiceai/spiceai/pull/3483
Update Datafusion and Datafusion-Table-Providers patch by @Sevenannn in https://github.com/spiceai/spiceai/pull/3489
docs: Update Accelerator RC to specify clickbench in all modes by @peasee in https://github.com/spiceai/spiceai/pull/3490
Add logos and marks by @lukekim in https://github.com/spiceai/spiceai/pull/3485
Updates to repo docs by @lukekim in https://github.com/spiceai/spiceai/pull/3486
Change document_similarity to return markdown, not JSON. by @Jeadie in https://github.com/spiceai/spiceai/pull/3477
Add support for creating embeddings for Utf8View type columns by @sgrebnov in https://github.com/spiceai/spiceai/pull/3498
Add vector search support for Utf8View type columns by @sgrebnov in https://github.com/spiceai/spiceai/pull/3500
Update datafusion-table-providers version by @Jeadie in https://github.com/spiceai/spiceai/pull/3503
Update text-embeddings-inference and mistral.rs from downstream. by @Jeadie in https://github.com/spiceai/spiceai/pull/3505
Fix snapshot update PR push in benchmark by @Sevenannn in https://github.com/spiceai/spiceai/pull/3484
Run FederationAnalyzerRule before ResolveGroupingFunction rule by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3508
Update benchmark snapshots by @github-actions in https://github.com/spiceai/spiceai/pull/3509
docs: Release DuckDB accelerator RC by @peasee in https://github.com/spiceai/spiceai/pull/3512
Upgrade datafusion-functions-json to 0.43 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3511
Update Datafusion Table Provider patch to fix MySQL refresh append mode by @Sevenannn in https://github.com/spiceai/spiceai/pull/3514
Handle panics in HF API calls by @Jeadie in https://github.com/spiceai/spiceai/pull/3521
Update Runtime metrics according to metrics naming guidelines by @sgrebnov in https://github.com/spiceai/spiceai/pull/3518
Update Flight metrics according to metrics naming guidelines by @sgrebnov in https://github.com/spiceai/spiceai/pull/3515
Update Results Cache metrics according to metrics naming guidelines by @sgrebnov in https://github.com/spiceai/spiceai/pull/3520
Move ready_state to dataset level by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3526
Add --force option to spice upgrade to force it to upgrade to the latest released version by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3527
Refactor runtime initialization into separate modules by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3531
Update Anonymous telemetry metrics according to metrics naming guidelines by @sgrebnov in https://github.com/spiceai/spiceai/pull/3529
Add Metrics naming principles and guidelines by @sgrebnov in https://github.com/spiceai/spiceai/pull/3516
Update Dataset Acceleration metrics according to metrics naming guidelines by @sgrebnov in https://github.com/spiceai/spiceai/pull/3528
Improve localpod startup to register immediately after its parent is registered by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3532
AI/LLM integration tests: make tests more robust and verify more ai_tools by @sgrebnov in https://github.com/spiceai/spiceai/pull/3513
Update dashboards to match new metrics names by @sgrebnov in https://github.com/spiceai/spiceai/pull/3530
Clarify source of prefixes for data component parameters. by @Jeadie in https://github.com/spiceai/spiceai/pull/3541
Upgrade dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3564
Update Spice release process to support release branches by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3525
fix: Validate the endpoint for ABFS and S3 by @peasee in https://github.com/spiceai/spiceai/pull/3565
Vector Search: Default to datasets with embeddings only when none are specified by @sgrebnov in https://github.com/spiceai/spiceai/pull/3575
Lowercase the ready handler response by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3577
Update benchmark snapshots by @github-actions in https://github.com/spiceai/spiceai/pull/3579
Improve spice search error handling by @sgrebnov in https://github.com/spiceai/spiceai/pull/3571
Load components in parallel, not concurrently by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3566
fix: Make S3 auth parameter validation more robust: by @peasee in https://github.com/spiceai/spiceai/pull/3578
fix: Infer if the specified file format is correct in object store by @peasee in https://github.com/spiceai/spiceai/pull/3580
Add ability to configure CORS on the HTTP server by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3581
fix: Handle invalid S3 auth and region better by @peasee in https://github.com/spiceai/spiceai/pull/3582
allow setting of replicaCount to a falsy-value by @barracudarin in https://github.com/spiceai/spiceai/pull/3586
spice search to default to only datasets with embeddings by @sgrebnov in https://github.com/spiceai/spiceai/pull/3588
Run AI integration tests as part of CI by @sgrebnov in https://github.com/spiceai/spiceai/pull/3572
Load datasets in parallel by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3585
Run integration test on smaller runners by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3583
Use folders for model component by @Jeadie in https://github.com/spiceai/spiceai/pull/3584
Improve models integration tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/3592
Change default task_history captured_output to none by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3598
Add timeout to /v1/datasets APIs when app is locked by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3601
Properly drop the read lock on the runtime app in http.start by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3603
Make integration tests more robust on fewer cores by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3604
refactor: First pass data connector error messages update by @peasee in https://github.com/spiceai/spiceai/pull/3602
Add log if no datasets are configured by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3605
Upgrade to DuckDB 1.1.3 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3606
Add E2E test for spice search and chat functionality (OpenAI) by @sgrebnov in https://github.com/spiceai/spiceai/pull/3599
Use spiceai-runners for TPCH / TPCDS benchmark by @Sevenannn in https://github.com/spiceai/spiceai/pull/3507
docs: Update error handling guide by @peasee in https://github.com/spiceai/spiceai/pull/3611
Improve default description for sql tool by @Jeadie in https://github.com/spiceai/spiceai/pull/3612
Update metric name from query_invocations to query_executions by @sgrebnov in https://github.com/spiceai/spiceai/pull/3613
Don’t provide runtime tools to health check. by @Jeadie in https://github.com/spiceai/spiceai/pull/3615
Sort vector search results based on similarity score by @sgrebnov in https://github.com/spiceai/spiceai/pull/3620
Allow overriding runtime configuration with --set-runtime CLI flags by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3619
Some bugs by @Jeadie in https://github.com/spiceai/spiceai/pull/3621
Improve S3 errors by @Sevenannn in https://github.com/spiceai/spiceai/pull/3640
Update Databricks, Delta Lake, DuckDB error messages by @Sevenannn in https://github.com/spiceai/spiceai/pull/3642
docs: Add error message UX to beta connector criteria by @peasee in https://github.com/spiceai/spiceai/pull/3639
feat: Make REPL identify it’s waiting on a new line by @peasee in https://github.com/spiceai/spiceai/pull/3617
Wrap Server-Sent-Events chat errors as OpenAI error events by @sgrebnov in https://github.com/spiceai/spiceai/pull/3641
refactor: Update accelerated table errors, dataset health monitor errors by @peasee in https://github.com/spiceai/spiceai/pull/3614
Extend v1/datasets api to indicate if dataset can be used in vector search by @sgrebnov in https://github.com/spiceai/spiceai/pull/3644
feat: Unnest DataFusion errors by @peasee in https://github.com/spiceai/spiceai/pull/3646
feat: Add RateLimited DataConnectorError by @peasee in https://github.com/spiceai/spiceai/pull/3648
Setup nightly docker release workflow by @ewgenius in https://github.com/spiceai/spiceai/pull/3649
Make LLM integration tests more extensible. by @Jeadie in https://github.com/spiceai/spiceai/pull/3576
feat: Update ODBC error messages by @peasee in https://github.com/spiceai/spiceai/pull/3651
feat: Better tonic errors by @peasee in https://github.com/spiceai/spiceai/pull/3650
Nightly release workflow fixes by @ewgenius in https://github.com/spiceai/spiceai/pull/3652
Fix missing ARM64 image for nightly publish step by @ewgenius in https://github.com/spiceai/spiceai/pull/3653
Use GitHub GraphQL rate limiting responses to rate limit requests by @lukekim in https://github.com/spiceai/spiceai/pull/3610
Fix typo in nightly release publish step by @ewgenius in https://github.com/spiceai/spiceai/pull/3654
Handle GitHub rate-limiting for the Rest API by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3656
Adding custom User-Agent parameters to chat, nsql and flightrepl by @slyons in https://github.com/spiceai/spiceai/pull/3609
Remove “nightly-” prefix from tag by @ewgenius in https://github.com/spiceai/spiceai/pull/3671
Upgrade dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3670
spice search to warn if dataset is not ready and won’t be included in search by @sgrebnov in https://github.com/spiceai/spiceai/pull/3590
Fix keyring secret store to try both prefixed & unprefixed secrets by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3672
Handle empty embeds by allowing for nulls by @Jeadie in https://github.com/spiceai/spiceai/pull/3600
Improve github connector error by @Sevenannn in https://github.com/spiceai/spiceai/pull/3677
Update FlightSQL error messages by @sgrebnov in https://github.com/spiceai/spiceai/pull/3676
Update Datafusion Table Provider Patch to include error message improvements by @Sevenannn in https://github.com/spiceai/spiceai/pull/3678
Integration tests for llms crate, with basic Anthropic test. by @Jeadie in https://github.com/spiceai/spiceai/pull/3647
Allow E2E model tests to complete even if parallel platform tests failed by @sgrebnov in https://github.com/spiceai/spiceai/pull/3679
Add Openai to llms testing by @Jeadie in https://github.com/spiceai/spiceai/pull/3680
Fix .env in ‘.github/workflows/integration_llms.yml’ by @Jeadie in https://github.com/spiceai/spiceai/pull/3686
Improve error messages for spice ai connector, separate errors to different lines for DuckDB, Delta Lake, Databricks connector by @Sevenannn in https://github.com/spiceai/spiceai/pull/3643
Add microsoft/Phi-3-mini-4k-instruct to llms crate testing, with MODEL_SKIPLIST & MODEL_ALLOWLIST by @Jeadie in https://github.com/spiceai/spiceai/pull/3690
Add nightly label to spiced version in Cargo.toml by @ewgenius in https://github.com/spiceai/spiceai/pull/3691
Disable HF in models integration tests (not supported) by @sgrebnov in https://github.com/spiceai/spiceai/pull/3693
Add log when CORS is enabled by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3695
Fix nightly release workflow by @ewgenius in https://github.com/spiceai/spiceai/pull/3698
Correctly set nightly labels for both release and pre-release versions by @ewgenius in https://github.com/spiceai/spiceai/pull/3699
Improve REPL error handling for multiline error messages by @sgrebnov in https://github.com/spiceai/spiceai/pull/3692
Determine support_filter_pushdown based on Accelerator federated reader & ZeroResultsAction by @Sevenannn in https://github.com/spiceai/spiceai/pull/3694
Fix rdfkafak duplicated version by @Sevenannn in https://github.com/spiceai/spiceai/pull/3707
feat: Render multiline errors better in REPL by @peasee in https://github.com/spiceai/spiceai/pull/3701
refactor: Update UnableToAttachDataConnector error message by @peasee in https://github.com/spiceai/spiceai/pull/3706
refactor: Update errors for Alpha connectors by @peasee in https://github.com/spiceai/spiceai/pull/3705
Update benchmark snapshots by @github-actions in https://github.com/spiceai/spiceai/pull/3704
Implement a RequestContext that automatically propagates request details to metric dimensions by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3709
Fix acceleration in append mode with refresh_sql specified by @sgrebnov in https://github.com/spiceai/spiceai/pull/3697
Bump github.com/stretchr/testify from 1.9.0 to 1.10.0 by @dependabot in https://github.com/spiceai/spiceai/pull/3655
Tokenizer for OpenAI embedding models for accurate chunking by @Jeadie in https://github.com/spiceai/spiceai/pull/3519
Update error message when dataset isn’t configured with time_column in append refresh by @Sevenannn in https://github.com/spiceai/spiceai/pull/3703
Add the missing winver dependency in runtime crate by @Sevenannn in https://github.com/spiceai/spiceai/pull/3711
deps: Update table providers by @peasee in https://github.com/spiceai/spiceai/pull/3712
Add special tokens in chunk sizer by @Jeadie in https://github.com/spiceai/spiceai/pull/3713
Disable results cache for benchmark tests by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3715

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.20.0-beta...v1.0.0-rc.1

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.20-beta (Nov 4, 2024)

By Spice AI (@spice_ai) | Monday, November 04, 2024

Announcing the release of Spice v0.20-beta 🧩

Spice v0.20.0-beta improves federated query performance with column pruning and adds support for Metal (Apple Silicon) and CUDA (NVidia) accelerators. The S3, PostgreSQL, MySQL, and GitHub Data Connectors have graduated from Beta to Release Candidates. The Arrow, DuckDB, and SQLite Data Accelerators have graduated from Alpha to Beta.

Highlights in v0.20.0-beta

Data Connectors: The S3, PostgreSQL, MySQL, and GitHub Data Connectors have graduated from beta to release candidate.

Data Accelerators: The Arrow, DuckDB, and SQLite Data Accelerators have graduated from alpha to beta.

Metal and CUDA Support: Added support for Metal (Apple Silicon) and CUDA (NVidia) for AI/ML workloads including embeddings and local LLM inference.

For instructions on compiling a Meta or CUDA binary, see the Installation Docs.

Breaking Changes

The ODBC Data Connector now requires ODBC drivers specified in connection strings are registered in the system ODBC driver manager.

Example invalid connection string:

DRIVER={/path/to/driver.so};SERVER=localhost;DATABASE=master

Example valid connection string:

DRIVER={My ODBC Driver};SERVER=localhost;DATABASE=master

Where My ODBC Driver is the name of an ODBC driver registered in the ODBC driver manager.

Contributors

@ewgenius
@peasee
@phillipleblanc
@sgrebnov
@Jeadie
@barracudarin
@Sevenannn

What’s Changed

Update Helm for v0.19.4-beta and add release notes by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3310
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/3311
metal & cuda flags for spice by @Jeadie in https://github.com/spiceai/spiceai/pull/3212
Promote postgres connector to RC quality by @Sevenannn in https://github.com/spiceai/spiceai/pull/3305
docs: Update ROADMAP.md by @peasee in https://github.com/spiceai/spiceai/pull/3322
feat: Enable federation for in-memory accelerators by @peasee in https://github.com/spiceai/spiceai/pull/3325
fix: Only allow env files from the current dir by @peasee in https://github.com/spiceai/spiceai/pull/3327
Always read TimezoneTZ from PostgreSQL as UTC by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3330
For multi-sink acceleration refreshes, ensure parent table completes before the children. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3329
Update TPC-DS Q49 (Decimal to Float) to match SQLite’s type system by @sgrebnov in https://github.com/spiceai/spiceai/pull/3323
Enable parquet pushdown in Spice by @Sevenannn in https://github.com/spiceai/spiceai/pull/3245
Use spice object_store fork to fix S3 ambiguous error by @Sevenannn in https://github.com/spiceai/spiceai/pull/3304
Don’t mix commented out queries for s3 connectors and accelerators by @Sevenannn in https://github.com/spiceai/spiceai/pull/3331
Allow only valid WHERE conditions in vector searches by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3335
fix: Allow only ODBC profiles by @peasee in https://github.com/spiceai/spiceai/pull/3324
Track how many times an acceleration falls back during initialization by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3339
Anthropic model regex and fix tool parsing aggregation bug by @Jeadie in https://github.com/spiceai/spiceai/pull/3334
Upgrade runtime along with CLI on spice upgrade by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3341
Update upcoming Roadmap by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3343
fix: Prevent acceleration files outside of working directory by @peasee in https://github.com/spiceai/spiceai/pull/3340
Document S3 connector limitations by @Sevenannn in https://github.com/spiceai/spiceai/pull/3333
Update Object Store Patch by @Sevenannn in https://github.com/spiceai/spiceai/pull/3361
Promote SQLite Data Accelerator to Beta by @sgrebnov in https://github.com/spiceai/spiceai/pull/3365
Promote S3 connector to RC quality by @Sevenannn in https://github.com/spiceai/spiceai/pull/3362
Revert “fix: Only allow env files from the current dir” by @peasee in https://github.com/spiceai/spiceai/pull/3368
docs: Fix typo for S3 release status in README.md by @peasee in https://github.com/spiceai/spiceai/pull/3370
Include unnecessary columns pruning step during federated plan creation by @sgrebnov in https://github.com/spiceai/spiceai/pull/3363

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.19.4-beta...v0.20.0-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.19.4-beta (Oct 30, 2024)

By Spice AI (@spice_ai) | Wednesday, October 30, 2024

Announcing the release of Spice v0.19.4-beta ☘️

Spice v0.19.4-beta introduces a new localpod Data Connector, improvements to accelerator resiliency and control, and a new configuration to control when accelerated datasets are considered ready.

Highlights in v0.19.4

localpod Connector: Implement a “tiered” acceleration strategy with a new localpod Data Connector that can be used to accelerate datasets from other datasets registered in Spice.

datasets:
  - from: s3://my_bucket/my_dataset
    name: my_dataset
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      refresh_check_interval: 60s
  - from: localpod:my_dataset
    name: my_localpod_dataset
    acceleration:
      enabled: true

Refreshes on the localpod’s parent dataset will automatically be synchronized with the localpod dataset.

Improved Accelerator Resiliency: When Spice is restarted, if the federated source for a dataset configured with a file-based accelerator is not available, the dataset will still load from the existing file data and will attempt to connect to the federated source in the background for future refreshes.

Accelerator Ready State: Control when an accelerated dataset is considered “ready” by the runtime with the new ready_state parameter.

datasets:
  - from: s3://my_bucket/my_dataset
    name: my_dataset
    acceleration:
      enabled: true
      ready_state: on_load # or on_registration

ready_state: on_load: Default. The dataset is considered ready after the initial load of the accelerated data. For file-based accelerated datasets that have existing data, this means the dataset is ready immediately.
ready_state: on_registration: The dataset is considered ready when the dataset is registered in Spice. Queries against this dataset before the data is loaded will fallback to the federated source.

Breaking changes

Accelerated datasets configured with ready_state: on_load (the default behavior) that are not ready will return an error instead of returning zero results.

Contributors

@Sevenannn
@peasee
@phillipleblanc
@sgrebnov
@barracudarin
@Jeadie
@ewgenius

What’s Changed

Update helm for v0.19.3-beta by @ewgenius in https://github.com/spiceai/spiceai/pull/3274
docs: Mark GitHub as Beta in README.md by @peasee in https://github.com/spiceai/spiceai/pull/3272
Fix docker publish by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3273
Add SQLite TPC-DS Limitations: ROLLUP and GROUPING by @sgrebnov in https://github.com/spiceai/spiceai/pull/3277
Update version to 1.0.0-rc.1 by @sgrebnov in https://github.com/spiceai/spiceai/pull/3276
Synchronize localpod acceleration with parent acceleration refreshes by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3264
feat: Update Datafusion, promote DuckDB and MySQL by @peasee in https://github.com/spiceai/spiceai/pull/3278
Add SQLite TPC-DS Limitations: stddev by @sgrebnov in https://github.com/spiceai/spiceai/pull/3279
fix indentation issue with service annotations by @barracudarin in https://github.com/spiceai/spiceai/pull/3281
fix: Expose GitHub ratelimit errors by @peasee in https://github.com/spiceai/spiceai/pull/3258
Revert Datafusion parquet changes by @Sevenannn in https://github.com/spiceai/spiceai/pull/3286
Promote arrow accelerator to beta by @Sevenannn in https://github.com/spiceai/spiceai/pull/3287
Add SQLite TPC-DS Limitations: casting to DECIMAL by @sgrebnov in https://github.com/spiceai/spiceai/pull/3282
Accelerated datasets can fallback to federated source while loading by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3280
Enable overlap_size correctly by @Jeadie in https://github.com/spiceai/spiceai/pull/3229
Avoid duplicated filter conditions in rewritten SQL by @Sevenannn in https://github.com/spiceai/spiceai/pull/3284
Fix SQLite records conversion with NULL in first row by @sgrebnov in https://github.com/spiceai/spiceai/pull/3295
fix: Update datafusion by @peasee in https://github.com/spiceai/spiceai/pull/3297
Display shorter name for benchmark workflow matrix by @Sevenannn in https://github.com/spiceai/spiceai/pull/3299
Update spice_sys_dataset_checkpoint to store federated table schema by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3303
Update postgres connector/accelerator snapshot by @Sevenannn in https://github.com/spiceai/spiceai/pull/3298
Accelerated tables with existing file data can load without a connection to the federated source by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3306
Ensure synchronized tables complete their insertion at the same time by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3307

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.19.3-beta...v0.19.4-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.19.3-beta (Oct 28, 2024)

By Spice AI (@spice_ai) | Monday, October 28, 2024

Announcing the release of Spice v0.19.3-beta 📈

Spice v0.19.3-beta improves the performance and stability of data connectors and accelerators, including faster queries across multiple federated sources by optimizing how filters are applied. Anthropic has also been added as a LLM model provider.

Highlights in v0.19.3

DataFusion Fixes: Resolved bugs in DataFusion and DataFusion Table Providers, expanding TPC-DS coverage and correctness.

GitHub Data Connector Beta Milestone: The GitHub Data Connector has graduated to Beta after extensive testing, stability, and performance improvements.

Anthropic Models Provider: Anthropic has been added as an LLM provider, including support for streaming.

Example spicepod.yml:

models:
  - from: anthropic:claude-3-5-sonnet-20240620
    name: claude_3_5_sonnet
    params:
      anthropic_api_key: ${ secrets:SPICE_ANTHROPIC_API_KEY }

Breaking changes

None.

Contributors

@Jeadie
@Sevenannn
@phillipleblanc
@peasee
@sgrebnov
@nlamirault
@barracudarin
@lukekim
@slyons

New Contributors

@nlamirault made their first contribution in https://github.com/spiceai/spiceai/pull/3207
@barracudarin made their first contribution in https://github.com/spiceai/spiceai/pull/3228

What’s Changed

Make Anthropic OpenAI compatible. by @Jeadie in https://github.com/spiceai/spiceai/pull/3087
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/3200
Bump version to 1.0.0-rc.1 by @Sevenannn in https://github.com/spiceai/spiceai/pull/3202
Fix clickhouse schema inference for non-default database by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3201
Update endgame template by @Sevenannn in https://github.com/spiceai/spiceai/pull/3198
Upgrade dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3197
fix: dataset refresh defaults properties to None by @peasee in https://github.com/spiceai/spiceai/pull/3205
Upgrade OTEL to v0.26 and make seconds based metrics reported precisely by @sgrebnov in https://github.com/spiceai/spiceai/pull/3203
use text_embedding_inference::Infer for more complete embedding solution by @Jeadie in https://github.com/spiceai/spiceai/pull/3199
Add S3 parquet file - arrow accelerator e2e test by @Sevenannn in https://github.com/spiceai/spiceai/pull/3154
feat: Add script to setup clickbench on mysql by @peasee in https://github.com/spiceai/spiceai/pull/3176
Update helm chart version to v0.19.2 by @Sevenannn in https://github.com/spiceai/spiceai/pull/3210
Add sample dataset option in v1/nsql. by @Jeadie in https://github.com/spiceai/spiceai/pull/3105
Split spiced_docker build across architectures by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3206
feat(helm): do not install demo dataset by default by @nlamirault in https://github.com/spiceai/spiceai/pull/3207
Split integration test across build/run steps by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3215
feat(helm): Refactoring Kubernetes labels by @nlamirault in https://github.com/spiceai/spiceai/pull/3208
Define ’tool_recursion_limit’ for LLMs, and limit internal tool calling recursion. by @Jeadie in https://github.com/spiceai/spiceai/pull/3214
Improve filters pushdown for federated queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/3183
Implement native schema inference for PostgreSQL by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3209
docs: Update release criteria by @peasee in https://github.com/spiceai/spiceai/pull/3219
Run SQLite acceleration TPC-DS tests using smaller scale by @sgrebnov in https://github.com/spiceai/spiceai/pull/3227
bind the serviceAccount if a name is given or if we’re creating one by @barracudarin in https://github.com/spiceai/spiceai/pull/3228
Only emit channel send error log when its not a closed channel error by @Jeadie in https://github.com/spiceai/spiceai/pull/3230
Enable Parquet Exec filter pushdown in Spice by @Sevenannn in https://github.com/spiceai/spiceai/pull/3216
Add snapshots for SQLite TPC-DS benchmark (file mode) by @sgrebnov in https://github.com/spiceai/spiceai/pull/3234
docs: Add SDK release checks to endgame by @peasee in https://github.com/spiceai/spiceai/pull/3256
Implement localpod Data Connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3249
Revert “Enable Parquet Exec filter pushdown in Spice (#3216)” by @Sevenannn in https://github.com/spiceai/spiceai/pull/3244
refactor: Use existing action for detecting changes by @peasee in https://github.com/spiceai/spiceai/pull/3255
feat: Add GitHub integration test by @peasee in https://github.com/spiceai/spiceai/pull/3226
Add get_readiness tool to retrieve status of all registered components by @lukekim in https://github.com/spiceai/spiceai/pull/3035
Improve CLI error output when REPL can’t connect to the Flight endpoint by @slyons in https://github.com/spiceai/spiceai/pull/3188
Fixing FTP link in Endgame by @slyons in https://github.com/spiceai/spiceai/pull/3267
Update version to 0.19.3-beta by @sgrebnov in https://github.com/spiceai/spiceai/pull/3269
add service type and annotation customizations in https://github.com/spiceai/spiceai/pull/3268

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.19.2-beta...v0.19.3-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.19.2-beta (Oct 21, 2024)

By Spice AI (@spice_ai) | Monday, October 21, 2024

Announcing the release of Spice v0.19.2-beta ⚡

Spice v0.19.2-beta continues to improve performance and stability of data connectors and data accelerators, further expands TPC-DS coverage, and includes several bug fixes.

Highlights in v0.19.2

DataFusion Fixes: Resolved bugs in DataFusion and DataFusion Table Providers, improving TPC-DS query support and correctness.

TPC-DS Snapshots: Extended support for TPC-DS benchmarks with added snapshot tests for validating query plans and result accuracy.

PostgreSQL Accelerator Beta: Postgres Data Accelerator has been promoted to Beta Quality

Breaking changes

The hive_infer_partitions parameter been changed to hive_partitioning_enabled, now defaults to false and must be explicitly enabled.

Contributors

@ewgenius
@sgrebnov
@slyons
@Jeadie
@Sevenannn
@phillipleblanc
@dependabot
@peasee

Dependencies

DataFusion Table Providers: Upgraded to rev 2bcf481b4abe9d0bd6bb2479ce49020df66ff97f.
duckdb-rs: Upgraded from 1.0.0 to 1.1.1.

What’s Changed

Update Helm chart for v0.19.1-beta by @ewgenius in https://github.com/spiceai/spiceai/pull/3106
Add more TPC-DS snapshots for Postgres acceleration by @sgrebnov in https://github.com/spiceai/spiceai/pull/3107
Bumping version to 1.0.0-rc.1 by @slyons in https://github.com/spiceai/spiceai/pull/3109
New table sampling methods: sample_distinct_columns, random_sample, top_n_sample by @Jeadie in https://github.com/spiceai/spiceai/pull/3108
Add TPCDS snapshot tests for file-based and in-mem duckdb by @Sevenannn in https://github.com/spiceai/spiceai/pull/3115
Add Postgres acceleration E2E test for MySQL by @sgrebnov in https://github.com/spiceai/spiceai/pull/3110
Update datafusion logical plan to avoid wrong group_by columns in aggregation by @Sevenannn in https://github.com/spiceai/spiceai/pull/3111
Warn if user tries to embed column that does not exist by @Jeadie in https://github.com/spiceai/spiceai/pull/3120
Changes for Rust version upgrade by @Sevenannn in https://github.com/spiceai/spiceai/pull/3134
Add unnest support for federated plans by @sgrebnov in https://github.com/spiceai/spiceai/pull/3133
Don’t .clone() unnecessarily by @Jeadie in https://github.com/spiceai/spiceai/pull/3128
Fix Flight get_schema to construct logical plan and return that schema. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3131
Bump clap from 4.5.19 to 4.5.20 by @dependabot in https://github.com/spiceai/spiceai/pull/3099
Add GitHub Workflow to build spice-postgres-tpcds-bench image by @sgrebnov in https://github.com/spiceai/spiceai/pull/3140
test: Add basic MySQL integration test by @peasee in https://github.com/spiceai/spiceai/pull/3143
Bump datafusion-federation and datafusion-table-providers crates by @sgrebnov in https://github.com/spiceai/spiceai/pull/3148
docs: Add MySQL limitation for division by zero by @peasee in https://github.com/spiceai/spiceai/pull/3144
fix: Dataset refresh by @peasee in https://github.com/spiceai/spiceai/pull/3147
Update arrow, duckdb, postgres accelerator tpcds snapshots by @Sevenannn in https://github.com/spiceai/spiceai/pull/3145
Add TPC-DS benchmarks for Postgres data connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/3149
Update E2E test ci to include tests for accelerating Postgres into accelerators by @Sevenannn in https://github.com/spiceai/spiceai/pull/3137
Add TPCDS Benchmark test and snapshots for S3 by @Sevenannn in https://github.com/spiceai/spiceai/pull/3152
[cli] Include 200 in acceptable response codes for doRuntimeApiRequest by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3157
Use -build.{GIT_SHA} for unreleased versions by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3159
Upgrade to Rust 1.82 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3158
Disable hive_infer_partitions by default by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3160
Upgrade to DuckDB 1.1.1 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3161
feat: Add MySQL TPCDS results snapshots and exclude workarounds by @peasee in https://github.com/spiceai/spiceai/pull/3165
Fix task_history output for sql, add output to table_schema & list_datasets tool by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3166
feat: Add ClickBench queries as separate files by @peasee in https://github.com/spiceai/spiceai/pull/3169
Calculate embeddings in a separate blocking thread by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3170
docs: Update ROADMAP.md and release criterias by @peasee in https://github.com/spiceai/spiceai/pull/3124
Handle OpenTelemetry errors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3173
Update version to 0.19.2-beta by @Sevenannn in https://github.com/spiceai/spiceai/pull/3182

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.19.1-beta...v0.19.2-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.19.1-beta (Oct 14, 2024)

By Spice AI (@spice_ai) | Monday, October 14, 2024

Announcing the release of Spice v0.19.1-beta 🔥

Spice v0.19.1 brings further performance and stability improvements to data connectors, including improved query push-down for file-based connectors (s3, abfs, file, ftp, sftp) that use Hive-style partitioning.

Highlights in v0.19.1

TPC-H and TPC-DS Coverage: Expanded coverage for TPC-H and TPC-DS benchmarking suites across accelerators and connectors.

GitHub Connector Array Filter: The GitHub connector now supports filter push down for the array_contains function in SQL queries using search query mode.

NSQL CLI Command: A new spice nsql CLI command has been added to easily query datasets with natural language from the command line.

Breaking changes

None

Contributors

@peasee
@Sevenannn
@sgrebnov
@karifabri
@phillipleblanc
@lukekim
@Jeadie
@slyons

Dependencies

DataFusion Table Providers: Upgraded to rev f22b96601891856e02a73d482cca4f6100137df8.

What’s Changed

release: Update helm chart for v0.19.0-beta by @peasee in https://github.com/spiceai/spiceai/pull/3024
Set fail-fast = true for benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/2997
release: Update next version and ROADMAP by @peasee in https://github.com/spiceai/spiceai/pull/3033
Verify TPCH benchmark query results for Spark connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2993
feat: Add x-spice-user-agent header to Spice REPL by @peasee in https://github.com/spiceai/spiceai/pull/2979
Update to object store file formats documentation link by @karifabri in https://github.com/spiceai/spiceai/pull/3036
Use teraswitch-runners for Linux x64 workflows + builds by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3042
feat: Support array contains in GitHub pushdown by @peasee in https://github.com/spiceai/spiceai/pull/2983
Bump text-splitter from 0.16.1 to 0.17.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2987
Revert integration tests back to hosted runner by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3046
Tune Github runner resources to allow in memory TPCDS benchmark to run by @Sevenannn in https://github.com/spiceai/spiceai/pull/3025
fix: add winver by @peasee in https://github.com/spiceai/spiceai/pull/3054
refactor: Use is modifier for checking GitHub state filter by @peasee in https://github.com/spiceai/spiceai/pull/3056
Enable merge_group checks for PR workflows by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3058
Fix issues with merge group by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3059
Validate in-memory arrow accelertion TPCDS result correctness by @Sevenannn in https://github.com/spiceai/spiceai/pull/3044
Fix rev parsing for PR checks by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3060
Use ‘Accept’ header for /v1/sql/ and /v1/nsql by @Jeadie in https://github.com/spiceai/spiceai/pull/3032
Verify Postgres acceleration TPCDS result correctness by @Sevenannn in https://github.com/spiceai/spiceai/pull/3043
Add NSQL CLI REPL command by @lukekim in https://github.com/spiceai/spiceai/pull/2856
Preserve query results order and add TPCH benchmark results verification for duckdb:file mode by @sgrebnov in https://github.com/spiceai/spiceai/pull/3034
Refactor benchmark to include MySQL tpcds bench, tweaks to makefile target for generating mysql tpcds data by @Sevenannn in https://github.com/spiceai/spiceai/pull/2967
Support runtime parameter for sql_query_keep_partition_by_columns & enable by default by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3065
Document TPC-DS limitations: EXCEPT, INTERSECT, duplicate names by @sgrebnov in https://github.com/spiceai/spiceai/pull/3069
Adding ABFS benchmark by @slyons in https://github.com/spiceai/spiceai/pull/3062
Add support for GitHub app installation auth for GitHub connector by @ewgenius in https://github.com/spiceai/spiceai/pull/3063
docs: Document stack overflow workaround, add helper script by @peasee in https://github.com/spiceai/spiceai/pull/3070
Tune MySQL TPCDS image to allow for successful benchmark test run by @Sevenannn in https://github.com/spiceai/spiceai/pull/3067
Automatically infer partitions for hive-style partitioned files for object store based connectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3073
Support hf_token from params/secrets by @Jeadie in https://github.com/spiceai/spiceai/pull/3071
Inherit embedding columns from source, when available. by @Jeadie in https://github.com/spiceai/spiceai/pull/3045
Validate identifiers for component names by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3079
docs: Add workaround for TPC-DS Q97 in MySQL by @peasee in https://github.com/spiceai/spiceai/pull/3080
Document TPC-DS Postgres column alias in a CASE statement limitation by @sgrebnov in https://github.com/spiceai/spiceai/pull/3083
Update plan snapshots for TPC-H bench queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/3088
Update Datafusion crate to include recent unparsing fixes by @sgrebnov in https://github.com/spiceai/spiceai/pull/3089
Sample SQL table data tool and API by @Jeadie in https://github.com/spiceai/spiceai/pull/3081
chore: Update datafusion-table-providers by @peasee in https://github.com/spiceai/spiceai/pull/3090
Add hive_infer_partitions to remaining object store connectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3086
deps: Update datafusion-table-providers by @peasee in https://github.com/spiceai/spiceai/pull/3093
For local embedding models, return usage input tokens. by @Jeadie in https://github.com/spiceai/spiceai/pull/3095
Update end_game.md with Accelerator/Connector criteria check by @slyons in https://github.com/spiceai/spiceai/pull/3092
Update TPC-DS Q90 by @sgrebnov in https://github.com/spiceai/spiceai/pull/3094
docs: Add RC connector criteria by @peasee in https://github.com/spiceai/spiceai/pull/3026
Update version to 0.19.1-beta by @sgrebnov in https://github.com/spiceai/spiceai/pull/3101

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.19.0-beta...v0.19.1-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.19-beta (Oct 7, 2024)

By Spice AI (@spice_ai) | Monday, October 07, 2024

Announcing the release of Spice v0.19-beta 📦

Spice v0.19.0-beta brings performance improvements for accelerators and expanded TPC-DS coverage. A new Azure Blob Storage data connector has also been added.

Highlights in v0.19.0-beta

Improved TPC-DS Coverage: Enhanced support for TPC-DS derived queries.

CLI SQL REPL: The CLI SQL REPL (spice sql) now supports multi-line editing and tab indentation. Note, a terminating semi-colon ‘;’ is now required for each executed SQL block.

Azure Storage Data Connector: A new Azure Blob Storage data connector (abfs://) has been added, enabling federated SQL queries on files stored in Azure Blob-compatible endpoints, including Azure BlobFS (abfss://) and Azure Data Lake (adl://). Supported file formats can be specified using the file_format parameter.

Example spicepod.yml:

datasets:
  - from: abfs://foocontainer/taxi_sample.csv
    name: azure_test
    params:
      azure_account: spiceadls
      azure_access_key: abc123==
      file_format: csv

For a full list of supported files, see the Object Store File Formats documentation.

For more details, see the Azure Blob Storage Data Connector documentation.

Breaking Changes

Spice.ai Data Connector: The key for the Spice.ai Cloud Platform Data Connector has changed from spiceai to spice.ai. To upgrade, change uses of from: spiceai: to from: spice.ai:.
GitHub Data Connector: Pull Requests column login has been renamed to author.
CLI SQL REPL: A terminating semi-colon ‘;’ is now required for each executed SQL block.
Spicepod Hot-Reload: When running spiced directly, hot-reload of spicepod.yml configuration is now disabled. Run with spice run to use hot-reload.

Contributors

@sgrebnov
@Jeadie
@Sevenannn
@peasee
@ewgenius
@slyons
@phillipleblanc
@lukekim

Dependencies

DataFusion Table Providers: Upgraded to rev 826814ab149aad8ee668454c83a0650fb8b18d60.

What’s Changed

Bump tonic from 0.12.2 to 0.12.3 by @dependabot in https://github.com/spiceai/spiceai/pull/2880
Verify benchmark query results using snapshot testing (s3 connector) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2902
Fix paths-ignore: by @Jeadie in https://github.com/spiceai/spiceai/pull/2906
Rename spiceai data connector to spice.ai by @sgrebnov in https://github.com/spiceai/spiceai/pull/2899
Update ROADMAP.md by @Jeadie in https://github.com/spiceai/spiceai/pull/2907
Helm update for helm for 0.18.3-beta by @Jeadie in https://github.com/spiceai/spiceai/pull/2910
Add tpcds queries by @Sevenannn in https://github.com/spiceai/spiceai/pull/2918
Fix paths-ignore for docs. by @Jeadie in https://github.com/spiceai/spiceai/pull/2911
feat: Support LIKE expressions in GitHub filter pushdown by @peasee in https://github.com/spiceai/spiceai/pull/2903
feat: Support date comparison pushdown in GitHub connector by @peasee in https://github.com/spiceai/spiceai/pull/2904
Improve aggregation and union queries unparsing by @sgrebnov in https://github.com/spiceai/spiceai/pull/2925
Initialize file based accelerators on dataset reload by @Sevenannn in https://github.com/spiceai/spiceai/pull/2923
Update spiceai/spiceai for next release by @Jeadie in https://github.com/spiceai/spiceai/pull/2928
Verify TPC-H benchmark query results for arrow acceleration by @sgrebnov in https://github.com/spiceai/spiceai/pull/2927
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/2912
Use structured output for NSQL by @Jeadie in https://github.com/spiceai/spiceai/pull/2922
Update TPC-DS queries to use supported date addition format by @sgrebnov in https://github.com/spiceai/spiceai/pull/2930
Add busy_timeout accelerator param for Sqlite by @Sevenannn in https://github.com/spiceai/spiceai/pull/2855
Use Cosine Similarity in vector search by @Jeadie in https://github.com/spiceai/spiceai/pull/2932
Add support for passing x-spiceai-app-id metadata in spiceai data connector by @ewgenius in https://github.com/spiceai/spiceai/pull/2934
docs: update beta accelerator criteria by @peasee in https://github.com/spiceai/spiceai/pull/2905
Azure Connector implementation by @slyons in https://github.com/spiceai/spiceai/pull/2926
Local embedding model from relative paths by @Jeadie in https://github.com/spiceai/spiceai/pull/2908
Add Markdown aware chunker when params.file_format: md. by @Jeadie in https://github.com/spiceai/spiceai/pull/2943
‘spice version’ without structured logging by @Jeadie in https://github.com/spiceai/spiceai/pull/2944
Bump tempfile from 3.12.0 to 3.13.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2878
feat: GraphQL commit query parameters by @peasee in https://github.com/spiceai/spiceai/pull/2945
Update OpenAI client and use new request fields by @Jeadie in https://github.com/spiceai/spiceai/pull/2951
refactor: Rename GitHub pulls login to author by @peasee in https://github.com/spiceai/spiceai/pull/2954
Run tpcds benchmarks for accelerators by @Sevenannn in https://github.com/spiceai/spiceai/pull/2853
Add spiced arg --pods-watcher-enabled. Watcher disabled by default for spiced. by @ewgenius in https://github.com/spiceai/spiceai/pull/2953
Add error message when spicepod has embeddings or models without ‘–features models’ by @Jeadie in https://github.com/spiceai/spiceai/pull/2952
Adding multi-line editing and tab indentation to sql REPL by @slyons in https://github.com/spiceai/spiceai/pull/2949
Update MySQL ghcr image to include tpcds data by @Sevenannn in https://github.com/spiceai/spiceai/pull/2941
Document DataFusion limitation: The context only support single SQL Statement, Date Arithmetic like date + 3 not supported by @Sevenannn in https://github.com/spiceai/spiceai/pull/2970
Bump snafu from 0.8.4 to 0.8.5 by @dependabot in https://github.com/spiceai/spiceai/pull/2876
Bump async-trait from 0.1.82 to 0.1.83 by @dependabot in https://github.com/spiceai/spiceai/pull/2879
Bump async-graphql from 7.0.9 to 7.0.11 in the cargo group by @dependabot in https://github.com/spiceai/spiceai/pull/2950
Verify TPC-H benchmark query results for MySQL by @sgrebnov in https://github.com/spiceai/spiceai/pull/2972
Verify TPCH benchmark query results for Postgres by @sgrebnov in https://github.com/spiceai/spiceai/pull/2973
Verify TPCH benchmark query results for sqlite acceleration by @sgrebnov in https://github.com/spiceai/spiceai/pull/2974
Verify TPCH benchmark query results for duckdb (in-memory) acceleration by @sgrebnov in https://github.com/spiceai/spiceai/pull/2975
Support for mdx file extensions to apply a markdown splitter by @ewgenius in https://github.com/spiceai/spiceai/pull/2977
Don’t assume first vector or content will be non-null/zero by @Jeadie in https://github.com/spiceai/spiceai/pull/2940
use custom chunk sizers for HF, local and OpenAI models by @Jeadie in https://github.com/spiceai/spiceai/pull/2971
Ensure we return N unique documents, not N unique chunks by @Jeadie in https://github.com/spiceai/spiceai/pull/2976
Fix issues parsing messages[*].tool_calls for local models by @Jeadie in https://github.com/spiceai/spiceai/pull/2957
text -> SQL trait to customise per model. by @Jeadie in https://github.com/spiceai/spiceai/pull/2942
Remove system message from ToolUsingChat. by @Jeadie in https://github.com/spiceai/spiceai/pull/2978
Make logical plan to sql more robust (improve ORDER BY; support round for Postgres) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2984
Add connection_pool_size parameter for Postgres accelerator by @Sevenannn in https://github.com/spiceai/spiceai/pull/2969
Fix dataset configure prompt by @sgrebnov in https://github.com/spiceai/spiceai/pull/2991
Verify TPCH benchmark query results for Databricks(odbc) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2989
Verify TPCH benchmark query results for Databricks (delta_lake) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2982
Set log level for anonymous telemetry traces to trace by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2995
Improvements to issue templates by @lukekim in https://github.com/spiceai/spiceai/pull/2992
spice login writes to .env.local if present by @slyons in https://github.com/spiceai/spiceai/pull/2996

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.18.3-beta...v0.19.0-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.18.3-beta (Sep 30, 2024)

By Spice AI (@spice_ai) | Monday, September 30, 2024

Announcing the release of Spice v0.18.3-beta 🛠️

The Spice v0.18.3-beta release includes several quality-of-life improvements including verbosity flags for spiced and the Spice CLI, vector search over larger documents with support for chunking dataset embeddings, and multiple performance enhancements. Additionally, the release includes several bug fixes, dependency updates, and optimizations, including updated table providers and significantly improved GitHub data connector performance for issues and pull requests.

Highlights in v0.18.3-beta

GitHub Query Mode: A new github_query_mode: search parameter has been added to the GitHub Data Connector, which uses the GitHub Search API to enable faster and more efficient query of issues and pull requests when using filters.

Example spicepod.yml:

- from: github:github.com/spiceai/spiceai/issues/trunk
  name: spiceai.issues
  params:
    github_query_mode: search # Use GitHub Search API
    github_token: ${secrets:GITHUB_TOKEN}

Output Verbosity: Higher verbosity output levels can be specified through flags for both spiced and the Spice CLI.

Example command line:

spice -v
spice --very-verbose

spiced -vv
spiced --verbose

Embedding Chunking: Chunking can be enabled and configured to preprocess input data before generating dataset embeddings. This improves the relevance and precision for larger pieces of content.

Example spicepod.yml:

- name: support_tickets
  embeddings:
    - column: conversation_history
      use: openai_embeddings
      chunking:
        enabled: true
        target_chunk_size: 128
        overlap_size: 16
        trim_whitespace: true

For details, see the Search Documentation.

Dependencies

DataFusion Table Providers: Upgraded to rev b0af91992699ecbf5adf2036a07122578f06150e.

Contributors

@Sevenannn
@peasee
@Jeadie
@sgrebnov
@phillipleblanc
@ewgenius
@slyons

What’s Changed

Update datafusion table provider patch by @Sevenannn in https://github.com/spiceai/spiceai/pull/2817
refactor: Set max_rows_per_batch for ODBC to 4000 by @peasee in https://github.com/spiceai/spiceai/pull/2822
Use User message for health check by @Jeadie in https://github.com/spiceai/spiceai/pull/2823
Upgrade Helm chart (Spice v0.18.2-beta) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2820
Add verbosity flags for spiced, spice: -v, -vv, --verbose, --very-verbose. by @Jeadie in https://github.com/spiceai/spiceai/pull/2831
Rename spiceai data connector to spice.ai by @sgrebnov in https://github.com/spiceai/spiceai/pull/2680
Prepare for v0.19.0-beta release (version bump) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2821
Bump clap from 4.5.17 to 4.5.18 (#2801) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2848
Enable “rc” feature for serde in spicepod crate by @ewgenius in https://github.com/spiceai/spiceai/pull/2851
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/2852
chore: update table providers by @peasee in https://github.com/spiceai/spiceai/pull/2858
fix: Use GitHub search for issues in GraphQL by @peasee in https://github.com/spiceai/spiceai/pull/2845
fix: Use GitHub search for pull_requests by @peasee in https://github.com/spiceai/spiceai/pull/2847
Support chunking dataset embeddings by @Jeadie in https://github.com/spiceai/spiceai/pull/2854
refactor: Update GraphQL client to be more robust for filter push down by @peasee in https://github.com/spiceai/spiceai/pull/2864
docs: Update accelerator beta criteria by @peasee in https://github.com/spiceai/spiceai/pull/2865
Change BytesProcessedRule to be an optimizer rather than an analyzer rule by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2867
Don’t run E2E or PR tests on documentation by @Jeadie in https://github.com/spiceai/spiceai/pull/2869
Verify benchmark query results using snapshot testing (spice.ai connector) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2866
feat: Add GraphQLOptimizer by @peasee in https://github.com/spiceai/spiceai/pull/2868
Update quickstarts for Endgame by @Jeadie in https://github.com/spiceai/spiceai/pull/2863
Update version to v0.18.3-beta by @sgrebnov in https://github.com/spiceai/spiceai/pull/2882
Update DataFusion: fix coalesce, Aggregation with Window functions unparsing support by @sgrebnov in https://github.com/spiceai/spiceai/pull/2884
Revert “Rename spiceai data connector to spice.ai” by @sgrebnov in https://github.com/spiceai/spiceai/pull/2881
Adding integration test for DuckDB read functions by @slyons in https://github.com/spiceai/spiceai/pull/2857
Show more informative mysql error message by @Sevenannn in https://github.com/spiceai/spiceai/pull/2883
Fix no process-level CryptoProvider available when using REPL and TLS by @sgrebnov in https://github.com/spiceai/spiceai/pull/2887
Change UX for chunking and enable overlap_size in chunking by @Jeadie in https://github.com/spiceai/spiceai/pull/2890
Add log/slog to spice CLI tool by @Jeadie in https://github.com/spiceai/spiceai/pull/2859
feat: Add GitHub GraphQLOptimizer by @peasee in https://github.com/spiceai/spiceai/pull/2870
Fix mysql invalid tablename error message by @Sevenannn in https://github.com/spiceai/spiceai/pull/2896
fix: Remove login column rename in pulls and update Optimizer by @peasee in https://github.com/spiceai/spiceai/pull/2897
Fix require check checking. by @Jeadie in https://github.com/spiceai/spiceai/pull/2898

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.18.2-beta...v0.18.3-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.18.1-beta (Sep 23, 2024)

By Spice AI (@spice_ai) | Monday, September 23, 2024

Announcing the release of Spice v0.18.1-beta. 🏎️

The v0.18.1-beta release continues to improve runtime performance and reliability. Performance for accelerated queries joining multiple datasets has been significantly improved with join push-down support. GraphQL, MySQL, and SharePoint data connectors have better reliability and error handling, and a new Microsoft SQL Server data connector has been introduced. Task History now has fine-grained configuration, including the ability to disable the feature entirely. A new spice search CLI command has been added, enabling development-time embeddings-based searches across datasets.

Highlights in v0.18.1-beta

Join push-down for accelerations: Queries to the same accelerator will now push-down joins, significantly improving acceleration performance for queries joining multiple tables.

Microsoft SQL Server Data Connector: Use from: mssql: to access and accelerate Microsoft SQL Server datasets.

Example spicepod.yml:

datasets:
  - from: mssql:path.to.my_dataset
    name: my_dataset
    params:
      mssql_connection_string: ${secrets:mssql_connection_string}

See the Microsoft SQL Server Data Connector documentation.

Task History: Task History can be configured in the spicepod.yml, including the ability to include, or truncate outputs such as the results of a SQL query.

Example spicepod.yml:

runtime:
  task_history:
    enabled: true
    captured_output: truncated
    retention_period: 8h
    retention_check_interval: 15m

See the Task History Spicepod reference for more information on possible values and behaviors.

Search CLI Command Use the spice search CLI command to perform embeddings-based searches across search configure datasets. Note: Search requires the ai feature to be installed.

Refresh on File Changes: File Data Connector data refreshes can be configured to be triggered when the source file is modified through a file system watcher. Enable the watcher by adding file_watcher: enabled to the acceleration parameters.

Example spicepod.yml:

datasets:
  - from: file://path/to/my_file.csv
    name: my_file
    acceleration:
      enabled: true
      refresh_mode: full
      params:
        file_watcher: enabled

Breaking Changes

The Query History table runtime.query_history has been deprecated and removed in favor of the Task History table runtime.task_history. The Task History table tracks tasks across all features such as SQL query, vector search, and AI completion in a unified table.

See the Task History documentation.

Dependencies

DataFusion: Upgraded from v41 to v42.
Apache Arrow: Upgraded from v52 to v53.
DuckDB: Upgraded from v1.0 to v1.1.

Contributors

@phillipleblanc
@Jeadie
@lukekim
@sgrebnov
@peasee
@Sevenannn
@ewgenius
@slyons

New Contributors

@slyons made their first contribution in https://github.com/spiceai/spiceai/pull/2724

What’s Changed

Update Helm Chart for 0.18.0-beta release by @sgrebnov in https://github.com/spiceai/spiceai/pull/2711
Use a single instance for all DuckDB accelerated datasets by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2669
Dependabot upgrades by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2715
Use a single instance for all SQLite accelerated datasets by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2720
Prepare for v0.18.1-beta release by @sgrebnov in https://github.com/spiceai/spiceai/pull/2692
For GraphQL, remove necessity of json_pointer and improve error messaging. by @Jeadie in https://github.com/spiceai/spiceai/pull/2713
Postgres accelerator benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/2652
Trace query result while running benchmark tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/2684
Early check EmbeddingConnector if embedding models do not exist by @Jeadie in https://github.com/spiceai/spiceai/pull/2717
Move table creation for spice_sys_dataset_checkpoint to DatasetCheckpoint::try_new by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2732
Don’t load tools immediately by @Jeadie in https://github.com/spiceai/spiceai/pull/2731
Renable accelerator federation on trunk by @Sevenannn in https://github.com/spiceai/spiceai/pull/2725
Fixing Data Connectors link in README.md by @slyons in https://github.com/spiceai/spiceai/pull/2724
Enable rehydration tests for DuckDB by @sgrebnov in https://github.com/spiceai/spiceai/pull/2729
Check pageInfo is correct at initialisation of GraphQL connector by @Jeadie in https://github.com/spiceai/spiceai/pull/2730
Microsoft SQL Server data connector initial support by @sgrebnov in https://github.com/spiceai/spiceai/pull/2741
Add spice search CLI command by @lukekim in https://github.com/spiceai/spiceai/pull/2739
Update threat model by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2738
Upgrade to Arrow 53, DataFusion 42 and DuckDB 1.1 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2744
Update datafusion table provider patch by @Sevenannn in https://github.com/spiceai/spiceai/pull/2747
feat: Add enabled config option for task_history by @peasee in https://github.com/spiceai/spiceai/pull/2758
Remove v0.18.0-beta from the Roadmap by @sgrebnov in https://github.com/spiceai/spiceai/pull/2748
Fix spark-connect to use native roots for TLS again by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2766
Fix benchmark test - Install default crypto provider by @Sevenannn in https://github.com/spiceai/spiceai/pull/2752
Resolve primary keys for datasets with catalog or schema by @Jeadie in https://github.com/spiceai/spiceai/pull/2749
MSSQL: include table name in schema retrieval error by @sgrebnov in https://github.com/spiceai/spiceai/pull/2746
File Format parsing for Document tables, support for docx + pdf by @Jeadie in https://github.com/spiceai/spiceai/pull/2740
Add Document parsing to Sharepoint connector. by @Jeadie in https://github.com/spiceai/spiceai/pull/2760
Execution plan with BinaryExpr predicates pushdown support for MS SQL by @sgrebnov in https://github.com/spiceai/spiceai/pull/2768
Update datafusion patch by @Sevenannn in https://github.com/spiceai/spiceai/pull/2772
Support for standalone config parameters for MS SQL by @sgrebnov in https://github.com/spiceai/spiceai/pull/2773
Utilize DataConnectorError for MySQL Data Connector Errors by @Sevenannn in https://github.com/spiceai/spiceai/pull/2759
Add Score to search results by @lukekim in https://github.com/spiceai/spiceai/pull/2774
Don’t call GetComponentStatuses when –metrics not enabled by @Jeadie in https://github.com/spiceai/spiceai/pull/2779
Implement better error handling for spicepods by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2767
Make integration tests more robust by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2782
Query results streaming support for MS SQL by @sgrebnov in https://github.com/spiceai/spiceai/pull/2781
Update benchmark snapshots by @Sevenannn in https://github.com/spiceai/spiceai/pull/2778
For Sharepoint connector, if client_secret and auth_code are both provided, default to auth_code by @Jeadie in https://github.com/spiceai/spiceai/pull/2780
Add modified pk/indexes scenario to rehydration tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/2743
Run benchmarks on Wed, Fri, Sat, and Sun. by @lukekim in https://github.com/spiceai/spiceai/pull/2786
Update PULL_REQUEST_TEMPLATE.md to include a section for Documentation by @slyons in https://github.com/spiceai/spiceai/pull/2785
Add E2E test for MS SQL data connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2788
More types support for MS SQL data connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2789
feat: Add captured_output option for task_history by @peasee in https://github.com/spiceai/spiceai/pull/2783
Add ability to refresh when file data connector detects changes by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2787
Propagate MySQL invalid table name error by @Sevenannn in https://github.com/spiceai/spiceai/pull/2776
feat: Add retention options for task_history config by @peasee in https://github.com/spiceai/spiceai/pull/2784
fix: Move task history check after query history creation by @peasee in https://github.com/spiceai/spiceai/pull/2793
MS SQL connector should ignore all unsupported types by @sgrebnov in https://github.com/spiceai/spiceai/pull/2795
Improve Sharepoint DX by @Jeadie in https://github.com/spiceai/spiceai/pull/2791
Replace query history with task history by @peasee in https://github.com/spiceai/spiceai/pull/2792
Fix datasets_health_monitor spice.runtime.task_history not found warning by @sgrebnov in https://github.com/spiceai/spiceai/pull/2805
Upgrade macOS x86_64 test runner to macOS 13.6.9 Ventura by @sgrebnov in https://github.com/spiceai/spiceai/pull/2803
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/2808
Add mssql to the list of supported data connectors by @sgrebnov in https://github.com/spiceai/spiceai/pull/28

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.18.0-beta...v0.18.1-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.18-beta (Sep 16, 2024)

By Spice AI (@spice_ai) | Monday, September 16, 2024

Announcing the release of Spice v0.18-beta.

The v0.18.0-beta release adds new Sharepoint and File data connectors, introduces AWS Identity and Access Management (IAM) support for the S3 Data Connector, improves performance of the GitHub connector, and increases the overall reliability of all data accelerators. The /ready API endpoint was enhanced to report as ready only when all components, including loaded data, have successfully reported readiness.

Highlights in v0.18.0-beta

Sharepoint Data Connector: Use from: sharepoint: to access and accelerate documents stored in Microsoft 365 OneDrive for Business (Sharepoint). The CLI also includes a new spice login sharepoint to aid in local development and testing.

Example spicepod.yml:

datasets:
  - from: sharepoint:drive:Documents/path:/important_documents/
    name: important_documents
    params:
      sharepoint_client_id: ${secrets:SPICE_SHAREPOINT_CLIENT_ID}
      sharepoint_tenant_id: ${secrets:SPICE_SHAREPOINT_TENANT_ID}
      sharepoint_client_secret: ${secrets:SPICE_SHAREPOINT_CLIENT_SECRET}

See the Sharepoint Data Connector documentation.

AWS Identity and Access Management (IAM) for S3: A new s3_auth parameter for the s3 data connector to configure the authentication method to use when connecting to S3. Supported values are public, key, and iam_role. Use s3_auth: iam_role to assume the instance IAM role.

Example spicepod.yml:

datasets:
  - from: s3://my-bucket
    name: bucket
    params:
      s3_auth: iam_role # Assume IAM role of instance

See the S3 Data Connector documentation.

File Data Connector Use from: file: to query files stored by locally accessible filesystems.

Example spicepod.yml:

datasets:
  - from: file://path/to/customer.parquet
    name: customer
    params:
      file_format: parquet

See the File Data Connector documentation.

Improved /ready Api Now includes the initial data load for accelerated datasets in addition to component readiness to ensure readiness is only reported when data has loaded and can be successfully queried.

Breaking Changes

GitHub Data Connector: The data type for time-related columns has changed from Utf8 to Timestamp. To upgrade, data type references to timestamp. For example, if using time_format:, change uses of time_format: ISO8601 to time_format: timestamp.
Ready API: The /ready API reports ready only when all components have reported ready and data is fully loaded. To upgrade, evaluate uses of the Ready API (such as Kubernetes readiness probes) and consider how it might affect system behavior.

Dependencies

No major dependencies updates.

Contributors

@phillipleblanc
@Jeadie
@lukekim
@sgrebnov
@peasee
@eltociear
@Sevenannn
@ewgenius
@karifabri

New Contributors

@karifabri made their first contribution in https://github.com/spiceai/spiceai/pull/2601

What’s Changed

Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/2585
Set helm to v0.17.4-beta by @ewgenius in https://github.com/spiceai/spiceai/pull/2595
Bump to next v0.18.0-beta version by @ewgenius in https://github.com/spiceai/spiceai/pull/2596
Add snapshot test docs / Update beta criteria for data accelerators by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2594
Enable federation for accelerated queries (sqlite, duckdb, postgres) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2598
spelling updates on v0.17.4 release notes by @karifabri in https://github.com/spiceai/spiceai/pull/2601
Update endgame template by @ewgenius in https://github.com/spiceai/spiceai/pull/2591
fix: Re-attach DuckDB attachments on each query by @peasee in https://github.com/spiceai/spiceai/pull/2602
Speed up sqlite accelerator benchmark test with indexes by @Sevenannn in https://github.com/spiceai/spiceai/pull/2597
Fix refresh API using refresh_mode: append by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2609
Tweak /ready to only report ready when components have all reported Ready by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2600
Add s3_auth parameter to configure IAM role authentication by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2611
Bump fundu from 2.0.0 to 2.0.1 by @dependabot in https://github.com/spiceai/spiceai/pull/2576
fix: Remove comments from SQL files by @peasee in https://github.com/spiceai/spiceai/pull/2627
Utilize runtime.status().is_ready() to check acceleration dataset readiness in benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/2614
Allow for prefix to be kept in internal Parameters by @Jeadie in https://github.com/spiceai/spiceai/pull/2603
Bump itertools from 0.12.1 to 0.13.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2572
Bump golang.org/x/mod from 0.20.0 to 0.21.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2571
Add initial threat model using OWASP Threat Dragon by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2599
fix: Explicitly error for duplicate duckdb file accelerators by @peasee in https://github.com/spiceai/spiceai/pull/2628
Benchmark test binary can parse command line option by @Sevenannn in https://github.com/spiceai/spiceai/pull/2626
Snapshot tests shouldn’t crash the Spice benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/2613
Bump anyhow from 1.0.86 to 1.0.87 by @dependabot in https://github.com/spiceai/spiceai/pull/2573
Upgrade datafusion to improve SQLite subquery tables aliasing support by @sgrebnov in https://github.com/spiceai/spiceai/pull/2634
Run benchmark separately using workflow by @Sevenannn in https://github.com/spiceai/spiceai/pull/2631
Sharepoint UX changes by @Jeadie in https://github.com/spiceai/spiceai/pull/2633
Improve /ready to only mark a dataset ready iff the initial refresh completed by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2630
Support relative paths for file connector by @Jeadie in https://github.com/spiceai/spiceai/pull/2637
Fix error decoding response body GitHub file connector bug by @sgrebnov in https://github.com/spiceai/spiceai/pull/2645
GraphQL pagination and robustness. by @Jeadie in https://github.com/spiceai/spiceai/pull/2632
docs: Update bug template by @peasee in https://github.com/spiceai/spiceai/pull/2629
Define GitHub issues data connector schema upfront by @sgrebnov in https://github.com/spiceai/spiceai/pull/2646
Add support for loading from Sharepoint Group’s default drive. by @Jeadie in https://github.com/spiceai/spiceai/pull/2642
Fix typo in workflow, fix the postgres connector container readiness check by @Sevenannn in https://github.com/spiceai/spiceai/pull/2654
Fix check all features by @Sevenannn in https://github.com/spiceai/spiceai/pull/2653
Enable Warn/Error traces from dependency components by @sgrebnov in https://github.com/spiceai/spiceai/pull/2655
Use lower case iso8601 for time_column by @Sevenannn in https://github.com/spiceai/spiceai/pull/2551
Add basic integration test for Spice spill-to-disk and re-hydration scenario by @sgrebnov in https://github.com/spiceai/spiceai/pull/2643
Add ‘RefreshOverrides::max_jitter’ to ‘POST /v1/datasets/:name/acceleration/refresh’ by @Jeadie in https://github.com/spiceai/spiceai/pull/2641
Bump rustls-pemfile from 1.0.4 to 2.1.3 by @dependabot in https://github.com/spiceai/spiceai/pull/2575
Update dependencies to support querying postgres enum types by @Sevenannn in https://github.com/spiceai/spiceai/pull/2657
Upgrade table-providers by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2659
Improve spill_to_disk_and_rehydration integration test by @sgrebnov in https://github.com/spiceai/spiceai/pull/2658
Enhance GitHub connector robustness with explicit table schema definitions by @sgrebnov in https://github.com/spiceai/spiceai/pull/2661
Rename sharepoint fields by @Jeadie in https://github.com/spiceai/spiceai/pull/2668
Disable dataset checkpoint for DuckDB acceleration by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2676
Revert “Enable federation for accelerated queries (sqlite, duckdb, postgres) (#2598) by @Sevenannn in https://github.com/spiceai/spiceai/pull/2683

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.17.4-beta...v0.18.0-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.17.4-beta (Sep 9, 2024)

By Spice AI (@spice_ai) | Monday, September 09, 2024

Announcing the release of Spice v0.17.4-beta.

The v0.17.4-beta release adds compatibility, performance, and reliability improvements to the DuckDB and SQLite accelerators. The GitHub data connector adds a Stargazers table, Snowflake and Clickhouse data connectors have improved resiliency for empty tables, and core data processing and quality has been improved.

Highlights in v0.17.4-beta

Improved benchmarking, testing, and robustness of data accelerators: Continued compatibility, performance, and reliability improvements for SQLite and DuckDB data accelerators and expanded performance and quality testing.

GitHub Stargazers: The GitHub Data Connector adds support for a /stargazers table making it easy to query GitHub Stargazers using SQL!

Breaking Changes

None.

Contributors

@phillipleblanc
@Jeadie
@lukekim
@sgrebnov
@peasee
@eltociear
@Sevenannn
@ewgenius

New Contributors

@eltociear made their first contribution in https://github.com/spiceai/spiceai/pull/2516

What’s Changed

Change to sql lang by @ewgenius in https://github.com/spiceai/spiceai/pull/2484
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/2487
Bump rustyline from 13.0.0 to 14.0.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2473
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/2490
Native schema inference for snowflake (and support timezone_tz, better numeric support) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2493
Add checks for GitHub quickstart and docs banner to endgame template by @ewgenius in https://github.com/spiceai/spiceai/pull/2489
Prepare for v0.18.0-beta by @Jeadie in https://github.com/spiceai/spiceai/pull/2488
Add logo to README.md by @lukekim in https://github.com/spiceai/spiceai/pull/2497
Add stargazers to GitHub data connector by @lukekim in https://github.com/spiceai/spiceai/pull/2502
Enable federation for accelerated queries (sqlite and duckdb) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2511
Load SQLite decimal extension by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2498
fix: Support INTERVAL in SQLite by @peasee in https://github.com/spiceai/spiceai/pull/2513
Add refresh jitter to refreshing dataset acceleration by @Jeadie in https://github.com/spiceai/spiceai/pull/2510
Update to use DuckDB streaming by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2514
Add more MySQL types in E2E testing by @sgrebnov in https://github.com/spiceai/spiceai/pull/2512
Update tpc loading script to support automatic loading into postgres by @Sevenannn in https://github.com/spiceai/spiceai/pull/2509
docs: update README.md by @eltociear in https://github.com/spiceai/spiceai/pull/2516
Bump quinn-proto from 0.11.6 to 0.11.8 in the cargo group by @dependabot in https://github.com/spiceai/spiceai/pull/2501
Script for loading clickbench data into arrow / postgres and run clickbench queries by @Sevenannn in https://github.com/spiceai/spiceai/pull/2500
Fix run query script to correctly record all errors by @Sevenannn in https://github.com/spiceai/spiceai/pull/2529
Add support for DuckDB engine to setup-tpc-spicepod.bash by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2530
Upgrade datafusion (fixes subquery alias table unparsing for SQLite) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2532
Make dataset acceleration delay period +- jitter by @Jeadie in https://github.com/spiceai/spiceai/pull/2534
Add refresh options to POST /v1/datasets/:name/acceleration/refresh by @Jeadie in https://github.com/spiceai/spiceai/pull/2515
Add E2E for GitHub Connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2505
Add on-conflict integration test for file based and memory based sqlite by @Sevenannn in https://github.com/spiceai/spiceai/pull/2533
Upgrade to Rust v.1.81.0 and fix resulting compile error by @Sevenannn in https://github.com/spiceai/spiceai/pull/2539
Remove unneeded RwLock from EmbeddingModelStore by @Jeadie in https://github.com/spiceai/spiceai/pull/2541
Remove unneeded RwLock from LlmModelStore by @Jeadie in https://github.com/spiceai/spiceai/pull/2537
Add sqlite to the setup tpc benchmark script by @Sevenannn in https://github.com/spiceai/spiceai/pull/2540
Add sqlite to setup clickbench script by @Sevenannn in https://github.com/spiceai/spiceai/pull/2548
Update version for v0.17.4-beta release by @ewgenius in https://github.com/spiceai/spiceai/pull/2563
Sharepoint data connector by @Jeadie in https://github.com/spiceai/spiceai/pull/2294
Fix predicate/projection push-down for BytesProcessedNode by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2564
fix out of order projections in sharepoint scans by @Jeadie in https://github.com/spiceai/spiceai/pull/2569
Use Decimal instead of Float64 for SQLite Decimal columns by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2566
Add snapshot tests for EXPLAIN plans in integration tests by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2570
Set refresh.period from refresh_data_window by @ewgenius in https://github.com/spiceai/spiceai/pull/2578
Add snapshot tests for EXPLAIN plans in benchmark tests by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2580
Disable federation for accelerated queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/2581
Add manual refresh payload to ‘spice refresh…’ by @Jeadie in https://github.com/spiceai/spiceai/pull/2565
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/2586

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.17.3-beta...v0.17.4-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.17.3-beta (Sep 2, 2024)

By Spice AI (@spice_ai) | Monday, September 02, 2024

Announcing the release of Spice v0.17.3-beta.

The v0.17.3-beta release further improves data accelerator robustness and adds a new github data connector that makes accelerating GitHub Issues, Pull Requests, Commits, and Blobs easy.

Highlights in v0.17.3-beta

Improved benchmarking, testing, and robustness of data accelerators: Continued improvements to benchmarking and testing of data accelerators, leading to more robust and reliable data accelerators.

GitHub Connector (alpha): Connect to GitHub and accelerate Issues, Pull Requests, Commits, and Blobs.

datasets:
  # Fetch all rust and golang files from spiceai/spiceai
  - from: github:github.com/spiceai/spiceai/files/trunk
    name: spiceai.files
    params:
      include: '**/*.rs; **/*.go'
      github_token: ${secrets:GITHUB_TOKEN}

    # Fetch all issues from spiceai/spiceai. Similar for pull requests, commits, and more.
  - from: github:github.com/spiceai/spiceai/issues
    name: spiceai.issues
    params:
      github_token: ${secrets:GITHUB_TOKEN}

Breaking Changes

None.

Upgrade Instructions

CLI: Run spice upgrade
Docker: docker pull spiceai/spiceai:latest
Container image tag: spiceai/spiceai:latest or spiceai/spiceai:0.17.3-beta

Contributors

@phillipleblanc
@Jeadie
@peasee
@sgrebnov
@Sevenannn
@lukekim
@dependabot
@ewgenius

What’s Changed

Dependencies

delta_kernel from 0.2.0 to 0.3.0.

Commits

Prepare version for v0.17.3-beta by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2388
Add a basic Github Connector by @Jeadie in https://github.com/spiceai/spiceai/pull/2365
task: Re-enable federation by @peasee in https://github.com/spiceai/spiceai/pull/2389
fix: Implement custom PartialEq for Dataset by @peasee in https://github.com/spiceai/spiceai/pull/2390
GitHub Data Connector files support (basic fields) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2393
Add a --force flag to spice install to force it to install the latest released version by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2395
Improve experience of using spice chat by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2396
Fix view loading on startup by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2398
Add include param support to GitHub Data Connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2397
Postgres integration test to cover on-conflict behavior by @Sevenannn in https://github.com/spiceai/spiceai/pull/2359
Create dependabot.yml by @lukekim in https://github.com/spiceai/spiceai/pull/2399
Add content column to GitHub Connector when dataset is accelerated by @sgrebnov in https://github.com/spiceai/spiceai/pull/2400
Fix dependabot indentation by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2402
Bump docker/setup-buildx-action from 1 to 3 by @dependabot in https://github.com/spiceai/spiceai/pull/2403
Bump github/codeql-action from 2 to 3 by @dependabot in https://github.com/spiceai/spiceai/pull/2404
Bump docker/login-action from 1 to 3 by @dependabot in https://github.com/spiceai/spiceai/pull/2405
Bump yogevbd/enforce-label-action from 2.1.0 to 2.2.2 by @dependabot in https://github.com/spiceai/spiceai/pull/2406
Bump actions/checkout from 3 to 4 by @dependabot in https://github.com/spiceai/spiceai/pull/2407
Bump go.uber.org/zap from 1.21.0 to 1.27.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2408
Bump github.com/prometheus/client_model from 0.6.0 to 0.6.1 by @dependabot in https://github.com/spiceai/spiceai/pull/2409
Bump github.com/spf13/cobra from 1.6.0 to 1.8.1 by @dependabot in https://github.com/spiceai/spiceai/pull/2412
Bump chrono-tz from 0.8.6 to 0.9.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2413
Bump tokio from 1.39.2 to 1.39.3 by @dependabot in https://github.com/spiceai/spiceai/pull/2414
Bump tokenizers from 0.19.1 to 0.20.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2415
Bump serde from 1.0.207 to 1.0.209 by @dependabot in https://github.com/spiceai/spiceai/pull/2416
Bump gopkg.in/natefinch/lumberjack.v2 from 2.0.0 to 2.2.1 by @dependabot in https://github.com/spiceai/spiceai/pull/2410
Bump ndarray from 0.15.6 to 0.16.1 by @dependabot in https://github.com/spiceai/spiceai/pull/2417
Bump golang.org/x/mod from 0.14.0 to 0.20.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2411
Add correct labels to dependabot.yml by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2418
Fix build break by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2430
Dependabot updates by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2431
Bump github.com/stretchr/testify from 1.8.1 to 1.9.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2422
Preserve timezone information in constructing expr by @Sevenannn in https://github.com/spiceai/spiceai/pull/2392
Bump github.com/spf13/viper from 1.12.0 to 1.19.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2420
Fix repeated base table data in acceleration with embeddings by @Sevenannn in https://github.com/spiceai/spiceai/pull/2401
Fix tool calling with Groq (and potentially other tool-enabled models) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2435
Remove candle from crates/llms/src/chat/ by @Jeadie in https://github.com/spiceai/spiceai/pull/2439
fix: Only attach successfully initialized accelerators by @peasee in https://github.com/spiceai/spiceai/pull/2433
Support overriding OpenAI default values in a model param; add token usage telemetry to task_history. by @Jeadie in https://github.com/spiceai/spiceai/pull/2434
Enable message chains and tool calls for local LLMs by @Jeadie in https://github.com/spiceai/spiceai/pull/2180
DuckDB on-conflict integration test by @Sevenannn in https://github.com/spiceai/spiceai/pull/2437
Fix MySQL E2E tests and include MySQL acceleration testing by @sgrebnov in https://github.com/spiceai/spiceai/pull/2441
Use rtcontext for proper cloud/local context in spice chat by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2442
Fix MySQL connector to respect the source column’s decimal precision by @sgrebnov in https://github.com/spiceai/spiceai/pull/2443
Improve Github Data Connector tables schema by @sgrebnov in https://github.com/spiceai/spiceai/pull/2448
Improve GitHub Connector error msg when invalid token or permissions by @sgrebnov in https://github.com/spiceai/spiceai/pull/2449
Proper error tracking across tracing spans by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2454
task: Disable and update federation by @peasee in https://github.com/spiceai/spiceai/pull/2457
GitHub connector: convert labels and hashes to primitive arrays by @sgrebnov in https://github.com/spiceai/spiceai/pull/2452
Bump datafusion version to the latest by @sgrebnov in https://github.com/spiceai/spiceai/pull/2456
Trim trailing / for S3 data connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2458
Add accelerated_refresh to task_history table by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2459
Add assignees and labels fields to github issues and github pulls datasets by @ewgenius in https://github.com/spiceai/spiceai/pull/2467
Native clickhouse schema inference by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2466
List GitHub connector in readme by @ewgenius in https://github.com/spiceai/spiceai/pull/2468
Fix LLMs health check; Add updatedAt field to GitHub connector by @ewgenius in https://github.com/spiceai/spiceai/pull/2474
Remove non existing updated_at from github.pulls dataset by @ewgenius in https://github.com/spiceai/spiceai/pull/2475
GitHub connector: add pulls labels and rm duplicate milestoneId and milestoneTitle for issues by @sgrebnov in https://github.com/spiceai/spiceai/pull/2477
Bump delta_kernel from 0.2.0 to 0.3.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2472
Add back GitHub connector Pull Request updated_at by @lukekim in https://github.com/spiceai/spiceai/pull/2479
Update ROADMAP Sep 2, 2024. by @lukekim in https://github.com/spiceai/spiceai/pull/2478

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.17.2-beta...v0.17.3-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.17.2-beta (August 26, 2024)

By Spice AI (@spice_ai) | Monday, August 26, 2024

Announcing the release of Spice v0.17.2-beta 🏄

The v0.17.2-beta release focuses on improving data accelerator compatibility, stability, and performance. Expanded data type support for DuckDB, SQLite, and PostgreSQL data accelerators (and data connectors) enables significantly more data types to be accelerated. Error handling and logging has also been improved along with several bugs.

Highlights in v0.17.2-beta

Expanded Data Type Support for Data Accelerators: DuckDB, SQLite, and PostgreSQL Data Accelerators now support a wider range of data types, enabling acceleration of more diverse datasets.

Enhanced Error Handling and Logging: Improvements have been made to aid in troubleshooting and debugging.

Anonymous Usage Telemetry: Optional, anonymous, aggregated telemetry has been added to help improve Spice. This feature can be disabled. For details about collected data, see the telemetry documentation.

To opt out of telemetry:

Using the CLI flag:
```
spice run -- --telemetry-enabled false
```

Add configuration to spicepod.yaml:

runtime:
  telemetry:
    enabled: false

Improved Benchmarking: A suite of performance benchmarking tests have been added to the project, helping to maintain and improve runtime performance; a top priority for the project.

Breaking Changes

None.

Contributors

@Jeadie
@y-f-u
@phillipleblanc
@sgrebnov
@Sevenannn
@peasee
@ewgenius

What’s Changed

Dependencies

DataFusion: Upgraded from v40 to v41

Commits

Pin actions/upload-artifact to v4.3.4 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2200
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/2202
Update to next release version, v0.17.2-beta by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2203
add accelerator beta criteria by @y-f-u in https://github.com/spiceai/spiceai/pull/2201
update helm chart to 0.17.1-beta by @Sevenannn in https://github.com/spiceai/spiceai/pull/2205
add dockerignore to avoid copy target and test folder by @y-f-u in https://github.com/spiceai/spiceai/pull/2206
add client timeout for deltalake connector by @y-f-u in https://github.com/spiceai/spiceai/pull/2208
Upgrade tonic and opentelemetry-proto by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2223
Add index and resource tuning for postgres ghcr image to support postgres benchmark in sf1 by @Sevenannn in https://github.com/spiceai/spiceai/pull/2196
Remove embedding columns from retrieved_primary_keys in v1/search by @Jeadie in https://github.com/spiceai/spiceai/pull/2176
use file as db_path_param as the param prefix is trimmed by @y-f-u in https://github.com/spiceai/spiceai/pull/2230
use file for sqlite db path param by @y-f-u in https://github.com/spiceai/spiceai/pull/2231
docs: Clarify the global requirement for local_infile when loading TPCH by @peasee in https://github.com/spiceai/spiceai/pull/2228
Revert pinning actions/upload-artifact@v4 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2232
Runtime tools to chat models by @Jeadie in https://github.com/spiceai/spiceai/pull/2207
Create runtime.task_history table for queries, and embeddings by @Jeadie in https://github.com/spiceai/spiceai/pull/2191
chore: Update Databricks ODBC Bench to use TPCH SF1 by @peasee in https://github.com/spiceai/spiceai/pull/2238
Replace metrics-rs with OpenTelemetry Metrics by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2240
fix: Remove dead code by @peasee in https://github.com/spiceai/spiceai/pull/2249
Improve tool quality and add vector search tool by @Jeadie in https://github.com/spiceai/spiceai/pull/2250
fix missing partition cols in delta lake by @y-f-u in https://github.com/spiceai/spiceai/pull/2253
download file from remote for delta testing by @y-f-u in https://github.com/spiceai/spiceai/pull/2254
feat: Set SQLite DB path to .spice/data by @peasee in https://github.com/spiceai/spiceai/pull/2242
Support tools for chat completions in streaming mode by @ewgenius in https://github.com/spiceai/spiceai/pull/2255
Load component description field from spicepod.yaml and include in LLM context by @ewgenius in https://github.com/spiceai/spiceai/pull/2261
Add parameter for connection_pool_size in the Postgres Data Connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2251
Add primary keys to response of DocumentSimilarityTool by @Jeadie in https://github.com/spiceai/spiceai/pull/2263
run queries bash script by @y-f-u in https://github.com/spiceai/spiceai/pull/2262
Run benchmark test on schedule by @Sevenannn in https://github.com/spiceai/spiceai/pull/2277
feat: Add a reference to originating App for a Dataset by @peasee in https://github.com/spiceai/spiceai/pull/2283
Tool use & telemetry productionisation. by @Jeadie in https://github.com/spiceai/spiceai/pull/2286
Fix cron in benchmarks.yml by @Sevenannn in https://github.com/spiceai/spiceai/pull/2288
Upgrade to DataFusion v41 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2290
Chat completions adjustments and fixes by @ewgenius in https://github.com/spiceai/spiceai/pull/2292
Define the new metrics Arrow schema based on Open Telemetry by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2295
OpenTelemetry Metrics Arrow exporter to runtime.metrics table by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2296
Calculate summary metrics from histograms for Prometheus endpoint by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2302
Add back Spice DF runtime_env during SessionContext construction by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2304
Add integration test for S3 data connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2305
Fix secrets.inject_secrets when secret not found. by @Jeadie in https://github.com/spiceai/spiceai/pull/2306
Intra-table federation query on duckdb accelerated table by @y-f-u in https://github.com/spiceai/spiceai/pull/2299
Postgres federation on acceleration by @y-f-u in https://github.com/spiceai/spiceai/pull/2309
sqlite intra table federation on acceleration by @y-f-u in https://github.com/spiceai/spiceai/pull/2308
feat: Add DataAccelerator::init() for SQLite acceleration federation by @peasee in https://github.com/spiceai/spiceai/pull/2293
Initial framework for collecting anonymous usage telemetry by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2310
Add gRPC action to trigger accelerated dataset refresh by @sgrebnov in https://github.com/spiceai/spiceai/pull/2316
add disable_query_push_down option to acceleration settings by @y-f-u in https://github.com/spiceai/spiceai/pull/2327
Remove v1/assist by @Jeadie in https://github.com/spiceai/spiceai/pull/2312
bump table provider version to set the correct dialect for postgres writer by @y-f-u in https://github.com/spiceai/spiceai/pull/2329
Send telemetry on startup by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2331
Calculate resource IDs for telemetry by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2332
Refactor v1/search: include WHERE condition, allow extra columns in projection. by @Jeadie in https://github.com/spiceai/spiceai/pull/2328
Add integration test for gRPC dataset refresh action by @sgrebnov in https://github.com/spiceai/spiceai/pull/2330
Propagate errors through all task_history nested spans by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2337
Improve tools by @Jeadie in https://github.com/spiceai/spiceai/pull/2338
update duckdb rs version to support more types: interval/duration/etc by @y-f-u in https://github.com/spiceai/spiceai/pull/2336
feat: Add DuckDB accelerator init, attach databases for federation by @peasee in https://github.com/spiceai/spiceai/pull/2335
Add query telemetry metrics by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2333
Add system prompts for LLMs; system prompts for tool using models. by @Jeadie in https://github.com/spiceai/spiceai/pull/2342
Fix benchmark test to keep running when there’s failed queries by @Sevenannn in https://github.com/spiceai/spiceai/pull/2347
Tools as a spicepod first class citizen. by @Jeadie in https://github.com/spiceai/spiceai/pull/2344
Add bytes_processed telemetry metric by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2343
fix misaligned columns from delta lake by @y-f-u in https://github.com/spiceai/spiceai/pull/2356
Emit telemetry metrics to runtime.metrics/Prometheus as well by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2352
Use UTC timezone for telemetry timestamps by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2354
Fix MetricType deserialization by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2358
Add dataset details to tool using LLMs; early check tables in vector search by @Jeadie in https://github.com/spiceai/spiceai/pull/2353
Bump datafusion-federation/datafusion-table-providers dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2360
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/2362
fix: Disable DuckDB and SQLite federation by @peasee in https://github.com/spiceai/spiceai/pull/2371
Fix system prompt in ToolUsingChat, fix builtin registration by @Jeadie in https://github.com/spiceai/spiceai/pull/2367
fix: Use –profile release for benchmarks by @peasee in https://github.com/spiceai/spiceai/pull/2372
nql parameter ‘use’ -> ‘model’ by @Jeadie in https://github.com/spiceai/spiceai/pull/2366

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.17.1-beta...v0.17.2-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.17.1-beta (August 5, 2024)

By Spice AI (@spice_ai) | Monday, August 05, 2024

The v0.17.1-beta minor release focuses on enhancing stability, performance, and usability. The Flight interface now supports the GetSchema API and s3, ftp, sftp, http, https, and databricks data connectors have added support for a client_timeout parameter.

Highlights in v0.17.1-beta

Flight API GetSchema: The GetSchema API is now supported by the Flight interface. The schema of a dataset can be retrieved using GetSchema with the PATH or CMD FlightDescriptor types. The CMD FlightDescriptor type is used to get the schema of an arbitrary SQL query as the CMD bytes. The PATH FlightDescriptor type is used to retrieve the schema of a dataset.

Client Timeout: A client_timeout parameter has been added for Data Connectors: ftp, sftp, http, https, and databricks. When defined, the client timeout configures Spice to stop waiting for a response from the data source after the specified duration. The default timeout is 30 seconds.

datasets:
  - from: ftp://remote-ftp-server.com/path/to/folder/
    name: my_dataset
    params:
      file_format: csv
      # Example client timeout
      client_timeout: 30s
      ftp_user: my-ftp-user
      ftp_pass: ${secrets:my_ftp_password}

Breaking Changes

TLS is now required to be explicitly enabled. Enable TLS on the command line using --tls-enabled true:

spice run -- --tls-enabled true --tls-certificate-file /path/to/cert.pem --tls-key-file /path/to/key.pem

Or in the spicepod.yml with enabled: true:

runtime:
  tls:
    # TLS explicitly enabled
    enabled: true
    certificate_file: /path/to/cert.pem
    key_file: /path/to/key.pem

Contributors

@Jeadie
@y-f-u
@phillipleblanc
@sgrebnov
@peasee
@Sevenannn

What’s Changed

Dependencies

Rust: Upgraded from v1.79.0 to v1.80.0

Commits

Update README.md by @Jeadie in https://github.com/spiceai/spiceai/pull/2142
update helm chart to 0.17.0-beta by @y-f-u in https://github.com/spiceai/spiceai/pull/2144
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/2143
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/2141
Update Spice runtime to require explicit enablement for TLS by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2148
Update next version, ROADMAP, End Game template, move alpha release notes by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2145
Update EXTENSIBILITY to be correct, update README.md with Beta connectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2146
Add benchmark tests for duckdb acceleration by @sgrebnov in https://github.com/spiceai/spiceai/pull/2151
fix: Increase benchmark dataset setup timeout for Databricks by @peasee in https://github.com/spiceai/spiceai/pull/2149
Add LLMs to v1/models by @Jeadie in https://github.com/spiceai/spiceai/pull/2152
Dataset with acceleration enabled = false shouldn’t go through accelerated dataset hot reload by @Sevenannn in https://github.com/spiceai/spiceai/pull/2155
Show single error string in Spice SQL REPL command line by @Sevenannn in https://github.com/spiceai/spiceai/pull/2150
Add CI to build makefile install targets by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2157
Make the FlightClient struct cheap to clone by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2162
Fix bugs with local Unity Catalog server by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2160
Benchmark: data connector tests should continue on query error (s3) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2161
fix hanging spiced when odbc loading data and received a cancel signal by @y-f-u in https://github.com/spiceai/spiceai/pull/2156
Improve MySql schema extraction and add InList and ScalarFunction expr support by @sgrebnov in https://github.com/spiceai/spiceai/pull/2158
Fix issue with use of EmbeddingConnector by @Jeadie in https://github.com/spiceai/spiceai/pull/2165
add client timeout for all object store providers by @y-f-u in https://github.com/spiceai/spiceai/pull/2168
Benchmark: include sqlite acceleration and enable more tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/2172
feat: Use datafusion SQLite streaming updates by @peasee in https://github.com/spiceai/spiceai/pull/2171
Benchmark: include arrow acceleration and enable more tests (tpch_q22) by @sgrebnov in https://github.com/spiceai/spiceai/pull/2173
Localhost -> Sink; Fix Sink connector to not require schema via CREATE TABLE... and infer on first write by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2167
Fix misspelled acceleration engine name in benchmark tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/2175
update spark bench catalog by @y-f-u in https://github.com/spiceai/spiceai/pull/2178
Benchmark: Discard first measurement of sql query, disable result caching by @Sevenannn in https://github.com/spiceai/spiceai/pull/2179
clear message when invalid params configured for accelerator by @y-f-u in https://github.com/spiceai/spiceai/pull/2177
Implement the Flight GetSchema API by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2169
Support AppendStream for SpiceAI data connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2181
Support MySQL BINARY, VARBINARY, Postgres BYTEA and improve MySQL auth error message by @sgrebnov in https://github.com/spiceai/spiceai/pull/2184
Benchmark: use SF1 for MySQL TPC-H tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/2183
fix windows build broken by adding tokio unix signal by @y-f-u in https://github.com/spiceai/spiceai/pull/2193
Adds TLS support for flightsubscriber/flightpublisher tools by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2194
Update README output samples by @ewgenius in https://github.com/spiceai/spiceai/pull/2195
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/2197

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.17.0-beta...v0.17.1-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.17-beta (July 29, 2024)

By Spice AI (@spice_ai) | Monday, July 29, 2024

Announcing the first beta release of Spice.ai OSS! 🎉

The core Spice runtime has graduated from alpha to beta! Components, such as Data Connectors and Models, follow independent release milestones. Data Connectors graduating from alpha to beta include databricks, spiceai, postgres, s3, odbc, and mysql. From beta to 1.0, project will be to on improving performance and scaling to larger datasets.

This release also includes enhanced security with Transport Layer Security (TLS) secured APIs, a new spice install CLI command, and several performance and stability improvements.

Highlights in v0.17-beta

Encryption in transit with TLS: The HTTP, gRPC, Metrics, and OpenTelemetry (OTEL) API endpoints can be secured with TLS by specifying a certificate and private key in PEM format.

Enable TLS using the --tls-certificate-file and --tls-key-file command-line flags:

spice run -- --tls-certificate-file /path/to/cert.pem --tls-key-file /path/to/key.pem

Or configure in the spicepod.yml:

runtime:
  tls:
    certificate_file: /path/to/cert.pem
    key_file: /path/to/key.pem

Get started with TLS by following the TLS Sample. For more details see the TLS Documentation.

spice install: Running the spice install CLI command will download and install the latest version of the runtime.

spice install

Improved SQLite and DuckDB compatibility: The SQLite and DuckDB accelerators support more complex queries and additional data types.
Pass through arguments from spice run to runtime: Arguments passed to spice run are now passed through to the runtime.
Secrets replacement within connection strings: Secrets are now replaced within connection strings:

datasets:
  - from: mysql:my_table
    name: my_table
    params:
      mysql_connection_string: mysql://user:${secrets:mysql_pw}@localhost:3306/db

Breaking Changes

The odbc data connector is now optional and has been removed from the released binaries. To use the odbc data connector, use the official Spice Docker image or build the Spice runtime from source.

To build Spice from source with the odbc feature:

cargo build --release --features odbc

To use the official Spice Docker image from DockerHub:

# Pull the latest official Spice image
docker pull spiceai/spiceai:latest

# Pull the official v0.17-beta Spice image
docker pull spiceai/spiceai:0.17.0-beta

Contributors

@y-f-u
@peasee
@digadeesh
@phillipleblanc
@ewgenius
@sgrebnov
@Sevenannn
@lukekim

What’s Changed

Dependencies

Upgraded delta-kernel-rs to v0.2.0.

Commits

update helm chart versions for v0.16.0-alpha by @y-f-u in https://github.com/spiceai/spiceai/pull/2057
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/2060
fix: Install unixodbc for E2E test release installation by @peasee in https://github.com/spiceai/spiceai/pull/2063
update next release to 0.16.1-beta by @digadeesh in https://github.com/spiceai/spiceai/pull/2065
update version to 0.17.0-beta by @digadeesh in https://github.com/spiceai/spiceai/pull/2068
Update ROADMAP.md - removing delivered features and updating Beta timeline. by @digadeesh in https://github.com/spiceai/spiceai/pull/2066
make bench works for more connectors by @y-f-u in https://github.com/spiceai/spiceai/pull/2042
enable spark benchmark by @y-f-u in https://github.com/spiceai/spiceai/pull/2069
Make the json_pointer param optional for the GraphQL connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2072
Fix secrets init to not bail if a secret store can’t load by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2073
Update end_game.md by @ewgenius in https://github.com/spiceai/spiceai/pull/2059
Fix time predicate with timezone info casting for Dremio by @sgrebnov in https://github.com/spiceai/spiceai/pull/2058
Add benchmark tests for S3 data connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2049
Add benchmark tests for MySQL data connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2048
fix: Add Athena dialect for ODBC by @peasee in https://github.com/spiceai/spiceai/pull/2084
Workflow to build MySQL image with TPCH benchmark data by @sgrebnov in https://github.com/spiceai/spiceai/pull/2070
Fix secrets replacement within connection strings by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2086
fix: Correctly prefix missing required parameters by @peasee in https://github.com/spiceai/spiceai/pull/2088
Add Postgres Data Connector TPCH Benchmark Tests by @Sevenannn in https://github.com/spiceai/spiceai/pull/2009
Add spice install CLI command by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2090
Use MySQL service container for benchmark tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/2089
Remove ODBC from default released binaries by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2092
Add cfg flag to properly support build w / wo feature in benchmark tests by @Sevenannn in https://github.com/spiceai/spiceai/pull/2095
Move Prometheus metrics server to runtime by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2093
fix: Remove unixodbc from test release install by @peasee in https://github.com/spiceai/spiceai/pull/2103
Upgrade delta_kernel to 0.2.0 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2102
Allow DuckDB to load extensions in Docker by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2104
Spawn the metrics server in the background. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2105
fix: suffix delta kernel table location with slash if none by @y-f-u in https://github.com/spiceai/spiceai/pull/2107
Bump object_store from 0.10.1 to 0.10.2 by @dependabot in https://github.com/spiceai/spiceai/pull/2094
Decision Record: Default HTTP and GRPC ports for Spice.ai OSS by @digadeesh in https://github.com/spiceai/spiceai/pull/2091
Enable TLS for metrics endpoint by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2108
Use Postgres container for tpch bench by @Sevenannn in https://github.com/spiceai/spiceai/pull/2112
Add workflow to build Postgres Docker image using tpch data by @Sevenannn in https://github.com/spiceai/spiceai/pull/2101
Enable TLS for HTTP endpoint by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2109
Enable TLS on the Flight GRPC endpoint by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2110
add timeout parameters for object store client options by @y-f-u in https://github.com/spiceai/spiceai/pull/2114
Enable TLS on the OpenTelemetry GRPC endpoint by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2111
feat: Add ODBC Databricks Benches by @peasee in https://github.com/spiceai/spiceai/pull/2113
Support configuring TLS in the spicepod by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2118
add broken tpch simple queries by @y-f-u in https://github.com/spiceai/spiceai/pull/2116
Add integration test for TLS by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2121
Improve SQLite and DuckDB compatibility by @sgrebnov in https://github.com/spiceai/spiceai/pull/2122
Pass through arguments from spice run and spice sql to runtime by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2123
Handle TLS in the spice CLI when connecting to the runtime by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2124
Handle connecting over TLS for spice sql by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2125
Remove --tls flag by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2128
fix: Handle SQLResult error instead of unwrapping by @peasee in https://github.com/spiceai/spiceai/pull/2127
Add delta bench by @y-f-u in https://github.com/spiceai/spiceai/pull/2120
feat: Add Athena ODBC benches by @peasee in https://github.com/spiceai/spiceai/pull/2129
fix: Use odbc-api fork for decimal conversion fix by @peasee in https://github.com/spiceai/spiceai/pull/2133
Update benchmarks job env for delta testing by @y-f-u in https://github.com/spiceai/spiceai/pull/2134
Use forked dotenvy to disable variable substitution by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2135
Remove unnecessary memory allocations in the query path by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2136
upgrade spiceai df for tpch simple 6 and 7 by @y-f-u in https://github.com/spiceai/spiceai/pull/2137
Avoid more unnecessary allocations in the query path by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2138

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.16.0-alpha...v0.17-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.16-alpha (July 22, 2024)

By Spice AI (@spice_ai) | Monday, July 22, 2024

The v0.16-alpha release is the first candidate release for the beta milestone on a path to finalizing the v1.0 developer and user experience. Upgraders should be aware of several breaking changes designed to improve the Secrets configuration experience and to make authoring spicepod.yml files more consistent. See the [Breaking Changes](#Breaking Changes) section below for details. Additionally, the Spice Java SDK was released, providing Java developers a simple but powerful native experience to query Spice.

Highlights in v0.16-alpha

Secret Stores: More than one Secret Store can now be specified. For example, to configure Spice with both Environment Variable and AWS Secrets Manager Secret Stores, use the following secrets configuration in spicepod.yaml:

secrets:
  - from: env
    name: env
  - from: aws_secrets_manager:my_secret_name
    name: aws_secret

Secrets managed by configured Secret Stores can be referenced in component params using the syntax ${<store_name>:<key>}. E.g.

datasets:
  - from: postgres:my_table
    name: my_table
    params:
      pg_host: localhost
      pg_port: 5432
      pg_pass: ${ env:MY_PG_PASS }

Java Client SDK: The Spice Java SDK has been released for JDK 17 or greater.
Federated SQL Query: Significant stability and reliability improvements have been made to federated SQL query support in most data connectors.
ODBC Data Connector: Providing a specific SQL dialect to query ODBC data sources is now supported using the sql_dialect param. For example, when querying Databricks using ODBC, the databricks dialect can be specified to ensure compatibility. Read the ODBC Data Connector documentation for more details.

Breaking Changes

Secret Stores: Secret Stores support has been overhauled including required changes to spicepod.yml schema. File based secrets stored in the ~/.spice/auth file are no longer supported. See Secret Stores Documentation for full reference.

To upgrade Secret Stores, rename any parameters ending in _key to remove the _key suffix and specify a secret inline via the secret replacement syntax (${<secret_store>:<key>}):

datasets:
  - from: postgres:my_table
    name: my_table
    params:
      pg_host: localhost
      pg_port: 5432
      pg_pass_key: my_pg_pass

to:

datasets:
  - from: postgres:my_table
    name: my_table
    params:
      pg_host: localhost
      pg_port: 5432
      pg_pass: ${secrets:my_pg_pass}

And ensure the MY_PG_PASS environment variable is set.

Datasets: The default value of time_format has changed from unix_seconds to timestamp.

To upgrade:

datasets:
  - from:
    name: my_dataset
    # Explicitly define format when not specified.
    time_format: unix_seconds

HTTP Port: The default HTTP port has changed from port 3000 to port 8090 to avoid conflicting with frontend apps which typically use the 3000 range. If an SDK is used, upgrade it at the same time as the runtime.

To upgrade and continue using port 3000, run spiced with the --http command line argument:

# Using Dockerfile or spiced directly
spiced --http 127.0.0.1:3000

HTTP Metrics Port: The default HTTP Metrics port has changed from port 9000 to 9090 to avoid conflicting with other metrics protocols which typically use port 9000.

To upgrade and continue using port 9000, run spiced with the metrics command line argument:

# Using Dockerfile or spiced directly
spiced --metrics 127.0.0.1:9000

GraphQL Data Connector: json_path has been replaced with json_pointer to access nested data from the result of the GraphQL query. See the GraphQL Data Connector documentation for full details and RFC-6901 - JSON Pointer.

To upgrade, change:

json_path: my.json.path

To:

json_pointer: /my/json/pointer

Data Connector Configuration: Consistent connector name prefixing has been applied to connector specific params parameters. Prefixed parameter names helps ensure parameters do not collide.

For example, the Databricks data connector specific params are now prefixed with databricks:

datasets:
  - from: databricks:spiceai.datasets.my_awesome_table # A reference to a table in the Databricks unity catalog
    name: my_delta_lake_table
    params:
      mode: spark_connect
      endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com
      token: MY_TOKEN

To upgrade:

datasets:
  # Example for Spark Connect
  - from: databricks:spiceai.datasets.my_awesome_table # A reference to a table in the Databricks unity catalog
    name: my_delta_lake_table
    params:
      mode: spark_connect
      databricks_endpoint: dbc-a1b2345c-d6e7.cloud.databricks.com # Now prefixed with databricks
      databricks_token: ${secrets:my_token} # Now prefixed with databricks

Refer to the Data Connector documentation for parameter naming changes in this release.

Clickhouse Data Connector: The clickhouse_connection_timeout parameter has been renamed to connection_timeout as it applies to the client and is not Clickhouse configuration itself.

To upgrade, change:

clickhouse_connection_timeout: time

To:

connection_timeout: time

Contributors

@y-f-u
@phillipleblanc
@ewgenius
@github-actions
@sgrebnov
@lukekim
@digadeesh
@peasee
@Sevenannn

What’s Changed

Dependencies

No major dependency updates.

Commits

bump helm chart versions to 0.15.2-alpha by @y-f-u in https://github.com/spiceai/spiceai/pull/1975
Remove unused Cargo.toml fields by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1981
Update version to 0.16.0-beta by @ewgenius in https://github.com/spiceai/spiceai/pull/1983
Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/1984
Enable sqlite acceleration testing in E2E by @sgrebnov in https://github.com/spiceai/spiceai/pull/1980
Revert “Revert “fix: validate time column and time format when constructing accelerated table refresh”” by @y-f-u in https://github.com/spiceai/spiceai/pull/1982
Add Datadog dashboard skeleton by @sgrebnov in https://github.com/spiceai/spiceai/pull/1971
Format Cargo.toml with taplo by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1988
Spice cli spice chat command, to interact with deployed spiced instance in spice.ai cloud by @ewgenius in https://github.com/spiceai/spiceai/pull/1990
Use platform api /v1/chat/completions with streaming in spice chat cli command by @ewgenius in https://github.com/spiceai/spiceai/pull/1998
update spiceai datafusion version to fix tpch queries by @y-f-u in https://github.com/spiceai/spiceai/pull/2001
Install a rustls default CryptoProvider by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2003
Roadmap update July, 2024 by @lukekim in https://github.com/spiceai/spiceai/pull/2002
Add local spice runtime support for spice chat command, add --model flag by @ewgenius in https://github.com/spiceai/spiceai/pull/2007
fix: GraphQL Data Connector - Change json path to json pointer by @digadeesh in https://github.com/spiceai/spiceai/pull/1930
Update ROADMAP.md to include MySQL data connector in Beta by @digadeesh in https://github.com/spiceai/spiceai/pull/2016
Load secrets from multiple secret stores & secrets UX refresh by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2011
upgrade spiceai datafusion to fix tpch simple query 3 by @y-f-u in https://github.com/spiceai/spiceai/pull/2021
feat: Autodetect ODBC dialect by @peasee in https://github.com/spiceai/spiceai/pull/1997
feat: Use CustomDialectBuilder for Databricks ODBC dialect by @peasee in https://github.com/spiceai/spiceai/pull/2020
Switch the secret replacement syntax to ${ <secret>:<key> } by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2026
fix spiceai connector lengthy error by @y-f-u in https://github.com/spiceai/spiceai/pull/2024
Log parameter key instead of value when injecting secret by @Sevenannn in https://github.com/spiceai/spiceai/pull/2031
Update benchmark yml to support postgres benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/2032
Separate data connector parameters into connector and runtime categories by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2028
Fix spice chat prompt and spinner by @ewgenius in https://github.com/spiceai/spiceai/pull/2029
Build spiced with odbc for release binaries by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2036
MySQL timestamp, int64 casting, date part extraction and intervals support by @sgrebnov in https://github.com/spiceai/spiceai/pull/2035
updating default http and metrics ports by @digadeesh in https://github.com/spiceai/spiceai/pull/2034
enable spark connect federated query by @y-f-u in https://github.com/spiceai/spiceai/pull/2041
fix: Use MySQL Interval for Databricks ODBC by @peasee in https://github.com/spiceai/spiceai/pull/2037
Re-enable test_quickstart_dremio E2E test by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2045
Fix ODBC build for release binaries by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2046
chore: Remove unused dependencies by @peasee in https://github.com/spiceai/spiceai/pull/2044
fix: Change version to alpha breaking by @peasee in https://github.com/spiceai/spiceai/pull/2051
Add connector prefix for dataset configure endpoint param by @sgrebnov in https://github.com/spiceai/spiceai/pull/2052
Fix unprefixed runtime parameters by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2050
Fix make install-with-models by @phillipleblanc in https://github.com/spiceai/spiceai/pull/2054
Bump openssl from 0.10.64 to 0.10.66 by @dependabot in https://github.com/spiceai/spiceai/pull/2047
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/2056
ignore empty constraints when creating accelerated table by @y-f-u in https://github.com/spiceai/spiceai/pull/2055

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.15.2-alpha...v0.16.0-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.15.2-alpha (July 15, 2024)

By Spice AI (@spice_ai) | Monday, July 15, 2024

The v0.15.2-alpha minor release focuses on enhancing stability, performance, and introduces Catalog Providers for streamlined access to Data Catalog tables. Unity Catalog, Databricks Unity Catalog, and the Spice.ai Cloud Platform Catalog are supported in v0.15.2-alpha. The reliability of federated query push-down has also been improved for the MySQL, PostgreSQL, ODBC, S3, Databricks, and Spice.ai Cloud Platform data connectors.

Highlights in v0.15.2-alpha

Catalog Providers: Catalog Providers streamline access to Data Catalog tables. Initial catalog providers supported are Databricks Unity Catalog, Unity Catalog and Spice.ai Cloud Platform Catalog.

For example, to configure Spice to connect to tpch tables in the Spice.ai Cloud Platform Catalog use the new catalogs: section in the spicepod.yml:

catalogs:
  - name: spiceai
    from: spiceai
    include:
      - tpch.*

sql> show tables
+---------------+--------------+---------------+------------+
| table_catalog | table_schema | table_name    | table_type |
+---------------+--------------+---------------+------------+
| spiceai       | tpch         | region        | BASE TABLE |
| spiceai       | tpch         | part          | BASE TABLE |
| spiceai       | tpch         | customer      | BASE TABLE |
| spiceai       | tpch         | lineitem      | BASE TABLE |
| spiceai       | tpch         | partsupp      | BASE TABLE |
| spiceai       | tpch         | supplier      | BASE TABLE |
| spiceai       | tpch         | nation        | BASE TABLE |
| spiceai       | tpch         | orders        | BASE TABLE |
| spice         | runtime      | query_history | BASE TABLE |
+---------------+--------------+---------------+------------+

Time: 0.001866958 seconds. 9 rows.

ODBC Data Connector Push-Down: The ODBC Data Connector now supports query push-down for joins, improving performance for joined datasets configured with the same odbc_connection_string.

Improved Spicepod Validation Improved spicepod.yml validation has been added, including warnings when loading resources with duplicate names (datasets, views, models, embeddings).

Breaking Changes

None.

Contributors

@phillipleblanc
@peasee
@y-f-u
@ewgenius
@Sevenannn
@sgrebnov
@lukekim

What’s Changed

Dependencies

Upgraded Apache DataFusion to v40.0.0.

Commits

Update to next release version v0.15.2-alpha by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1901
release: Update helm 0.15.1-alpha by @peasee in https://github.com/spiceai/spiceai/pull/1902
fix: Detect and error on duplicate component names on spiced (re)load by @peasee in https://github.com/spiceai/spiceai/pull/1905
fix: flaky test - test_refresh_status_change_to_ready by @y-f-u in https://github.com/spiceai/spiceai/pull/1908
Add support for parsing catalog from Spicepod. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1903
Add catalog component to Runtime by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1906
Adds a RuntimeBuilder and make most items on Runtime private by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1913
Bump zerovec-derive from 0.10.2 to 0.10.3 by @dependabot in https://github.com/spiceai/spiceai/pull/1914
Add separate tagged image with enabled models feature by @ewgenius in https://github.com/spiceai/spiceai/pull/1909
Update datafusion-table-providers to use newest head by @Sevenannn in https://github.com/spiceai/spiceai/pull/1927
Add MySQL support for TPC-H test data generation script by @sgrebnov in https://github.com/spiceai/spiceai/pull/1932
fix: Expose ODBC task errors if error is before data stream begins by @peasee in https://github.com/spiceai/spiceai/pull/1924
Use public.ecr.aws/docker/library/{postgres/mysql}:latest for integration test images by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1934
Implement spice.ai CatalogProvider by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1925
fix: validate time column and time format when constructing accelerated table refresh by @y-f-u in https://github.com/spiceai/spiceai/pull/1926
Add support for filtering tables included by a catalog by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1933
Add UnityCatalog catalog provider by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1940
Implement Databricks catalog provider by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1941
Copy params into dataset_params by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1947
Make integration tests more stable by using logged-in registry during CI by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1955
Add integration test for Spice.ai catalog provider by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1956
Add GET /v1/catalogs API and catalogs CMD by @lukekim in https://github.com/spiceai/spiceai/pull/1957
feat: Enable ODBC JoinPushDown with hashed connection string by @peasee in https://github.com/spiceai/spiceai/pull/1954
Fix bug: arrow acceleration reports zero results during refresh by @sgrebnov in https://github.com/spiceai/spiceai/pull/1962
Revert “fix: validate time column and time format when constructing accelerated table refresh” by @y-f-u in https://github.com/spiceai/spiceai/pull/1964
fix: Update arrow-odbc to use our fork for pending fixes by @peasee in https://github.com/spiceai/spiceai/pull/1965
Upgrade to DataFusion 40 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1963
Do exchange shouldn’t require table to be writable by @Sevenannn in https://github.com/spiceai/spiceai/pull/1958
Use custom dialect rule for flight federated request by @y-f-u in https://github.com/spiceai/spiceai/pull/1946
upgrade datafusion federation to have the table rewrite fix for tpch-q9 by @y-f-u in https://github.com/spiceai/spiceai/pull/1970
Create v0.15.2-alpha.md Release notes by @digadeesh in https://github.com/spiceai/spiceai/pull/1969
Fix Unity Catalog API response for Azure Databricks by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1973
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/1976

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.15.1-alpha...v0.15.2-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.15.1-alpha (July 8, 2024)

By Spice AI (@spice_ai) | Monday, July 08, 2024

The v0.15.1-alpha minor release focuses on enhancing stability, performance, and usability. Memory usage has been significantly improved for the postgres and duckdb acceleration engines which now use stream processing. A new Delta Lake Data Connector has been added, sharing a delta-kernel-rs based implementation with the Databricks Data Connector supporting deletion vectors.

Highlights

Improved memory usage for PostgreSQL and DuckDB acceleration engines: Large dataset acceleration with PostgreSQL and DuckDB engines has reduced memory consumption by streaming data directly to the accelerated table as it is read from the source.

Delta Lake Data Connector: A new Delta Lake Data Connector has been added for using Delta Lake outside of Databricks.

ODBC Data Connector Streaming: The ODBC Data Connector now streams results, reducing memory usage, and improving performance.

GraphQL Object Unnesting: The GraphQL Data Connector can automatically unnest objects from GraphQL queries using the unnest_depth parameter.

Breaking Changes

None.

Contributors

What’s Changed

Dependencies

The MySQL, PostgreSQL, SQLite and DuckDB DataFusion TableProviders developed by Spice AI have been donated to the datafusion-contrib/datafusion-table-providers community repository.

From the v0.15.1-alpha release, a new dependency is taken on datafusion-contrib/datafusion-table-providers

Commits

Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/1842
Update ROADMAP.md - Remove v0.15.0-alpha roadmap items. by @digadeesh in https://github.com/spiceai/spiceai/pull/1843
update helm chart for v0.15.0-alpha by @y-f-u in https://github.com/spiceai/spiceai/pull/1845
update cargo.toml and version.txt to 0.15.1-alpha (for next release) by @digadeesh in https://github.com/spiceai/spiceai/pull/1844
Fix check for outdated Cargo.lock & update Cargo.lock by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1846
Add Debezium to README by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1847
use snmalloc as global allocator by @y-f-u in https://github.com/spiceai/spiceai/pull/1848
Various improvements for mistral.rs by @Jeadie in https://github.com/spiceai/spiceai/pull/1831
Enable streaming for accelerated tables refresh (common logic) by @sgrebnov in https://github.com/spiceai/spiceai/pull/1863
Use in-memory DB pool for DuckDB functions by @Jeadie in https://github.com/spiceai/spiceai/pull/1849
Generate Spicepod JSON Schema by @ewgenius in https://github.com/spiceai/spiceai/pull/1865
Update http param names by @Jeadie in https://github.com/spiceai/spiceai/pull/1872
Replace DuckDB, PostgreSQL, Sqlite and MySQL providers with the datafusion-table-providers crate by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1873
Remove more dead code moved to datafusion-table-providers by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1874
feat: Optimize ODBC for streaming results by @peasee in https://github.com/spiceai/spiceai/pull/1862
Fix how models uses secrets by @Jeadie in https://github.com/spiceai/spiceai/pull/1875
fix: Add support for varying duplicate columns behavior in GraphQL unnesting by @peasee in https://github.com/spiceai/spiceai/pull/1876
fix: Remove GraphQL duplicate rename support by @peasee in https://github.com/spiceai/spiceai/pull/1877
fix: Remove Overwrite GraphQL duplicates behavior by @peasee in https://github.com/spiceai/spiceai/pull/1882
fix: Use tokio mpsc channels for ODBC streaming by @peasee in https://github.com/spiceai/spiceai/pull/1883
Upgrade table providers to enable DuckDB streaming write by @sgrebnov in https://github.com/spiceai/spiceai/pull/1884
Update ROADMAP.md - Add debezium (alpha) to connector list. by @digadeesh in https://github.com/spiceai/spiceai/pull/1880
Allow defining user for mysql data connector via secrets by @sgrebnov in https://github.com/spiceai/spiceai/pull/1886
Replace delta-rs with delta-kernel-rs and add new delta data connector. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1878
Update README images by @lukekim in https://github.com/spiceai/spiceai/pull/1890
Handle deletion vectors for delta tables by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1891
Rename delta to delta_lake by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1892
Add where is the AI to the FAQ. by @lukekim in https://github.com/spiceai/spiceai/pull/1885
update df table providers rev version by @y-f-u in https://github.com/spiceai/spiceai/pull/1889
Enable other cloud providers for delta_lake integration by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1893
Add CLI parameters for logging into Databricks with Azure/GCP cloud storage by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1894
Bump zerovec from 0.10.2 to 0.10.4 by @dependabot in https://github.com/spiceai/spiceai/pull/1896
Add ‘Content-Type’ to metrics exporter to be prometheus exposition format compliant by @sgrebnov in https://github.com/spiceai/spiceai/pull/1897
Update enforce-labels.yml so it accepts depdenabot updates with kind/… by @digadeesh in https://github.com/spiceai/spiceai/pull/1898

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.15.0-alpha...v0.15.1-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.15-alpha (July 1, 2024)

By Spice AI (@spice_ai) | Monday, July 01, 2024

The v0.15-alpha release introduces support for streaming databases changes with Change Data Capture (CDC) into accelerated tables via a new Debezium connector, configurable retry logic for data refresh, and the release of a new C# SDK to build with Spice in Dotnet.

Highlights in v0.15-alpha

Debezium data connector with Change Data Capture (CDC): Sync accelerated datasets with Debezium data sources over Kafka in real-time.
Data Refresh Retries: By default, accelerated datasets attempt to retry data refreshes on transient errors. This behavior can be configured using refresh_retry_enabled and refresh_retry_max_attempts.
C# Client SDK: A new C# Client SDK has been released for developing applications in Dotnet.

Debezium data connector with Change Data Capture (CDC)

Integrating Debezium CDC is straightforward. Get started with the Debezium CDC Sample, read more about CDC in Spice, and read the Debezium data connector documentation.

Example Spicepod using Debezium CDC:

datasets:
  - from: debezium:cdc.public.customer_addresses
    name: customer_addresses_cdc
    params:
      debezium_transport: kafka
      debezium_message_format: json
      kafka_bootstrap_servers: localhost:19092
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      refresh_mode: changes

Data Refresh Retries

Example Spicepod configuration limiting refresh retries to a maximum of 10 attempts:

datasets:
  - from: eth.blocks
    name: blocks
    acceleration:
      refresh_retry_enabled: true
      refresh_retry_max_attempts: 10
      refresh_check_interval: 30s

Breaking Changes

None.

New Contributors

@rupurt made their first contribution in https://github.com/spiceai/spiceai/pull/1791

Contributors

What’s Changed

Dependencies

No major dependency updates.

Commits

Update version to 0.15.0-alpha by @ewgenius in https://github.com/spiceai/spiceai/pull/1784
Update helm for v0.14.1-alpha by @ewgenius in https://github.com/spiceai/spiceai/pull/1786
Run PR checks on PRs merging into feature-- branches by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1788
Enable retries for accelerated table refresh by @sgrebnov in https://github.com/spiceai/spiceai/pull/1762
enable more tpch benchmark queries as a result of decimal unparsing by @y-f-u in https://github.com/spiceai/spiceai/pull/1790
add nix flake by @rupurt in https://github.com/spiceai/spiceai/pull/1791
Support local and HF embedding models by @Jeadie in https://github.com/spiceai/spiceai/pull/1789
fix(bin/spice): Implement custom Unmarshaller for DatasetOrReference by @peasee in https://github.com/spiceai/spiceai/pull/1787
For windows, move symlink -> symlink_file. by @Jeadie in https://github.com/spiceai/spiceai/pull/1793
docs: Add PULL_REQUEST_TEMPLATE.md by @peasee in https://github.com/spiceai/spiceai/pull/1794
Fix Unsupported DataType: conversion for time predicates by @sgrebnov in https://github.com/spiceai/spiceai/pull/1795
Use incremental backoff for initial dataset registration retries by @sgrebnov in https://github.com/spiceai/spiceai/pull/1805
Basic HTTP/S connector by @Jeadie in https://github.com/spiceai/spiceai/pull/1792
Scale support for Snowflake fixed-point numbers by @sgrebnov in https://github.com/spiceai/spiceai/pull/1804
bump datafusion federation to resolve the join query failures by @y-f-u in https://github.com/spiceai/spiceai/pull/1806
fix: Stream PostgreSQL data in by @peasee in https://github.com/spiceai/spiceai/pull/1798
Remove clippy::module_name_repetitions lint by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1812
Improve Snowflake fixed-point numbers casting by @sgrebnov in https://github.com/spiceai/spiceai/pull/1809
Case insensitive secret getter by @ewgenius in https://github.com/spiceai/spiceai/pull/1813
refactor: Format TOML with Taplo by @peasee in https://github.com/spiceai/spiceai/pull/1808
feat: Update PR template, add label enforcement in PR by @peasee in https://github.com/spiceai/spiceai/pull/1815
fix bug that append may miss updates when the incremental changes are not able to be contained in one record batch by @y-f-u in https://github.com/spiceai/spiceai/pull/1817
add integration test for inner join across federated table and accelerated table by @y-f-u in https://github.com/spiceai/spiceai/pull/1811
Unify spicepod.llms into spicepod.models and refactor UX of spicepod.models by @Jeadie in https://github.com/spiceai/spiceai/pull/1818
Fix issue with querying accelerated tables where the dataset name has a schema by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1823
Fix schema support for refresh_sql and improve e2e tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/1826
feat: Add GraphQL unnesting by @peasee in https://github.com/spiceai/spiceai/pull/1822
fix: Allow kind/optimization labels, increase Postgres test timeout by @peasee in https://github.com/spiceai/spiceai/pull/1830
Implement Real-time acceleration updates via Debezium CDC by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1832
Remove println statement from PG Connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/1835
Don’t try to “hot reload” Debezium accelerated datasets by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1837
Create v1/search that performs vector search. by @Jeadie in https://github.com/spiceai/spiceai/pull/1836
Align spicepod UX of embeddings with models by @Jeadie in https://github.com/spiceai/spiceai/pull/1829
Add "cmake-build" feature to rdkafka for windows by @Jeadie in https://github.com/spiceai/spiceai/pull/1840
Add a better error message when trying to configure refresh_mode=changes on a data connector that doesn’t support it. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1839

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.14.1-alpha...v0.15.0-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.14.1-alpha (June 24, 2024)

By Spice AI (@spice_ai) | Monday, June 24, 2024

The v0.14.1-alpha release is focused on quality, stability, and type support with improvements in PostgreSQL, DuckDB, and GraphQL data connectors.

Highlights

PostgreSQL acceleration and data connector: Support for Composite Types and UUID data types.
DuckDB acceleration and data connector: Support for LargeUTF8 and DuckDB functions.
GraphQL data connector: Improved error handling on invalid query syntax.
Refresh SQL: Improved stability when overwriting STRUCT data types.

Breaking Changes

None.

New Contributors

@phungleson made their first contribution in https://github.com/spiceai/spiceai/pull/1750
@peasee made their first contribution in https://github.com/spiceai/spiceai/pull/1769

Contributors

@lukekim
@y-f-u
@ewgenius
@phillipleblanc
@Jeadie
@sgrebnov
@gloomweaver
@phungleson
@peasee
@digadeesh

What’s Changed

Dependencies

No major dependency updates.

Commits

Update Helm to v0.14.0-alpha by @sgrebnov in https://github.com/spiceai/spiceai/pull/1720
Update version to 0.14.1-alpha by @sgrebnov in https://github.com/spiceai/spiceai/pull/1721
Use spiceai/async-openai to solve Deserialize issue in v1/embed by @Jeadie in https://github.com/spiceai/spiceai/pull/1707
Add greatest least user defined functions by @y-f-u in https://github.com/spiceai/spiceai/pull/1722
default timeunit to be seconds when time column is a numeric column by @y-f-u in https://github.com/spiceai/spiceai/pull/1727
use system conf to construct dns resolver by @y-f-u in https://github.com/spiceai/spiceai/pull/1728
fix a bug that dataset refresh api does not work for table with schema by @y-f-u in https://github.com/spiceai/spiceai/pull/1729
Move secret crate to runtime module by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1723
Return schema in get_flight_info_simple by @gloomweaver in https://github.com/spiceai/spiceai/pull/1724
Refactor vector search component of v1/assist into a VectorSearch struct by @Jeadie in https://github.com/spiceai/spiceai/pull/1699
Update ROADMAP.md. Fix a broken link for the “Get in touch” link. by @digadeesh in https://github.com/spiceai/spiceai/pull/1725
Secret keys in params should be case insensitive by @ewgenius in https://github.com/spiceai/spiceai/pull/1737
expose error log when refresh encountered some issue, also add more debug logs by @y-f-u in https://github.com/spiceai/spiceai/pull/1739
Support Struct in PostgreSQL accelerator by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1733
rewrite refresh append update dedup logic using arrow comparators by @y-f-u in https://github.com/spiceai/spiceai/pull/1743
Add health checks when loading {llms, embeddings} by @Jeadie in https://github.com/spiceai/spiceai/pull/1738
Support DuckDB function in DuckDB datasets by @Jeadie in https://github.com/spiceai/spiceai/pull/1742
Update version of spiceai/duckdb-rs, support LargeUTF8 by @Jeadie in https://github.com/spiceai/spiceai/pull/1746
Split refresh into coordination and execution layers by @sgrebnov in https://github.com/spiceai/spiceai/pull/1744
bump duckdb rs git sha to resolve duckdb incorrect null value issue by @y-f-u in https://github.com/spiceai/spiceai/pull/1747
cargo.lock file update with #1747 duckdb-rs sha by @y-f-u in https://github.com/spiceai/spiceai/pull/1748
Fix error when GraphQL error locations is missing by @phungleson in https://github.com/spiceai/spiceai/pull/1750
Tweak refresh scheduling logic by @sgrebnov in https://github.com/spiceai/spiceai/pull/1749
Ensure tonic package is in duckdb feature by @Jeadie in https://github.com/spiceai/spiceai/pull/1756
Change tonic::async_trait -> async_trait::async_trait by @Jeadie in https://github.com/spiceai/spiceai/pull/1757
Streaming in v1/chat/completion by @Jeadie in https://github.com/spiceai/spiceai/pull/1741
Add refresh_retry_enabled/max_attempts acceleration params by @sgrebnov in https://github.com/spiceai/spiceai/pull/1753
Implement refresh retry based on fibonacci backoff (not enabled) by @sgrebnov in https://github.com/spiceai/spiceai/pull/1752
Add VSCode debug target to debug runtime benchmark test by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1760
update spiceai datafusion to include more unparser rules by @y-f-u in https://github.com/spiceai/spiceai/pull/1764
Show UUID types as String instead of base64 binary. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1767
docs: Add linux contributor guide for setup by @peasee in https://github.com/spiceai/spiceai/pull/1769
Do not expose connection url on object store error by @ewgenius in https://github.com/spiceai/spiceai/pull/1761
Support secrets in llm and embeddings params by @ewgenius in https://github.com/spiceai/spiceai/pull/1770
Bump github.com/hashicorp/go-retryablehttp from 0.7.1 to 0.7.7 by @dependabot in https://github.com/spiceai/spiceai/pull/1775
Update ROADMAP.md with latest roadmap changes for v0.15.0 by @digadeesh in https://github.com/spiceai/spiceai/pull/1773
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/1776
Strip kwarg ‘=’ in DuckDB function parsing by @Jeadie in https://github.com/spiceai/spiceai/pull/1777

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.14.0-alpha...v0.14.1-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.14-alpha (June 17, 2024)

By Spice AI (@spice_ai) | Monday, June 17, 2024

The v0.14-alpha release focuses on enhancing accelerated dataset performance and data integrity, with support for configuring primary keys and indexes. Additionally, the GraphQL data connector been introduced, along with improved dataset registration and loading error information.

Highlights

Accelerated Datasets: Ensure data integrity using primary key and unique index constraints. Configure conflict handling to either upsert new data or drop it. Create indexes on frequently filtered columns for faster queries on larger datasets.
GraphQL Data Connector: Initial support for using GraphQL as a data source.

Example Spicepod showing how to use primary keys and indexes with accelerated datasets:

datasets:
  - from: eth.blocks
    name: blocks
    acceleration:
      engine: duckdb # Use DuckDB acceleration engine
      primary_key: '(hash, timestamp)'
      indexes:
        number: enabled # same as `CREATE INDEX ON blocks (number);`
        '(number, hash)': unique # same as `CREATE UNIQUE INDEX ON blocks (number, hash);`
      on_conflict:
        '(hash, number)': drop # possible values: drop (default), upsert
        '(hash, timestamp)': upsert

Primary Keys, constraints, and indexes are currently supported when using SQLite, DuckDB, and PostgreSQL acceleration engines.

Learn more with the indexing quickstart and the primary key sample.

Read the Local Acceleration documentation.

Breaking Changes

None.

Contributors

@phillipleblanc
@ewgenius
@sgrebnov
@Jeadie
@digadeesh
@gloomweaver
@y-f-u
@lukekim
@edmondop

What’s Changed

Dependencies

Apache DataFusion: Upgraded from 38.0.0 to 39.0.0
Apache Arrow/Parquet: Upgraded from 51.0.0 to 52.0.0
Rust: Upgraded from 1.78.0 to 1.79.0

Commits

Update Helm chart for v0.13.3-alpha by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1671
Bump version to v0.14.0-alpha by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1673
Dependency upgrades: DataFusion 39, Arrow/Parquet 52, object_store 0.10.1, arrow-odbc 11.1.0 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1674
Generate unique runtime instance name and store in runtime.metrics table by @ewgenius in https://github.com/spiceai/spiceai/pull/1678
Proper support for Snowflake TIMESTAMP_NTZ by @sgrebnov in https://github.com/spiceai/spiceai/pull/1677
Enable tpch_q2 and tpch_q21 in the benchmark queries by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1679
Start runtime metrics recorder after loading secrets and extensions by @ewgenius in https://github.com/spiceai/spiceai/pull/1680
Validate table constraints (Primary Keys/Unique Index) on accelerated tables by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1658
Store labels as JSON string in runtime.metrics by @ewgenius in https://github.com/spiceai/spiceai/pull/1681
Atomic updates for DuckDB tables with constraints by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1682
Rename metrics column labels to properties and make it nullable by @ewgenius in https://github.com/spiceai/spiceai/pull/1686
Fix federation_optimizer_rule schema error for tpch_q7, tpch_q8, tpch_q9, tpch_q14 by @sgrebnov in https://github.com/spiceai/spiceai/pull/1683
Better prompt for /v1/assist by @Jeadie in https://github.com/spiceai/spiceai/pull/1685
Support stream in v1/assist by @Jeadie in https://github.com/spiceai/spiceai/pull/1653
Fix cache hit rate chart loading for Grafana v9.5 by @sgrebnov in https://github.com/spiceai/spiceai/pull/1691
Update ROADMAP.md to include data connector statuses by @digadeesh in https://github.com/spiceai/spiceai/pull/1684
Support primary_key in Spicepod and create in accelerated table by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1687
Datasets with schema support for availability monitoring by @sgrebnov in https://github.com/spiceai/spiceai/pull/1690
Improve dataset registration output by @sgrebnov in https://github.com/spiceai/spiceai/pull/1692
Readme: update dataset registration traces by @sgrebnov in https://github.com/spiceai/spiceai/pull/1694
Improved error logging for datasets load error by @edmondop in https://github.com/spiceai/spiceai/pull/1695
Improve ArrayDistance scalar UDF by @Jeadie in https://github.com/spiceai/spiceai/pull/1697
Implement on_conflict behavior for accelerated tables with constraints by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1688
Fix datasets live update (Spice file watcher) by @sgrebnov in https://github.com/spiceai/spiceai/pull/1702
Grafana Dashboard: replace Quantile with Percentile filter by @sgrebnov in https://github.com/spiceai/spiceai/pull/1703
refresh with append overlap by @y-f-u in https://github.com/spiceai/spiceai/pull/1706
Fix error message on DuckDB constraint violation by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1709
Add warning when configuring indexes/primary_key/on_conflict for Arrow engine. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1710
ensure schema to be existing when query timestamp during refresh by @y-f-u in https://github.com/spiceai/spiceai/pull/1711
Improve README clarity and add comparison table by @lukekim in https://github.com/spiceai/spiceai/pull/1713
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/1716
Update README.md to include GraphQL data connector in supported table by @digadeesh in https://github.com/spiceai/spiceai/pull/1717
Fix quoting issue for databricks identifier by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1718

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.13.3-alpha...v0.14.0-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.13.3-alpha (June 10, 2024)

By Spice AI (@spice_ai) | Monday, June 10, 2024

The v0.13.3-alpha release is focused on quality and stability with improvements to metrics, telemetry, and operability.

Highlights

Ready API: - Add /v1/ready API that returns success once all datasets and models are loaded and ready.

Enhanced Grafana dashboard: The dashboard now includes charts for query duration and failures, the last update time of accelerated datasets, the count of refresh errors, and the last successful time the runtime was able to access federated datasets

Contributors

@Jeadie
@ewgenius
@phillipleblanc
@sgrebnov
@gloomweaver
@y-f-u
@mach-kernel

What’s Changed

Dependencies

DuckDB 1.0.0: Upgrades embedded DuckDB to 1.0.0.

Commits

Scalar UDF array_distance as euclidean distance between Float32[] by @Jeadie in https://github.com/spiceai/spiceai/pull/1601
Update version to v0.14.0-alpha by @ewgenius in https://github.com/spiceai/spiceai/pull/1614
Update helm for v0.13.2-alpha by @ewgenius in https://github.com/spiceai/spiceai/pull/1618
Upgrade duckdb-rs to DuckDB 1.0.0 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1615
initial idea for ‘POST v1/assist’ by @Jeadie in https://github.com/spiceai/spiceai/pull/1585
openai server trait and move HTTP endpoints to crates/runtime/src/http/v1/ by @Jeadie in https://github.com/spiceai/spiceai/pull/1619
Add branching policy & updated endgame instructions by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1620
Update Cargo.lock & add CI check for updated Cargo.lock by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1627
Add first-class support for views by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1622
Add /v1/ready API that returns 200 when all datasets have loaded by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1629
Separate NQL logic from LLM Chat messages, and add OpenAI compatiblility per LLM trait. by @Jeadie in https://github.com/spiceai/spiceai/pull/1628
Log queries failing on get_flight_info step (Flight Api) by @sgrebnov in https://github.com/spiceai/spiceai/pull/1626
Graphql Data Connector by @gloomweaver in https://github.com/spiceai/spiceai/pull/1624
GraphQL improved Error formatting, proper format request body by @gloomweaver in https://github.com/spiceai/spiceai/pull/1637
Fix v1/assist response and panic bug. Include primary keys in response too by @Jeadie in https://github.com/spiceai/spiceai/pull/1635
skip integration test if no secret by @y-f-u in https://github.com/spiceai/spiceai/pull/1638
[append] Refresher::get_latest_timestamp / get_df to add refresh_sql predicates to scan by @mach-kernel in https://github.com/spiceai/spiceai/pull/1636
GraphQL integration test by @gloomweaver in https://github.com/spiceai/spiceai/pull/1600
Add err_code to query_failures metric by @sgrebnov in https://github.com/spiceai/spiceai/pull/1639
use epoch_ms to replace epoch to work with timestamptz by @y-f-u in https://github.com/spiceai/spiceai/pull/1641
fix the schema mismatch issue on the fallback plan use schema casting by @y-f-u in https://github.com/spiceai/spiceai/pull/1642
bug report template update by @y-f-u in https://github.com/spiceai/spiceai/pull/1640
Add query duration, failures and accelerated dataset metrics to Grafana dashboard by @sgrebnov in https://github.com/spiceai/spiceai/pull/1598
Fix FTP/sftp support for ObjectStoreMetadataTable & ObjectStoreTextTable by @Jeadie in https://github.com/spiceai/spiceai/pull/1649
Support accelerated embedding tables in v1/assist by @Jeadie in https://github.com/spiceai/spiceai/pull/1648
GraphQL pagination, limit pushdown and refactor by @gloomweaver in https://github.com/spiceai/spiceai/pull/1643
Support indexes in accelerated tables by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1644
Federated datasets availability monitoring by @sgrebnov in https://github.com/spiceai/spiceai/pull/1650
Reset federated dataset availability during dataset registration by @sgrebnov in https://github.com/spiceai/spiceai/pull/1661
Change to v0.13.3-alpha by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1666
Add Time Since Offline chart to Grafana dashboard by @sgrebnov in https://github.com/spiceai/spiceai/pull/1664

readme fix to correct the number of rows for show tables by @y-f-u in https://github.com/spiceai/spiceai/pull/1667
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/1668
Add missing dependency on arrow_sql_gen from duckdb data_component by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1669

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.13.2-alpha...v0.13.3-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.13.2-alpha (June 3, 2024)

By Spice AI (@spice_ai) | Monday, June 03, 2024

The v0.13.2-alpha release is focused on quality and stability with improvements to federated query push-down, telemetry, and query history.

Highlights

Filesystem Data Connector: Adds the Filesystem Data Connector for directly using files as data sources.
Federated Query Push-Down: Improved stability and schema compatibility for federated queries.
Enhanced Telemetry: Runtime Metrics now include last update time for accelerated datasets, count of refresh errors, and new metrics for query duration and failures.
Query History: Enabled query history logging for Arrow Flight queries in addition to HTTP queries.

Contributors

@lukekim
@y-f-u
@ewgenius
@phillipleblanc
@Jeadie
@Sevenannn
@sgrebnov
@gloomweaver
@mach-kernel

What’s Changed

Update ROADMAP.md May 27, 2024 by @lukekim in https://github.com/spiceai/spiceai/pull/1535
update helm chart version and use v0.13.1-alpha by @y-f-u in https://github.com/spiceai/spiceai/pull/1536
version correction in v0.13.1 release note by @y-f-u in https://github.com/spiceai/spiceai/pull/1538
update version to v0.14.0-alpha by @y-f-u in https://github.com/spiceai/spiceai/pull/1539
Update spice_cloud - connect to cloud api by @ewgenius in https://github.com/spiceai/spiceai/pull/1523
Update spice_cloud extension params, and remove logging by @ewgenius in https://github.com/spiceai/spiceai/pull/1541
Update MSRV to 1.78 and remove unused Rust Version parameter in CI by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1540
Improve llm UX in spicepod.yaml by @Jeadie in https://github.com/spiceai/spiceai/pull/1545
Store local runtime metrics in Timestamp with nanoseconds precision and UTC time by @ewgenius in https://github.com/spiceai/spiceai/pull/1548
Object store metadata Table provider by @Jeadie in https://github.com/spiceai/spiceai/pull/1518
Remove clickhouse password requirement by @Sevenannn in https://github.com/spiceai/spiceai/pull/1547
pretty print loaded rows number by @y-f-u in https://github.com/spiceai/spiceai/pull/1553
Fix UNION ALL federated push down by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1550
Update mistral, fix bugs and improve local file DX by @Jeadie in https://github.com/spiceai/spiceai/pull/1552
Cast runtime.metrics schema, if remote (spiceai) data connector provided by @ewgenius in https://github.com/spiceai/spiceai/pull/1554
Use proper MySQL dialect during federation push-down by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1555
parallel load dataset when starting up by @y-f-u in https://github.com/spiceai/spiceai/pull/1551
fix linter warning on Scanf return value by @y-f-u in https://github.com/spiceai/spiceai/pull/1556
Update spice cloud connect api endpoint by @ewgenius in https://github.com/spiceai/spiceai/pull/1557
Create new HTTP endpoint to create embeddings. by @Jeadie in https://github.com/spiceai/spiceai/pull/1558
Query History support for Flight API by @sgrebnov in https://github.com/spiceai/spiceai/pull/1549
Don’t cache queries for runtime tables by @sgrebnov in https://github.com/spiceai/spiceai/pull/1561
Fix schema incompatibility on federated push-down queries by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1560
move ’embeddings’ to top-level concept in spicepod.yaml by @Jeadie in https://github.com/spiceai/spiceai/pull/1564
object_store table provider for UTF8 data formats by @Jeadie in https://github.com/spiceai/spiceai/pull/1562
Improve connectivity for JDBC clients, like Tableau by @sgrebnov in https://github.com/spiceai/spiceai/pull/1563
Enable datasets from local filesystem by @Jeadie in https://github.com/spiceai/spiceai/pull/1584
Adds benchmarking tests for Spice by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1577
Push down correct timestamp expr to SQLite, add binary type mapping by @mach-kernel in https://github.com/spiceai/spiceai/pull/1566
Add query_duration_seconds and query_failures metrics by @sgrebnov in https://github.com/spiceai/spiceai/pull/1575
Use /app as a default workdir in spiceai docker image by @ewgenius in https://github.com/spiceai/spiceai/pull/1586
Add support for both file:// and file:/ by @Jeadie in https://github.com/spiceai/spiceai/pull/1587
put load_datasets as the latest step along with start_servers by @y-f-u in https://github.com/spiceai/spiceai/pull/1559
Embedding columns (from embedding providers) are now run inside datafusion plans. by @Jeadie in https://github.com/spiceai/spiceai/pull/1576
Support BinaryArray in DuckDB accelerations by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1595
Add cache header to Flight API and Spice REPL indicator by @sgrebnov in https://github.com/spiceai/spiceai/pull/1591
Add accelerated datasets refresh metrics by @sgrebnov in https://github.com/spiceai/spiceai/pull/1589
update the error when starting spice sql with no runtime to be actionable by @digadeesh in https://github.com/spiceai/spiceai/pull/1597
add odbc integration test by @y-f-u in https://github.com/spiceai/spiceai/pull/1590
Fix bug in instantiating EmbeddingConnector by @Jeadie in https://github.com/spiceai/spiceai/pull/1592
readme change to reflect new cli output by @y-f-u in https://github.com/spiceai/spiceai/pull/1602
Update version v0.13.2 by @ewgenius in https://github.com/spiceai/spiceai/pull/1604
Roadmap changes Jun 3, 2024 by @lukekim in https://github.com/spiceai/spiceai/pull/1609

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.13.1-alpha...v0.13.2

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.13.1-alpha (May 27, 2024)

By Spice AI (@spice_ai) | Monday, May 27, 2024

The v0.13.1-alpha release of Spice is a minor update focused on stability, quality, and operability. Query result caching provides protection against bursts of queries and schema support for datasets has been added logical grouping. An issue where Refresh SQL predicates were not pushed down underlying data sources has been resolved along with improved Acceleration Refresh logging.

Highlights in v0.13.1-alpha

Results Caching: Introduced query results caching to handle bursts of requests and support caching of non-accelerated results, such as refresh data returned on zero results. Results caching is enabled by default with a 1s item time-to-live (TTL). Learn more.
Query History Logging: Recent queries are now logged in the new spice.runtime.query_history dataset with a default retention of 24-hours. Query history is initially enabled for HTTP queries only (not Arrow Flight queries).
Dataset Schemas: Added support for dataset schemas, allowing logical grouping of datasets by separating the schema name from the table name with a .. E.g.
```
datasets:
  - from: mysql:app1.identities
    name: app.users

  - from: postgres:app2.purchases
    name: app.purchases
```
In this example, queries against app.users will be federated to my_schema.my_table, and app.purchases will be federated to app2.purchases.

Contributors

@y-f-u @Jeadie @sgrebnov @ewgenius @phillipleblanc @lukekim @gloomweaver @Sevenannn

New in this release

Add more type support on mysql connector by @y-f-u in https://github.com/spiceai/spiceai/pull/1449
Add in-memory caching support for Arrow Flight queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/1450
Fix the table reference to use the full table reference, not just the table by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1460
Make file_format parameter required for S3/FTP/SFTP connector by @ewgenius in https://github.com/spiceai/spiceai/pull/1455
Add more verbose logging when acceleration refresh update is finished by @y-f-u in https://github.com/spiceai/spiceai/pull/1453
Fix snowflake dataset path when using federation query by @y-f-u in https://github.com/spiceai/spiceai/pull/1474
Update cargo to use spiceai datafusion fork by @y-f-u in https://github.com/spiceai/spiceai/pull/1475
Enable in-memory results caching by default by @sgrebnov in https://github.com/spiceai/spiceai/pull/1473
Add basic integration test for MySQL federation by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1477
Update results_cache config names per final spec by @sgrebnov in https://github.com/spiceai/spiceai/pull/1487
Add DuckDB quickstart to E2E tests by @lukekim in https://github.com/spiceai/spiceai/pull/1461
Add X-Cache header for http queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/1472
Add telemetry for in-memory caching by @sgrebnov in https://github.com/spiceai/spiceai/pull/1456
Pin Git dependencies to a specific commit hash by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1490
Detect file_format from dataset path by @ewgenius in https://github.com/spiceai/spiceai/pull/1489
Add file_format to helm chart sample dataset by @ewgenius in https://github.com/spiceai/spiceai/pull/1493
Improve duckdb data connector error messages by @Sevenannn in https://github.com/spiceai/spiceai/pull/1486
Add file_format prompt for s3 and ftp datasets in Dataset Configure CLI if no extension detected by @ewgenius in https://github.com/spiceai/spiceai/pull/1494
Add llms to the spicepod definition and use throughout by @Jeadie in https://github.com/spiceai/spiceai/pull/1447
Fix duckdb acceleration converting null into default values. by @y-f-u in https://github.com/spiceai/spiceai/pull/1500
Separate runtime Dataset from spicepod Dataset by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1503
Duckdb e2e test OSX support by @y-f-u in https://github.com/spiceai/spiceai/pull/1505
Use TableReference for dataset name by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1506
Tweak Results Cache naming and output by @lukekim in https://github.com/spiceai/spiceai/pull/1509
Fix refresh_sql not properly passing down filters by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1510
Allow datasets to specify a schema by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1507
Cache invalidation for accelerated tables by @sgrebnov in https://github.com/spiceai/spiceai/pull/1498
Improve spark data connector error messages by @Sevenannn in https://github.com/spiceai/spiceai/pull/1497
Parse postgres table schema from prepare statement to support empty tables by @ewgenius in https://github.com/spiceai/spiceai/pull/1445
Improve clarity of README and add FAQ by @lukekim in https://github.com/spiceai/spiceai/pull/1512
Use binary data transfer for ftp by @gloomweaver in https://github.com/spiceai/spiceai/pull/1517
Add support for time64 for SQL insertion statement by @y-f-u in https://github.com/spiceai/spiceai/pull/1519
Add Spice Extensions PoC by @ewgenius in https://github.com/spiceai/spiceai/pull/1476
Add results cache metrics, pod and quantile filters to Grafana dashboard by @sgrebnov in https://github.com/spiceai/spiceai/pull/1513
Add unit tests for results caching utils by @sgrebnov in https://github.com/spiceai/spiceai/pull/1514
Add E2E tests for results caching by @sgrebnov in https://github.com/spiceai/spiceai/pull/1515
Pass table_reference full string into spark_session table so it can query across schemas or catalogs by @y-f-u in https://github.com/spiceai/spiceai/pull/1521
Trace on debug level for tables in runtime schema by @ewgenius in https://github.com/spiceai/spiceai/pull/1524
Update SparkSessionBuilder::remote and update spark fork hash by @Sevenannn in https://github.com/spiceai/spiceai/pull/1495
Fix federation push-down for datasets with schemas by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1526
Store history of queries in ‘spice.runtime.query_history’ by @Jeadie in https://github.com/spiceai/spiceai/pull/1501
Disable cache for system queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/1528
Register runtime tables with runtime schema by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1532
Fix acknowledgments workflow to include all cargo features by @Jeadie in https://github.com/spiceai/spiceai/pull/1531

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.13.0-alpha...v0.13.1-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.13-alpha (May 20, 2024)

By Spice AI (@spice_ai) | Monday, May 20, 2024

The v0.13.0-alpha release significantly improves federated query performance and efficiency with Query Push-Down. Query push-down allows SQL queries to be directly executed by underlying data sources, such as joining tables using the same data connector. Query push-down is supported for all SQL-based and Arrow Flight data connectors. Additionally, runtime metrics, including query duration, collected and accessed in the spice.runtime.metrics table. This release also includes a new FTP/SFTP data connector and improved CSV support for the S3 data connector.

Highlights

Federated Query Push-Down (#1394): All SQL and Arrow Flight data connectors support federated query push-down.
Runtime Metrics (#1361): Runtime metric collection can be enabled using the --metrics flag and accessed by the spice.runtime.metrics table.
FTP & SFTP data connector (#1355) (#1399): Added support for using FTP and SFTP as data sources.
Improved CSV support (#1411) (#1414): S3/FTP/SFTP data connectors support CSV files with expanded CSV options.

Contributors

@Jeadie
@digadeesh
@ewgenius
@gloomweaver
@lukekim
@phillipleblanc
@sgrebnov
@y-f-u

What’s Changed

Remove milestones from Enhancement template by @lukekim in https://github.com/spiceai/spiceai/pull/1373
Update version.txt and Cargo.toml to 0.13.0-alpha by @sgrebnov in https://github.com/spiceai/spiceai/pull/1375
Helm chart for Spice v0.12.2-alpha by @sgrebnov in https://github.com/spiceai/spiceai/pull/1374
Add release cargo feature to docker builds by @ewgenius in https://github.com/spiceai/spiceai/pull/1377
FTP connector by @gloomweaver in https://github.com/spiceai/spiceai/pull/1355
Provide ability to specify timeout for s3 data connector by @gloomweaver in https://github.com/spiceai/spiceai/pull/1378
clickhouse-rs use tag instead of branch by @gloomweaver in https://github.com/spiceai/spiceai/pull/1313
Store runtime metrics in spice.runtime.metrics table by @ewgenius in https://github.com/spiceai/spiceai/pull/1361
Update bug_report.md to include the kind/bug label by @digadeesh in https://github.com/spiceai/spiceai/pull/1381
Remove redundant [refresh] in log by @lukekim in https://github.com/spiceai/spiceai/pull/1384
Implement federation for DuckDB Data Connector (POC) by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1380
Update wording for spice cloud connection by @ewgenius in https://github.com/spiceai/spiceai/pull/1386
fix dataset refreshing status by @y-f-u in https://github.com/spiceai/spiceai/pull/1387
clickhouse friendly error by @y-f-u in https://github.com/spiceai/spiceai/pull/1388
Initial work for NQL crate and API by @Jeadie in https://github.com/spiceai/spiceai/pull/1366
Fully implement federation for all SqlTable-based Data Connectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1394
use df logical plan to query latest timestamp when refreshing incrementally by @y-f-u in https://github.com/spiceai/spiceai/pull/1393
Refactor datafusion.write_data to use table reference by @ewgenius in https://github.com/spiceai/spiceai/pull/1402
Add federation to FlightTable based DataConnectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1401
SFTP Data Connector by @gloomweaver in https://github.com/spiceai/spiceai/pull/1399
Use GPT3.5 for NSQL task by @Jeadie in https://github.com/spiceai/spiceai/pull/1400
Update ROADMAP May 16, 2024 by @lukekim in https://github.com/spiceai/spiceai/pull/1405
Add ftp/sftp connector to readme by @gloomweaver in https://github.com/spiceai/spiceai/pull/1404
Add FlightSQL federation provider by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1403
Refactor runtime metrics to use localhost accelerated table by @ewgenius in https://github.com/spiceai/spiceai/pull/1395
Use JSON response in OpenAI, text -> SQL model by @Jeadie in https://github.com/spiceai/spiceai/pull/1407
support more common csv options by @y-f-u in https://github.com/spiceai/spiceai/pull/1411
add a TLS error message in data connector and implement it for clickhouse by @y-f-u in https://github.com/spiceai/spiceai/pull/1413
Add CSV to s3 data formats by @gloomweaver in https://github.com/spiceai/spiceai/pull/1414
fix up dependencies now 0.5.0 disappeared by @Jeadie in https://github.com/spiceai/spiceai/pull/1417
Add NSQL to FlightRepl by @Jeadie in https://github.com/spiceai/spiceai/pull/1409
Update Cargo.lock by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1418
Enable spice.ai replication for runtime.metrics table by @ewgenius in https://github.com/spiceai/spiceai/pull/1408
Restructure the runtime struct to make it easier to test by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1420
Make it easier to construct an App programatically by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1421
Add an integration test for federation by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1426
wait 2 seconds for the status to turn ready in refreshing status test by @y-f-u in https://github.com/spiceai/spiceai/pull/1419
Add functional tests for federation push-down by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1428
Enable push-down federation by default by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1429
Add guides and examples about error handling by @ewgenius in https://github.com/spiceai/spiceai/pull/1427
Add LRU cache support for http-based queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/1410
Update README.md - Remove bigquery from tablet of connectors by @digadeesh in https://github.com/spiceai/spiceai/pull/1434
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/1433
CLI wording and logs change reflected on readme by @y-f-u in https://github.com/spiceai/spiceai/pull/1435
Add databricks_use_ssl parameter by @Sevenannn in https://github.com/spiceai/spiceai/pull/1406
Update helm version and use v0.13.0-alpha by @Jeadie in https://github.com/spiceai/spiceai/pull/1436
Don’t include feature ’llms/candles’ by default by @Jeadie in https://github.com/spiceai/spiceai/pull/1437
Correctly map NullBuilder for Null arrow types by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1438
Propagate object store error by @gloomweaver in https://github.com/spiceai/spiceai/pull/1415

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.12.2-alpha...v0.13.0-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.12.2-alpha (May 13, 2024)

By Spice AI (@spice_ai) | Monday, May 13, 2024

The v0.12.2-alpha release introduces data streaming and key-pair authentication for the Snowflake data connector, enables general append mode data refreshes for time-series data, improves connectivity error messages, adds nested folders support for the S3 data connector, and exposes nodeSelector and affinity keys in the Helm chart for better Kubernetes management.

Highlights

Improved Connectivity Error Messages: Error messages provide clearer, actionable guidance for misconfigured settings or unreachable data connectors.
Snowflake Data Connector Improvements: Enables data streaming by default and adds support for key-pair authentication in addition to passwords.
API for Refresh SQL Updates: Update dataset Refresh SQL via API.
Append Data Refresh: Append mode data refreshes for time-series data are now supported for all data connectors. Specify a dataset time_column with refresh_mode: append to only fetch data more recent than the latest local data.
Docker Image Update: The spiceai/spiceai:latest Docker image now includes the ODBC data connector. For a smaller footprint, use spiceai/spiceai:latest-slim.
Helm Chart Improvements: nodeSelector and affinity keys are now supported in the Helm chart for improved Kubernetes deployment management.

Breaking Changes

API to trigger accelerated dataset refreshes has changed from POST /v1/datasets/:name/refresh to POST /v1/datasets/:name/acceleration/refresh to be consistent with the Spicepod.yaml structure.

Contributors

@mach-kernel
@y-f-u
@sgrebnov
@ewgenius
@Jeadie
@Sevenannn
@digadeesh
@phillipleblanc
@lukekim

What’s Changed

Fix list type support in spark connect by @y-f-u in https://github.com/spiceai/spiceai/pull/1341
Add nested folder support in S3 Parquet connector by @y-f-u in https://github.com/spiceai/spiceai/pull/1342
Improves S3 connector using DataFusion ListingTable table provider by @y-f-u in https://github.com/spiceai/spiceai/pull/1326
Update ROADMAP May 6, 2024 by @lukekim in https://github.com/spiceai/spiceai/pull/1315
List flightsql and snowflake as supported connectors in README.md by @sgrebnov in https://github.com/spiceai/spiceai/pull/1317
Helm chart for v0.12.1-alpha by @ewgenius in https://github.com/spiceai/spiceai/pull/1323
Read sqlite_file param and use it as path by @Sevenannn in https://github.com/spiceai/spiceai/pull/1309
Compile spiced with release feature in docker image by @ewgenius in https://github.com/spiceai/spiceai/pull/1324
Add support for Snowflake key-pair authentication by @sgrebnov in https://github.com/spiceai/spiceai/pull/1314
Wrap postgres errors in common DataConnectorError by @ewgenius in https://github.com/spiceai/spiceai/pull/1327
Fix TPCH tests runner by @sgrebnov in https://github.com/spiceai/spiceai/pull/1330
Spice CLI support for Snowflake key-pair auth by @sgrebnov in https://github.com/spiceai/spiceai/pull/1325
sql_provider_datafusion: Support TimestampMicrosecond, Date32, Date64 by @mach-kernel in https://github.com/spiceai/spiceai/pull/1329
Resolve dangling reference for SQLite by @Sevenannn in https://github.com/spiceai/spiceai/pull/1312
Select columns from Spark Dataframe according to projected_schema by @Sevenannn in https://github.com/spiceai/spiceai/pull/1336
Expose nodeselector and affinity keys in Helm chart by @mach-kernel in https://github.com/spiceai/spiceai/pull/1338
Use streaming for Snowflake queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/1337
Publish ODBC images by @mach-kernel in https://github.com/spiceai/spiceai/pull/1271
Include Postgres acceleration engine to types support tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/1343
Refactor dataconnector providers getters to return common DataConnectorResult and DataConnectorError by @ewgenius in https://github.com/spiceai/spiceai/pull/1339
s3 csv support to validate the listing table extensibility by @y-f-u in https://github.com/spiceai/spiceai/pull/1344
Move model code into separate, feature-flagged crate by @Jeadie in https://github.com/spiceai/spiceai/pull/1335
Initial setup for federated queries by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1350
Refactor dbconnection errors, and catch invalid postgres table name case by @ewgenius in https://github.com/spiceai/spiceai/pull/1353
Rename default datafusion catalog to “spice”, add internal “spice.runtime” schema by @ewgenius in https://github.com/spiceai/spiceai/pull/1359
Add API to set Refresh SQL for accelerated table by @sgrebnov in https://github.com/spiceai/spiceai/pull/1356
Set next version to v0.12.2 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1367
Upgrade to DataFusion 38 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1368
Incremental append based on time column by @y-f-u in https://github.com/spiceai/spiceai/pull/1360
Update README.md to include correct output when running show tables from quickstart by @digadeesh in https://github.com/spiceai/spiceai/pull/1371

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.12.1-alpha...v0.12.2-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.12.1-alpha (May 6, 2024)

By Spice AI (@spice_ai) | Monday, May 06, 2024

The v0.12.1-alpha release introduces a new Snowflake data connector, support for UUID and TimestampTZ types in the PostgreSQL connector, and improved error messages across all data connectors. The Clickhouse data connector enables data streaming by default. The public SQL interface now restricts DML and DDL queries. Additionally, accelerated tables now fully support NULL values, and issues with schema conversion in these tables have been resolved.

Highlights

Snowflake Data Connector: Initial support for Snowflake as a data source.
Clickhouse Data Streaming: Enables data streaming by default, eliminating in-memory result collection.
Read-only SQL Interface: Disables DML (INSERT/UPDATE/DELETE) and DDL (CREATE/ALTER TABLE) queries for improved data source security.
Error Message Improvements: Improved the error messages for commonly encountered issues with data connectors.
Accelerated Tables: Supports NULL values across all data types and fixes schema conversion errors for consistent type handling.

Contributors

@ahirner
@y-f-u
@sgrebnov
@ewgenius
@Jeadie
@gloomweaver
@Sevenannn
@digadeesh
@phillipleblanc

What’s Changed

Add schema types check for query result by @sgrebnov in https://github.com/spiceai/spiceai/pull/1212
helm chart for v0.12.0-alpha by @y-f-u in https://github.com/spiceai/spiceai/pull/1235
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/1232
Bump spiceai version to v0.12.1-alpha by @ewgenius in https://github.com/spiceai/spiceai/pull/1239
Update ROADMAP.md - remove v0.12.0-alpha by @ewgenius in https://github.com/spiceai/spiceai/pull/1241
Raise errors in InsertBuilder by @Jeadie in https://github.com/spiceai/spiceai/pull/1242
Update endgame template by @ewgenius in https://github.com/spiceai/spiceai/pull/1240
Add E2E tests for acceleration engines types support by @sgrebnov in https://github.com/spiceai/spiceai/pull/1218
Stream blocks to arrow by @gloomweaver in https://github.com/spiceai/spiceai/pull/1203
Update enhancement.md to include a checklist item have a release notes entry for each enhancement. by @digadeesh in https://github.com/spiceai/spiceai/pull/1245
arrow_sql_gen data column conversion by @Sevenannn in https://github.com/spiceai/spiceai/pull/1230
Implement the Localhost Data Connector & fix DoPut by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1266
Update postgres parameter check by @Sevenannn in https://github.com/spiceai/spiceai/pull/1244
Record batch casting to fix SQLite data type issues by @y-f-u in https://github.com/spiceai/spiceai/pull/1261
typo fix on Decimal in postgres arrow_sql_gen by @y-f-u in https://github.com/spiceai/spiceai/pull/1277
Move verify_schema to arrow_tools by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1284
Support UUID and TimestampTZ type for Postgres as Data Connector by @ahirner & @y-f-u https://github.com/spiceai/spiceai/pull/1276
Fix linter warnings by @ewgenius in https://github.com/spiceai/spiceai/pull/1286
Add Snowflake data connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/1278
Add Snowflake login support (username and password) by @sgrebnov in https://github.com/spiceai/spiceai/pull/1272
convert timestamp properly in sql gen by @y-f-u in https://github.com/spiceai/spiceai/pull/1291
Add if not exists clause to create statement on when creating a table using duckdb acceleration. by @digadeesh in https://github.com/spiceai/spiceai/pull/1290
Disable DML & DDL queries in the public SQL interface by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1294
Refactor duckdb to properly set access_mode for connection by @ewgenius in https://github.com/spiceai/spiceai/pull/1285
do not insert batch for sqlite and postgres if no records in the record batch by @y-f-u in https://github.com/spiceai/spiceai/pull/1293
Postgres - add custom error message for invalid error table by @ewgenius in https://github.com/spiceai/spiceai/pull/1295
SQLite/Accelerators handle null values by @gloomweaver in https://github.com/spiceai/spiceai/pull/1298
Add command to attach to running process by @gloomweaver in https://github.com/spiceai/spiceai/pull/1297
Use the GITHUB_TOKEN environment variable in the installation script, if available, to avoid rate limiting in CI workflows by @ewgenius in https://github.com/spiceai/spiceai/pull/1302
Fix unsupported SSL mode options for PostgreSQL connection string by @ewgenius in https://github.com/spiceai/spiceai/pull/1300
Add CLI cmd spice login spark by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1303
Check only the latest published release to avoid installing pre-release versions by @ewgenius in https://github.com/spiceai/spiceai/pull/1301
Postgres data connector - handle invalid host/port and username/password errors by @ewgenius in https://github.com/spiceai/spiceai/pull/1292
Fix the panic on bad clickhouse connection by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1306
Improve Snowflake Data Connector by @sgrebnov https://github.com/spiceai/spiceai/pull/1296

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.12.0-alpha...v0.12.1-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.12-alpha (April 29, 2024)

By Spice AI (@spice_ai) | Monday, April 29, 2024

The v0.12-alpha release introduces Clickhouse and Apache Spark data connectors, adds support for limiting refresh data periods for temporal datasets, and includes upgraded Spice Client SDKs compatible with Spice OSS.

Highlights

Clickhouse data connector: Use Clickhouse as a data source with the clickhouse: scheme.
Apache Spark Connect data connector: Use Apache Spark Connect connections as a data source using the spark: scheme.
Refresh data window: Limit accelerated dataset data refreshes to the specified window, as a duration from now configuration setting, for faster and more efficient refreshes.
ODBC data connector: Use ODBC connections as a data source using the odbc: scheme. The ODBC data connector is currently optional and not included in default builds. It can be conditionally compiled using the odbc cargo feature when building from source.
Spice Client SDK Support: The official Spice SDKs have been upgraded with support for Spice OSS.

Breaking Changes

Refresh interval: The refresh_interval acceleration setting and been changed to refresh_check_interval to make it clearer it is the check versus the data interval.

Contributors

@phillipleblanc
@Jeadie
@ewgenius
@sgrebnov
@y-f-u
@lukekim
@digadeesh
@gloomweaver
@edmondop
@mach-kernel

New Contributors

Thanks to @mach-kernel who made their first contribution in https://github.com/spiceai/spiceai/pull/1204 by adding the ODBC data connector!

What’s Changed

Update helm version by @Jeadie in https://github.com/spiceai/spiceai/pull/1167
Handle and trace errors in secret stores by @ewgenius in https://github.com/spiceai/spiceai/pull/1149
bump the release versions to 0.12.0 by @y-f-u in https://github.com/spiceai/spiceai/pull/1171
Don’t fail acknowledgments flow if no changes detected by @ewgenius in https://github.com/spiceai/spiceai/pull/1170
Allow Spice CLI to control runtime installation on Windows by @sgrebnov in https://github.com/spiceai/spiceai/pull/1173
Allow SELECT count(*) for Sqlite Data Accelerator by @sgrebnov in https://github.com/spiceai/spiceai/pull/1166
add refresh_period param in acceleration by @y-f-u in https://github.com/spiceai/spiceai/pull/1180
Properly support Spark Connect filter pushdown by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1186
Avoid rate-limiting on arduino/setup-protoc@v3 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1189
Clickhouse DataConnector base implementation by @gloomweaver in https://github.com/spiceai/spiceai/pull/1168
rename refresh_interval to refresh_check_interval by @y-f-u in https://github.com/spiceai/spiceai/pull/1190
Fix timestamp & add support for Decimal to Databricks/Spark by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1194
Convert temporal column and refresh period to datafusion expr by @y-f-u in https://github.com/spiceai/spiceai/pull/1187
Hot reload accelerated table on dataset update by @ewgenius in https://github.com/spiceai/spiceai/pull/1195
Upgrade DataFusion to 37.1 & DuckDB to 10.2 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1200
Update version.txt for 0.11.2 release by @digadeesh in https://github.com/spiceai/spiceai/pull/1199
Clickhouse E2E by @gloomweaver in https://github.com/spiceai/spiceai/pull/1193
Clickhouse: fix darwin ci pipeline by @gloomweaver in https://github.com/spiceai/spiceai/pull/1201
Add table_type to show tables in Spice SQL & update next version to v0.12.0-alpha by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1206
print WARN if time_column does not exists in federated schema by @y-f-u in https://github.com/spiceai/spiceai/pull/1207
Add FallbackOnZeroResultsScanExec for executing an input ExecutionPlan and optionally falling back to a TableProvider.scan() if the input has zero results by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1196
Clickhouse refactor connection code and set secure option by @gloomweaver in https://github.com/spiceai/spiceai/pull/1198
E2E: reusable Spice installation by @sgrebnov in https://github.com/spiceai/spiceai/pull/1205
Clickhouse block_to_arrow unit test by @gloomweaver in https://github.com/spiceai/spiceai/pull/1202
rename refresh_period to refresh_data_period by @y-f-u in https://github.com/spiceai/spiceai/pull/1210
Refactor E2E tests: dataset verification and PostgreSQL installation by @sgrebnov in https://github.com/spiceai/spiceai/pull/1211
Add BI dashboard acceleration video to README.md by @lukekim in https://github.com/spiceai/spiceai/pull/1219
Improve clarity and consistency of output messages by @lukekim in https://github.com/spiceai/spiceai/pull/1214
Update ROADMAP Apr 29, 2024 by @lukekim in https://github.com/spiceai/spiceai/pull/1220
Stand-alone Spark Connect: Isolate Spark Connect from Databricks Connect to make it reusable by @edmondop in https://github.com/spiceai/spiceai/pull/1213
Optimize build time in dev mode by @gloomweaver in https://github.com/spiceai/spiceai/pull/1215
Feature: Support ODBC reads using unixodbc by @mach-kernel in https://github.com/spiceai/spiceai/pull/1204
Use non-fork deltalake by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1223
Support Date32 & Date64 in arrow_sql_gen by @Jeadie in https://github.com/spiceai/spiceai/pull/1217
Update REPL output to be consistent with the latest Spice version by @sgrebnov in https://github.com/spiceai/spiceai/pull/1231
rename refresh_data_period to refresh_data_window by @y-f-u in https://github.com/spiceai/spiceai/pull/1233
Update README.md to include ODBC, Spark Connect, and Clickhouse data connectors in support data connector matrix. by @digadeesh in https://github.com/spiceai/spiceai/pull/1234

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.11.1-alpha...v0.12.0-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.11.1-alpha (April 22, 2024)

By Spice AI (@spice_ai) | Monday, April 22, 2024

The v0.11.1-alpha release introduces retention policies for accelerated datasets, native Windows installation support, and integration of catalog and schema settings for the Databricks Spark connector. Several bugs have also been fixed for improved stability.

Highlights

Retention Policies for Accelerated Datasets: Automatic eviction of data from accelerated time-series datasets when a specified temporal column exceeds the retention period, optimizing resource utilization.
Windows Installation Support: Native Windows installation support, including upgrades.
Databricks Spark Connect Catalog and Schema Settings: Improved translation between DataFusion and Spark, providing better Spark Catalog support.

Contributors

@phillipleblanc
@Jeadie
@ewgenius
@sgrebnov
@y-f-u
@lukekim
@digadeesh
@Sevenannn
@gloomweaver

New in this release

PowerShell script to install Spice on Windows by @sgrebnov in https://github.com/spiceai/spiceai/pull/1128
Support catalog and schema in Databricks Spark Connect by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1137
Retention handlers by @y-f-u in https://github.com/spiceai/spiceai/pull/1096

What’s Changed

Update CONTRIBUTING with new dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1121
Fix the Helm tag by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1122
Upgrade Spice version to 0.11.1 by @sgrebnov in https://github.com/spiceai/spiceai/pull/1123
Remove 0.11 from roadmap by @ewgenius in https://github.com/spiceai/spiceai/pull/1124
Include refresh_sql and manual refresh to e2e tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/1125
Respect executables file extension on Windows by @sgrebnov in https://github.com/spiceai/spiceai/pull/1130
Use quoted strings when performing federated SQL queries by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1129
Make Windows artifact names consistent with other platforms by @sgrebnov in https://github.com/spiceai/spiceai/pull/1132
Make Windows installation less verbose by @sgrebnov in https://github.com/spiceai/spiceai/pull/1138
Document Windows installation and add test by @sgrebnov in https://github.com/spiceai/spiceai/pull/1134
Use transaction for DuckDB Table Writer by @Sevenannn in https://github.com/spiceai/spiceai/pull/1135
Update Windows installation script url by @sgrebnov in https://github.com/spiceai/spiceai/pull/1143
Update roadmap Apr 18, 2024 by @lukekim in https://github.com/spiceai/spiceai/pull/1142
Test connection when new connection pool created by @ewgenius in https://github.com/spiceai/spiceai/pull/1126
Enable clippy::clone_on_ref_ptr by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1146
Allow only alphanumeric dataset names when using spice dataset configure by @ewgenius in https://github.com/spiceai/spiceai/pull/1140
Extend PR check to build with no default features, and each individual feature by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1156
Bump rustls from 0.21.10 to 0.21.11 by @dependabot in https://github.com/spiceai/spiceai/pull/1150
Serde rule for ISO8601 time format by @y-f-u in https://github.com/spiceai/spiceai/pull/1151
Add static linking for vcruntime dependencies on Windows by @sgrebnov in https://github.com/spiceai/spiceai/pull/1152
Use clearer retention param key - retention_check_enabled instead by @y-f-u in https://github.com/spiceai/spiceai/pull/1158
spice upgrade on Windows by @sgrebnov in https://github.com/spiceai/spiceai/pull/1155

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.11.0-alpha...v0.11.1-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Spice v0.11-alpha (April 15, 2024)

By Spice AI (@spice_ai) | Monday, April 15, 2024

The Spice v0.11-alpha release significantly improves the Databricks data connector with Databricks Connect (Spark Connect) support, adds the DuckDB data connector, and adds the AWS Secrets Manager secret store. In addition, enhanced control over accelerated dataset refreshes, improved SSL security for MySQL and PostgreSQL connections, and overall stability improvements have been added.

Highlights in v0.11-alpha

DuckDB data connector: Use DuckDB databases or connections as a data source.

AWS Secrets Manager Secret Store: Use AWS Secrets Managers as a secret store.

Custom Refresh SQL: Specify a custom SQL query for dataset refresh using refresh_sql.

Dataset Refresh API: Trigger a dataset refresh using the new CLI command spice refresh or via API.

Expanded SSL support for Postgres: SSL mode now supports disable, require, prefer, verify-ca, verify-full options with the default mode changed to require. Added pg_sslrootcert parameter for setting a custom root certificate and the pg_insecure parameter is no longer supported.

Databricks Connect: Choose between using Spark Connect or Delta Lake when using the Databricks data connector for improved performance.

Improved SSL support for Postgres: ssl mode now supports disable, require, prefer, verify-ca, verify-full options with default mode changed to require. Added pg_sslrootcert parameter to allow setting custom root cert for postgres connector, pg_insecure parameter is no longer supported as redundant.

Internal architecture refactor: The internal architecture of spiced was refactored to simplify the creation data components and to improve alignment with DataFusion concepts.

New Contributors

@edmondop’s first contribution github.com/spiceai/spiceai/pull/1110!

Contributors

@phillipleblanc
@Jeadie
@ewgenius
@sgrebnov
@y-f-u
@lukekim
@digadeesh
@Sevenannn
@gloomweaver
@ahirner

New in this release

Fixes MySQL NULL values by @gloomweaver in https://github.com/spiceai/spiceai/pull/1067
Fixes PostgreSQL NULL values for NUMERIC by @gloomweaver in https://github.com/spiceai/spiceai/pull/1068
Adds Custom Refresh SQL support by @lukekim and @phillipleblanc in https://github.com/spiceai/spiceai/pull/1073
Adds DuckDB data connector by @Sevenannn in https://github.com/spiceai/spiceai/pull/1085
Adds AWS Secrets Manager secret store by @sgrebnov in https://github.com/spiceai/spiceai/pull/1063, https://github.com/spiceai/spiceai/pull/1064
Adds Dataset refresh API by @sgrebnov in https://github.com/spiceai/spiceai/pull/1075, https://github.com/spiceai/spiceai/pull/1078, https://github.com/spiceai/spiceai/pull/1083
Adds spice refresh CLI command for dataset refresh by @sgrebnov in https://github.com/spiceai/spiceai/pull/1112
Adds TEXT and DECIMAL types support and properly handling NULL for MySQL by @gloomweaver in https://github.com/spiceai/spiceai/pull/1067
Adds MySQL DATE and TINYINT types support for MySQL by @ewgenius in https://github.com/spiceai/spiceai/pull/1065
Adds ssl_rootcert_path parameter for MySql data connector by @ewgenius in https://github.com/spiceai/spiceai/pull/1079
Adds LargeUtf8 support and explicitly passing the schema to data accelerator SqlTable by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1077
Adds Ability to configure data retention for accelerated datasets by @y-f-u in https://github.com/spiceai/spiceai/issues/1086
Adds Custom SSL certificates for PostgreSQL data connector by @ewgenius in https://github.com/spiceai/spiceai/pull/1081
Adds Conditional compile for Dremio by @ahirner in https://github.com/spiceai/spiceai/pull/1100
Adds Ability for Databricks connector to use spark-connect-rs as the mechanism to execute queries against the Databricks by @edmondop in https://github.com/spiceai/spiceai/pull/1110
Adds Ability to choose between Spark Connect and Delta Lake implementation for Databricks by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1115/files
Updates Databricks login parameters by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1113
Updates Architecture to simplify data components development by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1040
Updates Improved readability of GitHub Actions test job names by @lukekim in https://github.com/spiceai/spiceai/pull/1071
Updates Upgrade Arrow, DataFusion, Tonic dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1097
Updates Handling non-string spicepod params by @ewgenius in https://github.com/spiceai/spiceai/pull/1098
Updates Optional features compile: duckdb, databricks by @ahirner in https://github.com/spiceai/spiceai/pull/1100
Updates Helm version to 0.1.3 by @Jeadie in https://github.com/spiceai/spiceai/pull/1120
Removes pg_insecure parameter support from Postgres by ewgenius in https://github.com/spiceai/spiceai/pull/1081

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.10.2-alpha...v0.11.0-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Announcing the release of Spice.ai v0.10.2-alpha

By Spice AI (@spice_ai) | Monday, April 08, 2024

Announcing the release of Spice v0.10.2-alpha (Apr 9, 2024)! 🔥

The v0.10.2-alpha release adds the MySQL data connector and makes external data connections more robust on initialization.

Highlights in v0.10.2-alpha

MySQL data connector: Connect to any MySQL server, including SSL support.
Data connections verified at initialization: Verify endpoints and authorization for external data connections (e.g. databricks, spice.ai) at initialization.

New Contributors

@rthomas made their first contribution in https://github.com/spiceai/spiceai/pull/1022
@ahirner made their first contribution in https://github.com/spiceai/spiceai/pull/1029
@gloomweaver made their first contribution in https://github.com/spiceai/spiceai/pull/1004

Contributors

@phillipleblanc
@y-f-u
@ewgenius
@sgrebnov
@lukekim
@digadeesh
@jeadie

New in this release

Adds MySQL data connector by @gloomweaver in https://github.com/spiceai/spiceai/pull/1004
Fixes show tables; parsing in the Spice SQL repl.
Adds data connector verification at initialization
- For Dremio by @sgrebnov in https://github.com/spiceai/spiceai/pull/1017
- For Databricks by @sgrebnov in https://github.com/spiceai/spiceai/pull/1019
- For Spice.ai by @sgrebnov in https://github.com/spiceai/spiceai/pull/1020
Fixes Ensures unit and doc tests compile and run by @rthomas in https://github.com/spiceai/spiceai/pull/1022
Improves Helm chart + Grafana dashboard by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1030
Fixes Makes data connectors optional features by @ahirner in https://github.com/spiceai/spiceai/pull/1029
Fixes Fixes SpiceAI E2E for external contributors in Github actions by @ewgenius in https://github.com/spiceai/spiceai/pull/1023
Fixes remove hardcoded lookback_size (& improve SpiceAI’s ModelSource) by @Jeadie in https://github.com/spiceai/spiceai/pull/1016

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.10.1-alpha...v0.10.2-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Announcing the release of Spice.ai v0.10.1-alpha

By Spice AI (@spice_ai) | Monday, April 01, 2024

Announcing the release of Spice v0.10.1-alpha! 🔥

The v0.10.1-alpha release focuses on stability, bug fixes, and usability by improving error messages when using SQLite data accelerators, improving the PostgreSQL support, and adding a basic Helm chart.

Highlights in v0.10.1-alpha

Improved PostgreSQL support for Data Connectors TLS is now supported with PostgreSQL Data Connectors and there is improved VARCHAR and BPCHAR conversions through Spice.

Improved Error messages Simplified error messages from Spice when propagating errors from Data Connectors and Accelerator Engines.

Spice Pods Command The spice pods command can give you quick statistics about models, dependencies, and datasets that are loaded by the Spice runtime.

Kubernetes Helm Deployment

Spice.ai can be deployed to Kubernetes using Helm. Here’s a quick guide to get started:

Step 1. (Optional) Start a local kind cluster:

go install sigs.k8s.io/[email protected]
kind create cluster

Step 2. Install Spice in your Kubernetes cluster using Helm:

helm repo add spiceai https://helm.spiceai.org
helm install spiceai spiceai/spiceai

Step 3. Verify that the Spice pods are running:

kubectl get pods
kubectl logs deploy/spiceai

Step 4. Run the Spice SQL REPL inside the running pod:

kubectl exec -it deploy/spiceai -- spiced --repl

Learn more about deploying Spice.ai to Kubernetes

Contributors

@phillipleblanc
@mitchdevenport
@ewgenius
@sgrebnov
@lukekim
@digadeesh

New in this release

Adds Basic Helm Chart for spiceai (https://github.com/spiceai/spiceai/pull/1002)
Adds Support for spice login in environments with no browser. (https://github.com/spiceai/spiceai/pull/994)
Adds TLS support in Postgres connector. (https://github.com/spiceai/spiceai/pull/970)
Fixes Improve Postgres VARCHAR and BPCHAR conversion. (https://github.com/spiceai/spiceai/pull/993)
Fixes spice pods Returns incorrect counts. (https://github.com/spiceai/spiceai/pull/998)
Fixes Return friendly error messages for unsupported types in sqlite. (https://github.com/spiceai/spiceai/pull/982)
Fixes Pass Tonic errors when receiving errors from dependencies. (https://github.com/spiceai/spiceai/pull/995)

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Adding Spice - The Next Generation of Spice.ai OSS

By Phillip LeBlanc (@leblancphill) | Thursday, March 28, 2024

TL;DR: We’ve rebuilt Spice.ai OSS from the ground up in Rust, as a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets sourced from any database, data warehouse or data lake. Learn more at github.com/spiceai/spiceai.

In September, 2021, we introduced Spice.ai OSS as a runtime for building AI-driven applications using time-series data.

We quickly ran into a big problems in making these applications work… data, the fuel for intelligent software, was painfully difficult to access, operationalize, and use, not only in machine learning, but also in web frontends, backend applications, dashboards, data pipelines, and notebooks. And we had to make hard tradeoffs between cost and query performance.

We felt this pain every day building 100TB+ scale data and AI systems for the Spice.ai Cloud Platform. So we took our learnings and infused them back into Spice.ai OSS with the capabilities we wished we had.

We rebuilt Spice.ai OSS from the ground up in Rust, as a unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse or data lake.

Figure 1. Spice.ai OSS

Spice is a fast, lightweight (< 150Mb), single-binary, designed to be deployed alongside your application, dashboard, and within your data or machine learning pipelines. Spice federates SQL query across databases (MySQL, PostgreSQL, etc.), data warehouses (Snowflake, BigQuery, etc.) and data lakes (S3, MinIO, Databricks, etc.) so you can easily use and combine data wherever it lives. Datasets, declaratively defined, can be materialized and accelerated using your engine of choice, including DuckDB, SQLite, PostgreSQL, and in-memory Apache Arrow records, for ultra-fast, low-latency query. Accelerated engines run in your infrastructure giving you flexibility and control over price and performance.

Before Spice

Figure 2. Before Spice, applications submit many queries to external data sources.

With Spice

Figure 3. With Spice, data is materialized and accelerated locally for fast, low-latency query.

Use-Cases

The next-generation of Spice.ai OSS enables:

Better applications. Accelerate and co-locate data with frontend and backend applications, for high concurrent queries, serving more users with faster page loads and data updates. Try the CQRS sample app.

Snappy dashboards, analytics, and BI. Faster, more responsive dashboards without massive compute costs. Spice supports Arrow Flight SQL (JDBC/ODBC/ADBC) for connectivity with Tableau, Looker, PowerBI, and more. Watch the Apache Superset with Spice demo.

Faster data pipelines, machine learning training and inference. Co-locate datasets with pipelines where the data is needed to minimize data-movement and improve query performance. Predict hard drive failure with the SMART data demo.

Easily query many data sources. Federated SQL query across databases, data warehouses, and data lakes using Data Connectors.

Community Built

Spice is open-source, Apache 2.0 licensed, and is built using industry-leading technologies including Apache DataFusion, Arrow, and Arrow Flight SQL. We’re launching with several built-in Data Connectors and Accelerators and Spice is extensible so more will be added in each release. If you’re interested in contributing, we’d love to welcome you to the community!

Getting Started

You can download and run Spice in less than 30 seconds by following the quickstart at github.com/spiceai/spiceai.

Conclusion

Spice, rebuilt in Rust, introduces a unified SQL query interface, making it simpler and faster to build data-driven applications. The lightweight Spice runtime is easy to deploy and makes it possible to materialize and query data from any source quickly and cost-effectively. Applications can serve more users, dashboards and analytics can be snappier, and data and ML pipelines finish faster, without the heavy lifting of managing data.

For developers this translates to less time wrangling data and more time creating innovative applications and business value.

Check out and star the project on GitHub!

Thank you,

Phillip

Announcing the release of Spice.ai v0.10-alpha

By Spice AI (@spice_ai) | Wednesday, March 27, 2024

Announcing the release of Spice v0.10-alpha! 🧙‍♂️

The Spice.ai v0.10-alpha release focused on additions and updates to improve stability, usability, and the overall Spice developer experience.

Highlights in v0.10-alpha

Public Bucket Support for S3 Data Connector: The S3 Data Connector now supports public buckets in addition to buckets requiring an access id and key.

JDBC-Client Connectivity: Improved connectivity for JDBC clients, like Tableau.

User Experience Improvements:

Friendlier error messages across the board to make debugging and development better.
Added a spice login postgres command, streamlining the process for connecting to PostgreSQL databases.
Added PostgreSQL connection verification and connection string support, enhancing usability for PostgreSQL users.

Grafana Dashboard: Improving the ability to monitor Spice deployments, a standard Grafana dashboard is now available.

Contributors

@phillipleblanc
@mitchdevenport
@Jeadie
@ewgenius
@sgrebnov
@y-f-u
@lukekim
@digadeesh

New in this release

Fixes Gracefully handle Arrow Flight DoExchange connection resets
Adds Grafana Dashboard
Adds Flight SQL CommandGetTableTypes Command support (improves JDBC-client connectivity)
Adds Friendlier error messages
Adds spice login postgres command
Adds PostgreSQL connection verification
Adds PostgreSQL connection string support
Adds Linux aarch64 build
Updates Improves spice status with dataset metrics
Updates CLI REPL improved show tables output
Updates CLI REPL limit output to 500 rows
Updates Improved README.md with architecture diagram updates
Updates Improved CI run time.
Updates Use macOS hosted Actions runner

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Announcing the release of Spice.ai v0.9.1-alpha

By Spice AI (@spice_ai) | Monday, March 25, 2024

Announcing the release of Spice v0.9.1-alpha! 🧙‍♂️

The v0.9.1 release focused on stability, bug fixes, and usability by adding spice CLI commands for listing Spicepods (spice pods), Models (spice models), Datasets (spice datasets), and improved status (spice status) details. In addition, the Arrow Flight SQL (flightsql) data connector and SQLite (sqlite) data store were added.

Highlights in v0.9.1-alpha

FlightSQL data connector: Arrow Flight SQL can now be used as a connector for federated SQL query.

SQLite data backend: SQLite can now be used as a data store for acceleration.

Contributors

@phillipleblanc
@mitchdevenport
@Jeadie
@ewgenius
@sgrebnov
@y-f-u
@lukekim

New in this release

Adds FlightSQL data connector (flightsql).
Adds SQLite data store, supports both in-memory and file based (sqlite).
Adds support for date, varchar, bpchar, and primitive list types for the PostgreSQL data connector and data store.
Adds spice pods, spice status, spice datasets, and spice models CLI commands.
Adds GET /v1/spicepods API for listing loaded Spicepods.
Adds spiced Docker CI build and release.
Adds E2E tests for release installation and local acceleration.
Adds E2E tests and instructions to run basic TPC-H benchmark tests.
Adds linux/arm64 binary build.
Fixes spice sql REPL panics when query result is too large. (https://github.com/spiceai/spiceai/pull/875)
Fixes --access-secret in spice s3 login. (https://github.com/spiceai/spiceai/pull/894)
Fixes version check upgrade logic.

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Announcing the release of Spice.ai v0.9-alpha

By Spice AI (@spice_ai) | Friday, March 15, 2024

Announcing the release of Spice v0.9-alpha! 🧙‍♂️

The v0.9 release adds several data connectors including the Spice data connector for the ability to connect to other Spice instances. Improved observability for the Spice runtime has been added with the new /metrics endpoint for monitoring deployed instances.

Highlights in v0.9-alpha

Arrow Flight SQL endpoint: The Arrow Flight endpoint now supports Flight SQL, including JDBC, ODBC, and ADBC enabling database clients like DBeaver or BI applications like Tableau to connect to and query the Spice runtime.

Spice.ai data connector: Use other Spice runtime instances as data connectors for federated SQL query across Spice deployments and for chaining Spice runtimes.

Keyring secret store: Use the operating system native credential store, like macOS keychain for storing secrets used by the Spice runtime.

PostgreSQL data connector: PostgreSQL can now be used as both a data store for acceleration and as a connector for federated SQL query.

Databricks data connector: Databricks as a connector for federated SQL query across Delta Lake tables.

S3 data connector: S3 as a connector for federated SQL query across Parquet files stored in S3.

Metrics endpoint: Added new /metrics endpoint for Spice runtime observability and monitoring with the following metrics:

- spiced_runtime_http_server_start counter
- spiced_runtime_flight_server_start counter
- datasets_count gauge
- load_dataset summary
- load_secrets summary
- datasets/load_error counter
- datasets/count counter
- models/load_error counter
- models/count counter

Contributors

New in this release

Adds Keyring secret store (keyring).
Adds PostgreSQL data connector (postgres).
Adds Spice.ai data connector (spiceai).
Adds Arrow Flight SQL (JDBC/ODBC/ADBC) support.
Adds Databricks data connector (databricks) - Delta Lake support.
Adds S3 data connector (s3) - Parquet support.
Adds /v1/models API.
Adds /v1/status API.
Adds /metrics API.

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Announcing the release of Spice.ai v0.8-alpha

By Spice AI (@spice_ai) | Friday, March 08, 2024

Announcing the release of Spice v0.8-alpha! 🏹

This is a minor release that builds on the new Rust-based runtime, adding stability and a preview of new features for the first major release.

Highlights in v0.8-alpha

Secrets management: Spice 0.8 runtime can now configure and retrieve secrets from local environment variables and in a Kubernetes cluster.

Data tables can be locally accelerated using PostgreSQL

New in this release

Adds Secrets management in local environment variables and Kubernetes clusters.
Adds (Preview) PostgreSQL as a data table acceleration engine.

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Announcing the release of Spice.ai v0.7-alpha

By Spice AI (@spice_ai) | Friday, February 23, 2024

Announcing the release of Spice v0.7-alpha! 🏹

Spice v0.7-alpha is an all new implementation of Spice written in Rust. The Spice v0.7 runtime provides developers with a unified SQL query interface to locally accelerate and query data tables sourced from any database, data warehouse, or data lake.

Learn more and get started in minutes with the updated Quickstart in the repository README!

Highlights in v0.7-alpha

DataFusion SQL Query Engine: Spice v0.7 leverages the Apache DataFusion query engine to provide very fast, high quality SQL query across one or more local or remote data sources.

Data tables can be locally accelerated using Apache Arrow in-memory or by DuckDB.

New in this release

Adds runtime rewritten in Rust for high-performance.
Adds Apache DataFusion SQL query engine.
Adds The Spice.ai platform as a data source.
Adds Dremio as a data source.
Adds OpenTelemetry (OTEL) collector.
Adds local data table acceleration.
Adds DuckDB file or in-memory as a data table acceleration engine.
Adds In-memory Apache Arrow as a data table acceleration engine.
Removes the built-in AI training engine; now cloud-based and provided by the Spice.ai platform.
Removes the built-in dashboard and web-interface; now cloud-based and provided by the Spice.ai platform.

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Building on Apache Arrow and Flight

By Luke Kim (@0xLukeKim) | Tuesday, May 10, 2022

In February, we announced Spice.ai OSS v0.6 with its data processing and transport completely rebuilt upon Apache Flight. This enables Spice.ai OSS to scale to datasets 10-100 times larger and brings Spice.ai into the Apache Arrow ecosystem paving the way for integrations with many popular projects, like Apache Parquet, pandas and big data systems like Hive, Drill, Spark, Snowflake, BigQuery, and many more.

In Spice.ai OSS v0.6.1 we announced a new big data system integration… our own, Spice.xyz!

Figure 1. Spice.xyz - Data and AI infrastructure for web3

Integration with Spice.xyz

Spice.xyz is data and AI infrastructure for web3.

It’s web3 data made easy. Insanely fast and purpose designed for applications and ML.

Spice.xyz delivers data in Apache Arrow format, over high-performance Apache Arrow Flight APIs to your application, notebook, ML pipeline, and of course, to the Spice.ai runtime.

With Spice.ai OSS v0.6.1, a new Apache Arrow Flight data connector was made available, creating a high-performance bulk-data transport directly into the Spice.ai ML engine. Coupled with Spice.xyz, developers can quickly and easily build web3 data-driven applications that learn and adapt using Spice.ai.

To read the announcement post for Spice.xyz, visit blog.spice.xyz.

Apache Arrow and Flight Core

Apache Arrow is a specification for an in-memory columnar data format that’s very efficient for analytics operations. Arrow’s zero-copy read semantics coupled with the Flight client-server framework mean extremely fast and efficient data transport and access without serialization overhead. This enables high-performance bulk-data scenarios, critical for data-driven applications and ML. These properties enable an open-architecture based on Apache Arrow, Flight, and Parquet.

Paul Dix, CTO of InfluxData wrote a fantastic post on the Arrow ecosystem and why the future core of InfluxDB is built with Arrow. Sam Crowder also wrote A (Recent) History of Batch Data showing how Arrow is a cornerstone of modern data architecture.

Joining projects like InfluxDB, the core of both Spice.ai OSS and Spice.xyz are built with a foundation of Arrow and Flight. This means they benefit from the same high-performance data operations, they work great with each other and other projects in the ecosystem.

Exciting New Use Cases

Betting on Arrow in Spice.ai enables exciting new applications because AI needs AI-ready data.

Previously it was difficult to efficiently get bulk data from a provider like Spice.xyz to the Spice.ai engine, but now it’s just a matter of configuring the connection through a few lines of YAML.

Imagine creating an application to trade NFTs. With Spice.xyz, developers can query Ethereum for data relating to NFT trading activity. That data is then delivered with the high-performance Arrow format to the Spice.ai runtime. The application’s Spicepod could learn how to value NFTs based upon it’s trading history and the communities it’s owners have been engaged in. And this could be all done in real-time, something not feasible before.

In addition, using the Arrow Flight connector, other exciting applications are enabled across a ton of domains, like IoT, financial applications, security monitoring, and many more.

What’s Next

To get somewhere you need a goal or destination, a vehicle to get there, and fuel for that vehicle.

When it comes to intelligent, AI-driven applications, Spice.xyz now provides the Spice.ai vehicle with a massive pipeline of web3 data fuel.

The next step is to make it easier for developers to define the destination for the vehicle. Upcoming on the Spice.ai OSS roadmap is the ability for developers to define goals for how the decision-engine should learn. Like learning to maximize measurement “A” or optimizing to a target of “B”.

For example, in web3, this might be to build a client that can learn and adapt to optimize Ethereum Gas Fee prices for token swaps. The goal would be to minimize the gas fee, a problem we experienced first-hand when we built defly.ai. Today you have to encode that goal into your reward function, but our plan is to help do that for you, and all you have to do is tell us the end goal.

Goal-oriented learning applies to many domains, whether it be minimizing fees in crypto or maximizing engagement on a social platform. And personally, we’re excited about the eventual ability to apply Spice.ai and just say “minimize my taxes” :-)

Learn More and Contribute

Even for advanced developers, building intelligent apps that leverage AI is still way too hard. Our mission is to make this as easy as creating a modern web page. If that vision resonates with you, join us!

If you’d like to get involved, we’d love to talk. Try out Spice.ai OSS, Spice.xyz, email us “hey,” get in touch on Discord, or reach out on Twitter.

Luke

Announcing the release of Spice.ai v0.6.1-alpha

By Spice AI (@spice_ai) | Thursday, April 21, 2022

Announcing the release of Spice.ai v0.6.1-alpha! 🌶

Building upon the Apache Arrow support in v0.6-alpha, Spice.ai now includes new Apache Arrow data processor and Apache Arrow Flight data connector components! Together, these create a high-performance bulk-data transport directly into the Spice.ai ML engine. Coupled with big data systems from the Apache Arrow ecosystem like Hive, Drill, Spark, Snowflake, and BigQuery, it’s now easier than ever to combine big data with Spice.ai.

And we’re also excited to announce the release of Spice.xyz! 🎉

Spice.xyz is data and AI infrastructure for web3. It’s web3 data made easy. Insanely fast and purpose designed for applications and ML.

Spice.xyz delivers data in Apache Arrow format, over high-performance Apache Arrow Flight APIs to your application, notebook, ML pipeline, and of course through these new data components, to the Spice.ai runtime.

Read the announcement post at blog.spice.ai.

New in this release

Adds Apache Arrow Data Processor
Adds Apache Arrow Flight Data Connector

Now built with Go 1.18.

Dependency updates

Updates to React 18
Updates to CRA 5
Updates to Glide DataGrid 4
Updates to SWR 1.2
Updates to TypeScript 4.6

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: [email protected]

Announcing the release of Spice.ai v0.6-alpha

By Spice AI (@spice_ai) | Tuesday, February 08, 2022

Announcing the release of Spice.ai v0.6-alpha! 🏹

Spice.ai now scales to datasets 10-100 larger enabling new classes of uses cases and applications! 🚀 We’ve completely rebuilt Spice.ai’s data processing and transport upon Apache Arrow, a high-performance platform that uses an in-memory columnar format. Spice.ai joins other major projects including Apache Spark, pandas, and InfluxDB in being powered by Apache Arrow. This also paves the way for high-performance data connections to the Spice.ai runtime using Apache Arrow Flight and import/export of data using Apache Parquet. We’re incredibly excited about the potential this architecture has for building intelligent applications on top of a high-performance transport between application data sources the Spice.ai AI engine.

Highlights in v0.6-alpha

Massive improvement in data loading performance and dataset scale

From data connectors, to REST API, to AI engine, we’ve now rebuilt Spice.ai’s data processing and transport on the Apache Arrow project. Specifically, using the Apache Arrow for Go implementation. Many thanks to Matt Topol for his contributions to the project and guidance on using it.

This release includes a change to the Spice.ai runtime to AI Engine transport from sending text CSV over gGPC to Apache Arrow Records over IPC (Unix sockets).

This is a breaking change to the Data Processor interface, as it now uses arrow.Record instead of Observation.

Benchmarking v0.6

Before v0.6, Spice.ai would not scale into the 100s of 1000s of rows.

Format	Row Number	Data Size	Process Time	Load Time	Transport time	Memory Usage
csv	2,000	163.15KiB	3.0005s	0.0000s	0.0100s	423.754MiB
csv	20,000	1.61MiB	2.9765s	0.0000s	0.0938s	479.644MiB
csv	200,000	16.31MiB	0.2778s	0.0000s	NA (error)	0.000MiB
csv	2,000,000	164.97MiB	0.2573s	0.0050s	NA (error)	0.000MiB
json	2,000	301.79KiB	3.0261s	0.0000s	0.0282s	422.135MiB
json	20,000	2.97MiB	2.9020s	0.0000s	0.2541s	459.138MiB
json	200,000	29.85MiB	0.2782s	0.0010s	NA (error)	0.000MiB
json	2,000,000	300.39MiB	0.3353s	0.0080s	NA (error)	0.000MiB

After building on Arrow, Spice.ai now easily scales beyond millions of rows.

Format	Row Number	Data Size	Process Time	Load Time	Transport time	Memory Usage
csv	2,000	163.14KiB	2.8281s	0.0000s	0.0194s	439.580MiB
csv	20,000	1.61MiB	2.7297s	0.0000s	0.0658s	461.836MiB
csv	200,000	16.30MiB	2.8072s	0.0020s	0.4830s	639.763MiB
csv	2,000,000	164.97MiB	2.8707s	0.0400s	4.2680s	1897.738MiB
json	2,000	301.80KiB	2.7275s	0.0000s	0.0367s	436.238MiB
json	20,000	2.97MiB	2.8284s	0.0000s	0.2334s	473.550MiB
json	200,000	29.85MiB	2.8862s	0.0100s	1.7725s	824.089MiB
json	2,000,000	300.39MiB	2.7437s	0.0920s	16.5743s	4044.118MiB

New in this release

Adds Apache Arrow data processing and transport.
Fixes TensorBoard logging and monitoring when using GitHub Codespaces and Docker.
Adds Polling HTTP Data Connector

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]

Adding Soft Actor-Critic

By Corentin Risselin | Wednesday, January 12, 2022

Last month in the v0.5-alpha version, a new learning algorithm was added to Spice.ai: Soft Actor-Critic. This is a very popular algorithm in the Reinforcement Learning field. Let’s see what it is and why this is an interesting addition.

The previous article Understanding Q-learning: How a Reward Is All You Need is not necessary but can be helpful to understand this article.

What is Soft Actor-Critic

Actor-Critic

Deepmind first introduced the actor-critic approach in deep learning in a 2016 paper. We can think of this approach as having 2 tasks:

Choosing actions to take: giving probabilities for each possible action (the policy)
Evaluating values for each action: the estimated reward from those actions (the Q-values)

Those tasks will be made by 2 different neural networks or a single network that branches out in 2 heads. The actor is the part that outputs the policy, while the critic outputs the values.

Figure 1. Actor-Critic struture

In most cases, this model was proven to perform very well, better than Deep Q-Learning. The actor is trained to prefer actions associated with the best values from the critic. The critic is trained to correctly estimate rewards (current and future ones) of the actions.

Both will improve over time though we have to keep in mind that the critic is unlikely to evaluate all possible actions in the environment as it will only see actions from states that the actor is likely to take (the policy).

This bias of the system toward its policy is important: the algorithm is meant to train on-policy. The duo actor-critic works together: trying to train it with inputs and outputs from another system (humans or even itself in past iterations of its own training) will not work.

Multiple improvements were made to limit the bias of the actor-critic approach but the necessity to train on-policy remains. This is very limiting as being able to train from any experience can be very valuable for time and data efficiency.

Soft Actor-Critic

Soft Actor-Critic allows an Actor-Critic network to train off-policy. It was introduced in a paper in 2018 and included multiple additions to improve its parent algorithm. The main difference is the introduction of the entropy of the actor outputs during the training phase.

The entropy measures the chaos/order of a system (or uncertainty). If a system always acts the same way, the entropy is minimal. Here the actor’s entropy is maximum if all possible actions have the same weight (same probability) and minimum if the actor always chose only a single action with 100% confidence.

During the training phase, the actor is trained to maintain the entropy of its outputs at a specific value.

The introduction of the entropy changes the goal of the training not only to find the bests output but to keep exploring the other actions. The critic part will be trained on all actions, even if they may occur only in rare cases.

There are other essential parts, such as having 2 critics and being able to output continuous values, but the entropy is the crucial difference in this algorithm’s training and potential.

Adding choices to Spice.AI learning algorithms

As we saw above, the Actor-Critic algorithm is known to outperform Deep Q-Learning in most cases. If we also want to leverage previous data (off-policy training), Soft Actor-Critic is a natural choice. This approach is heavier despite better theoretical results, making it more suitable for complex tasks. For simpler tasks, Deep Q-Learning will still be an appealing option for its speed of training and its capability to quickly convergence to a good solution.

We can think of Soft Actor-Critic as a complex machine designed to take actions while keeping a variety of possibilities. Sometimes several options seem equally rewarding: a simpler algorithm would take what it evaluates as the best one even though the margin is small and the precision of its evaluation shouldn’t be enough. This tendency to quickly convergence to a solution has its benefits and inconveniences.

Implementation in the source code

Adding new algorithms is essential to Spice.ai, so the procedure was designed to be straightforward.

Looking a the source code, the code related to training agents is in the ai/src folder. This part of the code uses the python language as most modern AI libraries are distributed in this language.

In this folder, every agent is in the algorithms folder, and each has its subfolder. There is an agent_interface file that defines the main class that the different agents should inherit from and a factory script responsible for creating instances of an agent from a given algorithm name.

Adding a new agent is simple:

making a new folder in the algorithms
adding a json file describing the algorithm_id, name, and docs_link (see other json as an example) in the folder
adding a new python file with a class that would inherit from the SpiceAIAgent defined in the agent_interface script
adding a line in the factory script to instantiate the new implementation when its name is called.

For the new agent, inheriting from the main SpiceAIAgent class, 5 functions need to be implemented:

add_experience: storing inputs and outputs (used during the training)
act: returning the action to be taken from a given input
save: saving the agent to a given a path
load: restoring the agent from a given path
learn: train iteration (from the accumulated experiences)

Conclusion

Soft Actor-Critic is a fascinating algorithm that performs well in complex environments. We now support Soft Actor Critic in Spice.ai, which is another step forward in constantly improving the performance of the AI engine. Additionally, we’ll continue improving existing algorithms and adding newer ones over time. We designed the platform for ease of implementation and experimentation so if you’d like to try building your own agent, you can get the source code on Github and contribute to the platform. Say hi on Discord, reach out on Twitter or email us.

I hope you enjoy this post and something new.

Corentin

What Data Informs AI-driven Decision Making?

By Luke Kim (@0xLukeKim) | Tuesday, January 04, 2022

AI unlocks a new generation of intelligent applications that learn and adapt from data. These applications use machine learning (ML) to out-perform traditionally developed software. However, the data engineering required to leverage ML is a significant challenge for many product teams. In this post, we’ll explore the three classes of data you need to build next-generation applications and how Spice.ai handles runtime data engineering for you.

While ML has many different applications, one way to think about ML in a real-time application that can adapt is as a decision engine. Phillip discussed decision engines and their potential uses in A New Class of Applications That Learn and Adapt. This decision engine learns and informs the application how to operate. Of course, applications can and do make decisions without ML, but a developer normally has to code that logic. And the intelligence of that code is fixed, whereas ML enables a machine to constantly find the appropriate logic and evolve the code as it learns. For ML to do this, it needs three classes of data.

The three classes of data for informed decision making

We don’t want any decision, though. We want high-quality, informed decisions. If you consider making higher quality, informed decisions over time, you need three classes of information. These classes are historical information, real-time or present information, and the results of your decisions.

Especially recently, stock or crypto trading is something many of us can relate to. To make high-quality, informed investing decisions, you first need general historical information on the price, security, financials, industry, previous trades, etc. You study this information and learn what might make a good investment or trade.

Second, you need a real-time updated stream of data as it happens to make a decision. If you were stock trading, this information might be the stock price on the day or hour you want to make the trade. You need to apply what you learned from historical data to the current information to decide what trade to place.

Finally, if we’re going to make better decisions over time, we need to capture and learn from the results of those decisions. Whether you make a great or poor trade, you want to incorporate that experience into your historical learning.

Figure 1. The three data classes.

Using all three data classes together results in higher quality decisions over time. Broad data across these classes are useful, and we could make some nice trades with that. Still, we can make an even higher quality trading decision with personal context. For example, we may want to consider the individual tax consequences or risk level of the trade for our situation. So each of these classes also comes with global or local variants. We combine global information, like what worked well for everyone, and local experience, what worked well for us and our situation, to make the best, overall informed decision.

The waterfall approach to data engineering

Consider how you would capture these three data classes and make them available to both the application and ML in the trading example. This data engineering can be a pretty big challenge.

First, you need a way to gather and consume historical information, like stock prices, and keep that updated over time. You need to handle streaming constantly updated real-time data to make runtime decisions on how to operate. You need to capture and match the decisions you make and feed that back into learning. And finally, you need a way to provide personal or local context, like holding off on sell trades until next year, to stay within a tax threshold, or identifying a pattern you like to trade. If all this wasn’t enough, as we learned from Phillip’s AI needs AI-ready data post, all three data classes need to be in a format that ML can use.

Figure 2. Traditional app and data integration.

If you can afford a data or ML team, they may do much of this for you. However, this model starts to look quite waterfall-like and is not suited well to applications that want to learn and adapt in real-time. Like a waterfall approach, you would provide requirements to your data team, and they would do the data engineering required to provide you with the first two classes of data, historical and real-time. They may give you ML-ready data or train an ML model for you. However, there is often a large latency to apply that data or model in your application and a long turn-around time if it does not meet your requirements. In addition, to capture the third class of data, you would need to capture and send the results of the decisions your application made as a result of using those models back to the data team to incorporate in future learning. This latency through the data, decision-making, learning, and adaptation process is often infeasible for a real-world app.

And, if you can’t afford a data team, you have to figure out how to do all that yourself.

The agile approach

Modern software engineering practices have favored agile methodologies to reduce time to learn and adapt applications to customer and business needs. Spice.ai takes inspiration from agile methods to provide developers with a fast, iterative development cycle.

Spice.ai provides mechanisms for making all three classes of data available to both the application and the decision engine. Developers author Spicepods declaring how data should be captured, consumed, and made ML-ready so that all three classes are consistent and ML available.

The Spice.ai runtime exposes developer-friendly APIs and data connectors for capturing and consuming data and annotating that data with personal context. The runtime generates AI-ready data for you and makes it available directly for ML. These APIs also make it easy to capture application decisions and incorporate the resulting learning.

The Spice.ai approach short circuits the traditional waterfall-like data process by keeping as much data as possible application local instead of round-tripping through an external pipeline or team, especially valuable for real-time data. The application can learn and adapt faster by reducing the latency of decision consequences to learning.

Spice.ai enables personalized learning from personal context and experiences through the interpretations mechanism. Interpretations allow an application to provide additional information or an “interpretation” of a time range as input to learning. The trading example could be as simple as labeling a time range as a good time to buy or providing additional contextual information such as tax considerations, etc. Developers can also use interpretations to record the results of decisions with more context than what might be available in the observation space. You can read more about Interpretations in the Spice.ai docs.

While Spice.ai focuses on ensuring consistent ML-ready data is available, it does not replace traditional data systems or teams. They still have their place, especially for large historical datasets, and Spice.ai can consume data produced by them. Where possible, especially for application and real-time data, Spice.ai keeps runtime data local to create a virtuous cycle of data from the application to the decision engine and back again, enabling faster and more agile learning and adaption.

Figure 3. App with Spice.ai.

Summary

In summary, to build an intelligent application driven from AI recommended decisions, a significant amount of data engineering can be required to learn, make decisions, and incorporate the results. The Spice.ai runtime enables you as a developer to focus on consuming those decisions and tuning how the AI engine should learn rather than the runtime data engineering.

The potential of the next generation of intelligent applications to improve the quality of our lives is very exciting. Using AI to help applications make better decisions, whether that be AI-assisted investing, improving the energy efficiency of our homes and buildings, or supporting us in deciding on the most appropriate medical treatment, is very promising.

Learn more and contribute

If you want to get involved, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.

Luke

A New Class of Applications That Learn and Adapt

By Phillip LeBlanc (@leblancphill) | Thursday, December 30, 2021

A new class of applications that learn and adapt is becoming possible through machine learning (ML). These applications learn from data and make decisions to achieve the application’s goals. In the post Making apps that learn and adapt, Luke described how developers integrate this ability to learn and adapt as a core part of the application’s logic. You can think of the component that does this as a “decision engine.” This post will explore a brief history of decision engines and use-cases for this application class.

History of decision engines

The idea to make intelligent decision-making applications is not new. Developers first created these applications around the 1970s¹, and they are some of the earliest examples of using artificial intelligence to solve real-world problems.

The first applications used a class of decision engines called “expert systems.” A distinguishing trait of expert systems is that they encode human expertise in rules for decision-making. Domain experts created combinations of rules that powered decision-making capabilities.

Some uses of expert systems include:

However, the resources required to build expert systems make employing them infeasible for many applications². They often need a significant time and resource investment to capture and encode expertise into complex rule sets. These systems also do not automatically learn from experience, relying on experts to write more rules to improve decision-making.

With the advent of modern deep-learning techniques and the ability to access significantly more data, it is now possible for the computer, not only the developer, to learn and encode the rules to power a decision engine and improve them over time. The vision for Spice.ai is to make it easy for developers to build this new class of applications. So what are some use-cases for these applications?

Use cases of decision-making applications

Reduce energy costs by optimizing air conditioning

Today: The air conditioning system for an office building runs on a fixed schedule and is set to a fixed temperature in business hours, only adjusting using in-room sensor data, if at all. This behavior potentially over cools at business close as the outside temperature lowers and the building starts vacating.

With Spice.ai: Using Spice.ai, the application combines time-series data from multiple data sources, including the time of day and day of the week, building/room occupancy, and outside temperature, energy consumption, and pricing. The A/C controller application learns how to adjust the air conditioning system as the room naturally cools towards the end of the day. As the occupancy decreases, the decision engine is rewarded for maintaining the desired temperature and minimizing energy consumption/cost.

Food delivery order dispatching

Today: Customers order food delivery with a mobile app. When the order is ready to be picked up from the restaurant, the order is dispatched to a delivery driver by a simple heuristic that chooses the nearest available driver. As the app gets more popular with customers and the number of restaurants, drivers, and customers increases, the heuristic needs to be constantly tuned or supplemented with human operators to handle the demand.

With Spice.ai: The application learns which driver to dispatch to minimize delivery time and maximize customer star ratings. It considers several factors from data, including patterns in both the restaurant and driver’s order histories. As the number of users, drivers, and customers increases over time, the app adapts to keep up with the changing patterns and demands of the business.

Routing stock or crypto trades to the best exchange

Today: When trading stocks through a broker like Fidelity or TD Ameritrade, your broker will likely route your order to an exchange like the NYSE. And in the emerging world of crypto, you can place your trade or swap directly on a decentralized exchange (DEX) like Uniswap or Pancake Swap. In both cases, the routing of orders is likely to be either a form of traditional expert system based upon rules or even manually routed.

With Spice.ai: A smart order routing application learns from data such as pending transactions, time of day, day of the week, transaction size, and the recent history of transactions. It finds patterns to determine the most optimal route or exchange to execute the transaction and get you the best trade.

Summary

A new class of applications that can learn and adapt are made possible by integrating AI-powered decision engines. Spice.ai is a decision engine that makes it easy for developers to build these applications.

If you’d like to partner with us in creating this new generation of intelligent decision-making applications, we invite you to join us on Discord, reach out on Twitter or email us.

Phillip

Russell, Stuart; Norvig, Peter (1995). Artificial Intelligence: A Modern Approach. Simon & Schuster. pp. 22–23. ISBN 978-0-13-103805-9. ↩︎
Kendal, S. L., & Creen, M. (2007). An introduction to knowledge engineering. London: Springer. ISBN 978-1-84628-475-5 ↩︎

Announcing the release of Spice.ai v0.5.1-alpha

By Spice AI (@spice_ai) | Tuesday, December 28, 2021

Announcing the release of Spice.ai v0.5.1-alpha! 📈

This minor release builds upon v0.5-alpha adding the ability to start training from the dashboard plus support for monitoring training runs with TensorBoard.

Highlights in v0.5.1-alpha

Start training from dashboard

A “Start Training” button has been added to the pod page on the dashboard so that you can easily start training runs from that context.

Training runs can now be started by:

Modifications to the Spicepod YAML file.
The spice train command.
The “Start Training” dashboard button.
POST API calls to /api/v0.1/pods/{pod name}/train

TensorBoard monitoring

TensorBoard monitoring is now supported when using DQL (default) or the new SACD learning algorithms that was announced in v0.5-alpha.

When enabled, TensorBoard logs will automatically be collected and a “Open TensorBoard” button will be shown on the pod page in the dashboard.

Logging can be enabled at the pod level with the training_loggers pod param or per training run with the CLI --training-loggers argument.

Support for VPG will be added in v0.6-alpha. The design allows for additional loggers to be added in the future. Let us know what you’d like to see!

New in this release

Adds a start training button on the dashboard pod page.
Adds TensorBoard logging and monitoring when using DQL and SACD learning algorithms.

Dependency updates

Updates to Tailwind 3.0.6
Updates to Glide Data Grid 3.2.1

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]

Understanding Q-learning: How a Reward Is All You Need

By Corentin Risselin | Wednesday, December 15, 2021

There are two general ways to train an AI to match a given expectation: we can either give it the expected outputs (commonly named labels) for differents inputs; we call this supervised learning. Or we can provide a reward for each output as a score: this is reinforcement learning (RL).

Supervised learning works by tweaking all the parameters (weights in neural networks) to fit the desired outputs, expecting that given enough input/label pairs the AI will find common rules that generalize for any input.

Reinforcement learning’s reward is often provided from a simple function that can score any output: we don’t know what specific output would be best, but we can recognize how good the result is. In this latter statement there are two underlying concepts we will address in this post:

Can we only tell if the output is good in a binary way, or do we have to quantify the output to train our AI?
Do we have to give a reward for every AI’s output? Can we give a reward only at specific times?

Those questions are already mostly answered, and many algorithms deal with those topics. Our journey here will be to understand how we tackle those questions and end up with a beautiful formula that is at the core of modern approaches of RL:

Equation 1. Q estimation at the heart of many RL algorithm, also known as the Bellman equation.

Q-learning

The vast majority, if not all, of modern RL algorithms are based on the principles of Q-learning: the idea is to evaluate a ‘reward expectation’ for each possible action. If we can have a good evaluation, we could maximize the reward by choosing actions with the maximum evaluated rewards. The function giving this expected reward is named Q. For now, we will assume we can have a reward for any action.

Equation 2. Definition of the Q function.

The t indices show that the state and action aren’t constant and will vary, usually with time/action taken. On the other hand, the Q function and the reward function r are unique functions that ideally return the ’expected reward’ for any (state, action) pairs.

For now, we will assume we can have a reward that gives an objective and perfect evaluation of each state/action.

Figure 1. Example of reward given for different actions at a specific state. Here a simple 2D map with a goal.

Q-Table

We know that actions’ outcomes (rewards) will vary depending on the current state we are in, otherwise the problem would be trivial to solve. If the states that are relevant to our actions can be numbered, a simple way would be to build a table with all the possible states/action pairs. There are different ways to build such a table depending on how we can interact with our environment. Eventually, we would have a good ‘map’ to guide us to do the best actions.

Figure 2. Example of Q-table: we can build an exhaustive table for all the possible (state, action) pairs

Deep Q-Learning

When the number of variables of the environment relevant to our actions/rewards becomes too large, the number of possible states grows quickly. It doesn’t take a lot of possible parameters to make the Q-table approach unfeasible. Neural networks are known to work very nicely and efficiently in high dimensionality (with many input variables). They also generalize well, so the idea in Deep Q-Learning is to use a neural network to predict the different Q values for each action given a state.

Figure 3. A neural network can predict Q values from state information

In this case, we do not need to give the state/action pairs but only the state, as the neural network would exhaustively return all the Q values associated with each action. Outputting all actions’ Q value is a common method as the general cases have a complex environment but a smaller number of possible actions.

This method works very well. It is similar to supervised learning with states as inputs and rewards as labels. We assumed so far that we had a reward for each action, and we chose the next action with the best reward (called a greedy policy). In many cases this is not enough: even if an action would yield the best reward at a given state, this may affect the next state so that we wouldn’t optimize the reward in the long term. Also, if we can’t have a reward for each action, we usually give 0 as a reward. We will not be able to choose the right action if they affect later states despite not yielding different rewards at the current state.

The sparsity of rewards or the long-term calculation of total reward (non-greedy policies) leads us to diverge from supervised learning and learn potential future rewards.

Temporal difference: TD-Learning

TD-learning is a clever way to account for potential future value without knowing them yet. TD is a model-free class of algorithms: it does not simulate future states. The main idea is to consider all the rewards of a sequence of actions to give a better value than just the reward of the next action.

We can, for instance, sum all the future rewards:

Figure 4. Cumulating future rewards to assign values to each state.

Mathematically this can be written as:

Equation 3.

This is named TD(0): the simplest form of TD method, accumulating all the rewards.

Introducing policies

We could try different trajectories (sequence of actions) and retrospectively get the final reward for each action, but this has 2 drawbacks: the environment is usually too vast, and the sequence of actions might not even have a definite end. Also, such exhaustive methods might not be very efficient. Instead, we can evaluate the ‘value’ of the next state overall, like the maximum of all its possible rewards (direct reward), and add this value to the reward of a given action.

If a state can have different branches, we can select the best one, and this would be our policy, the way we choose actions. This simple form of taking the maximum is called the ‘greedy’ policy.

Figure 5. With a greedy policy the associated values to state come from the maximum value of the next state. Here despite the lower branch giving only half the top reward directly the overall value is greater.

This can be written down as:

Equation 4.

The expected value notation is defined as:

Equation 5.

For a greedy policy the probabilities p would all be set to 0 but the one associated with the highest return to 1 (in case of equality between n actions, we would attribute ‘1/n’ as probabilities to get the same expected value).

Equation 6.

Relation with Q function

The expected reward can be replaced by the Q function we used earlier, which now can be denominated to be specific to our chosen policy (named π):

Equation 7.

TD-0

We previously discussed the problem of not being able to go through all the states exhaustively and that the evaluation of the Q value from a neural network could help. We want to use the TD method to have a better value estimation that will consider potential future rewards.

The TD(0) method is elegant as we can, in fact, only use the next state’s expected value instead of all future ones. The idea is that with successive evaluations, we build a chain of dependencies as each states’ value depends on the next one.

Equation 8.

Figure 6. Iterative propagation of state values following TD(0) method.

We can see that the greedy policy would work even with null rewards in the trajectory. We can explicit our greedy policy, going back to use Q value instead of the state value V:

Equation 9.

TD-lambda

We need to fix a problem: if a trajectory grows too long or never ends, a state value can potentially grow indefinitely. To counter that, we can add a discount factor (originally named lambda, usually refer as gamma in Q-learning) for the next state’s value:

Equation 10.

Notice that we simplify the reward notation for clarity.

To avoid exploding values, this discount has to be between 0 and 1 (strictly below 1). We can think about it as giving more importance to the direct reward than the future ones. As the contribution to the latter reward decrease, the chain of action can grow without the calculated value growing. If the reward has an upper limit, the value will also be bounded.

The sparsity of rewards is also solved: giving only a positive reward after many non-rewarding steps will create smooth values for the intermediate states. Any reward, positive or negative, will diffuse its value to the neighbor states.

Figure 7. The TD(0) value propagation can allow for a smooth value distribution over the state that will help building efficient behaviour.

Q-Learning algorithm

Finally, as we train a neural network to estimate the Q function, we need to update its target with successive iteration. We cannot fully trust the estimator (a neural network here) to give the correct value, so we introduce a learning rate to update the target smoothly.

Equation 11. Fully explained Bellman equation.

That is it! We now understand all the parts of this formula. Over multiple training steps with different sates, the training should find a good average Q function. While training, the estimator uses its own output to train itself (commonly referred to as bootstrapping): it is like it is chasing itself. Bootstrapping can lead to instability in the training process. There are many additional methods to help against such instability.

From giving rewards, sparse or not, binary or fine-grained, we have a smooth space of values for all our states/actions so the AI can follow a greedy policy to the best outcome.

This way of training is not a silver bullet and there is no guarantee that the AI will find a correlation from the information given as state to the returned reward.

Conclusion

We can see how our rewards are used to train AI’s policies using Q-learning. By understanding the many iterations required and the bootstrapping issues, we can help our AI by carefully giving relevant state information and reward:

There needs to be a correlation between the state information and the reward: the simpler the relationship, the easier/faster the AI will find it.
Sparse and binary rewards make the training problem long and arduous. Giving more information through the reward can tremendously increase the speed/accuracy of the learned Q-estimator.
The longer the chain of actions, the more complex the Q-value will be to estimate.

We didn’t see how the AI’s algorithm can explore different actions given an environment here. Spice.ai’s technology focuses exclusively on off-policy training where we only have past data and cannot interact with the environment. RL is a vast topic and currently quickly growing. Robotics is a fantastic field of application; many other areas are yet to be explored with such a technology. We hope to push forward the technology and its field of application with our platform.

If you’d like to partner with us on the mission of making new applications by leveraging RL, we invite you to discuss with us on Discord, reach out on Twitter or email us.

I hope you enjoy this post and learn new things.

Corentin

Announcing the release of Spice.ai v0.5-alpha

By Spice AI (@spice_ai) | Monday, December 06, 2021

We are excited to announce the release of Spice.ai v0.5-alpha! 🥇

Highlights include a new learning algorithm called “Soft Actor-Critic” (SAC), fixes to the behavior of spice upgrade, and a more consistent authoring experience for reward functions.

If you are new to Spice.ai, check out the getting started guide and star spiceai/spiceai on GitHub.

Highlights in v0.5-alpha

Soft Actor-Critic (Discrete) (SAC) Learning Algorithm

The addition of the Soft Actor-Critic (Discrete) (SAC) learning algorithm is a significant improvement to the power of the AI engine. It is not set as the default algorithm yet, so to start using it pass the --learning-algorithm sacd parameter to spice train. We’d love to get your feedback on how its working!

Consistent reward authoring experience

With the addition of the reward function files that allow you to edit your reward function in a Python file, the behavior of starting a new training session by editing the reward function code was lost. With this release, that behavior is restored.

In addition, there is a breaking change to the variables used to access the observation state and interpretations. This change was made to better reflect the purpose of the variables and make them easier to work with in Python

Previous (Type)	New (Type)
`prev_state` (SimpleNamespace)	`current_state` (dict)
`prev_state.interpretations` (list)	`current_state_interpretations` (list)
`new_state` (SimpleNamespace)	`next_state` (dict)
`new_state.interpretations` (list)	`next_state_interpretations` (list)

Improved spice upgrade behavior

The Spice.ai CLI will no longer recommend “upgrading” to an older version. An issue was also fixed where trying to upgrade the Spice.ai CLI using spice upgrade on Linux would return an error.

New in this release

Adds a new learning algorithm called “Soft-Actor Critic” (SAC).
Updates the reward function parameters for the YAML code blocks from prev_state and new_state to current_state and next_state to be consistent with the reward function files.
Fixes an issue where editing a reward functions file would not automatically trigger training.
Fixes the normalization of values for the Deep-Q Learning algorithm to handle larger values.
Fixes an issue where the Spice.ai CLI would not upgrade on Linux with the spice upgrade command.
Fixes an issue where the Spice.ai CLI would recommend an “upgrade” to an older version.

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]

AI needs AI-ready data

By Phillip LeBlanc (@leblancphill) | Sunday, December 05, 2021

A significant challenge when developing an app powered by AI is providing the machine learning (ML) engine with data in a format that it can use to learn. To do that, you need to normalize the numerical data, one-hot encode categorical data, and decide what to do with incomplete data - among other things.

This data handling is often challenging! For example, to learn from Bitcoin price data, the prices are better if normalized to a range between -1 and 1. Being close to 0 is also a problem because of the lack of precision in floating-point representations (usually under 1e-5).

As a developer, if you are new to AI and machine learning, a great talk that explains the basics is Machine Learning Zero to Hero. Spice.ai makes the process of getting the data into an AI-ready format easy by doing it for you!

What is AI-ready data?

You write code with if statements and functions, but your machine only understands 1s and 0s. When you write code, you leverage tools, like a compiler, to translate that human-readable code into a machine-readable format.

Similarly, data for AI needs to be translated or “compiled” to be understood by the ML engine. You may have heard of tensors before; they are simply another word for a multi-dimensional array and they are the language of ML engines. All inputs to and all outputs from the engine are in tensors. You could use the following techniques when converting (or “compiling”) source data to a tensor.

Normalization/standardization of the numerical input data. Many of the inputs and outputs in machine learning are interpreted as probability distributions. Much of the math that powers machine learning, such as softmax, tanh, sigmoid, etc., is meant to work in the [-1, 1] range.

Figure 1. Normalizing Bitcoin price data.

Conversion of categorical data into numerical data. For categorical data (i.e., colors such as “red,” “blue,” or “green”), you can achieve this through a technique called “One Hot Encoding.” In one hot encoding, each possible value in the category appears as a column. The values in the column are assigned a binary value of 1 or 0 depending on whether the value exists or not.

Figure 2. A visualization of one-hot encoding

Several advanced techniques exist for “compiling” this source data - this process is known in the AI world as “feature engineering.” This article goes into more detail on feature engineering techniques if you are interested in learning more.

There are excellent tools like Pandas, Numpy, scipy, and others that make the process of data transformation easier. However, most of these tools are Python libraries and frameworks - which means having to learn Python if you don’t know it already. Plus, when building intelligent apps (instead of just doing pure data analysis), this all needs to work on real-time data in production.

Building intelligent apps

The tools mentioned above are not designed for building real-time apps. They are often designed for analytics/data science.

In your app, you will need to do this data compilation in real-time - and you can’t rely on a local script to help process your data. It becomes trickier if the team responsible for the initial training of the machine learning model is not the team responsible for deploying it out into production.

How data is loaded and processed in a static dataset is likely very different from how the data is loaded and processed in real-time as your app is live. The result often is two separate codebases that are maintained by different teams that are both responsible for doing the same thing! Ensuring that those codebases stay consistent and evolve together is another challenge to tackle.

Spice.ai helps developers build apps with real-time ML

Spice.ai handles the “compilation” of data for you.

You specify the data that your ML should learn from in a Spicepod. The Spice.ai runtime handles the logistics of gathering the data and compiling it into an AI-ready format.

It does this by using many techniques described earlier, such as normalization and one-hot encoding. And because we’re continuing to evolve Spice.ai, our data compilation will only get better over time.

In addition, the design of the Spice.ai runtime naturally ensures that the data used for both the training and real-time cases are consistent. Spice.ai uses the same data-components and runtime logic to produce the data. And not only that, you can take this a step further and share your Spicepod with someone else, and they would be able to use the same AI-ready data for their applications.

Summary

Spice.ai handles the process of compiling your data into an AI-ready format in a way that is consistent both during the training and real-time stages of the ML engine. A Spicepod defines which data to get and where to get it. Sharing this Spicepod allows someone else to use the same AI-ready data format in their application.

Learn more and contribute

Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page. If the vision resonates with you, join us!

Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.

If you are interested in partnering, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.

We are just getting started! 🚀

Phillip

Spicepods: From Zero To Hero

By Luke Kim (@0xLukeKim) | Thursday, December 02, 2021

In my previous post, Teaching Apps how to Learn with Spicepods, I introduced Spicepods as packages of configuration that describe an application’s data-driven goals and how it should learn from data. To leverage Spice.ai in your application, you can author a Spicepod from scratch or build upon one fetched from the spicerack.org registry. In this post, we’ll walk through the creation and authoring of a Spicepod step-by-step from scratch.

As a refresher, a Spicepod consists of:

A required YAML manifest that describes how the pod should learn from data
Optional seed data
Learned model/state
Performance telemetry and metrics

We’ll create the Spicepod for the ServerOps Quickstart, an application that learns when to optimally run server maintenance operations based upon the CPU-usage patterns of a server machine.

We’ll also use the Spice CLI, which you can install by following the Getting Started guide or Getting Started YouTube video.

Fast iterations

Modern web development workflows often include a file watcher to hot-reload so you can iteratively see the effect of your change with a live preview.

Spice.ai takes inspiration and enables a similar Spicepod manifest authoring experience. If you first start the Spice.ai runtime in your application root before creating your Spicepod, it will watch for changes and apply them continuously so that you can develop in a fast, iterative workflow.

You would normally do this by opening two terminal windows side-by-side, one that runs the runtime using the command spice run and one where you enter CLI commands. In addition, developers would open the Spice.ai dashboard located at http://localhost:8000 to preview changes they make.

Figure 1. Spice.ai's modern development workflow

Creating a Spicepod

The easiest way to create a Spicepod is to use the Spice.ai CLI command: spice init <Spicepod name>. We’ll make one in the ServerOps Quickstart application called serverops.

The CLI saves the Spicepod manifest file in the spicepods directory of your application. You can see it created a new serverops.yaml file, which should be included in your application and be committed to your source repository. Let’s take a look at it.

The initialized manifest file is very simple. It contains a name and three main sections being:

dataspaces
actions
training

We’ll walk through each of these in detail, and as a Spicepod author, you can always reference the documentation for the Spicepod manifest syntax.

Authoring a Spicepod manifest

You author and edit Spicepod manifest files in your favorite text editor with a combination of Spice.ai CLI helper commands. We eventually plan to have a VS Code extension and dashboard/portal editing abilities to make this even easier.

Adding a dataspace

To build an intelligent, data-driven application, we must first start with data.

A Spice.ai dataspace is a logical grouping of data with definitions of how that data should be loaded and processed, usually from a single source. A combination of its data source and its name identifies it, for example, nasdaq/msft or twitter/tweets. Read more about Dataspaces in the Core Concepts documentation.

Let’s add a dataspace to the Spicepod manifest to load CPU metric data from a CSV file. This file is a snapshot of data from InfluxDB, a time-series database we like.

We can see this dataspace is identified by its source hostmetrics and name cpu. It includes a data section with a file data connector, the path to the file, and a data processor to know how to process it. In addition, it defines a single measurement usage_idle under the measurements section, which is a measurement of CPU load. In Spice.ai, measurements are the core primitive the AI engine uses to learn and is always numerical data. Spice.ai includes a growing library of community contributable data connectors and data processors you can consist of in your Spicepod to access data. You can also contribute your own.

Finally, because the data is a snapshot of live data loaded from a file, we must set a Spicepod epoch_time that defines the data’s start Unix time.

Now we have a dataspace, called hostmetrics/cpu, that loads CSV data from a file and processes the data into a usage_idle measurement. The file connector might be swapped out with the InfluxDB connector in a production application to stream real-time CPU metrics into Spice.ai. And in addition, applications can always send real-time data to the Spice.ai runtime through its API with a simple HTTP POST (and in the future, using Web Sockets and gRPC).

Adding actions

Now that the Spicepod has data, let’s define some data-driven actions so the ServerOps application can learn when is the best time to take them. We’ll add three actions using the CLI helper command, spice action add.

And in the manifest:

Adding rewards

The Spicepod now has data and possible actions, so we can now define how it should learn when to take them. Similar to how humans learn, we can set rewards or punishments for actions taken based on their effect and the data. Let’s add scaffold rewards for all actions using the spice rewards add command.

We now have rewards set for each action. The rewards are uniform (all the same), meaning the Spicepod is rewarded the same for each action. Higher rewards are better, so if we change perform_maintenance to 2, the Spicepod will learn to perform maintenance more often than the other actions. Of course, instead of setting these arbitrarily, we want to learn from data, and we can do that by referencing the state of data at each time-step in the time-series data as the AI engine trains.

The rewards themselves are just code. Currently, we currently support Python code, either inline or in a .py external code file and we plan to support several other languages. The reward code can access the time-step state through the prev_state and new_state variables and the dataspace name. For the full documentation, see Rewards.

Let’s add this reward code to perform_maintenance, which will reward performing maintenance when there is low CPU usage.

cpu_usage_prev = 100 - prev_state.hostmetrics_cpu_usage_idle
cpu_usage_new = 100 - new_state.hostmetrics_cpu_usage_idle
cpu_usage_delta = cpu_usage_prev - cpu_usage_new
reward = cpu_usage_delta / 100

This code takes the CPU usage (100 minus the idle time) deltas between the previous time state and the current time state, and sets the reward to be a normalized delta value between 0 and 1. When the CPU usage is moving from higher cpu_usage_prev to lower cpu_usage_low, its a better time to run server maintenance and so we reward the inverse of the delta. E.g. 80% - 50% = 30% = 0.3. However, if the CPU moves lower to higher, 50% - 80% = -30% = -0.3, it’s a bad time to run maintenance, so we provide a negative reward or “punish” the action.

Through these rewards and punishments and the CPU metric data, the Spicepod will when it is a good time to perform maintence and be the decision engine for the ServerOps application. You might be thinking you could write code without AI to do this, which is true, but handling the variety of cases, like CPU spikes, or patterns in the data, like cyclical server load, would take a lot of code and a development time. Applying AI helps you build faster.

Putting it all together

The manifest now has defined data, actions, and rewards. The Spicepod can get data to learn which actions to take and when based on the rewards provided.

If the Spice.ai runtime is running, the Spicepod automatically trains each time the manifest file is saved. As this happens reward performance can be monitored in the dashboard.

Once a training run completes, the application can query the Spicepod for a decision recommendation by calling the recommendations API http://localhost:8000/api/v0.1/pods/serverops/recommendation. The API returns a JSON document that provides the recommended action, the confidence of taking that action, and when that recommendation is valid.

In the ServerOps Quickstart, this API is called from the server maintenance PowerShell script to make an intelligent decision on when to run maintenance. The ServerOps Sample, which uses live data, can be continuously trained to learn and adapt even as the live data changes due to load patterns changing.

The full Spicepod manifest from this walkthrough can be added from spicerack.org using the spice add quickstarts/serverops command.

Summary

Leveraging Spice.ai to be the decision engine for your server maintenance application helps you build smarter applications, faster that will continue to learn and adapt over time, even as usage patterns change over time.

Learn more and contribute

Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.

If you are interested in partnering, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.

We are just getting started! 🚀

Luke

Announcing the release of Spice.ai v0.4.1-alpha

By Spice AI (@spice_ai) | Monday, November 22, 2021

Announcing the release of Spice.ai v0.4.1-alpha! ✅

This point release focuses on fixes and improvements to v0.4-alpha. Highlights include AI engine performance improvements, updates to the dashboard observations data grid, notification of new CLI versions, and several bug fixes.

A special acknowledgment to @Adm28, who added the CLI upgrade detection and prompt, which notifies users of new CLI versions and prompts to upgrade.

Highlights in v0.4.1-alpha

AI engine performance improvements

Overall training performance has been improved up to 13% by removing a lock in the AI engine.

In versions before v0.4.1-alpha, performance was especially impacted when streaming new data during a training run.

Dashboard Observations Datagrid

The dashboard observations datagrid now automatically resizes to the window width, and headers are easier to read, with automatic grouping into dataspaces. In addition, column widths are also resizable.

CLI version detection and upgrade prompt

When it is run, the Spice.ai CLI will now automatically check for new CLI versions once a day maximum.

If it detects a new version, it will print a notification to the console on spice version, spice run or spice add commands prompting the user to upgrade using the new spice upgrade command.

New in this release

Adds automatic resizing of the observations datagrid.
Adds header group by dataspace to the observations datagrid.
Adds CLI version detection and prompt for upgrade on version, run, and add commands.
Adds Support for parsing hex-encoded times and measurements. Use the time_format of hex or prefix with 0x.
Updates AI engine with improved training performance.
Updates Go and NPM dependencies.
Fixes detection of Spicepods in the Spicepods directory, and a resulting error when loading a non-Spicepod file.
Fixes a potential “zip slip” security issue.
Fixes an issue where the AI engine may not gracefully shutdown.

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]

Spice.ai's approach to Time-Series AI

By Corentin Risselin | Thursday, November 18, 2021

The Spice.ai project strives to help developers build applications that leverage new AI advances which can be easily trained, deployed, and integrated. A previous blog post introduced Spicepods: a declarative way to create AI applications with Spice.ai technology. While there are many libraries and platforms in the space, Spice.ai is focused on time-series data aligning to application-centric and frequently time-dependent data, and a Reinforcement Learning approach, which can be more developer-friendly than expensive, labeled supervised learning.

This post will discuss some of the challenges and directions for the technology we are developing.

Time Series

Figure 1. Time Series processing visualization: a time window is usually chosen to process part of the data stream

Time series AI has become more popular over recent years, and there is extensive literature on the subject, including time-series-focused neural networks. Research in this space points to the likelihood that there is no silver bullet, and a single approach to time series AI will not be sufficient. However, for developers, this can make building a product complex, as it comes with the challenge of exploring and evaluating many algorithms and approaches.

A fundamental challenge of time series is the data itself. The shape and length are usually variable and can even be infinite (real-time streams of data). The volume of data required is often too much for simple and efficient machine learning algorithms such as Decision Trees. This challenge makes Deep Learning popular to process such data. There are several types of neural networks that have been shown to work well with time series so let’s review some of the common classes:

Convolutional Neural Networks (CNN): CNN’s can only accept data with fixed lengths: even with the ability to pad the data, this is a major drawback for time-series data as a specific time window needs to be decided. Despite this limitation, they are the most efficient network to train (computation, data needed, time) and usually the smallest storage. CNN’s are very robust and used in image/video processing, making them a very good baseline to start with while also benefiting from refined and mature development over the years, such as with the very efficient MobileNet with depth-wise convolutions.
Recurrent Neural Networks (RNN): RNNs have been researched for several decades, and while they aren’t as fast to train as CNNs, they can be faster to apply as there is no need to feed a time window like CNNs if the desired input/output is in real-time (in a continuous fashion, also called ‘online). RNNs are proven to be very good in some situations, and many new models are being discovered.
Transformers: Most of the state-of-the-art results today have been made from transformers and their variations. They are very good at correlating sparse information. Popularized in the famous paper Attention is all you need, transformers are proven to be flexible with high-performance in many classes (Vision Transformers, Perceiver, etc.). They suffer the same limitation as CNNs for the length of their input (fixed at training time), but they also have a disadvantage of not scaling well with the size of the data (quadratic growth with the length of the time series). They are also the most expensive network to train in general.

While not a complete representation of classes of neural networks, this list represents the areas of the most potential for Spice.ai’s time-series AI technology. We also see other interesting paradigms to explore when improving the core technology like Memory Augmented Neural Networks (MANN) or neural network-based Genetical Algorithms.

Reinforcement Learning

Reinforcement Learning (RL) has grown steadily, especially in fields like robotics. Usually, RL doesn’t require as much data processing as Supervised Learning, where large datasets can be demanding for hardware and people alike. RL is more dynamic: agents aren’t trained to replicate a specific behaviors/output but explore and ’exploit’ their environment to maximize a given reward.

Most of today’s research is based on environments the agent can interact with during the training process, known as online learning. Usually, efficient training processes have multiple agent/environment pairs training together and sharing their experiences. Having an environment for agents to interact enables different actions from the actual historical state known as on-policy learning, and using only past experiences without an environment is off-policy learning.

Figure 2. AI training without interacting with the environment (real world nor simulation). Only gathered data is used for training.

Spice.ai is initially taking an off-policy approach, where an environment (either pre-made or given by the user) is not required. Despite limiting the exploration of agents, this aligns to an application-centric approach as:

Creating a real-world model or environment can be difficult and expensive to create, arguably even impossible.
Off-policy learning is normally more efficient than on-policy (time/data and computation).

The Spice.ai approach to time series AI can be described as ‘Data-Driven’ Reinforcement Learning. This domain is very exciting, and we are building upon excellent research that is being published. The Berkeley Artificial Intelligence Research’s blog shows the potential of this field and many other research entities that have made great discoveries like DeepMind, Open AI, Facebook AI and Google AI (among many others). We are inspired and are building upon all the research in Reinforcement Learning to develop core Spice.ai technology.

If you are interested in Reinforcement Learning, we recommend following these blogs, and if you’d like to partner with us on the mission of making it easier to build intelligent applications by leveraging RL, we invite you to discuss with us on Discord, reach out on Twitter or email us.

Corentin

Announcing the release of Spice.ai v0.4-alpha

By Spice AI (@spice_ai) | Monday, November 15, 2021

We are excited to announce the release of Spice.ai v0.4-alpha! 🏄‍♂️

Highlights include support for authoring reward functions in a code file, the ability to specify the time of recommendation, and ingestion support for transaction/correlation ids. Authoring reward functions in a code file is a significant improvement to the developer experience than specifying functions inline in the YAML manifest, and we are looking forward to your feedback on it!

If you are new to Spice.ai, check out the getting started guide and star spiceai/spiceai on GitHub.

Highlights in v0.4-alpha

Upgrade using spice upgrade

The spice upgrade command was added in the v0.3.1-alpha release, so you can now upgrade from v0.3.1 to v0.4 by simply running spice upgrade in your terminal. Special thanks to community member @Adm28 for contributing this feature!

Reward Function Files

In addition to defining reward code inline, it is now possible to author reward code in functions in a separate Python file.

The reward function file path is defined by the reward_funcs property.

A function defined in the code file is mapped to an action by authoring its name in the with property of the relevant reward.

Example:

training:
  reward_funcs: my_reward.py
  rewards:
    - reward: buy
      with: buy_reward
    - reward: sell
      with: sell_reward
    - reward: hold
      with: hold_reward

Learn more in the documentation: docs.spiceai.org/concepts/rewards/external

Time Categories

Spice.ai can now learn from cyclical patterns, such as daily, weekly, or monthly cycles.

To enable automatic cyclical field generation from the observation time, specify one or more time categories in the pod manifest, such as a month or weekday in the time section.

For example, by specifying month the Spice.ai engine automatically creates a field in the AI engine data stream called time_month_{month} with the value calculated from the month of which that timestamp relates.

Example:

time:
  categories:
    - month
    - dayofweek

Supported category values are: month dayofmonth dayofweek hour

Learn more in the documentation: docs.spiceai.org/reference/pod/#time

Get recommendation for a specific time

It is now possible to specify the time of recommendations fetched from the /recommendation API.

Valid times are from pod epoch_time to epoch_time + period.

Previously the API only supported recommendations based on the time of the last ingested observation.

Requests are made in the following format: GET http://localhost:8000/api/v0.1/pods/{pod}/recommendation?time={unix_timestamp}

An example for quickstarts/trader

GET http://localhost:8000/api/v0.1/pods/trader/recommendation?time=1605729600

Specifying {unix_timestamp} as 0 will return a recommendation based on the latest data. An invalid {unix_timestamp} will return a result that has the valid time range in the error message:

{
  "response": {
    "result": "invalid_recommendation_time",
    "message": "The time specified (1610060201) is outside of the allowed range: (1610057600, 1610060200)",
    "error": true
  }
}

New in this release

Adds time categories configuration to the pod manifest to enable learning from cyclical patterns in data - e.g. hour, day of week, day of month, and month
Adds support for defining reward functions in a rewards functions code file.
Adds the ability to specify recommendation time making it possible to now see which action Spice.ai recommends at any time during the pod period.
Adds support for ingestion of transaction/correlation identifiers (e.g. order_id, trace_id) in the pod manifest.
Adds validation for invalid dataspace names in the pod manifest.
Adds the ability to resize columns to the dashboard observation data grid.
Updates to TensorFlow 2.7 and Keras 2.7
Fixes a bug where data processors were using data connector params
Fixes a dashboard issue in the pod observations data grid where a column might not be shown.
Fixes a crash on pod load if the training section is not included in the manifest.
Fixes an issue where data manager stats errors were incorrectly being printed to console.
Fixes an issue where selectors may not match due to surrounding whitespace.

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]

Teaching Apps how to Learn with Spicepods

By Luke Kim (@0xLukeKim) | Monday, November 15, 2021

The last post in this series, Making Apps that Learn and Adapt, described the shift from building AI/ML solutions to building apps that learn and adapt. But, how does the app learn? And as a developer, how do I teach it what it should learn?

With Spice.ai, we teach the app how to learn using a Spicepod.

Imagine you own a restaurant. You created a menu, hired staff, constructed the kitchen and dining room, and got off to a great start when it first opened. However, over the years, your customers’ tastes changed, you’ve had to make compromises on ingredients, and there’s a hot new place down the street… business is stagnating, and you know that you need to make some changes to stay competitive.

You have a few options. First, you could gather all the data, such as customer surveys, seasonal produce metrics, and staff performance profiles. You may even hire outside consultants. You then take this data to your office, and after spending some time organizing, filtering, and collating it, you’ve discovered an insight! Your seafood dishes sell poorly and cost the most… you are losing money! You spend several weeks or months perfecting a new menu, which you roll out with much fanfare! And then… business is still poor. What!? How could this be? It was a data-driven approach! You start the process again. While this approach is a worthy option, it has long latency from data to learning to implementation.

Another option is to build real-time learning and adaption directly into the restaurant. Imagine a staff member whose sole job was learning and adapting how the restaurant should operate; lets name them Blue. You write a guide for Blue that defines certain goal metrics, like customer food ratings, staff happiness, and of course, profit. Blue tracks each dish served, from start to finish, from who prepared it to its temperature, its costs, and its final customer taste rating. Blue not only learns from each customer review as each dish is consumed but also how dish preparation affects other goal metrics, like profitability. The restaurant staff consults Blue to determine any adjustments to improve goal metrics as they work. The latency from data to learning, to adaption, has been reduced, from weeks or months to minutes. This option, of course, is not feasible for most restaurants, but software applications can use this approach. Blue and his instructions are analogous to the Spice.ai runtime and manifest.

In the Spice.ai model, developers teach the app how to learn by describing goals and rewarding its actions, much like how a parent might teach a child. As these rewards are applied in training, the app learns what actions maximize its rewards towards the defined goals.

Returning to the restaurant example, you can think of the Spice.ai runtime as Blue, and Spicepod manifests as the guide on how Blue should learn. Individual staff members would consult with Blue for ongoing recommendations on decisions to make and how to act. These goals and rewards are defined in Spicepods or “pods” for short. Spicepods are packages of configuration that describe the application’s goals and how it should learn from data. Although it’s not a direct analogy, Spicepods and their manifests can be conceptualized similar to Docker containers and Dockerfiles. In contrast, Dockerfiles define the packaging of your app, Spicepods specify the packaging of your app’s learning and data.

Anatomy of a Spicepod

A Spicepod consists of:

A required YAML manifest that describes how the pod should learn from data
Optional seed data
Learned model/state
Performance telemetry and metrics

Developers author Spicepods using the spice CLI command such as with spice pod init <name> or simply by creating a manifest file such as mypod.yaml in the spicepods directory of their application.

Here’s an example of the Tweet Recommendation Quickstart Spicepod manifest.

A screenshot of the Spicepod manifest for the Tweet Recommendation Quickstart

You can see the data definitions under dataspaces, the actions the application may take under actions, and their rewards when training.

In the next post, I’ll walk through in detail each section of the pod manifest. In the meantime, you can review the documentation for a complete reference of the Spicepod manifest syntax.

Spicepods as packages

On disk, Spicepods are generally layouts of a manifest file, seed data, and trained models, but they can also be exported as zipped packages.

A screenshot of the Spicepod layout for the trader quickstart application

When the runtime exports a Spicepod using the spice export command, it is saved with a .spicepod extension. It can then be shared, archived, or imported into another instance of the Spice.ai runtime.

Soon, we also expect to enable publishing of .spicepods to spicerack.org, from where community-created Spicepods can easily be added to your application using spice add <pod name> (currently, only Spice AI published pods are available on spicerack.org).

Treating Spicepods as packages and enabling their sharing and distribution through spicerack.org will help developers share their “restaurant guides” and build upon each other’s work, much like they do with npmjs.org or pypi.org. In this way, developers can together build better and more intelligent applications.

In the next post, we’ll dive deeper into authoring a Spicepod manifest to create an intelligent application. Follow @spice_ai on Twitter to get an update when we post.

If you haven’t already, read the next the first post in the series, Making Apps that Learn and Adapt.

Learn more and contribute

Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.

If you are interested in partnering, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.

We are just getting started! 🚀

Luke

Making Apps That Learn And Adapt

By Luke Kim (@0xLukeKim) | Friday, November 05, 2021

In the Spice.ai announcement blog post, we shared some of the inspiration for the project stemming from challenges in applying and integrating AI/ML into a neurofeedback application. Building upon those ideas, in this post, we explore the shift in approach from a focus of data science and machine learning (ML) to apps that learn and adapt.

As a developer, I’ve followed the AI/ML space with keen interest and been impressed with the advances and announcements that only seem to be increasing. stateof.ai recently published its 2021 report, and once again, it’s been another great year of progress. At the same time, it’s still more challenging than ever for mainstream developers to integrate AI/ML into their applications. For most developers, where AI/ML is not their full-time job, and without the support of a dedicated ML team, creating and developing an intelligent application that learns and adapts is still too hard.

Most solutions on the market, even those that claim they are for developers, focus on helping make ML easier instead of making it easier to build applications. These solutions have been great for advancing ML itself but have not helped developers leverage ML in their apps to make them intelligent. Even when a developer successfully integrates ML into an application, it might make that application smart, but often does not help the app continue to learn and adapt over time.

Traditionally, the industry has viewed AI/ML as separate from the application. A pipeline, service, or team is provided with data, which trains on that data, and can then provide answers or insights. These solutions are often created with a waterfall-like approach, gathering and defining requirements, designing, implementing, testing, and deploying. Sometimes this process can take months or even years.

With Spice.ai, we propose a new approach to building applications. By bringing AI/ML alongside your compute and data and incorporating it as part of your application, the app can incrementally adopt recommendations from the AI engine and in addition the AI engine can learn from the application’s data and actions. This approach shifts from waterfall-like to agile-like, where the AI engine ingests streams of application and external data, along with the results of the application’s actions, to continuously learn. This virtuous feedback cycle from the app to the AI engine and back again enables the app to get smarter and adapt over time. In this approach, building your application is developing the ML.

Being part of the application is not just conceptual. Development teams deploy the Spice.ai runtime and AI engine with the application as a sidecar or microservice, enabling the app services and runtime to work together and for data to be kept application local. A developer teaches the AI engine how to learn by defining application goals and rewards for actions the application takes. The AI Engine observes the application and the consequences of its actions, which feeds into its experience. As the AI engine learns, the application can adapt.

As developers shift from thinking about disparate applications and ML to building applications where AI that learns and adapts is integrated as a core part of the application logic, a new class of intelligent applications will emerge. And as technical talent becomes even more scarce, applications built this way will be necessary, not just to be competitive but to be even built at all.

In the next post, I’ll discuss the concept of Spicepods, bundles of configuration that describes how the application should learn, and how the Spice.ai runtime hosts and uses them to help developers make applications that learn.

Learn more and contribute

Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.

If you are interested in partnering, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.

We are just getting started! 🚀

Luke

Announcing the release of Spice.ai v0.3.1-alpha

By Spice AI (@spice_ai) | Tuesday, November 02, 2021

We are excited to announce the release of Spice.ai v0.3.1-alpha! 🎃

This point release focuses on fixes and improvements to v0.3-alpha. Highlights include the ability to specify both seed and runtime data, to select custom named fields for time and tags, a new spice upgrade command and several bug fixes.

A special acknowledgment to @Adm28, who added the new spice upgrade command, which enables the CLI to self-update, which in turn will auto-update the runtime.

Highlights in v0.3.1-alpha

Upgrade command

The CLI can now be updated using the new spice upgrade command. This command will check for, download, and install the latest Spice.ai CLI release, which will become active on it’s next run.

When run, the CLI will check for the matching version of the Spice.ai runtime, and will automatically download and install it as necessary.

The version of both the Spice.ai CLI and runtime can be checked with the spice version CLI command.

Seed data

When working with streaming data sources, like market prices, it’s often also useful to seed the dataspace with historical data. Spice.ai enables this with the new seed_data node in the dataspace configuration. The syntax is exactly the same as the data syntax. For example:

dataspaces:
  - from: coinbase
    name: btcusd
    seed_data:
      connector: file
        params:
          path: path/to/seed/data.csv
      processor:
        name: csv
    data:
      connector: coinbase
        params:
          product_ids: BTC-USD
      processor:
        name: json

The seed data will be fetched first, before the runtime data is initialized. Both sets of connectors and processors use the dataspace scoped measurements, categories and tags for processing, and both data sources are merged in pod-scoped observation timeline.

Time field selectors

Before v0.3.1-alpha, data was required to include a specific time field. In v0.3.1-alpha, the JSON and CSV data processors now support the ability to select a specific field to populate the time field. An example selector to use the created_at column for time is:

data:
  processor:
    name: csv
    params:
      time_selector: created_at

Tag field selectors

Before v0.3.1-alpha, tags were required to be placed in a _tags field. In v0.3.1-alpha, any field can now be selected to populate tags. Tags are pod-unique string values, and the union of all selected fields will make up the resulting tag list. For example:

dataspace:
  from: twitter
  name: tweets
  tags:
    selectors:
      - tags
      - author_id
    values:
      - spice_ai
      - spicy

New in this release

Adds a new spice upgrade command for self-upgrade of the Spice.ai CLI.
Adds a new seed_data node to the dataspace configuration, enabling the dataspace to be seeded with an alternative source of data.
Adds the ability to select a custom time field in JSON and CSV data processors with the time_selector parameter.
Adds the ability to select custom tag fields in the dataspace configuration with selectors list.
Adds error reporting for AI engine crashes, where previously it would fail silently.
Fixes the dashboard pods list from “jumping” around due to being unsorted.
Fixes rare cases where categorical data might be sent to the AI engine in the wrong format.

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]

Spice.ai v0.3-alpha is now available

By Spice AI (@spice_ai) | Tuesday, October 26, 2021

We are excited to announce the release of Spice.ai v0.3-alpha! 🎉

This release adds support for ingestion, automatic encoding, and training of categorical data, enabling more use-cases and datasets beyond just numerical measurements. For example, perhaps you want to learn from data that includes a category of t-shirt sizes, with discrete values, such as small, medium, and large. The v0.3 engine now supports this and automatically encodes the categorical string values into numerical values that the AI engine can use. Also included is a preview of data visualizations in the dashboard, which is helpful for developers as they author Spicepods and dataspaces.

A screenshot of the data visualization preview

A special acknowledgment to @sboorlagadda, who submitted the first Spice.ai feature contribution from the community ever! He added the ability to list pods from the CLI with the new spice pods list command. Thank you, @sboorlagadda!!!

A screenshot of the new spice pods list command and output.

If you are new to Spice.ai, check out the getting started guide and star spiceai/spiceai on GitHub.

Highlights in v0.3-alpha

Categorical data

In v0.1, the runtime and AI engine only supported ingesting numerical data. In v0.2, tagged data was accepted and automatically encoded into fields available for learning. In this release, v0.3, categorical data can now also be ingested and automatically encoded into fields available for learning. This is a breaking change with the format of the manifest changing separating numerical measurements and categorical data.

Pre-v0.3, the manifest author specified numerical data using the fields node.

In v0.3, numerical data is now specified under measurements and categorical data under categories. E.g.

dataspaces:
  - from: event
    name: stream
    measurements:
      - name: duration
        selector: length_of_time
        fill: none
      - name: guest_count
        selector: num_guests
        fill: none
    categories:
      - name: event_type
        values:
          - dinner
          - party
      - name: target_audience
        values:
          - employees
          - investors
    tags:
      - tagA
      - tagB

Data visualizations preview

A top piece of community feedback was the ability to visualize data. After first running Spice.ai, we’d often hear from developers, “how do I see the data?”. A preview of data visualizations is now included in the dashboard on the pod page.

Listing pods

Once the Spice.ai runtime has started, you can view the loaded pods on the dashboard and fetch them via API call localhost:8000/api/v0.1/pods. To make it even easier, we’ve added the ability to list them via the CLI with the new spice pods list command, which shows the list of pods and their manifest paths.

Coinbase data connector

A new Coinbase data connector is included in v0.3, enabling the streaming of live market ticker prices from Coinbase Pro. Enable it by specifying the coinbase data connector and providing a list of Coinbase Pro product ids. E.g. “BTC-USD”. A new sample which demonstrates is also available with its associated Spicepod available from the spicerack.org registry. Get it with spice add samples/trader

Tweet Recommendation Quickstart

A new Tweet Recommendation Quickstart has been added. Given past tweet activity and metrics of a given account, this app can recommend when to tweet, comment, or retweet to maximize for like count, interaction rates, and outreach of said given Twitter account.

Trader Sample

A new Trader Sample has been added in addition to the Trader Quickstart. The sample uses the new Coinbase data connector to stream live Coinbase Pro ticker data for learning.

New in this release

Adds support for ingesting, encoding, and training on categorical data. v0.3 uses one-hot-encoding.
Changes Spicepod manifest fields node to measurements and add the categories node.
Adds the ability to select a field from the source data and map it to a different field name in the dataspace. See an example for measurements in docs.
Adds support for JSON content type when fetching from the /observations API. Previously, only CSV was supported.
Adds a preview version of data visualizations to the dashboard. The grid has several limitations, one of which is it currently cannot be resized.
Adds the ability to select which learning algorithm to use via the CLI, the API, and specified in the Spicepod manifest. Possible choices are currently “vpg”, Vanilla Policy Gradient and “dql”, Deep Q-Learning. Shout out to @corentin-pro, who added this feature on his second day on the team!
Adds the ability to list loaded pods with the CLI command spice pods list.
Adds a new coinbase data connector for Coinbase Pro market prices.
Adds a new Tweet Recommendation Quickstart.
Adds a new Trader Sample.
Fixes bug where the /observations endpoint was not providing fully qualified field names.
Fixes issue where debugging messages were printed when using spice add.

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]

Announcing the release of Spice.ai v0.2.1-alpha

By Spice AI (@spice_ai) | Tuesday, October 12, 2021

Announcing the release of Spice.ai v0.2.1-alpha! 🚚

This point release focuses on fixes and improvements to v0.2-alpha. Highlights include the ability to specify how missing data should be treated and a new production mode for spiced.

This release supports the ability to specify how the runtime should treat missing data. Previous releases filled missing data with the last value (or initial value) in the series. While this makes sense for some data, i.e., market prices of a stock or cryptocurrency, it does not make sense for discrete data, i.e., ratings. In v0.2.1, developers can now add the fill parameter on a dataspace field to specify the behavior. This release supports fill types previous and none. The default is previous.

Example in a manifest:

dataspaces:
  - from: twitter
    name: tweets
    fields:
      - name: likes
        fill: none # The new fill parameter

spiced now defaults to a new production mode when run standalone (not via the CLI), with development mode now explicitly set with the --development flag. Production mode does not activate development time features, such as the Spicepod file watcher. The CLI always runs spiced in development mode as it is not expected to be used in production deployments.

New in this release

Adds a fill parameter to dataspace fields to specify how missing values should be treated.
Adds the ability to specify the fill behavior of empty values in a dataspace.
Simplifies releases with a single spiceai release instead of separate spice and spiced releases.
Adds an explicit development mode to spiced. Production mode does not activate the file watcher.
Fixes a bug when the pod parameter epoch_time was not set which would cause data not to be sent to the AI engine.
Fixes a bug where the User-Agent was not set correctly from CLI calls to api.spicerack.org

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]

Spice.ai v0.2-alpha is now available

By Spice AI (@spice_ai) | Monday, October 04, 2021

We are excited to announce the release of Spice.ai v0.2-alpha! 🎉

This release is the first major version since the initial v0.1 announcement and includes significant improvements based upon community and customer feedback. If you are new to Spice.ai, check out the getting started guide and star spiceai/spiceai on GitHub.

Highlights in v0.2-alpha

Tagged data

In the first release, the runtime and AI engine could only ingest numerical data. In v0.2, tagged data is accepted and automatically encoded into fields available for learning. For example, it’s now possible to include a “liked” tag when using tweet data, automatically encoded to a 0/1 field for training. Both CSV and the new JSON observation formats support tags. The v0.3 release will add additional support for sets of categorical data.

Streaming data

Previously, the runtime would trigger each data connector to fetch on a 15-second interval. In v0.2, we upgraded the interface for data connectors to a push/streaming model, which enables continuous streaming data into the environment and AI engine.

Interpreted data

Spice.ai works together with your application code and works best when it’s provided continuous feedback. This feedback could be from the application itself, for example, ratings, likes, thumbs-up/down, profit from trades, or external expertise. The interpretations API was introduced in v0.1.1, and v0.2 adds AI engine support providing a way to give meaning or an interpretation of ranges of time-series data, which are then available within reward functions. For example, a time range of stock prices could be a “good time to buy,” or perhaps Tuesday mornings is a “good time to tweet,” and an application or expert can teach the AI engine this through interpretations providing a shortcut to it’s learning.

New in this release

Adds core runtime and AI engine tagged data support
Adds tagged data support to the CSV processor
Adds streaming data support to the engine and data connectors
Adds a new JSON data processor for ingesting JSON data
Adds a new Twitter data connector with JSON processor support
Adds a new /pods//dataspaces API
Adds support for using interpretations in reward functions Learn more.
Adds support for downloading zipped pods from the spicerack.org registry
Adds support for adding data along with the pod manifest when adding a pod from the spicerack.org registry
Adds basic /pods//diagnostics API
Fixes pod period, interval, and granularity not being correctly set when trying to use a “d” format
Fixes the color scheme of action counts in the dashboard to improve readability

Resources

Community

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]

Introducing Spice.ai - open source, time series AI for developers

By Spice AI (@spice_ai) | Tuesday, September 07, 2021

AI has recently seen some impressive advances, like with OpenAI Codex and DeepMind AlphaFold 2. And at the same time, for most developers, leveraging AI to create intelligent applications is still way too hard. The Data Science Hierarchy of Needs pyramid from 2017 still illustrates it well; there are too many unmet needs in applying ML in applications.

We faced the same AI development challenges many developers do, even though we had years of engineering experience at Microsoft and GitHub, there was too much to learn and build. And we simply didn’t have the time, resources, or tools to learn and utilize AI effectively in the project. After experiencing this pain ourselves, we saw an opportunity to make it better for everyone.

Today, we are making Spice.ai available on GitHub, a new open source project that helps developers use deep learning to create intelligent applications. We’re looking for feedback on the direction. It’s not finished, in fact, we only started this summer, and we invite you to try out the alpha.

Figure 1. Adding a Spice.ai pod, training and getting a recommendation in three commands

Like many developer stories, it all started with a side-project. We were interested in neurofeedback, a type of biofeedback therapy that reinforces healthy brain function but can cost up to $15,000. We wanted to make it accessible to more people, so we set out to build a system that leverages AI to deliver neurofeedback more cost-effectively. Using AI for the application was much more challenging than expected, and this sparked the inspiration for Spice.ai.

In the neurofeedback project, we worked with brain activity EEG data - time series data. We realized that time series data applies to many domains, from health and biometrics to finance, sales, logistics, security, IoT, and application monitoring. The amount of time series data in these fields is growing exponentially, and extracting insights from this data to make more intelligent software will determine the success of the next generation of applications.

We also realized that handling time series data is often sensitive, such as with health, financial, and security data. Instead of sending all data into a 3rd-party AI service, we needed the choice to bring the AI runtime to wherever our data and compute lived, either in the cloud, on-premises or on edge devices.

Spice.ai - a modern development experience and open source runtime for deep learning on time series data

Spice.ai is an open source, portable runtime for training and using deep learning on time series data. It’s written in Golang and Python and runs as a container or microservice with applications calling a simple HTTP API. It’s deployable to any public cloud, on-premises, and edge.

The vision for Spice.ai is to make creating intelligent applications as easy as possible for developers in their development environment of choice. Spice.ai brings AI development to their editor in any language or framework with a fast, iterative, inner development loop, continuous-integration (CI), and continuous-deployment (CD) workflows.

The Spice.ai runtime also includes a library of community-driven components for streaming and processing time series data, enabling developers to quickly and easily combine data with learning to create intelligent models.

Developers can write easy-to-understand and re-useable, “pods,” with manifests that connect these data components with a simple definition of the learning environment. These pods also serve as a package for the resulting trained model.

Modern developers build together with the community by leveraging registries such as npm, NuGet, and pip. The registry for sharing and using pods is spicerack.org. As the community shares more and more pods, developers can quickly build upon each others’ work, initially by sharing manifests and eventually by reusing fully-trained models.

Applying Spice.ai to real-world problems

We are currently piloting Spice.ai with several companies to create the next generation of modern applications, such as optimizing in-store pickups for a large online retailer or scheduling optimizations for healthcare workers and resources. We’ve already seen some cool use cases, including suspicious login detection, intelligent cloud-spend analysis, and order routing for a food delivery app.

Learn more and contribute

Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page.

This mission is a huge undertaking and Spice.ai v0.1-alpha has many gaps, including limited deep learning algorithms and training scale, streaming data, simulated environments, and offline learning modes. Pods aren’t searchable or even listed on spicerack.org yet. But if the vision resonates with you, join us! Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.

If you are interested in partnering, we’d love to talk. Try out Spice.ai, email us “hey,” get in touch on Discord, or reach out on Twitter.

We are just getting started! 🚀

Luke, Phillip, and Lane - Spice.ai project founders