Announcing the release of Spice.ai v0.6-alpha

By Spice AI (@spice_ai) | Tuesday, February 08, 2022

Tags:

Announcing the release of Spice.ai v0.6-alpha! 🏹

Spice.ai now scales to datasets 10-100 larger enabling new classes of uses cases and applications! 🚀 We’ve completely rebuilt Spice.ai’s data processing and transport upon Apache Arrow, a high-performance platform that uses an in-memory columnar format. Spice.ai joins other major projects including Apache Spark, pandas, and InfluxDB in being powered by Apache Arrow. This also paves the way for high-performance data connections to the Spice.ai runtime using Apache Arrow Flight and import/export of data using Apache Parquet. We’re incredibly excited about the potential this architecture has for building intelligent applications on top of a high-performance transport between application data sources the Spice.ai AI engine.

Highlights in v0.6-alpha

Massive improvement in data loading performance and dataset scale

From data connectors, to REST API, to AI engine, we’ve now rebuilt Spice.ai’s data processing and transport on the Apache Arrow project. Specifically, using the Apache Arrow for Go implementation. Many thanks to Matt Topol for his contributions to the project and guidance on using it.

This release includes a change to the Spice.ai runtime to AI Engine transport from sending text CSV over gGPC to Apache Arrow Records over IPC (Unix sockets).

This is a breaking change to the Data Processor interface, as it now uses arrow.Record instead of Observation.

Benchmarking v0.6

Before v0.6, Spice.ai would not scale into the 100s of 1000s of rows.

Format	Row Number	Data Size	Process Time	Load Time	Transport time	Memory Usage
csv	2,000	163.15KiB	3.0005s	0.0000s	0.0100s	423.754MiB
csv	20,000	1.61MiB	2.9765s	0.0000s	0.0938s	479.644MiB
csv	200,000	16.31MiB	0.2778s	0.0000s	NA (error)	0.000MiB
csv	2,000,000	164.97MiB	0.2573s	0.0050s	NA (error)	0.000MiB
json	2,000	301.79KiB	3.0261s	0.0000s	0.0282s	422.135MiB
json	20,000	2.97MiB	2.9020s	0.0000s	0.2541s	459.138MiB
json	200,000	29.85MiB	0.2782s	0.0010s	NA (error)	0.000MiB
json	2,000,000	300.39MiB	0.3353s	0.0080s	NA (error)	0.000MiB

After building on Arrow, Spice.ai now easily scales beyond millions of rows.

Format	Row Number	Data Size	Process Time	Load Time	Transport time	Memory Usage
csv	2,000	163.14KiB	2.8281s	0.0000s	0.0194s	439.580MiB
csv	20,000	1.61MiB	2.7297s	0.0000s	0.0658s	461.836MiB
csv	200,000	16.30MiB	2.8072s	0.0020s	0.4830s	639.763MiB
csv	2,000,000	164.97MiB	2.8707s	0.0400s	4.2680s	1897.738MiB
json	2,000	301.80KiB	2.7275s	0.0000s	0.0367s	436.238MiB
json	20,000	2.97MiB	2.8284s	0.0000s	0.2334s	473.550MiB
json	200,000	29.85MiB	2.8862s	0.0100s	1.7725s	824.089MiB
json	2,000,000	300.39MiB	2.7437s	0.0920s	16.5743s	4044.118MiB

New in this release

Adds Apache Arrow data processing and transport.
Fixes TensorBoard logging and monitoring when using GitHub Codespaces and Docker.
Adds Polling HTTP Data Connector

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!

Discord: https://discord.gg/kZnTfneP5u
Reddit: https://www.reddit.com/r/spiceai
Twitter: @spice_ai
Email: [email protected]