Announcing the release of Spice.ai v0.6-alpha
Categories:
Announcing the release of Spice.ai v0.6-alpha! 🏹
Spice.ai now scales to datasets 10-100 larger enabling new classes of uses cases and applications! 🚀 We’ve completely rebuilt Spice.ai’s data processing and transport upon Apache Arrow, a high-performance platform that uses an in-memory columnar format. Spice.ai joins other major projects including Apache Spark, pandas, and InfluxDB in being powered by Apache Arrow. This also paves the way for high-performance data connections to the Spice.ai runtime using Apache Arrow Flight and import/export of data using Apache Parquet. We’re incredibly excited about the potential this architecture has for building intelligent applications on top of a high-performance transport between application data sources the Spice.ai AI engine.
Highlights in v0.6-alpha
Massive improvement in data loading performance and dataset scale
From data connectors, to REST API, to AI engine, we’ve now rebuilt Spice.ai’s data processing and transport on the Apache Arrow project. Specifically, using the Apache Arrow for Go implementation. Many thanks to Matt Topol for his contributions to the project and guidance on using it.
This release includes a change to the Spice.ai runtime to AI Engine transport from sending text CSV over gGPC to Apache Arrow Records over IPC (Unix sockets).
This is a breaking change to the Data Processor interface, as it now uses arrow.Record
instead of Observation
.
Benchmarking v0.6
Before v0.6, Spice.ai would not scale into the 100s of 1000s of rows.
Format | Row Number | Data Size | Process Time | Load Time | Transport time | Memory Usage |
---|---|---|---|---|---|---|
csv | 2,000 | 163.15KiB | 3.0005s | 0.0000s | 0.0100s | 423.754MiB |
csv | 20,000 | 1.61MiB | 2.9765s | 0.0000s | 0.0938s | 479.644MiB |
csv | 200,000 | 16.31MiB | 0.2778s | 0.0000s | NA (error) | 0.000MiB |
csv | 2,000,000 | 164.97MiB | 0.2573s | 0.0050s | NA (error) | 0.000MiB |
json | 2,000 | 301.79KiB | 3.0261s | 0.0000s | 0.0282s | 422.135MiB |
json | 20,000 | 2.97MiB | 2.9020s | 0.0000s | 0.2541s | 459.138MiB |
json | 200,000 | 29.85MiB | 0.2782s | 0.0010s | NA (error) | 0.000MiB |
json | 2,000,000 | 300.39MiB | 0.3353s | 0.0080s | NA (error) | 0.000MiB |
After building on Arrow, Spice.ai now easily scales beyond millions of rows.
Format | Row Number | Data Size | Process Time | Load Time | Transport time | Memory Usage |
---|---|---|---|---|---|---|
csv | 2,000 | 163.14KiB | 2.8281s | 0.0000s | 0.0194s | 439.580MiB |
csv | 20,000 | 1.61MiB | 2.7297s | 0.0000s | 0.0658s | 461.836MiB |
csv | 200,000 | 16.30MiB | 2.8072s | 0.0020s | 0.4830s | 639.763MiB |
csv | 2,000,000 | 164.97MiB | 2.8707s | 0.0400s | 4.2680s | 1897.738MiB |
json | 2,000 | 301.80KiB | 2.7275s | 0.0000s | 0.0367s | 436.238MiB |
json | 20,000 | 2.97MiB | 2.8284s | 0.0000s | 0.2334s | 473.550MiB |
json | 200,000 | 29.85MiB | 2.8862s | 0.0100s | 1.7725s | 824.089MiB |
json | 2,000,000 | 300.39MiB | 2.7437s | 0.0920s | 16.5743s | 4044.118MiB |
New in this release
- Adds Apache Arrow data processing and transport.
- Fixes TensorBoard logging and monitoring when using GitHub Codespaces and Docker.
- Adds Polling HTTP Data Connector
Resources
Community
Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved. We will also be starting a community call series soon!
- Discord: https://discord.gg/kZnTfneP5u
- Reddit: https://www.reddit.com/r/spiceai
- Twitter: @spice_ai
- Email: [email protected]