DataStax, a generative AI data company, has announced its plans to bolster enterprise retrieval-augmented generation (RAG) use cases. This notable boost comes thanks to a strategic integration of the new NVIDIA NIM inference microservices and NeMo Retriever microservices with Astra DB. The expected outcome of this integration is markedly accelerated RAG data solutions, leading to enriched client experiences.
The standout feature of the integration is its ability to generate instantaneous vector embeddings 20x faster than other well-known cloud embedding services. Furthermore, users can anticipate an 80% reduction in cost for these services, providing a potent combination of efficiency and affordability.
Developing generative AI applications often presents significant challenges for organisations as they grapple with the technological intricacies, security demands, and cost barriers associated with vectorising both existing and new unstructured data for effortless integration into large language models (LLMs). The added requirement of generating embeddings in near-real time and effectively indexing data within a vector database only heightens these hurdles.
Through a synergistic collaboration with NVIDIA, DataStax aims to eliminate these issues. The NVIDIA NeMo Retriever can generate over 800 embeddings per second per GPU. This capability aligns perfectly with DataStax's Astra DB, which can process more than 4000 transactions per second at low latency on affordable storage solutions. This bespoke approach substantially lowers the total cost of ownership while ensuring rapid embedding generation and indexing.
DataStax and NVIDIA's collaboration has already yielded impressive results, with DataStax AstraDB vector performance of RAG use cases running on NVIDIA H100 Tensor Core GPUs achieving a 20x improvement. Coupled with the NVIDIA NeMo Retriever, Astra DB and DataStax Enterprise offer a swift vector database RAG solution based on a scalable NoSQL database that can function on any storage medium. Developers can easily replace their existing embedding model with NIM thanks to the integrated RAGStack powered by LangChain and LlamaIndex.
Additionally, DataStax has launched its developer preview of a new feature named "Vectorize." Designed to perform embedding generations at the database tier, Vectorize allows users to generate embeddings using DataStax's NeMo microservices instance instead of their own, passing the cost savings directly to its customers.
Chet Kapoor, chairman and CEO of DataStax, noted, "With a wealth of unstructured data at their disposal, ranging from software logs to customer chat history, enterprises hold a cache of valuable domain knowledge and real-time insights essential for generative AI applications." Kapoor highlights the benefits of integrating NVIDIA NIM into RAGStack to overcome challenges that enterprises confront, thereby helping them to "make significant strides in their genAI application development."
Similarly, Tisson Mathew, CEO and founder of Skypoint, underscores the vital importance of reducing the time spent generating embeddings in meeting his organisation's stringent five-second SLA for generating responses for frontline healthcare providers.
Commenting on the DataStax and NVIDIA integration, Kari Briski, vice president of AI software at NVIDIA, said, "Using the integration of NVIDIA NIM and NeMo Retriever microservices with the DataStax Astra DB, businesses can significantly reduce latency and harness the full power of AI-driven data solutions."