You need to use the identical question instruments to go looking vector indexes in addition to the remainder of your knowledge, supplying you with the choice to go looking based mostly on similarities in your knowledge or by actual matches. This strategy is just like how large-scale search engines like google and yahoo work and can assist discover and rank outcomes from giant semistructured knowledge units, for instance, looking for related evaluations on an e-commerce web site. Cloth requires a vector coverage for every Cosmos DB container, which defines measurement, dimensionality, and the underlying distance perform used to seek for comparable vectors. Search applied sciences like DiskANN require a excessive dimensionality, with at the least 1,000 dimensions (and a most of 4,096).
Querying Cosmos DB in Cloth
If you question knowledge saved in Cosmos DB by means of Cloth’s OneLake, you’re working with a mirrored copy of your Cosmos DB knowledge. As you retailer knowledge, it’s copied throughout within the Delta Parquet format utilized in Cloth, permitting you to make use of any of the supported question instruments, together with the desktop Energy BI for advert hoc evaluation. Queries right here will be made throughout all of your operational knowledge, not simply Cosmos DB, treating it as a unified entire and nonetheless making the most of Cosmos DB’s function set for functions that want to make use of that knowledge.
This additionally lets you make the most of different Cloth options together with your Cosmos DB knowledge, for instance, utilizing it to shortly add embeddings and a vector index to your knowledge, so it may be used as a part of the grounding knowledge for an AI software based mostly on retrieval-augmented technology (RAG).