Some more resources:
- A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge, https://arxiv.org/abs/2310.11703
- Survey of Vector Database Management Systems, https://arxiv.org/abs/2310.14021
- What are Embeddings, https://raw.githubusercontent.com/veekaybee/what_are_embeddi...
---
h/t: https://twitter.com/eatonphil/status/1745524630624862314 and https://twitter.com/ChristophMolnar/status/17457316026829826...
Here's my attempt at this: Embeddings: What they are and why they matter https://simonwillison.net/2023/Oct/23/embeddings/
Available as an annotated presentation + an optional 38 minute video.
Thanks for writing this one Simon, I read it some time ago and I just wanted to say thanks and recommend it to folks browsing the comments, it's really good!
Nicely written and very approachable. Might add a paragraph on why to use cosine similarity, as it gives a chance to illustrate how the n-dimensional vector embedding is used.
Very helpful to make it clear, in concrete terms
Throwing a few more on here (mix of beginner and advanced):
- Wikipedia article: https://en.wikipedia.org/wiki/Vector_database
- Vector Database 101: https://zilliz.com/learn/introduction-to-unstructured-data
- ANN & Similarity search: https://vinija.ai/concepts/ann-similarity-search/
- Distributed database: https://15445.courses.cs.cmu.edu/fall2021/notes/21-distribut...
Throwing one more - https://www.pinecone.io/learn/vector-database/