무제
[Langchain] Qdrant Vector DB 본문
https://blog.sionic.ai/vector-database-practice
Vector Database 구축 실습
실습 환경 구성
blog.sionic.ai
Sionic AI의 블로그를 참고하여 qdrant를 docker에서 pull하고
qdrant에 documents를 추가하여 vector search를 진행한 예제 코드
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings, CacheBackedEmbeddings
from langchain.vectorstores import Chroma
from langchain.storage import LocalFileStore
from qdrant_client.http.models import Distance, VectorParams, PointStruct, PointInsertOperations
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
from uuid import uuid4
from langchain_core.documents import Document
cache_dir = LocalFileStore("./.cache/")
loader = UnstructuredFileLoader(r"C:\Users\user\Desktop\LHS\Project\document\모욕.txt")
# splitter = RecursiveCharacterTextSplitter()
splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=500,
chunk_overlap=100,
)
docs = loader.load_and_split(text_splitter=splitter)
embeddings = OpenAIEmbeddings()
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(embeddings, cache_dir)
client = QdrantClient("http://localhost:6333")
client.recreate_collection(
collection_name="test1",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
vector_store = QdrantVectorStore(
client=client,
collection_name="test1",
embedding=cached_embeddings,
)
# uuids = [str(uuid4()) for i in range(len(docs))]
ids = [i+1 for i in range(len(docs))]
vector_store.add_documents(documents=docs, ids=ids)
query = "직작동료가 무례한 표현을 했다고 고소하겠다고 하는데, 모욕죄가 성립할 수 있어?"
results = vector_store.similarity_search_with_score(query=query, k=3)
for res, score in results:
print(score, res.page_content)
꽤나 정확하게 결과를 뱉어낸다
다음은 langchain을 활용해 vector search 결과 + prompt + llm 연결하는 방법을 구현해볼 것이다
'Project > LLM' 카테고리의 다른 글
| [Langchain] Map Reduce (0) | 2024.11.17 |
|---|---|
| [Langchain] Retriever (2) | 2024.11.17 |
| [Langchain] Splitter / Vector DB (0) | 2024.11.10 |
| [Langchain] Memory (0) | 2024.10.25 |
| [Langchain] Cache (0) | 2024.10.23 |
Comments