무제

[Langchain] Qdrant Vector DB 본문

Project/LLM

[Langchain] Qdrant Vector DB

mugan1 2024. 11. 12. 22:13

https://blog.sionic.ai/vector-database-practice

 

Vector Database 구축 실습

실습 환경 구성

blog.sionic.ai

Sionic AI의 블로그를 참고하여 qdrant를 docker에서 pull하고
qdrant에 documents를 추가하여 vector search를 진행한 예제 코드

from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings, CacheBackedEmbeddings
from langchain.vectorstores import Chroma
from langchain.storage import LocalFileStore

from qdrant_client.http.models import Distance, VectorParams, PointStruct, PointInsertOperations
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

from uuid import uuid4
from langchain_core.documents import Document

cache_dir = LocalFileStore("./.cache/")

loader = UnstructuredFileLoader(r"C:\Users\user\Desktop\LHS\Project\document\모욕.txt")

# splitter = RecursiveCharacterTextSplitter()
splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=500,
    chunk_overlap=100,
)

docs = loader.load_and_split(text_splitter=splitter)
embeddings = OpenAIEmbeddings()
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(embeddings, cache_dir)

client = QdrantClient("http://localhost:6333")

client.recreate_collection(
    collection_name="test1",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

vector_store = QdrantVectorStore(
    client=client,
    collection_name="test1",
    embedding=cached_embeddings,
)

# uuids = [str(uuid4()) for i in range(len(docs))]
ids = [i+1 for i in range(len(docs))]
vector_store.add_documents(documents=docs, ids=ids)

query = "직작동료가 무례한 표현을 했다고 고소하겠다고 하는데, 모욕죄가 성립할 수 있어?"
results = vector_store.similarity_search_with_score(query=query, k=3)
for res, score in results:
    print(score, res.page_content)

 

꽤나 정확하게 결과를 뱉어낸다

 

다음은 langchain을 활용해 vector search 결과 + prompt + llm 연결하는 방법을 구현해볼 것이다

'Project > LLM' 카테고리의 다른 글

[Langchain] Map Reduce  (0) 2024.11.17
[Langchain] Retriever  (2) 2024.11.17
[Langchain] Splitter / Vector DB  (0) 2024.11.10
[Langchain] Memory  (0) 2024.10.25
[Langchain] Cache  (0) 2024.10.23
Comments