<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>무제</title>
    <link>https://mugan1.tistory.com/</link>
    <description></description>
    <language>ko</language>
    <pubDate>Sat, 9 May 2026 06:17:13 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>mugan1</managingEditor>
    <item>
      <title>[Langchain] Dask로 대용량 판례 데이터 처리하기</title>
      <link>https://mugan1.tistory.com/81</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;판례 json 데이터가 5만 여개가 넘고, 각 데이터를 하나의 CSV로 처리하여 관리하고 싶었다.&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;나중에는 db를 활용해야겠지만, 현재는 판례 데이터를 병합하여 하나의 데이터프레임으로 만들어 로컬에서 관리할 것이다.&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이를 위해 json을 읽고 데이터프레임으로 빠르게 변환하는 방법인 dask를 사용하였다.&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1733750720327&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import dask.bag as db
import json
import glob

directory_path = &quot;./1.판례&quot;

json_files = glob.glob(f&quot;{directory_path}/*.json&quot;)

bag = db.from_sequence(json_files)

def load_json(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return json.load(file)

def handle_missing_values(record):
    for key, value in record.items():
        if key == '판례일련번호' or key == '사건종류코드':
            if value is None or value == '':
                record[key] = 0
        elif value is None or value == '':
            record[key] = &quot;&quot;
    return record

    return record
data_bag = bag.map(load_json).map(handle_missing_values)

df = data_bag.to_dataframe()

print(df.compute())&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;간단한 코드만으로 10여분 걸리던 변환 작업이 47초만에 완료됐다...ㄷㄷ&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;단 dask dataframe에서는 자유자재로 데이터를 다루기에는 지연이 많이 발생한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;따라서 일단 처리된 데이터를 csv로 저장하고,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;전처리 작업에서는&amp;nbsp; pandas dataframe에서 다시 로드하여 사용해야 겠다.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1419&quot; data-origin-height=&quot;604&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/d6sabh/btsLcZah7r7/4SJFF6eJSbfaVi39GHMn1K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/d6sabh/btsLcZah7r7/4SJFF6eJSbfaVi39GHMn1K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/d6sabh/btsLcZah7r7/4SJFF6eJSbfaVi39GHMn1K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fd6sabh%2FbtsLcZah7r7%2F4SJFF6eJSbfaVi39GHMn1K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1419&quot; height=&quot;604&quot; data-origin-width=&quot;1419&quot; data-origin-height=&quot;604&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;결과 또한 pandas가 깔끔하게 보여서 좋다!&lt;/p&gt;</description>
      <category>Project/LLM</category>
      <author>mugan1</author>
      <guid isPermaLink="true">https://mugan1.tistory.com/81</guid>
      <comments>https://mugan1.tistory.com/81#entry81comment</comments>
      <pubDate>Mon, 9 Dec 2024 22:30:02 +0900</pubDate>
    </item>
    <item>
      <title>[백준] 2004번 조합 0의 개수</title>
      <link>https://mugan1.tistory.com/80</link>
      <description>&lt;pre id=&quot;code_1733728673859&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import sys
input = sys.stdin.readline

n, m = map(int, input().split())

def two_count(n):
    two = 0
    while n != 0:
        n = n // 2
        two += n
    return two

def five_count(n):
    five = 0
    while n != 0:
        n = n // 5
        five += n
    return five

print(min(two_count(n) - two_count(n - m) - two_count(m), five_count(n) - five_count(n - m) - five_count(m)))&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 문제도 못풀었다. 5의 지수만을 생각해서 풀었는데, 조합 공식에 따라 나누어 질 경우 5만 카운트할 때 틀린 답이 될 수 있다(ex 5x3 = 15)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;다시 복습해볼 문제&lt;/p&gt;</description>
      <category>Study/Coding Test 오답노트</category>
      <author>mugan1</author>
      <guid isPermaLink="true">https://mugan1.tistory.com/80</guid>
      <comments>https://mugan1.tistory.com/80#entry80comment</comments>
      <pubDate>Mon, 9 Dec 2024 16:21:55 +0900</pubDate>
    </item>
    <item>
      <title>[백준] 15990번 1,2,3 더하기 5</title>
      <link>https://mugan1.tistory.com/79</link>
      <description>&lt;pre id=&quot;code_1733725412653&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import sys
input = sys.stdin.readline

t = int(input())

dp = [[0 for _ in range(3)] for _ in range(100001)]

dp[1] = [1, 0, 0]
dp[2] = [0, 1, 0]
dp[3] = [1, 1, 1]

for i in range(4, 100001):
    
    dp[i][0] = (dp[i - 1][1] + dp[i - 1][2]) % 1000000009
    dp[i][1] = (dp[i - 2][0] + dp[i - 2][2]) % 1000000009
    dp[i][2] = (dp[i - 3][0] + dp[i - 3][1]) % 1000000009

for _ in range(t):
    n = int(input())
    print(sum(dp[n]) % 1000000009)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;풀기 어려운 문제다. 이걸 어떻게 풀지? 한참 연습해야할 것 같다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1,2,3 더하기가 단순한 점화식 문제였다면, 이번 문제는 이중 배열을 사용해야하는 다소 복잡한 문제다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;포인트는&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;i가 6일 경우&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;5 + 1 = 4 + 2 = 3 + 3 = 6이므로&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;5가 2와 3으로 끝나는 경우에서 1을 더해주고&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;4가 1과 3으로 끝나는 경우에서 2를 더해주고&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3이 1과 2로 끝나는 경우에서 3을 더해주면 6을 만들 수 있으므로&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;해당 규칙을 통해 점화식을 만들어주는 것이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;다시 풀라해도 못풀겠다&lt;/p&gt;</description>
      <category>Study/Coding Test 오답노트</category>
      <author>mugan1</author>
      <guid isPermaLink="true">https://mugan1.tistory.com/79</guid>
      <comments>https://mugan1.tistory.com/79#entry79comment</comments>
      <pubDate>Mon, 9 Dec 2024 15:26:41 +0900</pubDate>
    </item>
    <item>
      <title>[백준] 9095번 1,2,3 더하기</title>
      <link>https://mugan1.tistory.com/78</link>
      <description>&lt;pre id=&quot;code_1733403721612&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import sys
input = sys.stdin.readline

t = int(input())

array = [0]*11
for _ in range(t):
    n = int(input())
    
    for i in range(1, n+1):
        if i == 1:
            array[i] = 1
        elif i == 2:
            array[i] = 2
        elif i == 3:
            array[i] = 4
        else :
            array[i] = array[i-1] + array[i-2] + array[i-3]
    print(array[n])&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;DP를 사용한 기본 문제.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;수가 1씩 증가할때 마다 경우의 수가 얼마나 증가하는지 규칙을 파악하면 해결할 수 있는 문제&lt;/p&gt;</description>
      <category>Study/Coding Test 오답노트</category>
      <author>mugan1</author>
      <guid isPermaLink="true">https://mugan1.tistory.com/78</guid>
      <comments>https://mugan1.tistory.com/78#entry78comment</comments>
      <pubDate>Thu, 5 Dec 2024 22:02:43 +0900</pubDate>
    </item>
    <item>
      <title>LLM 관련 자료 모음</title>
      <link>https://mugan1.tistory.com/77</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;1. LLM 커리큘럼&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://fastcampus.co.kr/data_online_rag&quot;&gt;https://fastcampus.co.kr/data_online_rag&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1732950559107&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;RAG를 활용한 완성도 높은 LLM 서비스 구축 With langchain &amp;amp; llamaindex | 패스트캠퍼스&quot; data-og-description=&quot;할루시네이션부터 데이터 유출 고민까지 한 번에 해결하세요!&quot; data-og-host=&quot;fastcampus.co.kr&quot; data-og-source-url=&quot;https://fastcampus.co.kr/data_online_rag&quot; data-og-url=&quot;https://fastcampus.co.kr/data_online_rag&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/D9MOf/hyXGMoyCmx/grltMKmQ3rUqh1UhMkPFMK/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630,https://scrap.kakaocdn.net/dn/cBNCf7/hyXGGPpEal/4Qn52WfKnApH6KXBpkhIbK/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630&quot;&gt;&lt;a href=&quot;https://fastcampus.co.kr/data_online_rag&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://fastcampus.co.kr/data_online_rag&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/D9MOf/hyXGMoyCmx/grltMKmQ3rUqh1UhMkPFMK/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630,https://scrap.kakaocdn.net/dn/cBNCf7/hyXGGPpEal/4Qn52WfKnApH6KXBpkhIbK/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;RAG를 활용한 완성도 높은 LLM 서비스 구축 With langchain &amp;amp; llamaindex | 패스트캠퍼스&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;할루시네이션부터 데이터 유출 고민까지 한 번에 해결하세요!&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;fastcampus.co.kr&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;좋은 커리큘럼인 것 같다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. LLM 파인튜닝&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://velog.io/@gaetokk/LLM-%ED%9B%88%EB%A0%A8-%EC%9A%A9%EC%96%B4%EB%93%A4-%EC%A0%95%EB%A6%AC&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://velog.io/@gaetokk/LLM-%ED%9B%88%EB%A0%A8-%EC%9A%A9%EC%96%B4%EB%93%A4-%EC%A0%95%EB%A6%AC&lt;/a&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3. 프로젝트 참고&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://github.com/hunsii/LawBot?tab=readme-ov-file&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://github.com/hunsii/LawBot?tab=readme-ov-file&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1732951826374&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;object&quot; data-og-title=&quot;GitHub - hunsii/LawBot: LLM을 활용한 대화형 유사 판례 검색 시스템입니다.&quot; data-og-description=&quot;LLM을 활용한 대화형 유사 판례 검색 시스템입니다. Contribute to hunsii/LawBot development by creating an account on GitHub.&quot; data-og-host=&quot;github.com&quot; data-og-source-url=&quot;https://github.com/hunsii/LawBot?tab=readme-ov-file&quot; data-og-url=&quot;https://github.com/hunsii/LawBot&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/cDPaUx/hyXGAuVkxt/vAEvJJjXSzBfI4OMQAjZM0/img.png?width=1200&amp;amp;height=600&amp;amp;face=0_0_1200_600,https://scrap.kakaocdn.net/dn/b6QIGe/hyXDhqdVdX/h1TqbWZGqAzjB9r2TryekK/img.png?width=1200&amp;amp;height=600&amp;amp;face=0_0_1200_600&quot;&gt;&lt;a href=&quot;https://github.com/hunsii/LawBot?tab=readme-ov-file&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://github.com/hunsii/LawBot?tab=readme-ov-file&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/cDPaUx/hyXGAuVkxt/vAEvJJjXSzBfI4OMQAjZM0/img.png?width=1200&amp;amp;height=600&amp;amp;face=0_0_1200_600,https://scrap.kakaocdn.net/dn/b6QIGe/hyXDhqdVdX/h1TqbWZGqAzjB9r2TryekK/img.png?width=1200&amp;amp;height=600&amp;amp;face=0_0_1200_600');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;GitHub - hunsii/LawBot: LLM을 활용한 대화형 유사 판례 검색 시스템입니다.&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;LLM을 활용한 대화형 유사 판례 검색 시스템입니다. Contribute to hunsii/LawBot development by creating an account on GitHub.&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;github.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://github.com/boostcampaitech5/level3_nlp_finalproject-nlp-08&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://github.com/boostcampaitech5/level3_nlp_finalproject-nlp-08&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1733795104228&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;object&quot; data-og-title=&quot;GitHub - boostcampaitech5/level3_nlp_finalproject-nlp-08: 사용자가 채팅웹을 통해 자신이 처한 법률적 상황을 &quot; data-og-description=&quot;사용자가 채팅웹을 통해 자신이 처한 법률적 상황을 제시하면, 입력에 대한 문맥을 모델이 이해하여 가이드라인을 제시하고, 유사한 상황의 판례를 제공하는 웹 서비스입니다. (2023.08.18 서비스 &quot; data-og-host=&quot;github.com&quot; data-og-source-url=&quot;https://github.com/boostcampaitech5/level3_nlp_finalproject-nlp-08&quot; data-og-url=&quot;https://github.com/boostcampaitech5/level3_nlp_finalproject-nlp-08&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/caAwTt/hyXKnvNRGY/PsW8eQlaD3WoQttEsRu6V0/img.png?width=1200&amp;amp;height=600&amp;amp;face=0_0_1200_600,https://scrap.kakaocdn.net/dn/pWbJQ/hyXKlksUbo/hFnuDx6NDZg0ZVkjfUrCrk/img.png?width=1200&amp;amp;height=600&amp;amp;face=0_0_1200_600&quot;&gt;&lt;a href=&quot;https://github.com/boostcampaitech5/level3_nlp_finalproject-nlp-08&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://github.com/boostcampaitech5/level3_nlp_finalproject-nlp-08&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/caAwTt/hyXKnvNRGY/PsW8eQlaD3WoQttEsRu6V0/img.png?width=1200&amp;amp;height=600&amp;amp;face=0_0_1200_600,https://scrap.kakaocdn.net/dn/pWbJQ/hyXKlksUbo/hFnuDx6NDZg0ZVkjfUrCrk/img.png?width=1200&amp;amp;height=600&amp;amp;face=0_0_1200_600');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;GitHub - boostcampaitech5/level3_nlp_finalproject-nlp-08: 사용자가 채팅웹을 통해 자신이 처한 법률적 상황을&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;사용자가 채팅웹을 통해 자신이 처한 법률적 상황을 제시하면, 입력에 대한 문맥을 모델이 이해하여 가이드라인을 제시하고, 유사한 상황의 판례를 제공하는 웹 서비스입니다. (2023.08.18 서비스&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;github.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Project/LLM</category>
      <author>mugan1</author>
      <guid isPermaLink="true">https://mugan1.tistory.com/77</guid>
      <comments>https://mugan1.tistory.com/77#entry77comment</comments>
      <pubDate>Sat, 30 Nov 2024 16:23:34 +0900</pubDate>
    </item>
    <item>
      <title>[Langchain] LLM 모델로 LLM 평가하기</title>
      <link>https://mugan1.tistory.com/76</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;Allganize에서 제공한 자동화 코드를 사용했고, 3가지의 평가 모델을 사용하여 내가 만든 모델이 GT와 일치성이 높으면 &quot;O&quot;, 그렇지 않으면 &quot;X&quot;를 출력하도록 설계 되어있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;모델은 ChatOpenAI의 효율성 좋은 GPT4-mini를 사용하였고, vector DB는 qdrant를 썼다&lt;/p&gt;
&lt;pre id=&quot;code_1732866671352&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;# model
llm = ChatOpenAI(
    temperature=0.1,
    model = &quot;gpt-4o-mini&quot;
)

cache_dir = LocalFileStore(&quot;./.cache/&quot;)

splitter = CharacterTextSplitter.from_tiktoken_encoder(
    separator=&quot;\n&quot;,
    chunk_size=600,
    chunk_overlap=100,
)
embeddings = OpenAIEmbeddings()
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(embeddings, cache_dir)
client = QdrantClient(&quot;http://localhost:6333&quot;)
client.recreate_collection(
     collection_name=&quot;law&quot;,
     vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
vectorstore = QdrantVectorStore(
    client=client,
    collection_name=&quot;law&quot;,
    embedding=cached_embeddings,
)

pdf_files = glob(r&quot;C:\Users\user\Desktop\LHS\Project\Evaluate\*.pdf&quot;)
index_counter = {&quot;current_index&quot;: 0}
async def process_pdf(file):
    print(file)
    loader = UnstructuredFileLoader(file)
    docs = loader.load_and_split(text_splitter=splitter)
    ids = []
    for _ in range(len(docs)):
        ids.append(index_counter[&quot;current_index&quot;])
        index_counter[&quot;current_index&quot;] += 1
    vectorstore.add_documents(documents=docs, ids=ids)

await asyncio.gather(*(process_pdf(file) for file in pdf_files))

retriever = vectorstore.as_retriever()&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;답변이 너무 길면, 생성과 평가가 너무 오래걸리는 것 같아, 비교적 축약한 답변을 내보내주는 map reduce 방식으로 텍스트를 요약했다&lt;/p&gt;
&lt;pre id=&quot;code_1732866729909&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;map_doc_prompt = ChatPromptTemplate.from_messages(
    [
        (
            &quot;system&quot;,
            &quot;&quot;&quot;
            Use the following portion of a long document to see if any of the text is relevant to answer the question. Return any relevant text verbatim. If there is no relevant text, return : ''
            -------
            {context}
            &quot;&quot;&quot;,
        ),
        (&quot;human&quot;, &quot;{question}&quot;),
    ]
)

map_doc_chain = map_doc_prompt | llm

def map_docs(inputs):
    documents = inputs[&quot;documents&quot;]
    question = inputs[&quot;question&quot;]
    return &quot;\n\n&quot;.join(
        map_doc_chain.invoke(
            {&quot;context&quot;: doc.page_content, &quot;question&quot;: question}
        ).content
        for doc in documents
    )
    # for doc in documents:
    #     result = map_doc_chain.invoke({&quot;context&quot;: doc.page_content, &quot;question&quot;: question}).content
    #     print(result)
    # return 

map_chain = {
    &quot;documents&quot;: retriever,
    &quot;question&quot;: RunnablePassthrough(),
} | RunnableLambda(map_docs)

final_prompt = ChatPromptTemplate.from_messages(
    [
        (
            &quot;system&quot;,
            &quot;&quot;&quot;
            You are a competent lawyer. Answer questions in Korean using only the following context.
            Given the following extracted parts of a long document and a question, create a final answer. 
            If you don't know the answer, just say that you don't know. Don't try to make up an answer.
            ------
            {context}
            &quot;&quot;&quot;,
        ),
        (&quot;human&quot;, &quot;{question}&quot;),
    ]
)

chain = {&quot;context&quot;: map_chain, &quot;question&quot;: RunnablePassthrough()} | final_prompt | llm&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Hugging face에서 데이터셋을 받아 pandas로 정제&lt;/p&gt;
&lt;pre id=&quot;code_1732866862119&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import pandas as pd

from datasets import load_dataset
dataset = load_dataset(&quot;allganize/RAG-Evaluation-Dataset-KO&quot;)
df = pd.DataFrame(dataset['test'])

law_df = df[df['domain'] == 'law']
responses = []

for question in law_df[&quot;question&quot;]:
    responses.append(chain.invoke(question).content)

law_df[&quot;map_reduce_answer&quot;] = responses&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Allganize에서 제공한 colab 자동화 코드를 사용하여 평가한 결과는...?&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;['X',&amp;nbsp;'X',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'X',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'X',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'X',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'X',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'X',&amp;nbsp;'X',&amp;nbsp;'X',&amp;nbsp;'X',&amp;nbsp;'X',&amp;nbsp;'X',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'X',&amp;nbsp;'O',&amp;nbsp;'O',&amp;nbsp;'X']&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;60개 중에서 35개를 맞췄다!&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Leaderboard로 비교하면 upstage의 gpt4 모델 정도의 성능이다&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;다음에는 판례 및 법률 QA 데이터셋을 대용량으로 처리하는 법을 공부해볼 것이다.&amp;nbsp;&lt;/p&gt;</description>
      <category>Project/LLM</category>
      <author>mugan1</author>
      <guid isPermaLink="true">https://mugan1.tistory.com/76</guid>
      <comments>https://mugan1.tistory.com/76#entry76comment</comments>
      <pubDate>Fri, 29 Nov 2024 16:59:24 +0900</pubDate>
    </item>
    <item>
      <title>[백준] 1676번 팩토리얼 0의 개수</title>
      <link>https://mugan1.tistory.com/75</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;1. 나의 풀이&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;통과한 풀이지만 모범답안을 보니 엉망이네&lt;/p&gt;
&lt;pre id=&quot;code_1732864560192&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import sys
input = sys.stdin.readline

n = int(input())

array = [1]*501

for i in range(1, n+1):
    array[i] = array[i-1]*i

answer = 0 
while array[n] % 10 == 0:
    array[n] = array[n] // 10    
    answer +=1
print(answer)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. 좋은 풀이&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;와 10은 2x5이므로 5의 개수만 찾아내면 된다....&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;범위가 500까지므로 5의 3제곱수까지만 계산해주면 된다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이걸 어케 생각하는거지...&lt;/p&gt;
&lt;pre id=&quot;code_1732864641634&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;N = int(input())
print(N//5 + N//25 + N//125)&lt;/code&gt;&lt;/pre&gt;</description>
      <category>Study/Coding Test 오답노트</category>
      <author>mugan1</author>
      <guid isPermaLink="true">https://mugan1.tistory.com/75</guid>
      <comments>https://mugan1.tistory.com/75#entry75comment</comments>
      <pubDate>Fri, 29 Nov 2024 16:20:19 +0900</pubDate>
    </item>
    <item>
      <title>[백준] 6588번 골드바흐의 추측</title>
      <link>https://mugan1.tistory.com/74</link>
      <description>&lt;pre id=&quot;code_1732713391878&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import sys
input = sys.stdin.readline

number = [True] * 1000001

# 소수 list
for i in range(2, int(len(number) ** 0.5) + 1):
    if number[i]:
        for j in range(2 * i, 1000001, i):
            number[j] = False

while 1:
    n = int(input())
    if n == 0:
        break

    for i in range(n - 3, 2, -2):
        if (number[i] == True) and (number[n - i] == True):
            print(f&quot;{n} = {n-i} + {i}&quot;)
            break
    else:
        print('&quot;Goldbach\'s conjecture is wrong.&quot;')&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아이고 또 틀렸다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;에라토스테네스의 체의 개념을 알면 쉽게 풀 수 있는 문제라고 생각했다..&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Study/Coding Test 오답노트</category>
      <author>mugan1</author>
      <guid isPermaLink="true">https://mugan1.tistory.com/74</guid>
      <comments>https://mugan1.tistory.com/74#entry74comment</comments>
      <pubDate>Wed, 27 Nov 2024 22:17:07 +0900</pubDate>
    </item>
    <item>
      <title>[백준] 10820 문자열 분석 / 1463 1로 만들기</title>
      <link>https://mugan1.tistory.com/73</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;1. 10820 문자열 분석&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;파이썬에서는 input()은 EOF 에러를 발생시키기 때문에, except 를 통해 예외처리를 하여 정상적으로 프로세스를 종료&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;sys.stdin.readline()은 빈문자열 발생으로 입력초과 발생...&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #666666;&quot;&gt;&lt;span style=&quot;color: #666666;&quot;&gt;와 이런 차이가 있었다니...&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1732623868995&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import sys
# input = sys.stdin.readline

while True:
    try:
        s = input()
        capital = 0
        letter = 0
        num = 0
        space = 0
        for i in s :
            if i.isupper():
                capital += 1
            elif i.islower():
                letter +=1
            elif i.isnumeric():
                num +=1
            elif i == &quot; &quot; :
                space += 1
        print(letter, capital, num, space)
    except :
        break&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. 1로 만들기&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;드디어 DP 문제를 풀기 시작했고, 예상했던대로 어렵다..&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이전 수의 최소값에서 1을 더한 것과, 나누었을 경우의 최소값에서 1을 더한 값을 비교하는 것이 포인트...&lt;/p&gt;
&lt;pre id=&quot;code_1732626561403&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import sys
input = sys.stdin.readline

n = int(input())

tmp = [0]*((10**6)+1)

for i in range(2, n+1):
    tmp[i] = tmp[i-1]+1

    if i % 2 ==0:
        tmp[i] = min(tmp[i], tmp[i//2]+1)
    if i % 3 ==0:
        tmp[i] = min(tmp[i], tmp[i//3]+1)
print(tmp[n])&lt;/code&gt;&lt;/pre&gt;</description>
      <category>Study/Coding Test 오답노트</category>
      <author>mugan1</author>
      <guid isPermaLink="true">https://mugan1.tistory.com/73</guid>
      <comments>https://mugan1.tistory.com/73#entry73comment</comments>
      <pubDate>Tue, 26 Nov 2024 22:10:35 +0900</pubDate>
    </item>
    <item>
      <title>[Langchain] Model Evaluation - 1</title>
      <link>https://mugan1.tistory.com/72</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;1. 성능 평가 방식에 대해 잘 정리된 블로그 링크&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://gagadi.tistory.com/58&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://gagadi.tistory.com/58&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1732436045356&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[LLM Evaluation] LLM 성능 평가 방법 : Metric, Benchmark, LLM-as-a-judge 등&quot; data-og-description=&quot;  LLM 성능 평가 방법 정리&amp;nbsp;&amp;nbsp;  개요&amp;nbsp;LLM의 성능을 제대로 측정하는 작업은 모델의 개발 과정뿐만 아니라 수많은 LLM 중 어떤 모델을 선택할 것인지 결정하는 상황에서도 매우 중요하다. 즉, LLM&quot; data-og-host=&quot;gagadi.tistory.com&quot; data-og-source-url=&quot;https://gagadi.tistory.com/58&quot; data-og-url=&quot;https://gagadi.tistory.com/58&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/uoCe5/hyXDcgJWtz/puv0G5GFjnXJBvtNjowBLK/img.jpg?width=800&amp;amp;height=581&amp;amp;face=0_0_800_581,https://scrap.kakaocdn.net/dn/cn77EM/hyXDfqYo7U/yXUr5JxaehlCE9HHSz6Jw1/img.jpg?width=800&amp;amp;height=581&amp;amp;face=0_0_800_581,https://scrap.kakaocdn.net/dn/zqubY/hyXDniejWa/SSXpneJzqXZIFPZNleat1k/img.jpg?width=1140&amp;amp;height=982&amp;amp;face=0_0_1140_982&quot;&gt;&lt;a href=&quot;https://gagadi.tistory.com/58&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://gagadi.tistory.com/58&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/uoCe5/hyXDcgJWtz/puv0G5GFjnXJBvtNjowBLK/img.jpg?width=800&amp;amp;height=581&amp;amp;face=0_0_800_581,https://scrap.kakaocdn.net/dn/cn77EM/hyXDfqYo7U/yXUr5JxaehlCE9HHSz6Jw1/img.jpg?width=800&amp;amp;height=581&amp;amp;face=0_0_800_581,https://scrap.kakaocdn.net/dn/zqubY/hyXDniejWa/SSXpneJzqXZIFPZNleat1k/img.jpg?width=1140&amp;amp;height=982&amp;amp;face=0_0_1140_982');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[LLM Evaluation] LLM 성능 평가 방법 : Metric, Benchmark, LLM-as-a-judge 등&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;  LLM 성능 평가 방법 정리&amp;nbsp;&amp;nbsp;  개요&amp;nbsp;LLM의 성능을 제대로 측정하는 작업은 모델의 개발 과정뿐만 아니라 수많은 LLM 중 어떤 모델을 선택할 것인지 결정하는 상황에서도 매우 중요하다. 즉, LLM&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;gagadi.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. Allganize의 자동평가코드 사용법이 상세히 기재된 블로그&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://didi-universe.tistory.com/entry/RAG-RAG-%EB%B2%A4%EC%B9%98%EB%A7%88%ED%81%AC-%EB%8D%B0%EC%9D%B4%ED%84%B0%EC%85%8B-%EC%84%B1%EB%8A%A5-%ED%8F%89%EA%B0%80-%EB%A6%AC%EB%B7%B0-RAG-Evaluation-Dataset-KO&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://didi-universe.tistory.com/entry/RAG-RAG-%EB%B2%A4%EC%B9%98%EB%A7%88%ED%81%AC-%EB%8D%B0%EC%9D%B4%ED%84%B0%EC%85%8B-%EC%84%B1%EB%8A%A5-%ED%8F%89%EA%B0%80-%EB%A6%AC%EB%B7%B0-RAG-Evaluation-Dataset-KO&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1732436985850&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[RAG] RAG 벤치마크 데이터셋 &amp;amp; 성능 평가 리뷰 : RAG-Evaluation-Dataset-KO&quot; data-og-description=&quot;개요&amp;nbsp;한국어 RAG 솔루션 성능 평가를 위해 RAG 벤치마크 데이터셋과 평가 관련 리서치를 진행,올거나이즈에서 운영중인 RAG 리더보드에서 사용하는 벤치마크 데이터셋을 찾게 되었다.&amp;nbsp;https://huggin&quot; data-og-host=&quot;didi-universe.tistory.com&quot; data-og-source-url=&quot;https://didi-universe.tistory.com/entry/RAG-RAG-%EB%B2%A4%EC%B9%98%EB%A7%88%ED%81%AC-%EB%8D%B0%EC%9D%B4%ED%84%B0%EC%85%8B-%EC%84%B1%EB%8A%A5-%ED%8F%89%EA%B0%80-%EB%A6%AC%EB%B7%B0-RAG-Evaluation-Dataset-KO&quot; data-og-url=&quot;https://didi-universe.tistory.com/entry/RAG-RAG-%EB%B2%A4%EC%B9%98%EB%A7%88%ED%81%AC-%EB%8D%B0%EC%9D%B4%ED%84%B0%EC%85%8B-%EC%84%B1%EB%8A%A5-%ED%8F%89%EA%B0%80-%EB%A6%AC%EB%B7%B0-RAG-Evaluation-Dataset-KO&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/UH1r4/hyXDeFBv8Z/KPLazZ7TiRuO8tS3XTVCxK/img.png?width=800&amp;amp;height=546&amp;amp;face=0_0_800_546,https://scrap.kakaocdn.net/dn/HRqvZ/hyXDdNs4Mv/6OSw7uxMfElnDx3XnUrVxk/img.png?width=800&amp;amp;height=546&amp;amp;face=0_0_800_546,https://scrap.kakaocdn.net/dn/rakZn/hyXDf5Bdsb/6ehxFTQLuWDZ5KHuGVcfd1/img.png?width=3630&amp;amp;height=1306&amp;amp;face=0_0_3630_1306&quot;&gt;&lt;a href=&quot;https://didi-universe.tistory.com/entry/RAG-RAG-%EB%B2%A4%EC%B9%98%EB%A7%88%ED%81%AC-%EB%8D%B0%EC%9D%B4%ED%84%B0%EC%85%8B-%EC%84%B1%EB%8A%A5-%ED%8F%89%EA%B0%80-%EB%A6%AC%EB%B7%B0-RAG-Evaluation-Dataset-KO&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://didi-universe.tistory.com/entry/RAG-RAG-%EB%B2%A4%EC%B9%98%EB%A7%88%ED%81%AC-%EB%8D%B0%EC%9D%B4%ED%84%B0%EC%85%8B-%EC%84%B1%EB%8A%A5-%ED%8F%89%EA%B0%80-%EB%A6%AC%EB%B7%B0-RAG-Evaluation-Dataset-KO&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/UH1r4/hyXDeFBv8Z/KPLazZ7TiRuO8tS3XTVCxK/img.png?width=800&amp;amp;height=546&amp;amp;face=0_0_800_546,https://scrap.kakaocdn.net/dn/HRqvZ/hyXDdNs4Mv/6OSw7uxMfElnDx3XnUrVxk/img.png?width=800&amp;amp;height=546&amp;amp;face=0_0_800_546,https://scrap.kakaocdn.net/dn/rakZn/hyXDf5Bdsb/6ehxFTQLuWDZ5KHuGVcfd1/img.png?width=3630&amp;amp;height=1306&amp;amp;face=0_0_3630_1306');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[RAG] RAG 벤치마크 데이터셋 &amp;amp; 성능 평가 리뷰 : RAG-Evaluation-Dataset-KO&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;개요&amp;nbsp;한국어 RAG 솔루션 성능 평가를 위해 RAG 벤치마크 데이터셋과 평가 관련 리서치를 진행,올거나이즈에서 운영중인 RAG 리더보드에서 사용하는 벤치마크 데이터셋을 찾게 되었다.&amp;nbsp;https://huggin&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;didi-universe.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;내가 만든 Laywer LLM을 평가하기 위해서 다음과 같은 파이프라인을 구축할 것이다.&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1) Allganize에서 제공한 데이터셋 중 도메인 분야가 law인 dataset만 추출&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2) 관련 pdf 문서 다운로드&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;-&amp;gt; 1,2번의 경우 테스트가 완료되면 Law QA 데이터셋을 추가로 확보하여 50~100문항 정도를 준비해야겠다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3) 평가 자동화 코드 구현 및 테스트&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;4) 성능 검증(모델 / Summarization 방법 등등을 비교)&lt;/p&gt;</description>
      <category>Project/LLM</category>
      <author>mugan1</author>
      <guid isPermaLink="true">https://mugan1.tistory.com/72</guid>
      <comments>https://mugan1.tistory.com/72#entry72comment</comments>
      <pubDate>Sun, 24 Nov 2024 17:50:49 +0900</pubDate>
    </item>
  </channel>
</rss>