> ## Documentation Index > Fetch the complete documentation index at: https://wb-21fd5541-sdk-testing-latest.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # Serverless Inference로 Weave 배우기 > Serverless Inference를 사용해 모델 call을 트레이스하고, 출력을 비교하고, 평가를 실행하는 방식으로 Weave의 기초를 익혀보세요. export const GitHubLink = ({url}) => GitHub 소스 코드 ; export const ColabLink = ({url}) => Colab에서 사용해 보기 ;

이 가이드에서는 [Serverless Inference](/ko/inference)와 함께 W\&B Weave를 사용해 Weave의 기초를 익히는 방법을 안내합니다. Serverless Inference를 사용하면 자체 인프라를 설정하거나 여러 공급자의 API 키를 관리할 필요 없이 라이브 오픈소스 모델을 사용해 LLM 애플리케이션을 구축하고 트레이스할 수 있습니다. W\&B API 키를 사용하면 [Serverless Inference에서 호스팅되는 모든 모델](/ko/inference/models)과 상호작용할 수 있습니다. 이 가이드를 마치면 LLM Call을 트레이스하고, 모델을 비교하고, Weave UI에서 검토할 수 있는 평가를 실행하게 됩니다.

## 배울 내용

이 가이드에서는 다음을 알아봅니다: * Weave 및 Serverless Inference 설정. * 자동 트레이스가 포함된 기본 LLM 애플리케이션 빌드. * 여러 모델 비교. * 데이터셋에서 모델 성능 평가. * Weave UI에서 결과 보기.

## 사전 요구 사항

* [W\&B 계정](https://wandb.ai/signup) * Python 3.10+ 또는 Node.js 18+ * 필수 패키지가 설치되어 있어야 합니다: * **Python**: `pip install weave openai` * **TypeScript**: `npm install weave openai` * [OpenAI API 키](https://platform.openai.com/api-keys)를 환경 변수로 설정해야 합니다.

## 첫 번째 LLM Call 트레이스하기

이 섹션에서는 단일 LLM Call을 수행하고 Weave가 이를 자동으로 트레이스하도록 하는 방법을 보여줍니다. 이를 통해 더 복잡한 예시로 넘어가기 전에 설정이 제대로 작동하는지 확인할 수 있습니다. 시작하려면 아래 코드 예시를 복사해 붙여넣으세요. 이 코드 예시는 Serverless Inference의 Llama 3.1-8B를 사용합니다. 이 코드를 실행하면 Weave가 다음 작업을 수행합니다. * LLM Call을 자동으로 트레이스합니다. * 입력, 출력, 지연 시간, 토큰 사용량을 기록합니다. * Weave UI에서 트레이스를 확인할 수 있는 링크를 제공합니다. ```python lines theme={null} import weave import openai # Weave를 초기화합니다. [YOUR-TEAM]을 팀 이름으로 바꾸세요. weave.init("[YOUR-TEAM]/inference-quickstart") # Serverless Inference를 가리키는 OpenAI 호환 클라이언트를 생성합니다 client = openai.OpenAI( base_url='https://api.inference.wandb.ai/v1', api_key="YOUR_WANDB_API_KEY", # 실제 API 키로 바꾸세요 project="[YOUR-TEAM]/my-first-weave-project", # 사용량 추적에 필요합니다 ) # 트레이스를 사용 설정하려면 함수를 데코레이트하세요. 표준 OpenAI 클라이언트를 사용합니다 @weave.op() def ask_llama(question: str) -> str: response = client.chat.completions.create( model="meta-llama/Llama-3.1-8B-Instruct", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": question} ], ) return response.choices[0].message.content # 함수를 호출하면 Weave가 모든 것을 자동으로 트레이스합니다 result = ask_llama("What are the benefits of using W&B Weave for LLM development?") print(result) ``` ```typescript twoslash lines theme={null} // @noErrors import * as weave from 'weave'; import OpenAI from 'openai'; // Weave를 초기화합니다. "[]"로 묶인 값을 사용자 환경에 맞게 바꾸세요. await weave.init("[YOUR-TEAM]/inference-quickstart") // Serverless Inference를 가리키는 OpenAI 호환 클라이언트를 생성합니다 const client = new OpenAI({ baseURL: 'https://api.inference.wandb.ai/v1', // Serverless Inference 엔드포인트 apiKey: process.env.WANDB_API_KEY || 'YOUR_WANDB_API_KEY', // API 키로 바꾸거나 WANDB_API_KEY 환경 변수를 설정하세요 }); // 트레이스를 사용 설정하려면 함수를 weave.op으로 감싸세요 const askLlama = weave.op(async function askLlama(question: string): Promise { const response = await client.chat.completions.create({ model: 'meta-llama/Llama-3.1-70B-Instruct', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: question } ], }); return response.choices[0].message.content || ''; }); // 함수를 호출하면 Weave가 모든 것을 자동으로 트레이스합니다 const result = await askLlama('What are the benefits of using W&B Weave for LLM development?'); console.log(result); ```

## 텍스트 요약 애플리케이션 만들기

이제 단일 LLM Call을 트레이스했으므로, 이 섹션에서는 Weave가 여러 함수에 걸친 중첩된 오퍼레이션을 어떻게 트레이스하는지 보여 줍니다. 이를 통해 실제 다단계 LLM 애플리케이션이 UI에서 어떻게 캡처되는지 확인할 수 있습니다. 다음으로, Weave가 중첩된 오퍼레이션을 어떻게 트레이스하는지 보여 주는 기본 요약 앱 코드를 실행해 보세요: ```python lines theme={null} import weave import openai # Weave 초기화. "[]"로 묶인 값을 실제 값으로 교체하세요. weave.init("[YOUR-TEAM]/inference-quickstart") client = openai.OpenAI( base_url='https://api.inference.wandb.ai/v1', api_key="YOUR_WANDB_API_KEY", # 실제 API 키로 교체하세요 project="[YOUR-TEAM]/my-first-weave-project", # 사용량 추적에 필요합니다 ) @weave.op() def extract_key_points(text: str) -> list[str]: """Extract key points from a text.""" response = client.chat.completions.create( model="meta-llama/Llama-3.1-8B-Instruct", messages=[ {"role": "system", "content": "Extract 3-5 key points from the text. Return each point on a new line."}, {"role": "user", "content": text} ], ) # 빈 줄을 제외하고 응답을 반환합니다 return [line for line in response.choices[0].message.content.strip().splitlines() if line.strip()] @weave.op() def create_summary(key_points: list[str]) -> str: """Create a concise summary based on key points.""" points_text = "\n".join(f"- {point}" for point in key_points) response = client.chat.completions.create( model="meta-llama/Llama-3.1-8B-Instruct", messages=[ {"role": "system", "content": "Create a one-sentence summary based on these key points."}, {"role": "user", "content": f"Key points:\n{points_text}"} ], ) return response.choices[0].message.content @weave.op() def summarize_text(text: str) -> dict: """Main summarization pipeline.""" key_points = extract_key_points(text) summary = create_summary(key_points) return { "key_points": key_points, "summary": summary } # 샘플 텍스트로 사용해 보세요 sample_text = """ The Apollo 11 mission was a historic spaceflight that landed the first humans on the Moon on July 20, 1969. Commander Neil Armstrong and lunar module pilot Buzz Aldrin descended to the lunar surface while Michael Collins remained in orbit. Armstrong became the first person to step onto the Moon, followed by Aldrin 19 minutes later. They spent about two and a quarter hours together outside the spacecraft, collecting samples and taking photographs. """ result = summarize_text(sample_text) print("Key Points:", result["key_points"]) print("\nSummary:", result["summary"]) ``` ```typescript twoslash lines theme={null} // @noErrors import * as weave from 'weave'; import OpenAI from 'openai'; // Weave 초기화 - your-team/your-project를 실제 값으로 교체하세요 await weave.init('[YOUR-TEAM]/inference-quickstart'); const client = new OpenAI({ baseURL: 'https://api.inference.wandb.ai/v1', apiKey: process.env.WANDB_API_KEY || 'YOUR_WANDB_API_KEY', // API 키로 교체하거나 WANDB_API_KEY 환경 변수를 설정하세요 }); const extractKeyPoints = weave.op(async function extractKeyPoints(text: string): Promise { const response = await client.chat.completions.create({ model: 'meta-llama/Llama-3.1-8B-Instruct', messages: [ { role: 'system', content: 'Extract 3-5 key points from the text. Return each point on a new line.' }, { role: 'user', content: text } ], }); // 빈 줄을 제외한 응답을 반환합니다 const content = response.choices[0].message.content || ''; return content.split('\n').map(line => line.trim()).filter(line => line.length > 0); }); const createSummary = weave.op(async function createSummary(keyPoints: string[]): Promise { const pointsText = keyPoints.map(point => `- ${point}`).join('\n'); const response = await client.chat.completions.create({ model: 'meta-llama/Llama-3.1-8B-Instruct', messages: [ { role: 'system', content: 'Create a one-sentence summary based on these key points.' }, { role: 'user', content: `Key points:\n${pointsText}` } ], }); return response.choices[0].message.content || ''; }); const summarizeText = weave.op(async function summarizeText(text: string): Promise<{key_points: string[], summary: string}> { const keyPoints = await extractKeyPoints(text); const summary = await createSummary(keyPoints); return { key_points: keyPoints, summary: summary }; }); // 샘플 텍스트로 사용해 보세요 const sampleText = ` The Apollo 11 mission was a historic spaceflight that landed the first humans on the Moon on July 20, 1969. Commander Neil Armstrong and lunar module pilot Buzz Aldrin descended to the lunar surface while Michael Collins remained in orbit. Armstrong became the first person to step onto the Moon, followed by Aldrin 19 minutes later. They spent about two and a quarter hours together outside the spacecraft, collecting samples and taking photographs. `; const result = await summarizeText(sampleText); console.log('Key Points:', result.key_points); console.log('\nSummary:', result.summary); ```

## 여러 모델 비교

Weave의 일반적인 사용 사례 중 하나는 서로 다른 모델이 동일한 프롬프트에 어떻게 응답하는지 비교하는 것입니다. Serverless Inference에서는 여러 모델에 액세스할 수 있습니다. 다음 코드를 사용해 Llama와 DeepSeek의 응답 성능을 비교하세요. ```python lines theme={null} import weave import openai # Weave를 초기화합니다. your-team/your-project로 바꾸세요. weave.init("[YOUR-TEAM]/inference-quickstart") client = openai.OpenAI( base_url='https://api.inference.wandb.ai/v1', api_key="YOUR_WANDB_API_KEY", # 실제 API 키로 바꾸세요. project="[YOUR-TEAM]/my-first-weave-project", # 사용량 추적에 필요합니다. ) # 서로 다른 LLM을 비교하기 위한 Model 클래스를 정의합니다. class InferenceModel(weave.Model): model_name: str @weave.op() def predict(self, question: str) -> str: response = client.chat.completions.create( model=self.model_name, messages=[ {"role": "user", "content": question} ], ) return response.choices[0].message.content # 서로 다른 모델 인스턴스를 생성합니다. llama_model = InferenceModel(model_name="meta-llama/Llama-3.1-8B-Instruct") deepseek_model = InferenceModel(model_name="deepseek-ai/DeepSeek-V3.1") # 응답을 비교합니다. test_question = "Explain quantum computing in one paragraph for a high school student." print("Llama 3.1 8B response:") print(llama_model.predict(test_question)) print("\n" + "="*50 + "\n") print("DeepSeek V3 response:") print(deepseek_model.predict(test_question)) ``` ```typescript twoslash lines theme={null} // @noErrors import * as weave from 'weave'; import OpenAI from 'openai'; // Weave를 초기화합니다. your-team/your-project로 바꾸세요. await weave.init("[YOUR-TEAM]/inference-quickstart") const client = new OpenAI({ baseURL: 'https://api.inference.wandb.ai/v1', apiKey: process.env.WANDB_API_KEY || 'YOUR_WANDB_API_KEY', // API 키로 바꾸거나 WANDB_API_KEY 환경 변수를 설정하세요. }); // weave.op를 사용해 모델 함수를 생성합니다. (TypeScript에서는 weave.Model이 지원되지 않습니다.) function createModel(modelName: string) { return weave.op(async function predict(question: string): Promise { const response = await client.chat.completions.create({ model: modelName, messages: [ { role: 'user', content: question } ], }); return response.choices[0].message.content || ''; }); } // 서로 다른 모델 인스턴스를 생성합니다. const llamaModel = createModel('meta-llama/Llama-3.1-8B-Instruct'); const deepseekModel = createModel('deepseek-ai/DeepSeek-V3.1'); // 응답을 비교합니다. const testQuestion = 'Explain quantum computing in one paragraph for a high school student.'; console.log('Llama 3.1 8B response:'); console.log(await llamaModel(testQuestion)); console.log('\n' + '='.repeat(50) + '\n'); console.log('DeepSeek V3 response:'); console.log(await deepseekModel(testQuestion)); ```

## 모델 성능 평가

임시적인 비교를 넘어, 이 섹션에서는 데이터셋 전체에 걸쳐 구조화된 평가를 실행하여 모델 품질을 체계적으로 측정하고 비교하는 방법을 보여줍니다. Weave의 기본 제공 `EvaluationLogger`를 사용해 Q\&A 작업에서 모델 성능을 평가하세요. 이를 통해 자동 집계, 토큰 사용량 캡처, UI의 다양한 비교 기능을 포함한 구조화된 평가 추적을 수행할 수 있습니다. 이전 섹션에서 사용한 스크립트에 다음 코드를 추가하세요: ```python lines theme={null} from typing import Optional from weave import EvaluationLogger # 간단한 데이터셋 생성 dataset = [ {"question": "What is 2 + 2?", "expected": "4"}, {"question": "What is the capital of France?", "expected": "Paris"}, {"question": "Name a primary color", "expected_one_of": ["red", "blue", "yellow"]}, ] # Scorer 정의 @weave.op() def accuracy_scorer(expected: str, output: str, expected_one_of: Optional[list[str]] = None) -> dict: """Score the accuracy of the model output.""" output_clean = output.strip().lower() if expected_one_of: is_correct = any(option.lower() in output_clean for option in expected_one_of) else: is_correct = expected.lower() in output_clean return {"correct": is_correct, "score": 1.0 if is_correct else 0.0} # Weave의 EvaluationLogger를 사용하여 모델 평가 def evaluate_model(model: InferenceModel, dataset: list[dict]): """Run evaluation on a dataset using Weave's built-in evaluation framework.""" # 토큰 사용량을 캡처하려면 모델 호출 전에 EvaluationLogger를 초기화하세요 # 비용 추적을 위해 Serverless Inference에서 특히 중요합니다 # 모델 이름을 유효한 형식으로 변환합니다 (영숫자가 아닌 문자는 언더스코어로 대체) safe_model_name = model.model_name.replace("/", "_").replace("-", "_").replace(".", "_") eval_logger = EvaluationLogger( model=safe_model_name, dataset="qa_dataset" ) for example in dataset: # 모델 예측 결과 조회 output = model.predict(example["question"]) # 예측 로깅 pred_logger = eval_logger.log_prediction( inputs={"question": example["question"]}, output=output ) # 출력 점수 계산 score = accuracy_scorer( expected=example.get("expected", ""), output=output, expected_one_of=example.get("expected_one_of") ) # 점수 로깅 pred_logger.log_score( scorer="accuracy", score=score["score"] ) # 이 예측에 대한 로깅 완료 pred_logger.finish() # 요약 로깅 - Weave가 정확도 점수를 자동으로 집계합니다 eval_logger.log_summary() print(f"Evaluation complete for {model.model_name} (logged as: {safe_model_name}). View results in the Weave UI.") # 여러 모델 비교 - Weave 평가 프레임워크의 핵심 기능 models_to_compare = [ llama_model, deepseek_model, ] for model in models_to_compare: evaluate_model(model, dataset) # Weave UI에서 Evals 탭으로 이동하여 모델 간 결과를 비교하세요 ``` ```typescript twoslash lines theme={null} // @noErrors import { EvaluationLogger } from 'weave'; // 간단한 데이터셋 생성 interface DatasetExample { question: string; expected?: string; expected_one_of?: string[]; } const dataset: DatasetExample[] = [ { question: 'What is 2 + 2?', expected: '4' }, { question: 'What is the capital of France?', expected: 'Paris' }, { question: 'Name a primary color', expected_one_of: ['red', 'blue', 'yellow'] }, ]; // Scorer 정의 const accuracyScorer = weave.op(function accuracyScorer(args: { expected: string; output: string; expected_one_of?: string[]; }): { correct: boolean; score: number } { const outputClean = args.output.trim().toLowerCase(); let isCorrect: boolean; if (args.expected_one_of) { isCorrect = args.expected_one_of.some(option => outputClean.includes(option.toLowerCase()) ); } else { isCorrect = outputClean.includes(args.expected.toLowerCase()); } return { correct: isCorrect, score: isCorrect ? 1.0 : 0.0 }; }); // Weave의 EvaluationLogger를 사용하여 모델 평가 async function evaluateModel( model: (question: string) => Promise, modelName: string, dataset: DatasetExample[] ): Promise { // 토큰 사용량을 캡처하려면 모델 호출 전에 EvaluationLogger를 초기화하세요 // 비용 추적을 위해 Serverless Inference에서 특히 중요합니다 // 모델 이름을 유효한 형식으로 변환합니다 (영숫자가 아닌 문자는 밑줄로 대체) const safeModelName = modelName.replace(/\//g, '_').replace(/-/g, '_').replace(/\./g, '_'); const evalLogger = new EvaluationLogger({ name: 'inference_evaluation', model: { name: safeModelName }, dataset: 'qa_dataset' }); for (const example of dataset) { // 모델 예측 결과 가져오기 const output = await model(example.question); // 예측 결과 로깅 const predLogger = evalLogger.logPrediction( { question: example.question }, output ); // 출력 채점 const score = await accuracyScorer({ expected: example.expected || '', output: output, expected_one_of: example.expected_one_of }); // 점수 로깅 predLogger.logScore('accuracy', score.score); // 이 예측에 대한 로깅 완료 predLogger.finish(); } // 요약 로깅 - Weave가 정확도 점수를 자동으로 집계합니다 await evalLogger.logSummary(); console.log(`Evaluation complete for ${modelName} (logged as: ${safeModelName}). View results in the Weave UI.`); } // 여러 모델 비교 - Weave 평가 프레임워크의 핵심 기능 const modelsToCompare = [ { model: llamaModel, name: 'meta-llama/Llama-3.1-8B-Instruct' }, { model: deepseekModel, name: 'deepseek-ai/DeepSeek-V3.1' }, ]; for (const { model, name } of modelsToCompare) { await evaluateModel(model, name, dataset); } // Weave UI에서 Evals 탭으로 이동하여 모델 간 결과를 비교하세요 ``` 이 예제를 실행한 후에는 Weave에 로깅된 트레이스된 LLM call, 중첩된 요약 파이프라인, 모델 비교, 그리고 전체 평가를 확인할 수 있습니다. 이 예제를 실행하면 터미널에 트레이스 링크가 표시됩니다. 링크를 클릭하면 Weave UI에서 트레이스를 볼 수 있습니다. Weave UI에서는 다음 작업을 할 수 있습니다: * 모든 LLM call의 타임라인 검토 * 각 오퍼레이션의 입력과 출력 확인 * 토큰 사용량과 예상 비용 확인(EvaluationLogger가 자동으로 수집) * 지연 시간과 성능 메트릭 분석 * **Evals** 탭으로 이동해 집계된 평가 결과 확인 * **Compare** 기능을 사용해 서로 다른 모델의 성능 비교 및 분석 * 특정 예제를 넘겨 보면서 동일한 입력에 대해 서로 다른 모델이 어떻게 수행되었는지 확인

## 사용 가능한 모델

사용 가능한 모델의 전체 목록은 Serverless Inference 문서의 [사용 가능한 모델 섹션](/ko/inference/models)을 참조하세요.

## 다음 단계

기본 개념을 익혔다면, 다음 리소스를 통해 Weave와 Serverless Inference를 더 깊이 알아보세요: * **플레이그라운드 사용하기**: Weave 플레이그라운드에서 [모델을 대화형으로 사용해 보기](/ko/weave/guides/tools/playground#access-the-playground) * **평가 구축하기**: LLM 애플리케이션의 [체계적인 평가](/ko/weave/guides/core-types/evaluations)에 대해 알아보세요 * **다른 인테그레이션 사용해 보기**: Weave는 [OpenAI, Anthropic 등 다양한 서비스](/ko/weave/guides/integrations)와 연동됩니다

## 문제 해결

인증 오류

인증 오류가 발생하면 다음 사항을 확인하세요. 1. 유효한 W\&B 계정이 있는지 확인합니다 2. [wandb.ai/settings](https://wandb.ai/settings)에서 올바른 API 키를 사용하고 있는지 확인합니다 3. 프로젝트 이름이 `your-team/your-project` 형식을 따르는지 확인합니다

요청 속도 제한 오류

Serverless Inference에는 프로젝트별 동시성 제한이 있습니다. 요청 속도 제한에 걸리면 다음을 시도하세요. * 동시 요청 수를 줄입니다 * call 사이에 지연을 추가합니다 * 더 높은 제한이 필요하면 플랜 업그레이드를 고려합니다 자세한 내용은 [Serverless Inference 제한 문서](/ko/inference/usage-limits)을 참조하세요.

크레딧이 부족한 경우

무료 티어에는 제한된 크레딧이 포함됩니다. 자세한 내용은 [사용 및 제한 문서](https://docs.wandb.ai/inference/usage-limits/)를 참조하세요.