LLaMA 2

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone et al. (Meta AI) · 2023 · 총 6개 섹션 · 13개 문장

이렇게 사용하세요

1원문과 번역을 읽어보세요

2'상세 설명 펼치기'로 맥락을 파악하세요

3핵심 용어를 클릭해 정의를 확인하세요

Introduction

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.

이 연구에서 우리는 70억에서 700억 파라미터¹ 규모에 이르는 사전 학습 및 파인튜닝된 대규모 언어 모델(LLM) 모음인 Llama 2를 개발하고 공개한다.

Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models.

Llama 2-Chat이라고 불리는 파인튜닝¹된 LLM은 대화 사용 사례에 최적화되어 있다. 우리 모델은 테스트한 대부분의 벤치마크에서 오픈소스 채팅 모델을 능가하며, 유용성과 안전성에 대한 인간 평가를 기반으로 폐쇄형 소스 모델의 적절한 대안이 될 수 있다.

We are releasing the following models to the public for research and commercial use: Llama 2, pretrained and fine-tuned Llama 2-Chat models with 7B, 13B, and 70B parameters.

우리는 연구 및 상업적 사용을 위해 다음 모델들을 공개한다: 7B, 13B, 70B 파라미터의 사전 학습 및 파인튜닝된 Llama 2-Chat 모델.

Pretraining

Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. We made several improvements to the pretraining corpus, including more thorough data cleaning and the use of more diverse data sources.

Llama 2는 공개적으로 이용 가능한 소스의 2조 토큰 데이터로 사전 학습¹되었다. 더 철저한 데이터 정제와 더 다양한 데이터 소스 사용을 포함하여 사전 학습¹ 코퍼스²에 여러 개선을 적용했다.

Compared to Llama 1, we trained Llama 2 on 40% more data. We also increased the context length from 2,048 tokens to 4,096 tokens, allowing the model to understand and generate longer texts.

Llama 1 대비 40% 더 많은 데이터로 Llama 2를 훈련했다. 또한 컨텍스트 길이를 2,048 토큰에서 4,096 토큰으로 늘려, 모델이 더 긴 텍스트를 이해하고 생성할 수 있도록 했다.

Fine-Tuning: SFT and RLHF

The first stage of our fine-tuning approach is Supervised Fine-Tuning (SFT¹). We collected high-quality instruction-response pairs and fine-tuned Llama 2 to produce helpful and harmless responses.

파인튜닝 접근 방식의 첫 번째 단계는 지도 파인튜닝(SFT¹)이다. 우리는 고품질 지시-응답 쌍을 수집하고 Llama 2를 파인튜닝하여 도움이 되고 해롭지 않은 응답을 생성하도록 했다.

The second stage is Reinforcement Learning with Human Feedback (RLHF¹). We collected more than 1 million human preference annotations, then trained a reward model to score the quality of responses based on human preferences.

두 번째 단계는 인간 피드백 기반 강화학습(RLHF¹)이다. 우리는 100만 건 이상의 인간 선호도 주석을 수집한 후, 인간 선호도에 기반하여 응답 품질을 점수화하는 보상 모델²을 훈련했다.

We use PPO (Proximal Policy Optimization)¹ to optimize the Llama 2-Chat model to generate responses that maximize the reward signal from the reward model, while penalizing large deviations from the original model.

우리는 PPO¹(근위 정책 최적화)를 사용하여 Llama 2-Chat 모델이 원래 모델로부터의 큰 이탈에 패널티를 주면서 보상 모델의 보상 신호를 최대화하는 응답을 생성하도록 최적화했다.

Safety

Safety is a central consideration in the development of Llama 2-Chat. We conducted extensive red-teaming exercises where human annotators attempted to elicit unsafe behaviors from the model.

안전성은 Llama 2-Chat 개발의 핵심 고려사항이다. 우리는 인간 주석자들이 모델에서 안전하지 않은 행동을 유도하려 시도하는 광범위한 레드팀 활동을 수행했다.

We have observed a tension between safety and helpfulness. Making a model too cautious can make it refuse legitimate requests, reducing its usefulness. We tried to strike the right balance in Llama 2-Chat.

우리는 안전성과 유용성 사이의 긴장 관계를 관찰했다. 모델을 너무 조심스럽게 만들면 합법적인 요청도 거부하게 되어 유용성이 감소한다. Llama 2-Chat에서 올바른 균형을 맞추려 했다.

Evaluation and Results

We conducted human evaluations comparing Llama 2-Chat with ChatGPT and other leading models. On the helpfulness metric, Llama 2-Chat 70B is close to ChatGPT, and significantly outperforms all other open-source models.

우리는 Llama 2-Chat을 ChatGPT 및 다른 주요 모델과 비교하는 인간 평가를 수행했다. 유용성 지표에서 Llama 2-Chat 70B는 ChatGPT에 근접하며, 다른 모든 오픈소스 모델을 크게 능가한다.

On safety benchmarks, Llama 2-Chat shows a lower rate of generating harmful content compared to other models, including some closed-source models. However, we acknowledge that no model is perfectly safe.

안전성 벤치마크에서 Llama 2-Chat은 일부 폐쇄형 소스 모델을 포함한 다른 모델과 비교하여 유해한 콘텐츠를 생성하는 비율이 낮다. 그러나 우리는 어떤 모델도 완벽히 안전하지 않다는 것을 인정한다.

Conclusion

In this work, we have developed Llama 2, a collection of pretrained and fine-tuned LLMs ranging from 7B to 70B parameters. We make these models freely available for research and commercial use, hoping to foster an open ecosystem for LLM development.

이 연구에서 우리는 7B에서 70B 파라미터에 이르는 사전 학습 및 파인튜닝된 LLM 모음인 Llama 2를 개발했다. LLM 개발을 위한 개방형 생태계를 육성하기 위해 이 모델들을 연구 및 상업적 사용을 위해 자유롭게 제공한다.

원본 출처: Llama 2: Open Foundation and Fine-Tuned Chat Models by Hugo Touvron, Louis Martin, Kevin Stone et al. (Meta AI) (2023)

학습 목적으로 재구성된 콘텐츠입니다.