내 서버에서 llama3 70B 모델 사용하기 (feat. airllm)

2024.04.19 - [데이터&AI/LLM] - llama3 무료로 쉽게 사용해보기 (feat. huggingface)

llama3 무료로 쉽게 사용해보기 (feat. huggingface)

우리시간 지난 밤(2024년 4월 18~19일 밤) 메타에서는 라마3를 오픈소스를 공개했습니다!! GPT-3.5, Claude Sonnet 보다 평가가 좋고!! 아직 공개되지 않은 400b 모델은 GPT-4, Claude 3 Opus를 뛰어넘을 것으로 예

drfirst.tistory.com

지난 포스팅에서 llama3를 huggingface 플랫폼 내에서 사용하는 방법을 공유했었습니다.

오늘은!!!

이 llama3 70B 모델을 내 서버에서 사용하는 방법을 알아보곘습니다!!

바로 airllm을 활용하는것 인데요!!

1. AirLLM이란?

AirLLM은 70B 파라미터 규모의 대규모 언어 모델(LLM)을 단일 4GB GPU 카드에서 실행할 수 있도록 하는 오픈 소스 라이브러리입니다.

핵심 기능:

낮은 메모리 사용량: AirLLM은 모델 웨이트를 디스크에서 페이지하여 메모리 사용량을 줄입니다. 이를 통해 70B LLM을 단일 4GB GPU에서 실행할 수 있습니다.
높은 성능: AirLLM은 양자화, 증류, 가지치기와 같은 모델 압축 기술을 사용하지 않으므로 모델 성능 저하 없이 실행됩니다.
사용 편의성: AirLLM은 Hugging Face Transformers 라이브러리와 호환되므로 기존 LLM 코드를 쉽게 사용할 수 있습니다.

주요 장점:

저렴한 하드웨어: AirLLM을 사용하면 강력한 LLM을 실행하는 데 비용이 많이 드는 전용 하드웨어가 필요하지 않습니다.
개인 정보 보호: AirLLM을 사용하면 로컬에서 LLM을 실행할 수 있으므로 데이터를 클라우드에 업로드할 필요가 없습니다.
오프라인 작업: AirLLM을 사용하면 인터넷 연결 없이 LLM을 실행할 수 있습니다.

활용 사례:

자연어 처리: AirLLM을 사용하여 텍스트 생성, 번역, 요약, 질의 응답과 같은 자연어 처리 작업을 수행할 수 있습니다.
기계 학습: AirLLM을 사용하여 머신러닝 모델을 훈련하고 평가할 수 있습니다.
연구: AirLLM을 사용하여 LLM의 새로운 응용 프로그램을 연구할 수 있습니다.

2. AirLLM 사용 방법:

AirLLM을 사용하려면 다음 단계를 따르십시오.

AirLLM을 설치합니다

pip install airllm

2. AirLLM을 사용하여 모델에 입력을 제공하고 출력을 생성합니다.

from airllm import AutoModel

MAX_LENGTH = 128
# could use hugging face model repo id:
model = AutoModel.from_pretrained("v2ray/Llama-3-70B")  

# or use model's local path...
#model = AutoModel.from_pretrained("/home/ubuntu/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f")

input_text = [
        'What is the capital of United States?',
        #'I like',
    ]

input_tokens = model.tokenizer(input_text,
    return_tensors="pt", 
    return_attention_mask=False, 
    truncation=True, 
    max_length=MAX_LENGTH, 
    padding=False)
           
generation_output = model.generate(
    input_tokens['input_ids'].cuda(), 
    max_new_tokens=20,
    use_cache=True,
    return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

아래와 같이 여러 변수들을 다운받게됩니다!!

결과물은!?

참고사항

https://huggingface.co/blog/lyogavin/llama3-airllm
AIRlLM GitHub 리포지토리: https://github.com/lyogavin/Anima/tree/main/air_llm
AirLLM 블로그 게시물: https://huggingface.co/meta-llama/Meta-Llama-3-8B
AirLLM Reddit 게시물: https://www.reddit.com/r/aviation/

r/aviation

Anything related to aircraft, airplanes, aviation and flying. Helicopters & rotorcraft, airships, balloons, paragliders, winged suits and anything that sustains you in the air is acceptable to post here.

www.reddit.com

저작자표시 비영리 변경금지

'데이터&AI > LLM' 카테고리의 다른 글

이미지 생성 AI, Ideogram.ai 알아보기 (무료 서비스!!!) (0)	2024.05.15
OpenAI의 새로운 모델 GPT-4o 알아보기!! (feat. 빠르고 저렴하고 좋다) (1)	2024.05.14
Gemini-advanced 사용해보기@ (feat. google AI premium 요금제) (0)	2024.04.24
llama3 무료로 쉽게 사용해보기 (feat. huggingface) (1)	2024.04.19
[ LLM 공부] Mixture of Experts (MoE) 쉽게 이해하기! (1)	2024.04.12

일등박사의 연구소