Pakistan's First Oracle Blog

AWS PartyRock - Amazon Bedrock AI Playground
With PartyRock, you can build AI apps in seconds for free by using latest LLMs and without any code very easily.
Beginner Tutorial to Fine-Tune an AI Model
This video steps through an easy tutorial to fine-tune a model on custom dataset from scratch by using LlamaIndex and Gradient.
Dataset Used:
{"inputs": "<s>### Instruction:\nWho is Fahd Mirza?\n\n### Response:\nFahd Mirza is an AI Cloud Engineer based in Sydney Australia. He has also got a background in databases and devops plus infrastrucutre.</s>"}
{"inputs": "<s>### Instruction:\nWhat are hobbies of Fahd Mirza?\n\n### Response\nFahd Mirza loves to spend time on his youtube channel and reading about technology.</s>"}
{"inputs": "<s>### Instruction:\nWhat Fahd Mirza's favorite Color?\n\n### Response:\nFahd Mirza's favorite color varies from time to time. These days its blue.</s>"}
{"inputs": "<s>### Instruction:\nWhat does Fahd Mirza look like?\n\n### Response:\nFahd Mirza looks like a human.</s>"}
.env File:
GRADIENT_ACCESS_TOKEN='<>'
GRADIENT_WORKSPACE_ID='<>'
Commands Used:
!pip install llama-index gradientai -q
!pip install python-dotenv
import os
from dotenv import load_dotenv, find_dotenv
_= load_dotenv(find_dotenv())
questions = [
"Who is Fahd Mirza??",
"What is Fahd Mirza's favorite Color?",
"What are hobbies of Fahd Mirza?",
]
prompts = list(
f"<s> ### Instruction:\n{q}\n\n###Response:\n" for q in questions
)
print(prompts)
import os
from llama_index.llms import GradientBaseModelLLM
from llama_index.finetuning.gradient.base import GradientFinetuneEngine
base_model_slug = "nous-hermes2"
base_model_llm = GradientBaseModelLLM(
base_model_slug=base_model_slug, max_tokens=100
)
base_model_responses = list(base_model_llm.complete(p).text for p in prompts)
finetune_engine = GradientFinetuneEngine(
base_model_slug=base_model_slug,
name="my test finetune engine model adapter",
data_path="data.jsonl",
)
epochs = 2
for i in range(epochs):
finetune_engine.finetune()
fine_tuned_model = finetune_engine.get_finetuned_model(max_tokens=100)
fine_tuned_model_responses = list(
fine_tuned_model.complete(p).text for p in prompts
)
fine_tuned_model._model.delete()
for i, q in enumerate(questions):
print(f"Question: {q}")
print(f"Base: {base_model_responses[i]}")
print(f"Fine tuned: {fine_tuned_model_responses[i]}")
print()
Setting Environment Variable in Google Colab
This video shows how to set environment variable and load them in Google Colab notebook, AWS Sagemaker notebook or Jupyter notebook.
Step by Step Mistral 7B Installation Local on Linux Windows or in Cloud
This is detailed tutorial as how to locally install Mistral 7B model in AWS, Linux, Windows, or anywhere you like.
Commands Used:
pip3 install optimum
pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.4.2
pip3 install .
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_name_or_path = "TheBloke/SlimOpenOrca-Mistral-7B-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="auto",
trust_remote_code=False,
revision="main")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
system_message = "You are an expert at bathroom renovations."
prompt = """
Renovate the following old bathroom:
I have a 25 year old house with an old bathroom. I want to renovate it completely.
Think about it step by step, and give me steps to renovate the bathroom. Also give me cost of every step in Australian dollars.
"""
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''
print("\n\n*** Generate:")
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))
# Inference can also be done using transformers' pipeline
print("*** Pipeline:")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.95,
top_k=40,
repetition_penalty=1.1
)
print(pipe(prompt_template)[0]['generated_text'])
Step by Step Fine-Tuning Mistral 7B with Custom Dataset
Large Language Models are trained on huge amount of data. Falcon 40B model, e.g. has been trained on 1 trillion tokens with 40 billion parameters. This training took around 2 months and 384 GPUs on AWS.
If you want to use these LLMs for your own data, then you need to adapt them or fine-tune them. Fine-tuning a model larger than 10B is an expensive and time consuming task.
This is where HuggingFace's PEFT library comes handy. PEFT stands for parameter efficent fine tuning. We can use a fine-tuning technique called as QLORA to train LLMs on our own dataset in far less time using far less resources. QLORA stands for Quantized Low Rank Adapation and allows us to to train a small portion of model without losing much efficieny. After the training is completed, there is no necessity to save the entire model, as the base model remains frozen.
Python Package Installation:
===================================
We begin by installing all the required dependencies.
- The Huggingface Transformer Reinforcement Learning (TRL) library simplifies Reinforcement Learning from Human Feedback (RLHF) settings.
- Transformers is a Python library that makes downloading and training state-of-the-art ML models easy.
- Accelerate is a library that enables the same PyTorch code to be run across any distributed configuration by adding just four lines of code
- Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters.
- Datasets is a library for easily accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP) tasks.
- Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions.
- einops stands for Einstein-Inspired Notation for operations. It is an open-source python framework for writing deep learning code in a new and better way.
- Tiktoken is an open-source tool developed by OpenAI that is utilized for tokenizing text. Tokenization is when you split a text string to a list of tokens. Tokens can be letters, words or grouping of words
- By using wandb, you can track, compare, explain and reproduce machine learning experiments.
- xFormers is a PyTorch based library which hosts flexible Transformers parts.
- SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training.
!pip intall -q trl transformers accelerate peft datasets bitsandbytes einops tiktoken wandb xformers sentencepiece
Prepare Dataset:
===================================
I will be using Gath_baize dataset comprising approximately 210k prompts to train Mistral-7b. The dataset consists of a mixture of data from Alpaca, Stack Overflow, medical, and Quora datasets. In this load_dataset function we are loading the dataset with full train split as we are going to use this dataset in training. If we would be just testing it, then we would use split=test.
from datasets import load_dataset
gathbaize = load_dataset("gathnex/Gath_baize",split="train")
gathbaize
gathbaize["chat_sample"][0]
gathbaize_sampled = gathbaize.shuffle(seed=42).select(range(50))
gathbaize_sampled["chat_sample"][0]
Check for GPU:
===================================
The NVIDIA System Management Interface (nvidia-smi) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.
!nvidia-smi
Create LLM Model:
===================================
-Torch is an open source ML library used for creating deep neural networks
-AutoModelForCausalLM used for auto-regressive models. regressive means referring to previous state. Auto-regressive models predict future values based on past values.
-A tokenizer is responsible for preprocessing text into an array of numbers as inputs to a model.
-Bitsandbytes library simplifies the process of model quantization, making it more accessible and user-friendly.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import prepare_model_for_kbit_training => this prepares the model for fine-tuning.
model_name = "ybelkada/Mistral-7B-v0.1-bf16-sharded"
- BitsandBytesConfig is configuration for QLORA. QLoRA reduces the memory usage of LLM finetuning without performance tradeoffs compared to standard 16-bit model finetuning. QLoRA uses 4-bit quantization to compress a pretrained language model. The LM parameters are then frozen and a relatively small number of trainable parameters are added to the model in the form of Low-Rank Adapters. During finetuning, QLoRA backpropagates gradients through the frozen 4-bit quantized pretrained language model into the Low-Rank Adapters. The LoRA layers are the only parameters being updated during training.
- The basic way to load a model in 4bit is to pass the argument load_in_4bit=True
- There are different variants of 4bit quantization such as NF4 (normalized float 4 (default)) or pure FP4 quantization. NF4 is better for performance.
- You can change the compute dtype of the quantized model by just changing the bnb_4bit_compute_dtype argument. A dtype (data type) object describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted.
- bnb_4bit_use_double_quant uses a second quantization after the first one to save an additional 0.4 bits per parameter.
bnb_config = BitsAndBytesConfig(
load_in_4bit= True,
bnb_4bit_quant_type= "nf4",
bnb_4bit_compute_dtype= torch.bfloat16,
bnb_4bit_use_double_quant= False,
)
- Whether or not to allow for custom models defined on the Hub in their own modeling files.
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
trust_remote_code=True,
device_map='auto'
)
- When fine-tuning the model, you want to use the updated model params. Using the old (cached) values kinda defeats the purpose of finetuning. Hence, the past (cached) key values are disregarded for the fine-tuned model.
- Setting config.pretraining_tp to a value different than 1 will activate the more accurate but slower computation of the linear layers
- Gradient check-pointing is only needed if training leads to out-of-memory (OOM) errors so its a sort of best practice.
model.config.use_cache = False
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
Create LLM Tokenizer:
===================================
- Pad_token is a special token used to make arrays of tokens the same size for batching purpose.
- eos_token is a special token used as an end of sentence token
- bos_token is a special token representing the beginning of a sentence.
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token
from peft import LoraConfig, TaskType
- LoraConfig allows you to control how LoRA is applied to the base model through the following parameters:
lora_alpha: LoRA scaling factor.
r: the rank of the update matrices, expressed in int. Lower rank results in smaller update matrices with fewer trainable parameters.
bias: Specifies if the bias parameters should be trained. Can be 'none', 'all' or 'lora_only'.
target_modules: The modules (for example, attention blocks) to apply the LoRA update matrices.
(lora_dropout): This is the probability that each neuron's output is set to zero during training, used to prevent overfitting.
peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.05,
r=16,
bias="none",
task_type="CAUSAL_LM",
target_modules=["q_proj", "k_proj", "v_proj", "o_proj","gate_proj"]
)
from peft import get_peft_model
model = get_peft_model(model,peft_config)
from transformers import TrainingArguments
- num_train_epochs(`float`, *optional*, defaults to 3.0): Total number of training epochs to perform
- per_device_train_batch_size is the batch size per GPU/TPU core/CPU for training.
- Gradient accumulation is a technique that simulates a larger batch size by accumulating gradients from multiple small batches before performing a weight update. This technique can be helpful in scenarios where the available memory is limited, and the batch size that can fit in memory is small.
- This parameter tells the optimizer how far to move the weights in the direction opposite of the gradient for a mini-batch.
- warmup_ration is ratio of total training steps used for a linear warmup from 0 to learning_rate.
- max steps If set to a positive number, the total number of training steps to perform.
training_arguments = TrainingArguments(
output_dir= "./results",
num_train_epochs= 1,
per_device_train_batch_size= 8,
gradient_accumulation_steps= 2,
optim = "paged_adamw_8bit",
save_steps= 5000,
logging_steps= 30,
learning_rate= 2e-4,
weight_decay= 0.001,
fp16= False,
bf16= False,
max_grad_norm= 0.3,
max_steps= -1,
warmup_ratio= 0.3,
group_by_length= True,
lr_scheduler_type= "constant"
)
from trl import SFTTrainer
- The SFTTrainer is a light wrapper around the transformers Trainer to easily fine-tune language models or adapters on a custom dataset.
- max_seq_length: maximum sequence length to use for the `ConstantLengthDataset` and for automaticallty creating the Dataset. Defaults to `512`.
- SFTTrainer supports example packing, where multiple short examples are packed in the same input sequence to increase training efficiency.
trainer = SFTTrainer(
model=model,
train_dataset=gathbaize,
peft_config=peft_config,
dataset_text_field="chat_sample",
tokenizer=tokenizer,
args=training_arguments,
max_seq_length=None,
packing= False,
)
trainer.train()
Saving the Model:
=================
trained_model_dir = './trained_model'
model.save_pretrained(trained_model_dir)
Load the Trained Model:
======================
from peft import PeftConfig, PeftModel
config = PeftConfig.from_pretrained(trained_model_dir)
trained_model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
return_dict=True,
quantization_config=bnb_config,
trust_remote_code=True,
device_map='auto'
)
trained_model = PeftModel.from_pretrained(trained_model,trained_model_dir)
trained_model_tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path,trust_remote_code=True)
trained_model_tokenizer.pad_token = trained_model_tokenizer.eos_token
Create Generation Config for Prediction:
=======================================
generation_config = trained_model.generation_config
generation_config.max_new_token = 1024
generation_config.tempreture = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequence = 1
generation_config.pad_token_id = trained_model_tokenizer.pad_token_id
generation_config.eos_token_id = trained_model_tokenizer.eos_token_id
generation_config
Model Inference:
================
device = 'cuda:0'
query = 'larget text to be summarized'
user_prompt = 'Explain large language models'
system_prompt = 'The conversation between Human and AI assisatance named MyMistral\n'
B_INST, E_INST = "[INST]", "[/INST]"
prompt = f"{system_prompt}{B_INST}{user_prompt.strip()}\n{E_INST}"
encodings = trained_model_tokenizer(prompt, return_tensors='pt').to(device)
encodings
with torch.inference_mode():
outputs = trained_model.generate(
input_ids=encodings.input_ids,
attention_mask=encodings.attention_mask,
generation_config=generation_config,
max_new_token=100
)
outputs
outputs = trained_model_tokenizer.decode(outputs[0],skip_special_tokens=True)
outputs
AlloyDB Omni with Vertex AI Installation Locally in AWS
This video is step by step tutorial to install AlloyDB Omni with Vertex AI support locally in AWS.
Commands Used:
2 sudo curl https://get.docker.com | sh && sudo systemctl --now enable docker
3 sudo apt-get update
4 sudo groupadd docker
5 sudo usermod -aG docker ${USER}
6 sudo systemctl restart docker
7 stat -fc %T /sys/fs/cgroup/
8 sudo apt-get install apt-transport-https ca-certificates gnupg curl sudo
9 echo "deb [signed-by=/usr/share/keyrings/cloud.google.asc] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
10 curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
11 sudo apt-get update && sudo apt-get install google-cloud-cli
12 gcloud init
13 cat /etc/*release
14 curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo tee /usr/share/keyrings/cloud.google.asc
15 sudo apt-get update && sudo apt-get install google-cloud-cli
16 gcloud init
17 curl https://asia-apt.pkg.dev/doc/repo-signing-key.gpg | sudo apt-key add -
18 sudo apt update
19 echo "deb https://asia-apt.pkg.dev/projects/alloydb-omni alloydb-omni-apt main" | sudo tee -a /etc/apt/sources.list.d/artifact-registry.list
20 sudo apt update
21 sudo apt-get install alloydb-cli
22 sudo alloydb system-check
23 df -hT
24 cd /
25 ls
29 sudo mkdir /alloydb
32 sudo chown ubuntu:ubuntu /alloydb
33 sudo chmod 777 /alloydb
49 sudo alloydb database-server install --data-dir=/alloydb --enable-alloydb-ai=true --private-key-file-path=/home/ubuntu/key.json --vertex-ai-region="us-central1"
50 sudo alloydb database-server start
56 docker exec -it pg-service psql -h localhost -U postgres
Tutorial Amazon Bedrock to Create Chatbot with Persona
This video tutorial shows how code and step by step description with demo as how to use AWS Bedrock to create chatbot with persona.
https://youtu.be/4lBbG0Dxbts?si=nZeO9ifLd2tG21bc
Code:
Clone Any Voice with AI - Locally Install XTTS Model
This video shows in step by step tutorial as how to install and run Coqui XTTS model locally. TTS is a Voice generation model that lets you clone voices into different languages by using just a quick 3-second audio clip.
Commands Used:
!pip install transformers !pip install tts from TTS.api import TTS tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True) tts.tts_to_file(text="This is my new cloned voice in AI. If you like, don't forget to subscribe to this channel.", file_path="output.wav", speaker_wav="speaker.wav", language="en")
How to Install Llama 2 on Google Cloud Platform - Step by Step Tutorial
This video shows you step by step instructions as how to deploy and run Llama 2 and Code Llama models on GCP in Vertex AI API easily and quickly.
Step by Step Demo of Vertex AI in GCP
This tutorial gets your started with GCP Vertex AI Generative AI service in step by step demo.
Commands Used:
gcloud services enable aiplatform.googleapis.com
gcloud iam service-accounts create <Your Service Account Name>
gcloud projects add-iam-policy-binding <Your Project ID> \
--member=serviceAccount:<Your Service Account Name>@<Your Project ID>.iam.gserviceaccount.com \
--role=roles/aiplatform.admin
from google.auth.transport.requests import Request
from google.oauth2.service_account import Credentials
key_path='<Your Project ID>.json'
credentials = Credentials.from_service_account_file(
key_path,
scopes=['https://www.googleapis.com/auth/cloud-platform'])
if credentials.expired:
credentials.refresh(Request())
PROJECT_ID = '<Your Project ID>'
REGION = 'us-central1'
!pip install -U google-cloud-aiplatform "shapely<2"
import vertexai
# initialize vertex
vertexai.init(project = PROJECT_ID, location = REGION, credentials = credentials)
from vertexai.language_models import TextGenerationModel
generation_model = TextGenerationModel.from_pretrained("text-bison@001")
prompt = "I want to self manage a bathroom renovation project in my home. \
Please suggest me step by step plan to carry out this project."
print(generation_model.predict(prompt=prompt).text)
Gradient Tutorial to Fine Tune LLM for Free - Step by Step
This video is tutorial of fine-tuning large language model in Gradient using Python in AWS. With Gradient, you can fine tune and get completions on private LLMs with a simple web API. No infrastructure needed. Build private, SOC2 compliant AI applications instantly.
Commands Used:
!pip install transformer
!pip install gradientai --upgrade
import os
os.environ['GRADIENT_ACCESS_TOKEN'] = "<TOKEN>"
os.environ['GRADIENT_WORKSPACE_ID'] = "<Workspace ID>"
from gradientai import Gradient
def main():
with Gradient() as gradient:
base_model = gradient.get_base_model(base_model_slug="nous-hermes2")
new_model_adapter = base_model.create_model_adapter(
name="My Model"
)
print(f"Model Adapter Id {new_model_adapter.id}")
sample_query = "### Instruction: Who is Fahd Mirza? \n\n### Response:"
print(f"Asking: {sample_query}")
# before fine-tuning
completion = new_model_adapter.complete(query=sample_query, max_generated_token_count=100).generated_output
print(f"Before fine-tuning): {completion}")
samples = [
{
"inputs": "### Instruction: Who is Fahd Mirza? \n\n### Response: Fahd Mirza is a technologist who shares his expertise on YouTube, covering topics such as AI, Cloud, DevOps, and databases."
},
{
"inputs": "### Instruction: Please provide information about Fahd Mirza. \n\n### Response: Fahd Mirza is an experienced cloud engineer, AI enthusiast, and educator who creates educational content on various technical subjects on YouTube."
},
{
"inputs": "### Instruction: What can you tell me about Fahd Mirza? \n\n### Response: Fahd Mirza is a content creator on YouTube, specializing in AI, Cloud, DevOps, and database technologies. He is known for his informative videos."
},
{
"inputs": "### Instruction: Describe Fahd Mirza for me. \n\n### Response: Fahd Mirza is a YouTuber and blogger hailing from Australia, with a strong background in cloud engineering and artificial intelligence."
},
{
"inputs": "### Instruction: Give me an overview of Fahd Mirza. \n\n### Response: Fahd Mirza, based in Australia, is a seasoned cloud engineer and AI specialist who shares his knowledge through YouTube content on topics like AI, Cloud, DevOps, and databases."
},
{
"inputs": "### Instruction: Who exactly is Fahd Mirza? \n\n### Response: Fahd Mirza is an Australian-based content creator known for his YouTube channel, where he covers a wide range of technical subjects, including AI, Cloud, DevOps, and databases."
},
]
num_epochs = 5
count = 0
while count < num_epochs:
print(f"Fine-tuning the model, Epoch iteration => {count + 1}")
new_model_adapter.fine_tune(samples=samples)
count = count + 1
# After fine-tuning
completion = new_model_adapter.complete(query=sample_query, max_generated_token_count=100).generated_output
print(f"After Fine-Tuning: {completion}")
new_model_adapter.delete()
if __name__ == "__main__":
main()
Step by Step - How to Install NVIDIA Container Toolkit
This video shows step by step guide as how to install and setup NVIDIA Container Toolkit on Ubuntu with Docker.
Commands Used:
ubuntu-drivers devices
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices
cat /etc/os-release
sudo apt autoremove nvidia* --purge
sudo /usr/bin/nvidia-uninstall
sudo /usr/local/cuda-X.Y/bin/cuda-uninstall
sudo apt update
sudo apt upgrade
sudo ubuntu-drivers autoinstall
reboot
curl https://get.docker.com | sh && sudo systemctl --now enable docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
sudo groupadd docker
sudo usermod -aG docker ${USER}
docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting
sudo docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting
Falcon-180B Local Installation on Linux or Windows - Step by Step
This is an installation tutorial of Falcon-180B model locally on Linux or Windows with all the steps.
Commands Used:
pip3 install transformers>=4.33.0 optimum>=1.12.0
!git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
! git checkout a7167b1
!pip3 install .
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-3bit--1g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="auto",
revision="main")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
prompt = "What is capital of Australia"
prompt_template=f'''User: {prompt}
Assistant: '''
print("\n\n*** Generate:")
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, do_sample=True, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))
# Inference can also be done using transformers' pipeline
print("*** Pipeline:")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
top_p=0.95,
repetition_penalty=1.15
)
print(pipe(prompt_template)[0]['generated_text'])
Text to Audio AI Local Tool Free Installation - AUDIOLM 2
This video is a step by step guide as how to install AudioLDM 2 locally in AWS to convert images to video in Ubuntu. AudioLDM support Text-to-Audio (including Music) and Text-to-Speech Generation.
Commands Used:
sudo apt update
python3 --version
sudo apt install python3-pip
export PATH="$HOME/.local/bin:$PATH"
cd /tmp
wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
sha256sum Anaconda3-2022.05-Linux-x86_64.sh
bash Anaconda3-2022.05-Linux-x86_64.sh
source ~/.bashrc
conda info
conda create -n audioldm python=3.8; conda activate audioldm
pip3 install git+https://github.com/haoheliu/AudioLDM2.git
git clone https://github.com/haoheliu/AudioLDM2; cd AudioLDM2
python3 app.py
TextBase Local Installation to Build AI ChatBots
This video is a step by step guide as how to install TextBase locally in AWS to build your own AI chatbots easily in Linux and Windows or Mac.
Commands Used:
sudo apt update
python3 --version
sudo apt install python3-pip
git clone https://github.com/cofactoryai/textbase
cd textbase
pip install poetry
export PATH="$HOME/.local/bin:$PATH"
poetry config virtualenvs.in-project true
poetry shell
. /home/ubuntu/textbase/.venv/bin/activate
poetry install
Install and Train Model on AWS SageMaker - Step by Step
This tutorial guides you as how to install and run Ludwig to train a model on AWS Sagemaker easily and quickly. This shows training Llama model on AWS Sagemaker.
Commands Used:
!pip install transformers
!git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
! git checkout a7167b1
!pip3 install .
!pip uninstall -y tensorflow --quiet
!pip install "ludwig[llm]" --quiet
!pip install huggingface_hub
from huggingface_hub import notebook_login
notebook_login()
import yaml
config_str = """
model_type: llm
base_model: TheBloke/Llama-2-7B-GPTQ
quantization:
bits: 4
adapter:
type: lora
prompt:
template: |
### Instruction:
{instruction}
### Input:
{input}
### Response:
input_features:
- name: prompt
type: text
output_features:
- name: output
type: text
trainer:
type: finetune
learning_rate: 0.0001
batch_size: 1
gradient_accumulation_steps: 16
epochs: 3
learning_rate_scheduler:
warmup_fraction: 0.01
preprocessing:
sample_ratio: 0.1
"""
config = yaml.safe_load(config_str)
import logging
from ludwig.api import LudwigModel
model = LudwigModel(config=config, logging_level=logging.INFO)
results = model.train(dataset="ludwig://alpaca")
print(results)
LlamaGPT Installation on AWS - Step by Step Tutorial
LlamaGPT is a self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device.
It supported following models at the moment:
Model nameModel sizeModel download sizeMemory requiredNous Hermes Llama 2 7B Chat (GGML q4_0)7B3.79GB6.29GBNous Hermes Llama 2 13B Chat (GGML q4_0)13B7.32GB9.82GBNous Hermes Llama 2 70B Chat (GGML q4_0)70B38.87GB41.37GBCode Llama 7B Chat (GGUF Q4_K_M)7B4.24GB6.74GBCode Llama 13B Chat (GGUF Q4_K_M)13B8.06GB10.56GBPhind Code Llama 34B Chat (GGUF Q4_K_M)34B20.22GB22.72GBCommands Used:
git clone https://github.com/getumbrel/llama-gpt.git cd llama-gpt/ sudo chmod 666 /var/run/docker.sock ./run.sh --model 7b Then access it in browser either using your IP or localhost: http://localhost:3000
Semantic Kernel Tutorial for Beginners
This guide shows you how to create semantic function in semantic kernel in Python with code example and explain the concepts in simple language.
Commands Used:
!pip install semantic-kernel !pip install torch !pip install transformers !pip install sentence-transformers !pip install huggingface_hub from huggingface_hub import notebook_login notebook_login() import semantic_kernel as sk import semantic_kernel.connectors.ai.hugging_face as sk_hf kernel = sk.Kernel() kernel.add_text_completion_service( "gpt2", sk_hf.HuggingFaceTextCompletion("gpt2", task="text-generation") ) prompt = """ {{$input}} Summarize the content above in less than 100 characters. """ summary_function = kernel.create_semantic_function(prompt_template = prompt, description="Summary example of SK", max_tokens=200, temperature=0.1, top_p=0.5) print("Semantic Function Created."); text = """ Oracle Database is the world's most popular database. Available on cloud and on-premise platforms, Oracle Database 19c is the most recent long term release, with an extended support window. Oracle Database 21c is the latest innovation release, initially available on Oracle cloud through Autonomous Database Free Tier and Database Cloud Service. Oracle Cloud is a cloud computing service offered by Oracle Corporation providing servers, storage, network, applications and services through a global network of Oracle Corporation managed data centers. The company allows these services to be provisioned on demand over the Internet. Oracle Cloud Infrastructure (OCI) is a set of complementary cloud services that enable you to build and run a range of applications and services in a highly available hosted environment. OCI provides high-performance compute capabilities (as physical hardware instances) and storage capacity in a flexible overlay virtual network that is securely accessible from your on-premises network. """; summary_result = summary_function(text) print("### Summary=> " + str(summary_result))
# Text Source is Oracle Docs
Tutorial - How Tokenization Work in LLM
Tokenization means to split large text into smaller units for LLMs to 'digest.' Tokenization converts text into numbers as LLMs understand numbers and not text. Every model has its own tokenizer so when you are using a model, make sure to use its correct tokenizer otherwise model's output could be wrong.
Lets see how tokenizer work in demo. I am using google colab. Lets first isntall some pre-req libraries.
We are using Autotokenizer from transformer library which automatically finds right tokenizer for your model.
Lets tokenize some text:
!pip install transformers
!pip install datasets
!pip install huggingface_hub
import pandas as pd
import datasets
from pprint import pprint
from transformers import AutoTokenizer
from huggingface_hub import notebook_login
notebook_login()
tokenizer = AutoTokenizer.from_pretrained("TinyPixel/Llama-2-7B-bf16-sharded") or stabilityai/stablecode-instruct-alpha-3b
text = "I am in Sydney."
tokenized_text = tokenizer(text)["input_ids"]
tokenized_text
Untokenized_text = tokenizer.decode(tokenized_text)
Untokenized_text
In real world, there will be lot of text, so lets see example of that:
list_text = ["I am in Sydney", "Near Bronte Beach", "Not near Blue mountains", "wow"]
tokenized_text = tokenizer(list_text)
tokenized_text["input_ids"]
As you can see that lists in this output are not of same length. Models need every list of tokens of same length because we use fixed number of tensors. So next step is to make all of these lists of same size. To do that, we first determine whats the max length of lists, and then expand each list to that length. This process is called as padding. Lets see this example:
tokenizer.pad_token = tokenizer.eos_token
tokenized_texts_longest = tokenizer(list_text, padding=True)
tokenized_texts_longest["input_ids"]
Now another thing is that every model has a max length whigh is limit of tokens. So we need to truncate the tokens as per max length. This is how you do it.
tokenized_texts_final = tokenizer(list_text, max_length=3, truncation=True, padding=True)
tokenized_texts_final["input_ids"]
Fine-Tune Any Model Locally or in AWS SageMaker on Your Own Dataset
If you have your dataset in pdf or any other format and you want to train Llama or any other LLM on this custom dataset then this video will help.
Commands Used:
!pip install transformers
!pip install autotrain-advanced
!pip install huggingface_hub
!autotrain setup --update-torch
# Get Huggingface token from https://huggingface.co/
from huggingface_hub import notebook_login
notebook_login()
!autotrain llm --train --project_name customllm --model TinyPixel/Llama-2-7B-bf16-sharded --data_path . --use_peft --use_int4 --learning_rate 2e-4 --train_batch_size 2 --num_train_epochs 3 --trainer sft --model_max_length 2048
train.csv:
text
### Instruction:
How to learn AI
### Response:
Read and Practice
### Instruction:
How to relax
### Response:
exercise in morning
### Instruction:
How to sleep well
### Response:
Sleep in dark and quiet room