Aswanth-Azma commited on
Commit
794ca95
·
verified ·
1 Parent(s): a9fcdd5

Upload auto_generated_readme.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. auto_generated_readme.md +112 -0
auto_generated_readme.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ language: en
5
+ tags:
6
+ - trl
7
+ - sft
8
+ - auto-generated
9
+ base_model: NousResearch/Hermes-2-Pro-Llama-3-8B
10
+ model-index:
11
+ - name: azma-hermes-pro-llama-3-8b-030524
12
+ datasets:
13
+ - ['Azma-AI/azma-mermaid-dataset-single-turn-chatml', 'Azma-AI/azma-dataset-v2-mermaid-without-thoughts-final-chatml-8192-seq-len']
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # azma-hermes-pro-llama-3-8b-030524
18
+
19
+ This model is a SFT fine-tuned version of [NousResearch/Hermes-2-Pro-Llama-3-8B] (https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) on an in-house dataset [['Azma-AI/azma-mermaid-dataset-single-turn-chatml', 'Azma-AI/azma-dataset-v2-mermaid-without-thoughts-final-chatml-8192-seq-len']] (https://huggingface.co/datasets/['Azma-AI/azma-mermaid-dataset-single-turn-chatml', 'Azma-AI/azma-dataset-v2-mermaid-without-thoughts-final-chatml-8192-seq-len']). The dataset includes function-calling, Json structured output, Insights collection and Retrieval Augmented Generation multi-turn conversation datasets. Fine-tuning was done based on next token prediction over the entire conversation.
20
+
21
+
22
+ ### Usage:
23
+ ```python
24
+ from templates import AzmaTemplateEngine
25
+ from transformers import AutoModelForCausalLM, AutoTokenizer
26
+
27
+ model = AutoModelForCausalLM.from_pretrained("Azma-AI/azma-hermes-pro-llama-3-8b-030524", trust_remote_code=True)
28
+ tokenizer = AutoTokenizer.from_pretrained("Azma-AI/azma-hermes-pro-llama-3-8b-030524")
29
+ template_engine = AzmaTemplateEngine(template_type=chatml, version=1.5, add_generation_prompt=True)
30
+
31
+ messages = [
32
+ {
33
+ "content": "You are \"Azma\", an advanced superintelligent artificial intelligence developed by a team of experts from B&I (Business and Intelligence) company, and your purpose and drive is to assist the employees with any request they have within their work environment. Give concise answers to simple questions, but provide thorough and substantive responses to more complex queries. You cannot open URLs, links, or videos, so if it seems as though the interlocutor is expecting Azma to do so, you clarifies the situation and let the user know. Admit uncertainty when appropriate and ask clarifying questions of the user if needed. Generate your markdown response to the user within <|response|>...<|end|> tags.",
34
+ "thoughts": null,
35
+ "function_call": null,
36
+ "role": "system"
37
+ },
38
+ {
39
+ "role": "reference",
40
+ "thoughts": null,
41
+ "function_call": null,
42
+ "content": "User Name: John Doe\nJob Post: AI Developer\nCompany Name: Acme Corps\nCharacter:\n- Curious\n- Ambitious\n- Creative"
43
+ },
44
+ {
45
+ "role": "user",
46
+ "thoughts": null,
47
+ "function_call": null,
48
+ "content": "A factory produces 250 widgets per hour. How many widgets will be produced in a week if the factory operates 16 hours per day and is closed on Sundays?"
49
+ }
50
+ ]
51
+ prompt = template_engine.apply_chat_template(messages)
52
+
53
+ input_ids = tokenizer(prompt, return_tensors='pt').to(model.device)["input_ids"]
54
+
55
+ outputs = model.generate(input_ids, max_new_tokens=1024)
56
+
57
+ print(tokenizer.batch_decode(outputs))
58
+ # ["First, let's determine how many widgets are produced each day:
59
+
60
+ Widgets per day = Widgets per hour * Hours per day
61
+ = 250 widgets * 16 hours
62
+ = 4000 widgets
63
+
64
+ Now, let's find out how many days the factory operates in a week (excluding Sunday):
65
+
66
+ Days per week = 7 days - 1 day (Sunday)
67
+ = 6 days
68
+
69
+ Finally, we can calculate the total number of widgets produced in a week:
70
+
71
+ Widgets per week = Widgets per day * Days per week
72
+ = 4000 widgets * 6 days
73
+ = 24,000 widgets
74
+
75
+ So, the factory will produce 24,000 widgets in a week if it operates 16 hours per day and is closed on Sundays."]
76
+ ```
77
+
78
+ ### Training hyperparameters:
79
+
80
+ The model has been trained with flash_attention-2. 'The following the hyper parameters used while training:'
81
+
82
+ -
83
+ -
84
+ - max_steps = -1
85
+ - weight_decay = 0.01
86
+ - num_train_epochs = 1
87
+ - learning_rate = 1e-05
88
+ - optim = paged_adamw_32bit
89
+ - data_collator = DataCollatorForLanguageModeling
90
+ -
91
+
92
+ - gradient_accumulation_steps = 2
93
+ - per_device_train_batch_size = 8
94
+ - per_device_eval_batch_size = 2
95
+ - gradient_checkpointing_kwargs = None
96
+
97
+ - gradient_checkpointing = True
98
+ - warmup_steps = 5
99
+ - neftune_noise_alpha = 5
100
+ - lr_scheduler_type = cosine
101
+
102
+ - bf16 = True
103
+ - fp16 = False
104
+
105
+ The following were the LoRA configurations used while training:
106
+
107
+ - lora_rank: 16
108
+ - lora_alpha: 32
109
+ - lora_dropout: 0.1
110
+ - task_type: CAUSAL_LM
111
+ - target_modules: ['k_proj', 'v_proj', 'o_proj', 'q_proj', 'up_proj', 'gate_proj', 'down_proj']
112
+ - modules_to_save: ['lm_head']