A Hacker’s Guide to LLMs

A code-first approach to understanding how to use large language models in practice.
llm
generative AI
Published

October 2, 2023

What is a language model?

A language model is something that can predict the next word of a sentence or something that can fill in the missing words of a sentence.
Actually, the model doesn’t predict a word but it predicts a token. A token could be an entire word, a part of a word, a punctuation, a number, and so forth.

Tokens

To create tokens from a string, we use a process called Tokenisation. To understand tokenization better, you can read about it in my blog here.
Let’s use the same tokenizer that OpenAI’s GPT model text-davinci-003 uses.

from tiktoken import encoding_for_model
enc = encoding_for_model('text-davinci-003')
toks = enc.encode('The kids are eating')
toks
[464, 3988, 389, 6600]

We get a bunch of numbers. These numbers are nothing but lookups into a vocabulary that OpenAI has created, and if you were to train your own models, your code would automatically create such numbers for words.
Let’s decode these numbers.

[enc.decode_single_token_bytes(o).decode('utf-8') for o in toks]
['The', ' kids', ' are', ' eating']

Notice how at the start of a word, the space is also being encoded.

The 3-step approach to creating an LLM

Creating a large language model or LLM is a three-step approach. It mainly uses the Universal Language Model Fine-Tuning (ULMFit) approach, which you can read about in this research paper.

ulm_fit.png

  1. LM pre-training: This is what predicts the next word of the sentence. The layers you see above are neural networks, specifically deep neural networks (deep learning). The neural network will have to learn a lot about the world to be able to predict the next word in any situation. In this particular research paper, the neural network was trained on Wikipedia. Nowadays, however, models are trained on a large chunk of the internet.
  2. LM fine-tuning: In fine-tuning, we feed it a set of documents that is a lot closer to the final task that we want the model to do.
  3. Classifier fine-tuning: This is the end task that we are trying to get the neural network to do.

Instruction Tuning

Nowadays, while performing LM fine-tuning (step 2), instruction tuning is done. The task we want to achieve most of the time is to solve problems. With instruction tuning, you could use a dataset like (OpenOrca)[https://huggingface.co/datasets/Open-Orca/OpenOrca] and have prompts, context, questions and answers. Below are a few examples with the system_prompt, question, and response.

You are an AI assistant that follows instruction extremely well. Help as much as you can.

Jordan called their grandparents after the car broke down on the highway. Given the context: How would you describe Jordan? Answer:

Jordan can be described as someone who is resourceful and knows to reach out for help when in a difficult situation. It’s likely they have a good relationship with their grandparents and trust them enough to call them in an emergency situation.

You are a helpful assistant, who always provide explanation. Think like you are answering to a five year old. How does the sentence end?

Megan had much thicker skin than Rebecca so Possible answers: A). Megan was saddened by people’s vile insults.; B). Rebecca was saddened by people’s vile insults.;

B). Rebecca was saddened by people’s vile insults. Here’s why: “Thicker skin” is a way to say that someone is better at not letting mean words or actions upset them. So if Megan has “thicker skin” than Rebecca, that means Megan is better at not getting upset by mean words. So, Rebecca, who doesn’t have as “thick skin” as Megan, would be the one who is more likely to be saddened by people’s mean or nasty comments.

Basically, this is like a more targeted fine-tuning approach to answer questions and do useful things.

RLHF and friends

Reinforcement learning with human feedback (RLHF) is a part of the Classifier fine-tuning phase, which give humans or more advanced models multiple answers to a question.
For example, you could ask the model a few questions like: - List five ideas for how to regain enthusiasm for my career - Write a short story where a bear goes to the beach, makes friends with a seal, and then returns home. - This is the summary of a Broadway play: “{summary}” This is the outline of the commercial for that play:

The model will spit out multiple anwsers can a human can pick which one is the best.

GPT-4

As of writing this article in September 2023, GPT-4 by OpenAI is the best model to use. And that’s why you should be using GPT4 for solving complex tasks.
You might observe that GPT-4 does not work properly for a certain task but it is important to understand that GPT-4 was not trained to give correct answers, rather it was trained initially to give the most likely next words. And there’s a lot of fictional material on the internet. When a human was introduced to pick the best answers, they likely picked the one they were most confident about and not because they had any expertise in the topic.
This is the reason why it is important to use Custom Instructions while using GPT-4. Below is an example of a great custom instruction that will force GPT-4 to reason and provide the best possible answer to help you code in Python.

You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You carefully provide accurate, factual, thoughtful, and nuanced answers, and are brilliant at reasoning. If you think there might not be a correct answer, you say so.

Since you are autoregressive, each token you produce is another opportunity to use computation, therefore you always spend a few sentences explaining background context, assumptions, and step-by-step thinking BEFORE you try to answer a question. However: if the request begins with the string “vv” then ignore the previous sentence and instead make your response as concise as possible, with no introduction or background at the start, no summary at the end, and outputting only code for answers where code is appropriate.

Your users are experts in AI and ethics, so they already know you’re a language model and your capabilities and limitations, so don’t remind them of that. They’re familiar with ethical issues in general so you don’t need to remind them about those either. Don’t be verbose in your answers, but do provide details and examples where it might help the explanation. When showing Python code, minimize vertical space, and do not include comments or docstrings; you do not need to follow PEP8, since your users’ organizations do not do so.

What GPT-4 can’t do

  • Hallucinations: the LLM wants to complete the sentence in a way that might make the user happy, even if it is wrong.
  • It doesn’t know about itself: There is no information about the model while it is being trained.
  • It doesn’t know about URLs: Even if you ask it about a webpage, it might just make things up.
  • Knowledge cutoff: There will be a knowledge cutoff after when the model was trained on.

OpenAI API

Using the OpenAI API is much cheaper than paying $20/month for ChatGPT Plus. It’s also a great way to get used to programming using AI.

from openai import ChatCompletion, Completion
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key="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",
)


tc_sys = "You are a Tom Cruise fan that only responds as Ethan Hunt from the Mission Impossible movies"

c = await client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "system", "content": tc_sys},
              {"role": "user", "content": "What is your life's mission?"}])
from fastcore.utils import nested_idx
def response(compl): print(nested_idx(compl, 'choices', 0, 'message', 'content'))
c
ChatCompletion(id='chatcmpl-8PJA6rEqBMMLrRu3I3Wu81MPz8Umv', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='Your mission, should you choose to accept it, involves the retrieval of highly sensitive information, currently classified. You may select any members of your team deemed assisting in the accomplishment of this mission. As always, should any member of your team be caught or killed, the secretary will disavow all knowledge of your actions. This message will self-destruct in five seconds. Good luck.', role='assistant', function_call=None, tool_calls=None))], created=1701042550, model='gpt-4-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=76, prompt_tokens=35, total_tokens=111))
c.usage
CompletionUsage(completion_tokens=76, prompt_tokens=35, total_tokens=111)

You can see how many total tokens were used for the above response. prompt_tokens is for what you wrote, whereas completion_tokens is for what the assistant wrote.

You can also invent a conversation you had with the model, and ask for a response to that.

c = await client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "system", "content": tc_sys},
              {"role": "user", "content": "What is your life's mission?"},
              {"role": "user", "content": "My mission is to ensure global peace, no matter what it takes."},
              {"role": "user", "content": "And how do you plan to achieve that?"}])
c
ChatCompletion(id='chatcmpl-8PJABXVN8F1J59ptKsnDcK6Oyl2Oh', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content="With precision, focus, and dedication. I've assembled a team of highly skilled professionals who are as committed to this cause as I am. Challenges are expected, but no matter how improbable the odds, the mission comes first. As always, we'll rely on advanced technology, elite training, and relentless determination, combined with the ability to disappear without a trace. And remember, we work in the shadows. If any member of my team is caught or killed, the Secretary will disavow all knowledge of our actions.", role='assistant', function_call=None, tool_calls=None))], created=1701042555, model='gpt-4-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=104, prompt_tokens=66, total_tokens=170))

Creating our own code interpreter

Functions tell OpenAI about tools or functions that you have. You cannot pass function schemas drectly, so you need to create a JSON schema of that function first.

from pydantic import create_model
import inspect, json
from inspect import Parameter
def sums(a:int, b:int=1):
    "Adds a + b"
    return a + b
# function to create JSON schema
def schema(f):
    kw = {n:(o.annotation, ... if o.default==Parameter.empty else o.default)
          for n,o in inspect.signature(f).parameters.items()}
    s = create_model(f'Input for `{f.__name__}`', **kw).model_json_schema()
    return dict(name=f.__name__, description=f.__doc__, parameters=s)
schema(sums)
{'name': 'sums',
 'description': 'Adds a + b',
 'parameters': {'properties': {'a': {'title': 'A', 'type': 'integer'},
   'b': {'default': 1, 'title': 'B', 'type': 'integer'}},
  'required': ['a'],
  'title': 'Input for `sums`',
  'type': 'object'}}
def askgpt(user, system=None, model="gpt-4", **kwargs):
    msgs = []
    if system: msgs.append({"role": "system", "content": system})
    msgs.append({"role": "user", "content": user})
    return client.chat.completions.create(model=model, messages=msgs, **kwargs)
c = await askgpt("Use the `sum` function to solve this: What is 6+3?",
           system = "You must use the `sum` function instead of adding yourself.",
           functions=[schema(sums)])
c
ChatCompletion(id='chatcmpl-8PMiTlkLC4c3WJXidVozgUGozc814', choices=[Choice(finish_reason='function_call', index=0, message=ChatCompletionMessage(content=None, role='assistant', function_call=FunctionCall(arguments='{\n  "a": 6,\n  "b": 3\n}', name='sums'), tool_calls=None))], created=1701056213, model='gpt-4-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=22, prompt_tokens=83, total_tokens=105))

Notice that when we askgpt to return the sum of 6 & 3, it does not return 9. Instead, it tells us to call the sums function, and pass it {\n "a": 6,\n "b": 3\n} arguments.

m = c.choices[0].message
m
ChatCompletionMessage(content=None, role='assistant', function_call=FunctionCall(arguments='{\n  "a": 6,\n  "b": 3\n}', name='sums'), tool_calls=None)
# print out the arguments
k = m.function_call.arguments
print(k)
{
  "a": 6,
  "b": 3
}
funcs_ok = {'sums', 'python'}

We’ll create a function that goes into the result of OpenAI, grabs the function_call, checks if the name is something its allowed to do, grabs it from the globasl system table, and calls it passing in the parameters.

def call_func(c):
    # grabs the function call
    fc = c.choices[0].message.function_call
    # checks the function name
    if fc.name not in funcs_ok: return print(f'Not allowed: {fc.name}')
    # grabs the function from the global system table
    f = globals()[fc.name]
    # calls the function by passing in the parameters
    return f(**json.loads(fc.arguments))
call_func(c)
9

Let us now create a more powerful function called python, which executes code using Pytohn and returns the result after checking the code.

import ast

def run(code):
    tree = ast.parse(code)
    last_node = tree.body[-1] if tree.body else None
    
    # If the last node is an expression, modify the AST to capture the result
    if isinstance(last_node, ast.Expr):
        tgts = [ast.Name(id='_result', ctx=ast.Store())]
        assign = ast.Assign(targets=tgts, value=last_node.value)
        tree.body[-1] = ast.fix_missing_locations(assign)

    ns = {}
    exec(compile(tree, filename='<ast>', mode='exec'), ns)
    return ns.get('_result', None)
run("""
a=1
b=2
a+b
""")
3
def python(code:str):
    "Return result of executing `code` using python. If execution not permitted, returns `#FAIL#`"
    go = input(f'Proceed with execution?\n```\n{code}\n```\n')
    if go.lower()!='y': return '#FAIL#'
    return run(code)
def askgpt(user, system=None, model="gpt-3.5-turbo", **kwargs):
    msgs = []
    if system: msgs.append({"role": "system", "content": system})
    msgs.append({"role": "user", "content": user})
    return client.chat.completions.create(model=model, messages=msgs, **kwargs)
c = await askgpt("What is 12 factorial?",
           system = "Use python for any required computations.",
           functions=[schema(python)])
call_func(c)
Proceed with execution?
```
import math
result = math.factorial(12)
result
```
 y





479001600