• scruiser@awful.systems
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    6 hours ago

    GPT-1 is 117 million parameters, GPT-2 is 1.5 billion parameters, GPT-3 is 175 billion, GPT-4 is undisclosed but estimated at 1.7 trillion. Token needed for training and training compute scale linearly (edit: actually I’m wrong, looking at the wikipedia page… so I was wrong, it is even worse for your case than I was saying, training compute scales quadratically with model size, it is going up 2 OOM for every 10x of parameters) with model size. They are improving … but only getting a linear improvement in training loss for a geometric increase in model size, training time. A hypothetical GPT-5 would have 10 trillion training parameters and genuinely need to be AGI to have the remotest hope of paying off it’s training. And it would need more quality tokens than they have left, they’ve already scrapped the internet (including many copyrighted sources and sources that requested not to be scrapped). So that’s exactly why OpenAI has been screwing around with fine-tuning setups with illegible naming schemes instead of just releasing a GPT-5. But fine-tuning can only shift what you’re getting within distribution, so it trades off in getting more hallucinations or overly obsequious output or whatever the latest problem they are having.

    Lower model temperatures makes it pick it’s best guess for next token as opposed to randomizing among probable guesses, they don’t improve on what the best guess is and you can still get hallucinations even picking the “best” next token.

    And lol at you trying to reverse the accusation against LLMs by accusing me of regurgitating/hallucinating.

    • vivendi@programming.dev
      link
      fedilink
      English
      arrow-up
      0
      ·
      6 hours ago

      Small scale models, like Mistral Small or Qwen series, are achieving SOTA performance with lower than 50 billion parameters. QwQ32 could already rival shitGPT with 32 billion parameters, and the new Qwen3 and Gemma (from google) are almost black magic.

      Gemma 4B is more comprehensible than GPT4o, the performance race is fucking insane.

      ClosedAI is 90% hype. Their models are benchmark princesses, but they need huuuuuuge active parameter sizes to effectively reach their numbers.

      Everything said in this post is independently verifiable by taking 5 minutes to search shit up, and yet you couldn’t even bother to do that.