“Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as “trivial”, even when their validity was crucial.”

  • vane@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 个月前

    This study is bullshit, because they only trace evaluations and not trace training process that align tokens with probabilities.

      • vane@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 个月前

        Well, every civilisation needs it’s prophets. Our civilisation built prophet machines that will kill us. We just didn’t get to the killing step yet.

        • froztbyte@awful.systems
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 个月前

          yeah but see, these grifters all heard it as “every civilisation needs its profits”. just a shame they suck at that too

          • vane@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            1 个月前

            No prophet worked for free and they were always near the rullers and near big money. The story repeats itself, just the times are different and we can instant message with each other.