• Novocirab@feddit.org
    link
    fedilink
    English
    arrow-up
    14
    ·
    edit-2
    3 days ago

    There should be a federated system for blocking IP ranges that other server operators within a chain of trust have already identified as belonging to crawlers. A bit like fediseer.com, but possibly more decentralized.

    (Here’s another advantage of Markov chain maze generators like Nepenthes: Even when crawlers recognize that they have been served garbage and they delete it, one still has obtained highly reliable evidence that the requesting IPs are crawlers.)

    Also, whenever one is only partially confident in a classification of an IP range as a crawler, instead of blocking it outright one can serve proof-of-works tasks (à la Anubis) with a complexity proportional to that confidence. This could also be useful in order to keep crawlers somewhat in the dark about whether they’ve been put on a blacklist.

      • Novocirab@feddit.org
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        3 days ago

        Thanks. Makes sense that things roughly along those lines already exist, of course. CrowdSec’s pricing, which apparently start at 900$/months, seem forbiddingly expensive for most small-to-medium projects, though. Do you or does anyone else know a similar solution for small or even nonexistent budgets? (Personally I’m not running any servers or projects right now, but may do so in the future.)

        • Opisek@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          3 days ago

          There are many continuously updated IP blacklists on GitHub. Personally I have an automation that sources 10+ of such lists and blocks all IPs that appear on like 3 or more of them. I’m not sure there are any blacklists specific to “AI”, but as far as I know, most of them already included particularly annoying scrapers before the whole GPT craze.

      • rekabis@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        3 days ago

        Holy shit, those prices. Like, I wouldn’t be able to afford any package at even 10% the going rate.

        Anything available for the lone operator running a handful of Internet-addressable servers behind a single symmetrical SOHO connection? As in, anything for the other 95% of us that don’t have literal mountains of cash to burn?

        • Opisek@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          3 days ago

          They do seem to have a free tier of sorts. I don’t use them personally, I only know of their existence and I’ve been meaning to give them a try. Seeing the pricing just now though, I might not even bother, unless the free tier is worth anything.