In an age of LLMs, is it time to reconsider human-edited web directories?

AJ Sadauskas@aus.social · edit-2 8 months ago

In an age of LLMs, is it time to reconsider human-edited web directories?

merthyr1831@lemmy.world · 8 months ago

This is how it’s gonna go. we’ll get human-curated search results, before someone “innovates” by mildly automating the process until someone “innovates” again by using AI to automate it further. Time is a circle

bsammon@lemmy.sdf.org · 8 months ago

Lycos, Excite, AltaVista, and of course Yahoo all were originally web directories of this sort.

Both Wikipedia and my own memory disagree with you about Lycos and AltaVista. I’m pretty sure they both started as search engines. Maybe they briefly dabbled in being “portals”.

AJ Sadauskas@aus.social · 8 months ago

@bsammon And this Archive.org capture of Lycos.com from 1998 contradicts your memory: https://web.archive.org/web/19980109165410/http://lycos.com/

See those links under “WEB GUIDES: Pick a guide, then explore the Web!”?

See the links below that say Autos/Business/Money/Careers/News/Computers/People/Education /Shopping/Entertainment /Space/Sci-Fi/Fashion /Sports/Games/Government/Travel/Health/Kids

That’s exactly what I’m referring to.

Here’s the page where you submitted your website to Lycos: https://web.archive.org/web/19980131124504/http://lycos.com/addasite.html

As far as the early search engines went, some were more sophisticated than others, and they improved over time. Some simply crawled the webpages on the sites in the directory, others

But yes, Lycos definitely was definitely an example of the type of web directory I described.

bsammon@lemmy.sdf.org · 8 months ago

1998 isn’t “originally” when Lycos started in 1994. That 1998 snapshot would be their “portal” era, I’d imagine.

And the page where you submitted your website to Lycos – that’s no different than what Google used to have. It just submitted your website to the spider. There’s no indication in that snapshot that suggests that it would get your site added to a curated web-directory.

Those late 90’s web-portal sites were a pale imitation of the web indices that Yahoo, and later DMoz/ODP were at their peak. I imagine that the Lycos portal, for example, was only managed/edited by a small handful of Lycos employees, and they were moving as fast as they could in the direction of charging websites for being listed in their portal/directory. The portal fad may have died out before they got many companies to pony up for listings.

I think in the Lycos and AltaVista cases, they were both search engines originally (mid 90s) and than jumped on the “portal” bandwagon in the late 90s with half-assed efforts that don’t deserve to be held up as examples of something we might want to recreate.

Yahoo and DMoz/ODP are the only two instances I am aware of that had a significant (like, numbered in the thousands) number of websites listed, and a good level of depth.

Dieter Komendera@hachyderm.io · 8 months ago

@ajsadauskas @degoogle it sounds a bit like Kagi‘s Small Web initiative and search. have you seen it? https://blog.kagi.com/small-web

harsh3466@lemmy.ml · 8 months ago

I don’t know if this is the intent of their small web effort, but the first impression I got just now when I clicked through and saw this was, “gross”.

Scummy looking/feeling marketing hustle is a huge turn off.

Moonrise2473@lemmy.ml · 8 months ago

Main problems are:

Link rot
Sneakily inserted sponsored links

Tinyrabbit ✅@floss.social · 8 months ago

@Moonrise2473 @ajsadauskas
3. Infinitely growing list of categories.
4. Mis-categorisation

i remember learning HTML (4.0) and reading that you should put info in a <meta> tag about the categories your page fits in, and that would help search engines. Did it also help web directories?

Bärchelor of Science@social.tchncs.de · 8 months ago

@ajsadauskas @degoogle I mean we could still use all modern tools. I’m hosting a searxng manually and there is currently an ever growing block list for AI generated websites that I regularly import to keep up to date. You could also make it as allow list thing to have all websites blocked and allow websites gradually.

Bärchelor of Science@social.tchncs.de · 8 months ago

@ajsadauskas @degoogle I started that because it bothered me that you couldn’t just report a website to duckduckgo that obviously was a stackoverflow crawler. This problem persists since reddit and stackoverflow are a thing themselves. why are there no measurements from search engine to get a hold of it.

I never understood that.

Albert Cardona@mathstodon.xyz · 8 months ago

@ajsadauskas @degoogle

Yes to all. For a while I’ve been de facto using a miniscule subset of the web. My gateway to other, relevant websites are via human-to-human recommendations, primarily in a place like this.

Albert Cardona@mathstodon.xyz · 8 months ago

@ajsadauskas @degoogle

And just now, as seen at the bottom of a blog post:

“Post a Comment
Unfortunately because of spam with embedded links (which then flag up warnings about the whole site on some browsers), I have to personally moderate all comments. As a result, your comment may not appear for some time. In addition, I cannot publish comments with links to websites because it takes too much time to check whether these sites are legitimate.”

raffaele@digipres.club · 8 months ago

@ajsadauskas @degoogle a bit of history of Yahoo here, started as a web directory https://www.wired.com/1996/05/indexweb/

Brad Enslen@mastodon.social · 8 months ago

@ajsadauskas @degoogle Since I run a small directory this is a fascinating conversation to me.

There is a place for small human edited directories along with search engines like Wiby and Searchmysite which have human review before websites are entered. Also of note: Marginalia search.

I don’t see a need for huge directories like the old Yahoo, Looksmart and ODP directories. But directories that serve a niche ignored by Google are useful.

Bernard Sheppard@mastodon.au · 8 months ago

@bradenslen @ajsadauskas @degoogle looksmart! There’s a blast from the past.

As a very early internet user (suburbia.org.au- look it up, and who ran it) and a database guy, what I learnt very early is that any search engine needed users who knew how to write highly selective queries to get highly specific results.

Google - despite everything - can still be used as a useful tool - if you are a skilled user.

I am still surprised that you are not taught how to perform critical internet searching in primary school. It is as important as the three Rs

ᴇᴍᴘᴇʀᴏʀ 帝@feddit.uk · 8 months ago

But directories that serve a niche ignored by Google are useful.

This is a good point - as search is increasingly enshittified too (from top down, with corporate interests, and bottom up, from SEO manipulation and dodgy sites) it makes sense for topics or communities often drowned out by the noise.

I also see you are using webrings - another blast from the past that has it’s uses.

René Seindal@mastodon.social · 8 months ago

@ajsadauskas @degoogle DMOZ was once an important part of the internet, but it too suffered from abuse and manipulation for traffic.

For many DMOZ was the entry point to the web. Whatever you were looking for, you started there.

Google changed that, first for the better, then for the worse.

Michelle Hughes@a2mi.social · 8 months ago

@ajsadauskas @degoogle

It looks like there’s a couple projects to continue the directory DMOZ. I hope they’re sharing work with each other!

ᴇᴍᴘᴇʀᴏʀ 帝@feddit.uk · 8 months ago

Got any links?

Michelle Hughes@a2mi.social · 8 months ago

@Emperor

Yeah. Sorry, I was hesitant to post links at first before I vetted them.

It looks like “Curlie” is the official continuation of the DMOZ project:

https://curlie.org/

The other ones I was seeing, it turns out, are static mirrors of 2017 DMOZ.

ᴇᴍᴘᴇʀᴏʀ 帝@feddit.uk · 8 months ago

Thanks for that, a real blast from the past. I have a vague memory that I was an editor on the ODP or dmoz back in the day.

Sorry, I was hesitant to post links at first before I vetted them.

Yes, perhaps not coincidentally, I thought it best to ask for a human-curated link.

Michelle Hughes@a2mi.social · 8 months ago

@Emperor

Y’know, come to think of it, Wikipedia might be a better project to point to here. All the content on there is hand curated. When I’m interested in a subject, I usually go to wikipedia first instead of a search engine. Sometimes I am directed out to other websites from there.

I set up a quick keyword search so I can type “wp blah blah blah” into my url bar and it searches wikipedia.

https://support.mozilla.org/en-US/kb/how-search-from-address-bar?redirectslug=Smart+keywords&redirectlocale=en-US

Mei Lin@rubber.social · 8 months ago

@ajsadauskas @degoogle
I’ve already seen new webrings forming.

Or maybe that was old webrings updating?

ᴇᴍᴘᴇʀᴏʀ 帝@feddit.uk · 8 months ago

Yeah, I was just looking at a webring and thinking “these still have a use”. They could definitely help with discoverablity on a broad front. I help Admin feddit.uk and had pondered reaching out to other British Fediverse services to make a Britiverse. However, how to hold it all together and navigate between them was proving tricky or clunky until I was looking at the webring and thought “FedRing”. Now that could work.

photonic_sorcerer@lemmy.dbzer0.com · 8 months ago

What’s to say we won’t have AI-curated lists and directories? That way we don’t have to deal with link rot and the like. I think the issue is the algorithms used for search. We need better ones, better AI, not more frivolous human labor.

happyborg@fosstodon.org · 8 months ago

@ajsadauskas
I agree we need better and remember the early days well. Before indexes we passed URLs, in fact just IP addresses of servers we’d visit to see what was there, and that was often a directory of documents, papers etc. It filled us with awe, but let’s not dial back that far!

Another improvement will be #LocalLLMs both for privacy and personalised settings. Much of the garbage now is in service of keeping us searching rather than finding what we want.
@degoogle

Alexander Hay@mastodon.social · 8 months ago

@ajsadauskas @degoogle Curation is elation.

AK Ritter@fluffs.au · 8 months ago

@ajsadauskas @degoogle I’ve been thinking back to the days of web rings and reciprocal links back when people had their own websites to add links too. I have been wanting to go back to that mode as well.

ᴇᴍᴘᴇʀᴏʀ 帝@feddit.uk · 8 months ago

Indeed. As I mentioned below, something like a webring (a FedRing) might be the solution to something I was pondering.

It is increasingly clear to me that a lot of directions Web 1.0 was evolving in were diverted or just killed off by Big Tech’s landgrab which built walled gardens. I see the Fediverse as a return to the idea of blogs (micro and macro), forums, etc but in a more natural progression to interoperability. This still isn’t perfect and there may be other early web ideas, like webrings, that improve discoverablity.