After 2,5 years of intensive research and programming efforts, the entire Openwebsearch.eu project team is excited to grant access to its pilot of the first-ever federated pan-European Open Web Index (OWI).

From June onward, commercial and scientific development teams of any size as well as interested individuals are welcome to access and make use of almost a petabyte (and growing) of open web data under a general research license or – upon request – under a designated commercial license as well.

Given that the European Commission has launched the InvestAI initiative to mobilize €200 billion of investment in artificial intelligence, the Open Web Index comes with perfect timing.

The OpenWebSearch.eu consortium actively calls early adopters to pioneer innovative projects surrounding vertical web search, argumentative search, LLM applications including RAG and more.

“The OWI symbolizes a first step towards true European digital sovereignty and is a fundamental step in paving the way for a comprehensive open European AI landscape.“ says Community Manager Ursula Gmelch and further:

“Our goal behind this initial pilot phase is to onboard a range of projects from diverse domains to get early feedback in. We look forward to users confirming the quality and value in current functionalities and/or helping us pivot in such ways that real market demands can be met and further expanded upon.“

An official kick-off event will be hosted on 6 June from 10 am to 12 am CEST via Zoom.

Registration to the event is open under the following link:

https://cscfi.zoom.us/meeting/register/eATIpDQ5TZidh4Jzkim6FQ#/registration

[,]

  • plyth@feddit.org
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 hours ago

    This part of the FAQ makes the project interesting:

    Services like Google’s Search Console allow website operators to optimize their search page for Google – thus Google crowdsources the robust parsing without making this information available to third parties.

    A new search engine is at a disadvantage without that data. Website operators don’t bother maintaining their information at an unknown search engine. Hopefully OWS becomes popular enough that operators use it, e.g. to indicate when their site needs a recrawl or which parts of their site have to be indexed.

    • randomname@scribe.disroot.orgOP
      link
      fedilink
      English
      arrow-up
      17
      ·
      edit-2
      12 hours ago

      In an nutshell, this is what I understand, too. It may take some time until it gets fully competitive but it could soon get a better alternative to the gatekeepers like Google imho.

      Addition for a brief article I just found:

      The EU’s Open Web Index Project: Another Step Toward Digital Independence

      The Open Web Index (OWI) is an open-source initiative under the European Union’s Horizon Programme, aimed at democratizing web-search technologies and strengthening Europe’s digital sovereignty. The project will launch in June 2025, providing a common web index accessible to all and decoupling the indexing infrastructure from the search services that use it. In doing so, the OWI offers not only technical innovations but also a paradigm shift in the global search market—today, a single player (Google) holds over ninety percent of the market share and determines access to online information.

      The project’s core idea is to make web crawling, metadata enrichment, and indexing a shared European resource. Development takes place in large data centres that process terabytes of raw data each day and publish the entire index as open data. All software components are open-source, and the CIFF format ensures that systems based on Lucene, Solr, or Terrier can connect to the OWI seamlessly. Thus, with minimal effort, researchers and developers can create vertical search engines that rank results according to specific criteria such as sustainability or privacy priorities […]

  • General_Effort@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    3
    ·
    7 hours ago

    Only few search engines index the Web at scale. Third parties who want to develop downstream applications based on web search fully depend on the terms and conditions of the few vendors. The public availability of the large-scale Common Crawl does not alleviate the situation, as it is often cheaper to crawl and index only a smaller collection focused on a downstream application scenario than to build and maintain an index for a general collection the size of the Common Crawl. Our goal is to improve this situation by developing the Open Web Index.

    The Open Web Index is a publicly funded basic infrastructure from which downstream applications will be able to select and compile custom indexes in a simple and transparent way. Our goal is to establish the Open Web Index along with associated data products as a new open web information intermediary.

    https://downloads.webis.de/publications/papers/hendriksen_2024.pdf

    This paper seems to give a good, quick overview.

    It looks to be the usual EU tech project. Doing more to achieve less in a desperate, hopeless attempt to make up for the stupidity and greed of European elites.

  • albert180@piefed.social
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    11 hours ago

    Given that the European Commission has launched the InvestAI initiative to mobilize €200 billion of investment in artificial intelligence, the Open Web Index comes with perfect timing.

    But these dumbfucks cutted the awesome NGI Zero Grant for this. Which funded many awesome Open Source Projects