Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

  • Salamander@mander.xyz
    link
    fedilink
    arrow-up
    0
    ·
    18 days ago

    Ahh, really?! Thanks for letting me know. I will see if there is something I can do to throttle that after holidays. Curious to see what solutions others come up with

      • Salamander@mander.xyz
        link
        fedilink
        arrow-up
        0
        ·
        3 days ago

        That’s interesting. I still don’t fully understand the implications from a user-experience perspective. It looks as if the proof-of-work would go unnoticed when using a user client but presents a more significant challenged for an automated scraping bot. So, it does look promising. I still don’t understand what it would do to a bot such as a ‘PlantID bot’ and other good bots. Do they have a heavy soul? I’ll look into it.

        For now, I have modified https://mander.xyz/robots.txt, copying the file that Dave from lemmy.nz found to work to prevent at least some scraping and bot load.