HelloHotel
CSAM sourcing?
Where do these people get that much CSAM. somebody once said that to the best of their understanding, it was new CSAM images each time, meaning not many repeats. My collection of reddit memes costs me ~15-30GB, all of sbubby costs ~5GB. where is it pooled from?
The most fried part of my brain says “One of the big companies trying to absorb the fediverse is doing this to undermine their competition,” but I have zero evidence
Most companies that build CSAM detectors, by nature of their work, have a lot of it. likely thousands of photos and videos were willingly handed over to put into some vault to fight against it’s existance. If its a large corperation attacking is, it nesisarly means a leak from a CSAM vault wether it was intentional (an authorized attack) or not (opsec mistakes or insiders). Or it means there was no vault (negligence) or it wasnt tranfered securely (opsec mistakes).
it’s just the only motive I can even think of beyond it being a rogue crank.
Its not hard to build a bot that scrapes a webpage of its images, they can easly aggrogate that much content over decades.
Im generally on the side of reposting for archival and continuation. however, the “throw it out there” half-assed ness and lack of transparency of these services make it a no deal. If I were to remake one (and ive thoght about it) a simple “upload and done” approach is discusting. These bots (clarified: probably should create their own instance for the task) need communal love and care to be anything but “a fire hose of content”. I propose the following:
- Allow the community to control most aspects of the bot behavor.
- Do not upload/add to queue unless initated by a lemmy user
- Allow users to vote on post deletion, add resistance or disable if there are many non bot comments.
- Allow users to vote on the bot’s upload speed and what gets priority (up/down vote this comment)
- Pin an admin post to act like a “settings menu” for the project
- Pin an unintrucive admin comment in every post to vote on actions for the post
- Use as little boilerplate as possable. Hide in spoiler what you cant avoid.
- Use one bot account for uploads, one blocked user blocks the whole service.
- Put human and machine readable metadata of the original and repost in a spoiler.
- Use 8 or so well labeled “sorting bot accounts” to aproimate upvotes of the source relative to its negboring ccomments. Should be no more sway than ±8 votes. Bot votes Should be disclosed in the metadata.
- Call the bot somthing like “reddit archive” put source’s username in the post/comment body
- Allow off instance admins to moderate bot posts
- Prefix all communities with somthing like "auto: " for transparency
- Allow partial reuploads and omition of threads for admin/data cleanup purposes.