For The Love Of The Web. Posting Publicly Is Going To Get Used In Some Way

white cloud sky - Photo by Kumiko SHIMIZU on UnSplash.com

Sam Cole over at 404 Media wrote an article about a Hugging Face Machine Learning Librarian making a public data set of 1 million Bluesky posts available to everyone for Machine Learning.

People were of course outraged. Afterall it’s the Internet. People thrive on being outraged, pissed off, and otherwise salty.

What people seem to miss is that what they’re posting on Bluesky is public and scrapable.

The way this guy made the data set was a bit sloppy and , in my opinion, irresponsible. He didn’t anonymize the data and left personal identifiable information in the data set. He also didn’t get consent from people first.

Yea, I agree it feels a bit icky that this was done, mostly without consent or anonymizing the data. But for the love of the Web, what you put online publicly is โ€” PUBLIC. People will see it and possibly use it for whatever they want. How hard is this to grasp?

This collection, according to Sam’s article, is also in a legal gray area right now and is going through the courts around the world.

To give some credit to the librarian, he down the data set after getting quite a bit of “feedback.” ๐Ÿ˜ตโ€๐Ÿ’ซ๐Ÿ˜œ

But that didn’t stop the trolls from making even bigger data sets and putting the out online.

I really do in fact understand why people are upset, but those posts are public. Don’t post stuff and expect it to be private when it’s PUBLIC!

Honestly, I’m fine with my content that I post publicly be used to train LLMs and AI, because it will improve the technology that I benefit from.

I agree with Rand Fishkin, the founder of Moz and Sparktoro.

He posted on Bluesky:

I know others are probably upset about this, but LLM training is, for me, a benefit of participating in spaces like this. I *want* my word usage, brands, and content to be part of how AI answers questions in the future. Just like I wanted Google to index my websites.

โ€” Rand Fishkin (@randfish.bsky.social) December 8, 2024 at 4:06 PM

I don’t think that’s crazy desire. Right? Am I completely off-base? What do you think?

Fediverse reactions

Comments

3 responses to “For The Love Of The Web. Posting Publicly Is Going To Get Used In Some Way”

    1. LOL. I’m sick of people getting all bent out of shape when they should know that anything public is going to be publicly accessible. That’s what. LOL

  1. @seth People do not understand that the web is public. People expect that social media businesses have the ability to block access by scrapers and other scammers.What people ought to do is stop giving free content to social media businesses. All of them in it for the money seem to be run by malevolent aberrations.