Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

Stopthatgirl7@lemmy.world · 16 days ago

Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

gcheliotis@lemmy.world · 14 days ago

The real question here is why the researcher “librarian” didn’t even attempt to anonymize the dataset before making it available. Full anonymization isn’t a trivial task, but at least removing unique identifiers or replacing them with randomly generated ones would be good practice.