Luu Tuyen@lemmy.world to Technology@lemmy.worldEnglish · 2 months agoTikTok’s parent launched a web scraper that’s gobbling up the world’s online data 25-times faster than OpenAIfortune.comexternal-linkmessage-square35fedilinkarrow-up12arrow-down10
arrow-up12arrow-down1external-linkTikTok’s parent launched a web scraper that’s gobbling up the world’s online data 25-times faster than OpenAIfortune.comLuu Tuyen@lemmy.world to Technology@lemmy.worldEnglish · 2 months agomessage-square35fedilink
minus-squarejagged_circle@feddit.nllinkfedilinkEnglisharrow-up0arrow-down1·edit-21 month agoThis is fine. I support archiving the Internet. It kinda drives me crazy how normalized anti-scraping rhetoric is. There is nothing wrong with (rate limited) scraping The only bots we need to worry about are the ones that POST, not the ones that GET
minus-squarepurrtastic@lemmy.nzlinkfedilinkEnglisharrow-up1·1 month agoIt’s not fine. They are not archiving the internet. I had to ban their user agent after very aggressive scraping that would have taken down our servers. Fuck this shitty behaviour.
minus-squareGhostalmedia@lemmy.worldlinkfedilinkEnglisharrow-up1·1 month agoBytedance ain’t looking to build an archival tool. This is to train gen AI models.
minus-squareWhyJiffie@sh.itjust.workslinkfedilinkEnglisharrow-up1·1 month agothis is neither archiving, nor ratelimited, if the AI training purpose and the 25 times faster scraping than a large company did not make it obvious
This is fine. I support archiving the Internet.
It kinda drives me crazy how normalized anti-scraping rhetoric is. There is nothing wrong with (rate limited) scraping
The only bots we need to worry about are the ones that POST, not the ones that GET
It’s not fine. They are not archiving the internet.
I had to ban their user agent after very aggressive scraping that would have taken down our servers. Fuck this shitty behaviour.
Bytedance ain’t looking to build an archival tool. This is to train gen AI models.
this is neither archiving, nor ratelimited, if the AI training purpose and the 25 times faster scraping than a large company did not make it obvious