There it begins. Nothing good will ever come form this.
No it won’t. Media already laid the groundwork for people to hate on AI. Now they will keep focus on areas where when you read it we all come to the same common sense legislation solution. Then will come a bill to strip us of more things that made the internet awesome and we will cheer. Web scrapping and data sharing can fuck off. Pirates sent to North Korean prison camps. Sharing accounts with family, you’re flagged for an audit. Nintendo modders, more like criminals.
Scraping not scrapping
Parappa the scrappa
It’s illegal when a regular person steals something, but it’s innovation and courage, when a huge corporation steals something. Interesting how that works
Honestly it’s fucking angering. So much regulation and geo-restrictions and licensing schemes… but it’s cool that there are data brokers, and shit like this. On top of it all Chrome screwing us with manifest v3 and killing ad blocking on chrome. It’s already in canary build.
WHAT THE FUCK IS WRONG WITH THIS SPECIES?!
WHAT THE FUCK IS WRONG WITH THIS SPECIES?!
Capitalism.
Google are actually doing really awesome work with manifest v3. A pimp needs to smack their b1tches around every once in a while to remind them who’s boss.
What’s being stolen
Data, network bandwidth, and CPU/Processing time from essentially every website in the world, and when you’re paying for cloud power to run your website the cost of webscrapers running a train on your digital asshole adds up QUICK.
It’s why normal human being people get sued to shit for webscraping data from certain companies who care. But companies don’t get sued because go fuck yourself. Kill bytedance.
Any regular person can scrape and use public data for AI use, it’s not illegal for companies or individuals and it shouldn’t be.
Not surprising that Bytedance would want to gobble up every bit of data they can as fast as possible.
Google’s mission statement was originally something about controlling the world’s data. If Google has competition, that might be a good thing?
Yeah, but we were hoping for competition that wasn’t worse than google…
What makes you think they’re worse than Google?
https://en.m.wikipedia.org/wiki/ByteDance
Mostly what they have said and done, but also largely what they intend to continue saying and doing.
Can you distill it down for me?
It’s the same old Yankee speech: “is chinese so must be really bad”. They’re definitely no worse than google or facebook.
They come from an environment where the government actively encourages and sometimes funds stealing copyrighted information couched in a strong history of disregard for human rights. I’m not defending Google, and yes the US government has given them leeway, but if there is the potential for something worse than Google - Bytedance is it.
We’ve had this thing hammering our servers. The scraper uses randomized user-agents browser/OS combinations and comes from a number of distinct IP ranges in different datacenters around the world, but all the IPs track back to Bytedance.
Wouldn’t be surprised if they’re just cashing out while TikTok is still public in the US. One last desperate grab at value-add for the parent company before the shut down.
Also a great way to burn the infrastructure for subsequent use. After this, you can guarantee every data security company is going to add the TikTok servers to their firewalls and blacklists. So the American company that tries to harvest the property is going to be tripping over these legacy bullwarks for years after.
This has nothing to do with Tik Tok other than ByteDance being a shareholder in Tik Tok
As for what ByteDance plans to do with a new LLM, a person familiar with the company’s ambitions said one goal has to do with the search function for TikTok.
Last week, TikTok released an update to its current search function focused on [keywords for ads], basically allowing advertisers to search in real time for words that are trending on TikTok. It allows marketers to build an ad with relevant keywords that would ostensibly help the ad show up on the screens of more users.
…
“Given the audience and the amount of use, TikTok with a search environment that is a completely biddable space with keywords and topics, that would be very interesting to a lot of people spending a ton of money with Google right now,” the person said.
A dark vision just flashed in my mind. And I am certain this is what will happen. AI-generated ads done in real time based on the latest “trending” thing. Presented to users basically as soon as the topic has the slightest amount of “trend”.
Just emitting untold amounts of CO2 to show you generated ads in near real time.
Also it doesn’t respect
robots.txt
(the file that tells bots whether or not a given page can be accessed) unlike most AI scrapping bots.This is fine. I support archiving the Internet.
It kinda drives me crazy how normalized anti-scraping rhetoric is. There is nothing wrong with (rate limited) scraping
The only bots we need to worry about are the ones that POST, not the ones that GET
It’s not fine. They are not archiving the internet.
I had to ban their user agent after very aggressive scraping that would have taken down our servers. Fuck this shitty behaviour.
this is neither archiving, nor ratelimited, if the AI training purpose and the 25 times faster scraping than a large company did not make it obvious
Bytedance ain’t looking to build an archival tool. This is to train gen AI models.
Every major ai company did this let them do that what is to loose here?
People like to act as if archiving has never been a thing until about a year ago at which point it was suddenly invented and is now a threat in some nebulous way.
This isn’t archiving.
It’s not that it’s a threat, it’s that there’s a difference between archiving for preservation and crawling other people’s content for the purpose of making money off it (in a way that does not benefit the content creator).
crawling other people’s content for the purpose of making money off it (in a way that does not benefit the content creator).
You’re describing capitalism there, bud
If a foreign Dictatorship’s military op wants to know every facet of your life, then you can be damn sure it’s a threat.