The company also stated that the data be used for training Google Translate and its Cloud AI to perform better. This move triggered the community as experts warn that such scraping would eventually harm the users, as the companies often consider stolen private data too in training their products.
Training on Public Data
With Generative AI gaining craze, big tech is changing their policies to adapt accordingly. After rushing to make their large language models, the tech companies are now tuning their data-sucking rules to favour their products better, at the cost of the community’s privacy.
The latest in this pursuit is Google, which modified its privacy policies over the weekend to use the publicly available data for training its products, viz Bard, Cloud AI and Google Translate. The company has changed the wording from “AI models” to “language models” in its policies, as you can see here.
This is a formal way of informing the public (not seeking permission) that they can use their public data for their product training. Experts warned that such a trend could harm the public’s privacy in future, with some already suing OpenAI for massively scraping personal data from the internet, including “stolen private information,” to train its GPT models without prior consent.
Similarly, we’ll see plenty of such lawsuits in the future as more companies develop their own generative AI products. This led some public website owners to take specific steps to prevent or profit from the generative AI boom.
And companies like Reddit and Twitter have crafted new rules to limit the usage of their free resources. For example, Reddit has revamped its API pricing policies to charge hefty sums, while Twitter restricts the tweet viewing ability for all its users to avoid excessive data scraping.