Over the weekend, Google updated its privacy policy to let users know that itโll use any of their publicly available data to train its large language models, like Bard.
The company also stated that the data be used for training Google Translate and its Cloud AI to perform better. This move triggered the community as experts warn that such scraping would eventually harm the users, as the companies often consider stolen private data too in training their products.
Training on Public Data
With Generative AI gaining craze, big tech is changing their policies to adapt accordingly. After rushing to make their large language models, the tech companies are now tuning their data-sucking rules to favour their products better, at the cost of the communityโs privacy.
The latest in this pursuit is Google, which modified its privacy policies over the weekend to use the publicly available data for training its products, viz Bard, Cloud AI and Google Translate. The company has changed the wording from โAI modelsโ to โlanguage modelsโ in its policies, as you can see here.
This is a formal way of informing the public (not seeking permission) that they can use their public data for their product training. Experts warned that such a trend could harm the publicโs privacy in future, with some already suing OpenAI for massively scraping personal data from the internet, including โstolen private information,โ to train its GPT models without prior consent.
Similarly, weโll see plenty of such lawsuits in the future as more companies develop their own generative AI products. This led some public website owners to take specific steps to prevent or profit from the generative AI boom.
And companies like Reddit and Twitter have crafted new rules to limit the usage of their free resources. For example, Reddit has revamped its API pricing policies to charge hefty sums, while Twitter restricts the tweet viewing ability for all its users to avoid excessive data scraping.