Today I read this, IMHO rather clickbaity blog entry by Heather Meeker called Is Copyright Eating AI?. And I was not impressed.
All of this dancing around the real question - should anyone just be free to claim “Fair Use!” and ingest whatever data, content, code it can scrape from the internet to feed Yet Another “AI” (Artfifical Intelligence) or “ML” (Machine Learning) training model?
Heather tries to build a case for that in her blog entry. Interestingly she avoids one very important term: Opt-In. She does talk about Opt-Out, but not about the, in my humble opinion, far more important Opt-In approach.
I took it as an opportunity to make a point. In the most ironic way.
I asked ChatGPT to write some legal language that prohibits the use of any of my content as input for such a training model without my explicit consent. And ChatGPT happily delivered, unaware of the irony:
You are prohibited from using any data or content from this website for training of or developing any artificial intelligence or machine learning model without obtaining explicit consent from the copyright owner. Any unauthorized use of data or content for AI/ML purposes will be considered a violation of copyright laws and may result in legal action.
So know now, dear human or machine reading my ramblings, you need my explicit consent (opt-in) to feed it into the hype machine called AI/ML :)
(This clause is obviously not really enforceable, but at least it sends a message. Feel free to copy and use wherever you want - except in a AI/ML model, obviously. I propose to use the #NoAIML Hashtag :)
They also tell us here that their crawlers use the
126.96.36.199/28 IPv4 address range, in case you want to add a block rule to your firewall.
Oh, and if you want to know which IPv4 and IPv6 ranges Google uses for their various bots and crawlers — here is the current list as JSON.