It was a setting I found myself in. You see, there is nothing wrong with bashing Microsoft. The question at times is how long until the bashing is no longer a civic duty, but personal pleasure. As such I started reading the article (at https://www.cbc.ca/news/business/new-york-times-openai-lawsuit-copyright-1.70697010) where we see ‘New York Times sues OpenAI, Microsoft for copyright infringement’ it is there where we are given a few part. The first that caught my eye was ““Defendants seek to free-ride on the Times’s massive investment in its journalism by using it to build substitutive products without permission or payment,” according to the complaint filed Wednesday in Manhattan Federal Court.” To see why I am (to some extent) siding with Microsoft on this is that a newspaper is only in value until it is printed. At that point it becomes public domain. Now the paper has a case when you consider the situation that someone is copying THEIR result for personal gain. Yet, this is not the case here. They are teaching a machine learning model to create new work. Consider that this is not an easy part. First the machine needs to learn ALL the articles that a certain writer has written. So not all the articles of the New York Times. But separately the articles from every writer. Now we could (operative word) to a setting where something alike is created on new properties, events that are the now. So that is no longer a copy, that is an original created article in the style of a certain writer.
As such when we see the delusional statement from the New York Times giving us “The Times is not seeking a specific amount of damages, but said it believes OpenAI and Microsoft have caused “billions of dollars” in damages by illegally copying and using its works.” Delusional for valuing itself billions of dollars whilst their revenue was a lot less than a billion dollars. Then there is the other setting. Is learning from public domain a crime? Even if it includes the articles of tomorrow, is it a crime then? You see, the law is not ready for machine learning algorithm. It isn’t even ready for the concept of machine learning at present.
Now, this doesn’t apply to everything. Newspapers are the vocalisations of fact (or at least used to be). The issues on skating towards design patents is a whole other mess.
As such OpenAi and Microsoft are facing an uphill battle, yet in the case of the New York Times and perhaps the Washington Post and the Guardian I am not so sure. You see, as I see it, it hangs on one simple setting. Is a published newspaper to be regarded as Public Domain? The paper is owned, as such these articles cannot be resold, but there is the grinding cog. It was never used as such. It was a learning model to create new original work and that is a setting newspapers were never ready for. None of these media laws will give coverage on that setting. This is probably why the NY Times is crying foul by the billions.
The law in these settings is complex, but overall as a learning model I do not believe the NY Times has a case. and I could be wrong. My setting is that articles published become public domain to some degree. At worst OpenAI (Microsoft too) would need to own one copy of every newspaper used, but that is as far as I can go.
The dangers here is not merely that this is done, it is “often taken from the internet” this becomes an exercise on ‘trust but verify’. There is so much fake and edited materials on the internet. One slip up and the machine learning routines fail. So we see not merely the writer. We see writer, publication, time of release, path of release, connected issues, connected articles all these elements hurt the machine learning algorithm. One slip up and it is back to the drawing board teaching the system often from scratch.
And all that is before we consider that editors also change stories and adjust for length, as such it is a slightly bigger mess than you consider from the start. To see that we need to return to June this year when we were given “The FTC is demanding documents from Open AI, ChatGPT’s creator, about data security and whether its chatbot generates false information.” If we consider the impact we need to realise that the chatbot does not generate false information, it was handed wrong and false information from the start the model merely did what the model was given. That is the danger. The operators and programmers not properly vetting information.
Almost the end of the year, enjoy.