Recently a report published in Business Insider described how big tech companies are refraining others from using their Artificial Intelligence (hereinafter AI) generated output from training any other AI models. The hypocrisy in this restriction is those models were trained from data available on the internet on various websites. In such a scenario for generative AI two questions arise, 1) To what extent the copyright protects content present on websites? 2) Will copyright be applicable to content generated using large language models or generative AI? For this discussion, we will examine the above two questions in the context of Indian laws assuming both generative AI entity and website owner come under Indian jurisdiction.
Applicability of copyright protection during training AI model
Various Indian courts have categorically held that to protect an artistic or literary work or create an intellectual property right like copyright it is not necessary to register the work. Design, content, and images on websites are copyrighted the moment such content goes live, subject to scheme and limitation of law operating at that time, and copyrights of such work are held by website owners unless a contrary position exists. The Copyright Act, of 1957 (hereinafter Act) governs the protection of artistic and literary work. As per the Act, copyright protects the right of copyright holders to reproduce the work, to translate the work, and to make adaptions to the work. Any person or organization undertaking or acting in such manner that it violates any of the above-listed rights will be said to be infringing copyright. Act has recognized some exceptions which would not amount to copyright infringement. Section 52 outlines such exceptions as fair dealing, private use, and current affairs reporting, for research purposes and there are many other scenarios, but for current purposes above listed exceptions will be enough.
‘Fair dealing’ has not been explicitly defined in the Act, but in Civic Chandran v Ammini Amma court dealt with fair dealing and held, when determining what is fair dealing it must be seen in light of 1) Quantum of value taken, 2) purpose of taking content and 3) likelihood of competition between two works. Super Cassettes v Chintamani Rao on the question of fair dealing has listed parameters of fair dealing. Paraphrasing from judgment, the defense of fair dealing can be used based upon the length of content used and the length of comment made upon such content, freedom of expression as well as public interest shall be considered, only because work is used for commercial purposes does not make it unfair, transformative use can be deemed as fair use. When dealing with the above parameter standard of a fair person or honest person shall be applied and public need shall not be treated the same as public interest.
For the purpose of data used in AI training in generative AI, it largely falls within the category of literary work and artistic work such as various written contents, computer programs, photographs, digital images, etc. When we consider the backbone of current generative AI, data based on which AI is trained is not directly reproduced wholly or partially instead it is paraphrased, even in the case of translation, translation is created on such paraphrased content, or translation is made of content which the user has provided. But generative AI can be brought under the definition of adaption. Adaption is defined as, any abridgment or version in which a story or action is wholly conveyed; or rearrangement or alteration of such work. But finding the exact source of data from which such output is generated is difficult to pinpoint due to the “Blackbox” nature of AI systems and difficulty is further increased due to multiple inputs that are provided to this “Blackbox”. Generally, such inputs are of two nature first user-provided input, and second input obtained through the internet. This second type of input can theoretically encompass the complete internet, but in practice, such input is drawn from the ocean of websites where every day numerous new pages are added and removed. A large language model which is the heart of generative AI is trained on these human-generated data to answer any query it has been posed in a similar fashion as a human being is answering the question. With continuous training and upgrading of such systems generated content will have very less plagiarised content in the case of literary work or it will be significantly distinct from artistic work from where such inspiration has been taken.
By the scheme and objective of the Act, it is clear that copyright infringement won’t happen by merely collecting data. Communication, and publishing such data in the form specified under Act would constitute actual copyright infringement. In the case of generative AI, it becomes very difficult to distinguish adaptation which could lead to infringement, and transformation which can be defense based on fair dealing. Information Technology Act, 2000 (hereinafter IT Act) has created web scrapping or extracting data without permission as an offense and provided compensation to the website owner. Further IT Act has provided for punishment by a maximum of 3 years imprisonment and a fine of 5 lakhs for offenses under section 43. If website owners feel that, their website was used for training generative AI, IT Act would provide them a definite remedy than Copyright Act.
The extent of copyright protection to AI-generated content
Next question to ponder is, what protection does content generated from the output of the AI system have? This leads to a tricky subject, most important objective of copyright is to protect human efforts and intellectual property generated from such efforts so they get the benefit from their hard work, intellect, and sacrifices, and no person robs their right to be benefited from their hard work. Before diving into legal provisions, the first question comes to mind is who is the author of AI-generated content? Can the AI system itself be called an author in general terms? Does a group of people who have created this AI system deemed as author? As a computer system cannot be said to be juristic person’s answer to 2nd question is negative. To answer the remaining questions, referring to the definition of “author” in Act which reads as “Author means, about any literary, dramatic, musical or artistic work which is computer-generated, the person who causes the work to be created.” When the definition is stretched to generative AI, two types of person can be said to have caused work to be created, first is a person or group of persons who have created the AI system and the second is a person who is querying the AI system. The first type of person is reasonable to be deemed as an author, but the second type of person is included as a possibility only because there is no definite authority on this question and when a person queries any generative AI system, they cause to generate content and to some extent, how content is generated is determined by user’s input history. As per the Act, the first owner of the copyright will be an author. In the case of the first type of person, there are multiple people, and generally, such copyright will lie with the entity that has developed the AI system.
Another question that arises is does AI-generated content satisfy all parameters of the Act. As per Section 13 (1), copyright is available to original artistic or literary work. In EBC v Modak “minimum amount of creativity” test was preferred above the “sweat of brow” test to determine the originality of any work. But both tests are based on the creativity and hard work of intelligent human beings. As per the Act, the only requirement for literary and artistic work is “originality” As per multiple court decisions originality does not necessarily means a novelty.
As per the present Act, generative AI content can claim protection under Copyright. But the whole scheme and objective of the Act were to protect the hard work of human beings. Tests widely acceptable in the Indian legal system to determine originality are also built around the creativity of intelligent human beings. Including AI-generated content under the Act will exceed the limits envisioned when framing the Act. At the same time, refusing any protection of AI-generated content can lead to trampling the AI system creator’s rights.
When considering generative AI there are two sides where copyright can be applied, on the input side against the AI system creators and output side in favor of the AI system creators. Application of the Act on the input side is complex and difficult to establish actual infringement. On the other hand IT Act can provide concrete solutions against AI creators. On the output side, AI system creators under the current scheme interpreted literally can claim protection under the Act. But when we consider the objective of the Act content generated by AI shall not be eligible to have the same level of protection as content generated by a human beings. The act needs to be interpreted to protect AI system creators as well as ensure a computer system is not placed at the same level as a human being.
 Alistair Barr, “AI hypocrisy: OpenAI, Google and Anthropic won’t let their data be used to train other AI models, but they use everyone else’s content”, Business Insider, June 3, 2023
 The Copyright Act, 1957 (Act 14 of 1957), Section 52
 Civic Chandran and Ors. v. C. Ammini Amma and Ors., 1996 SCCOnLine Ker 63
 Super Cassettes Industries Limited v Mr. Chintamani Rao & Others, (2012) 49 PTC 1
 The Information Technology Act, 2000 (Act 21 of 2000), Section 43
 The Information Technology Act, 2000 (Act 21 of 2000), Section 66
 The Copyright Act, 1957 (Act 14 of 1957), Section 2 Clause (d)
 The Copyright Act, 1957 (Act 14 of 1957), Section 17
 Eastern Book Company and Others v D. B. Modak and Another, (2008) 1 MLJ 361 (S.C.)
Viraj Thakur is currently pursuing LLB from Law Center-1, Faculty of Law, University of Delhi. Viraj has graduated in B.E. (Electronics and Telecommunication) from RAIT, Nerul. Prior to joining law school, he has 3.5 years of experience in intellectual property as a Patent Analyst in one of the leading patent service provider companies. He is passionate about Intellectual Property law, Cyber law, Technology law, and Law and Policy in Emerging Technology. His research work demonstrates attention to detail, analytical rigor, and innovative thinking. He is excited to start his legal career and make a positive impact.