
AI and Copyright:
Safeguarding Creativity in an AI-Driven World
**Lavanaya Jain
“The monkeys will continue typing away until every work of Shakespeare is randomly created.”
– Jesse Anderson
INTRODUCTION
In the 21st century, data is the new oil. The advent of language learning models (LLMs) has made a breakthrough in the sphere of copyright, by tacitly invading data owners’copyrighted material. The constant siphoning of data by these advanced chatbots has blurred the boundaries of copyright protection. These artificial intelligence (AI) tools owe their sophisticated services to large data sets that they feed on without the consent of the authors. Despite having access to copyrighted material, data is not read, understood, or enjoyed by AI, similar to plagiarism detectors, nor is the data made available to the general public. The dilemma prompts us to question- whether the painter made the painting using his own brush. Is AI trying to outpace human imagination by bringing ‘cheap creativity’ into the race? The defence taken by the proponents of AI is the concept of ‘fair use’, which could prove dangerous if AI is given blanket immunity under this veil.
Furthermore, given that machine learning puts human authors’ livelihoods in jeopardy, it seems unfair to train AI on their data without compensating the owners for their labour. This article will explore whether the infant copyright regime in India can accommodate the novel issues of copyright with respect to AI. The article also analysesthe controversial debate around the ‘idea–expression’ dichotomy, which will largely decide the fate of AI in India.
UNDERSTANDING HOW AI LEARNS AND CREATES
LLMs operate like parasites feeding on copyrighted material created by humans through Text and Data Mining (TDM). It is the process of feeding LLMs with colossal amounts of data for training, typically, functioning as a human mind. This training process requires the LLM to create digital copies of the data, and the quality of results created is contingent on the quality of content fed through TDM. However, most AI systems merely store uncopyrightable mathematical tokens that have been abstracted from the training data and only temporarily hold the actual training data in memory. Further, they create completely original phrases, pictures, and other content by deducing the underlying patterns, relationships, and structures from the data. This practice is guised under the catch-all defenceof ‘fair use’. For the developers to capitalise on the ‘fair use’ clause, they have to battle the conundrum of the “non-expressive use” doctrine. This doctrine requires AI to ensure that the copied data is used for purely non-expressive purposes and does not communicate the expressive aspects of the work, in order to be considered as fair use. In India, this doctrine was recognised by the Delhi High Court in The Press of the University of Cambridge v. BD Bhandari.It was held that copyright protection applies to the creative expression of ideas, and not to the idea itself. As such, anyparty has a right ofreproduction, if the reproduced content is transformative in nature. Therefore, developers of AI may avail of the exception of ‘fair use’, as long as the results produced by the AI are not similar to the copied content and do not cause a substitutive effect.
Even with these copyright protections, the process of machine learning poses a constant threat to artists and people working in creative fields, in the form of memorization of data to generate art in the style of an artist. For instance, a controversy was sparked when a game designer won an art competition by submitting an image created by an AI text-to-image generator. The event led people to a question- Is digital manipulation of art the demise of artistry? Does this imply that Slyvia Plath was just a great poet only becausethere was no AI that could outwrite her by consuming her data?
The mushrooming advancement of AI is a double-edged sword since it has made the creative sector highly dependent on it. For instance, The Lifestyle of the Richard and Family is a dramatic work created by Roslyn Helper with the help of an unsung hero,text predication AI. The application operates on the algorithm similar to that of a simple autocorrect on every smart phone. As Helper describes it, she would merely provide the skeleton of the work and all the immaculate details were filled by the AI,learning from the language pattern of Helper. Numerous such instances reflect the innovativeuseof AI to generate creativity. This compels us to acknowledge the new reality that curbingthe use of sophisticated AI maybe a deterrentto the creators themselves.
MACHINE LEARNING FROM THE LENSE OF INDIAN LEGISLATION
Intellectual property law in India recognizes that copyright protection merely extends to the creative manifestation of ideas,and not the idea itself. This principal is in line with Article 19(1)(a) of the Constitution of India, and necessitates that public access to ideas should not be restricted. However, the Copyright Act, 1957 does not explicitly discuss whether TDM for training AI is fair use/transformative use of the original work.
This position of law has been heavily debated in India. In R.G.Anand v. Deluxe Films, the Supreme Court of India held that ideas are not covered by copyright; as long as the future work is entirely new, there is no infringement of copyright of the author. However, in Anil Gupta v. Kunal Dasgupta, the Delhi High Court held that ideas can be copyrighted if a substantial amount of labour has gone into formulating it. As such, the Copyright Act, 1957 is limited when addressing the challenges posed by AI, but it can pave the way for a new comprehensive legislation.
Having said that, it appears that the US legislation has the tendencyto overlook the commercial implications that fair dealing might have upon such use of a work. The conjoint application of the two appears to have the levelling implication, at least for the Indian legislation.
TACKLING COPYRIGHT CONCERNS IN AI
Although it is important to protect the intellectual property of authors/creators, it may be detrimental to insist on consent for every use of their property. This puts researchers or developers at the mercy of the right holder, who may act in a rent-seeking manner by requesting the payment of license fees. The evolution of AI has allowed authors to limit the use of their data by AI, through the use of applications such as robots.txt which regulate web crawling of AI. However, this solution results in the AI training on the data available in the public domain, deteriorating the efficacy of the results produced. On the other hand, an approach where the law gives a free hand to AI to process all data, will defeat of intent of the Copyright Act, 1957, which is to protect and encourage creativity.
Since this is a developing legal issue, it is crucial for India to take cue from international law and foreign legislations. Countries, such as Japan, have given free rein to AI to use copyrighted works for model development. On the other hand, the European Union Parliament has chosen a middle ground by proposing a legislation that will mandate disclosing all the copyrighted training material used by the AI, in order to provide deserved recognition to the data owners. It should also be noted that while the EU’s disclosure strategy may result in a more equitable outcome for authors, “the need to obtain individual authorizations” imposed by the EU adds massive transactional cost for digital industries. The US jurisprudence over the contention revolves around the infamous clause of “fair use”. The defence offers significant scope for the creation of new works, provided the generated content satisfies the four-prong test: the purpose of the use, the nature of the copyrighted work, the amount used, and the effect on the market. However, this approach remains controversial as it allows users to utilize an author’s content without prior permission, which contradicts the fundamental principles of copyright law.
The current intellectual propertylaws in India are too primitive to deal with the issues of copyright infringement by AI. In essence, these laws were not intended to address the concerns that are now arising due to the increased use of AI and LLMs. Therefore, a new legislation on AI, building on the position of law in the EU, is needed which will act as an omnibus statute dealing with current issues as well as acting as an ex-ante provision. Nonetheless, the statute cannot be dealt with as a panacea before its enforcement and authorities are required to do their background exercise by answering a few key queries. The provision for fair dealing under Section 52(1)(a) of the Indian Copyright Act, 1957, is still in its early stages and requires further refinement through judicial decisions. Such as in the case of Civic Chandran v. Ammini Amma, Kerala High Court made a significant effort to broaden the scope of fair use in India by establishing certain guiding principles. The primary step should be to bring full transparency to the process of machine learning. In addition to providing data owners their due right, such a step will usher any potential scope of royalties for these owners. Any financial incentive which is correlated to the revenue generated will be apersuasive rationale for data owners to provide access to their data. The legislators will have to tread carefully because regulations and rules must meet data owners’ concerns without impeding AI development and guaranteeing that neither side of the issue is compromised. That is to say, the best course of action at this time is to create a legal framework that fosters AI development while maintaining certain restrictions in the sphere of creativity.
CONCLUSION
For India to navigate the complexities of copyright in AI, it is of prime importance to settle the position of law on whether training AI by TDM comes under the purview of ‘fair use’ and is protected by exemption. Notwithstanding its significance, AI conflicts involving copyright infringement may not always be fully resolved by the non-expressive use conundrum. A dogmatic conviction that literary property encompasses all possible uses of an author’s work is unhelpful when it comes to innovative concerns at the edge of copyright law.. The development of the erratic fair use theory is the main means by which copyright law adapts to possible market imperfections.
These concerns require efforts by all stakeholders so that digitalization in India unfolds in an equitable and inclusive manner. The policy wherein the maintenance of open-source is espoused to ensure that AI models could utilize it and improve, should be enforced. Furthermore, facilitating widespread data access may advance AI technology, without infringing the rights copyright owners. A legal framework should be envisioned that preserves original work and improves transparency without being a potential threat to the advancement of AI.
**Lavanaya Jain is a fourth-year law student at the Army Institute of Law, Mohali, with a keen interest in corporate law, competition law, and intellectual property rights. Passionate about understanding the legal frameworks governing corporate operations, market dynamics, and intellectual property protection, she has actively engaged in academic projects and internships to develop practical expertise. In addition to her academic pursuits, she enjoys writing about contemporary developments in law, exploring emerging legal trends, and their societal and practical impacts. She is an avid participant in moot court competitions, where she has honed her research, advocacy, and analytical skills.
Disclaimer: The views expressed in this blog do not necessarily align with the views of the Vidhi Centre for Legal Policy.