Pixels vs. plagiarism: Navigating copyright infringement in the age of AI

August 15, 2023

5 Mins Read

You’ve probably heard about authors filing lawsuits against OpenAI, the company behind ChatGPT. Beyond writers, artists and comedians are also resorting to legal action against top artificial intelligence companies.

It won’t be wrong to say that the AI industry has found itself entangled in a copyright puzzle. The ongoing legal clash between human creators and machine learning is only going to heat up. If you’re wondering what these lawsuits are all about and if AI is actually stealing ideas from authors and artists across the world, you’ve come to the right place because this blog will take a deep dive into the rapidly growing world of generative AI and discuss some of the most significant copyright cases faced by the industry.

What is generative AI?

Did you know that experts estimate that nearly 90% of all content on the internet will be AI-generated by 2026?

While this figure may seem dramatic at first glance, it shouldn’t come as a surprise considering how generative AI has completely revolutionized the way we create content.

For those unfamiliar with the term generative AI, it’s a technology that creates new content, like text or images, by learning patterns from existing datasets. It employs neural networks, a subset of machine learning modeled after the human brain, to process data input and generate coherent output. By recognizing and replicating patterns, generative AI can produce diverse outputs, including but not limited to text, images, and music. This output is often quite similar to the data it was trained on.

Today, AI tools such as ChatGPT and Dall-E are easily accessible to online users in various parts of the world. Their easy accessibility has opened doors to limitless possibilities by enabling rapid and personalized text generation, image creation, and innovative storytelling, all while freeing up valuable time for creators and businesses.

However, as this technology advances, it has blurred the lines between what is truly authentic and what is a mere imitation.

Intellectual property and copyright infringement

Intellectual property usually refers to human inventions, literary or artistic works, and symbols, designs, or names used in commerce. It has three main types: copyrights, trademarks, and patents. Copyrights protect creative works like books, music, and movies. Meanwhile, trademarks safeguard brand identifiers and patents grant exclusive rights to inventions.

Copyright infringement is defined as the unauthorized use or reproduction of copyrighted material. It occurs when someone uses, reproduces, or distributes copyrighted material without permission. The unauthorized use of protected content not only violates the exclusive rights of the creator but can also result in legal consequences, including fines and other penalties since copyright infringement is illegal in most parts of the world.

Is AI stealing your intellectual property?

Here comes the big question: should you be wary of artificial intelligence stealing your intellectual property?

Well, the answer can’t be a simple yes or no.

The thing is, AI learns from massive amounts of data, like books, articles, and writings present on the internet, which allows it to mimic human-like creativity and produce diverse content. Renowned companies, like OpenAI and Stability AI, say that they’re protected by something called “fair use.” This is a rule in US law that lets them use some copyrighted material without asking for permission from the owner. However, whether they can use this defense successfully depends on whether the content generated by the AI is different from the originals or could be considered “transformative.”

Nevertheless, to answer the primary question, AI itself doesn’t steal intellectual property. However, there have been concerns about the potential misuse of AI to assist in the unauthorized copying or distribution of copyrighted materials. Additionally, AI algorithms can be used to generate content that may infringe upon intellectual property rights – and the AI copyright infringement lawsuits below are a testament to that.

Copyright lawsuits against AI companies

Here are some of the most significant AI copyright lawsuits that you should know about.

The Github Copilot case

In November 2022, a programmer filed a class action motion against tech-giant Microsoft, its subsidiary Github, and Open AI. The suit claimed that Copilot, an advanced code-generating AI system unveiled by Github in 2021, was trained on billions of lines of copyrighted code without permission.

@github copilot, with "public code" blocked, emits large chunks of my copyrighted code, with no attribution, no LGPL license. For example, the simple prompt "sparse matrix transpose, cs_" produces my cs_transpose in CSparse. My code on left, github on right. Not OK. pic.twitter.com/sqpOThi8nf
— Tim Davis (@DocSparse) October 16, 2022

The Midjourney, Stability AI, and DeviantArt case

In January 2023, three artists named Sarah Andersen, Kelly McKernan, and Karla Ortiz filed a class action against Stability AI, Midjourney, and DeviantArt. All three companies have created AI art tools that, according to the lawsuit, infringed the rights of “millions of artists” by training on approximately five billion images that were “scraped” from the internet “without the consent of the original artists.”

1/ As I learned more about how the deeply exploitative AI media models practices I realized there was no legal precedent to set this right. Let’s change that.

Read more about our class action lawsuit, including how to contact the firm here: https://t.co/yvX4YZMfrG
— Karla Ortiz (@kortizart) January 15, 2023

This is undoubtedly one of the most critical AI and intellectual property lawsuits at the moment.

Stability AI vs Getty Images

Stability AI, the company behind Stable Diffusion, has also found itself in the legal crosshairs of Getty Images, one of the biggest suppliers of stock images and videos in the world. The stock image company has sued the AI organization for reportedly using millions of images from its site without permission to train its AI art-generating tool.

Tremblay v. OpenAI Inc

ChatGPT may make the prospect of writing a book a lot less daunting than before, but aspiring writers must beware of the lawsuits filed against OpenAI by two best-selling authors: Paul Tremblay and Mona Awad.

Tremblay is a science fiction and horror author, while Mona Awad writes novels. The two have filed a lawsuit against the AI company claiming that its chatbot copied and used their materials without their consent. The class action also alleges that ChatGPT can present an alarmingly accurate summary of the authors’ respective works.

The primary problem here is that the plaintiffs believe the books were “copied by OpenAI and ingested by the underlying OpenAI Language Model” without permission. They also cited a 2020 paper from OpenAI that explained about 15% of the training dataset for ChatGPT 3.0 came from “two internet-based books corpora” – one of which, according to the authors, comes from shadow libraries that use peer-to-peer file transfers to illegally distribute and publish thousands of copyrighted works.

Sarah Silverman vs OpenAI and Meta

Popular comedian and author, Sarah Silverman, has also joined the race to potentially transform the artificial industry by suing OpenAI and Meta, the parent company of Facebook, for copyright infringement. The entertainer has filed the lawsuit along with authors Christopher Golden and Richard Kadrey.

The trio of plaintiffs alleges that OpenAI and Meta trained ChaGPT and LLaMA, respectively, on datasets containing materials from illegal websites such as Bibliotik, Library Genesis, and Z-Library. These websites are called shadow libraries as they allow individuals to download and share books in bulk via torrents.

What do AI copyright lawsuits mean for the future of the industry?

There is no denying that the ongoing AI copyright lawsuits hold massive implications for the future of the overall industry.

That being said, it is also crucial to understand that these legal conflicts revolve around one big question: Can AI technologies, including software and tools, freely use copyrighted material without obtaining permission? These cases are still in the early stages, but their results could have far-reaching consequences, potentially influencing how companies develop and train their AI systems in the coming years.

If the courts place limitations on the usage of copyrighted data, it may hinder the innovation and progress of AI systems, particularly those relying on large datasets for training. Meanwhile, if the courts establish more permissive guidelines, it could pave the way for AI developers to incorporate copyrighted material into their creations, potentially expediting the development of new applications and capabilities.

Nevertheless, one can’t ignore that these AI copyright infringement suits signal a broader shift in how artificial intelligence companies deal with existing intellectual property laws.

To sum it up

The explosive popularity of ChatGPT and similar tools has sparked a much-needed discussion about AI and intellectual property. With AI-generated art resembling famous paintings and AI-written articles mirroring established journalists, the line between inspiration and infringement has become quite blurred. As a result, creators may lose control over their work, challenging traditional notions of ownership.

However, it is also important to remember that AI aids researchers, artists, writers, and musicians to enhance creativity and generate new ideas. Therefore, experts need to figure out how to balance and encourage creativity while protecting the rights of the original creators.

On a side note, if you are a fan of ChatGPT but have concerns about its data privacy measures, please feel free to check out PureAI – your ultimate solution to an anonymous and secure AI experience. The tool is exclusively available through the Member Area on PureVPN, so make sure to sign up and log in to use it.

Furthermore, stay connected to PureVPN Blog to learn more about artificial intelligence and AI copyright infringement.

Have Your Say!!

Cookie	Duration	Description
__stripe_mid	1 year	This cookie is set by Stripe payment gateway. This cookie is used to enable payment on the website without storing any patment information on a server.
__stripe_sid	30 minutes	This cookie is set by Stripe payment gateway. This cookie is used to enable payment on the website without storing any patment information on a server.
Affiliate ID	3 months	Affiliate ID cookie
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
Data 1	3 months
Data 2	3 months	Data 2
JSESSIONID	session	Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
woocommerce_cart_hash	session	This cookie is set by WooCommerce. The cookie helps WooCommerce determine when cart contents/data changes.
XSRF-TOKEN	session	The cookie is set by Wix website building platform on Wix website. The cookie is used for security purposes.

Cookie	Duration	Description
__lc_cid	2 years	This is an essential cookie for the website live chat box to function properly.
__lc_cst	2 years	This cookie is used for the website live chat box to function properly.
__lc2_cid	2 years	This cookie is used to enable the website live chat-box function. It is used to reconnect the customer with the last agent with whom the customer had chatted.
__lc2_cst	2 years	This cookie is necessary to enable the website live chat-box function. It is used to distinguish different users using live chat at different times that is to reconnect the last agent with whom the customer had chatted.
__oauth_redirect_detector		This cookie is used to recognize the visitors using live chat at different times inorder to optimize the chat-box functionality.
Affiliate ID	3 months	Affiliate ID cookie
Data 1	3 months
Data 2	3 months	Data 2
pll_language	1 year	This cookie is set by Polylang plugin for WordPress powered websites. The cookie stores the language code of the last browsed page.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga_J2RWQBT0P2	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_12584548_1	1 minute	This cookie is set by Google and is used to distinguish users.
_gat_UA-12584548-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gcl_au	3 months	This cookie is used by Google Analytics to understand user interaction with the website.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
_hjAbsoluteSessionInProgress	30 minutes	No description available.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	2 minutes	No description available.
_hjIncludedInSessionSample	2 minutes	No description available.
_hjTLDTest	session	No description available.
PAPVisitorId	1 year	This cookie is set by the Post Affiliate Pro.This cookie is used to store the visitor ID which helps in tracking the affiliate.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
NID	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_app_session	1 month	No description available.
_dc_gtm_UA-12584548-1	1 minute	No description
_gfpc	session	No description available.
71cfb2288d832330cf35a9f9060f8d69	session	No description
cli_bypass	3 months	No description
CONSENT	16 years 6 months 13 days 18 hours	No description
gtm-session-start	2 hours	No description available.
isoCode	1 month	No description available.
L-k26wU	1 day	No description
L-KVHA4	1 day	No description
m	2 years	No description available.
newVisitorId	3 months	No description
owner_token	1 day	No description available.
PP-k26wU	1 hour	No description
PP-KVHA4	1 hour	No description
RL-k26wU	1 day	No description
RL-KVHA4	1 day	No description
wisepops	2 years	No description available.
wisepops_session	session	No description available.
wisepops_visits	2 years	No description available.
woocommerce_items_in_cart	session	No description available.
wp_woocommerce_session_1b44ba63fbc929b5c862fc58a81dbb22	2 days	No description
yt-remote-connected-devices	never	No description available.
yt-remote-device-id	never	No description available.