AI and IP Law: Copyright, Patents & Trade Secrets for AI

Executive Summary: AI systems challenge foundational intellectual property doctrines designed for human creators. Can AI-generated works be copyrighted when no human author exists? Is scraping copyrighted content for AI training fair use? Can AI be named as inventor on patents? How do you protect AI models as trade secrets while using them in cloud services? The U.S. Copyright Office maintains that copyright requires human authorship, rejecting AI-generated work registrations. Courts are split on whether training AI on copyrighted works constitutes fair use—major lawsuits against OpenAI, Stability AI, and others remain unresolved. The USPTO refuses to recognize AI as inventor but grants patents for AI-implemented inventions with human inventors. This guide maps the evolving IP landscape for AI, providing practical strategies for protecting AI innovations while navigating uncertain legal terrain.

AI and Copyright Law

Can AI-Generated Works Be Copyrighted?

U.S. Copyright Office Position (2023 Guidance):
No copyright for purely AI-generated works. Copyright requires human authorship.

Human Authorship Requirement:

Copyright Act protects "original works of authorship"
"Authorship" implies human creator (Supreme Court: Burrow-Giles, 1884)
Works by nature (e.g., photographs of monkey selfie) lack copyright
AI = non-human creator = no copyright

Rejection Cases:

Zarya of the Dawn (2023): Copyright Office granted limited copyright for graphic novel with Midjourney images, then rescinded copyright for AI-generated images themselves (only human-authored text/arrangement copyrighted)
Theatre D'opera Spatial (2023): AI-generated art created with Midjourney denied copyright registration
A Recent Entrance to Paradise (2022): AI-generated text created with GPT-3 denied copyright

Human-AI Collaboration: The Copyright Office recognizes copyright when:

A human exercises creative control over AI output
The human contribution is more than de minimis (trivial)

Sufficient Human Contribution:

Writing detailed prompts, selecting/arranging AI outputs, and editing results can be copyrightable
Example: An author writes prompts, curates 1,000 AI images, and arranges them into a graphic novel—the selection, arrangement, and human-authored text are protected

Insufficient Human Contribution:

Typing a simple prompt and accepting the first AI output is not enough
Example: "Create image of sunset over ocean" → Midjourney output = no copyright in the image

AI Training on Copyrighted Works: Fair Use?

The Core Issue: AI models are trained on billions of copyrighted works (books, articles, images, code) scraped from the internet without permission. Is this copyright infringement or fair use?

Pending Lawsuits (as of 2024):

Authors Guild v. OpenAI: Authors claim GPT was trained on books without a license
Getty Images v. Stability AI: Getty alleges Stable Diffusion was trained on its stock photos without permission
New York Times v. OpenAI & Microsoft: Claims that training on news articles infringes copyright and harms licensing markets
GitHub Copilot class action: Alleges training on open-source code violates licenses and infringes copyright

Plaintiffs' Arguments (Training = Infringement):

Reproduction: Training involves copying entire works into the training corpus, violating the reproduction right
Derivative works: The trained model is allegedly a derivative work of the training corpus
Market harm: AI substitutes for human creators, depressing demand for original works and licensing markets
Not transformative: AI merely remixes training data without adding new meaning or purpose

Defendants' Arguments (Training = Fair Use):

Transformative use: The model learns patterns and statistics rather than storing or reproducing specific works
Intermediate copying: Training involves temporary copies for analysis, analogous to Google Books and search indexing
No market substitution: Outputs do not replace specific works but serve a different market and function
Public benefit: AI advances science, innovation, and access to information, serving the public interest

Fair Use Factors (17 USC § 107):

Factor 1: Purpose and character of use

Commercial use weighs against fair use (AI companies profit from models)
Transformative use weighs in favor (pattern learning vs. expressive substitution)
Key question: Is training "transformative"? Courts are not yet aligned.

Factor 2: Nature of copyrighted work

Use of factual works (e.g., news, research) leans more toward fair use than highly creative works (e.g., novels, art)
Training corpora typically mix both factual and creative content

Factor 3: Amount and substantiality used

Training uses entire works, which weighs against fair use
Defendants argue this is necessary for the transformative purpose of pattern learning

Factor 4: Effect on the market

Substitution harm: If outputs compete with or closely mimic specific works or authors' styles, this weighs against fair use
Non-substitution: If outputs do not replace specific works, this supports fair use
Licensing markets: As rightsholders develop AI training licenses, ignoring those markets may weigh against fair use

Likely Outcome (Speculative):

Courts may distinguish between factual and creative works, and between research and commercial uses
Significant settlement pressure may push AI companies toward broad licensing deals regardless of ultimate doctrine

CALLOUT: INFO
Text and Data Mining Exception (EU): The EU Copyright Directive creates a specific text and data mining (TDM) exception. Article 3 allows TDM for scientific research. Article 4 allows commercial TDM unless rights holders opt out. AI companies must respect opt-outs, creating a different landscape than in the U.S.

AI Output and Copyright Infringement

When AI Output Infringes

Substantial Similarity Test:

AI output infringes if it is substantially similar to a copyrighted work
Intent is not required; unintentional copying can still infringe
Example: Stable Diffusion generating images with visible Getty watermarks suggests close copying

Memorization Problem:

Large models sometimes memorize and reproduce training data verbatim or near-verbatim
Studies have shown language models reproducing passages from books in their training sets
Image models can output near-identical copies of training images when prompted in certain ways

Liability:

AI user: Directly liable if they create, publish, or commercialize infringing outputs
AI provider: Potentially liable under contributory or vicarious infringement theories

Defenses:

De minimis: Copying is so trivial that it does not rise to infringement
Fair use: Output is transformative and does not substitute for the original
DMCA safe harbor: Limited protection for providers hosting user content, contingent on notice-and-takedown and other requirements

Licensing AI Training Data

Emerging Business Models

Content Licensing Deals
- AI companies license archives from publishers, news organizations, and stock photo libraries
- Examples include deals between OpenAI and major publishers, and between Getty Images and technology companies
- Terms are typically confidential but often involve multimillion-dollar payments and attribution or branding requirements
Opt-Out Mechanisms
- robots.txt: A file on websites instructing crawlers not to scrape; some AI companies voluntarily honor it
- Do Not Train registries: Services like Spawning.ai allow creators to signal that their works should not be used for training
- Legal status is uncertain; violations may implicate contract or computer access laws more than copyright
AI-Friendly Licenses
- Creative Commons: Some CC licenses allow commercial AI training (e.g., CC BY), while others restrict it (e.g., CC BY-NC-ND)
- Open-source code licenses: Permissive licenses (MIT, Apache) are generally viewed as compatible with AI training; copyleft licenses (GPL) raise unresolved derivative-work questions
Micro-Licensing
- Platforms experiment with paying creators small amounts when their works are used for training
- Examples include artist-focused platforms and stock providers integrating AI tools
- Challenges include attribution, tracking, and scalable payment infrastructure

STATISTIC
AI Licensing Market: The market for licensed AI training data is projected to reach billions of dollars by the mid-2020s, driven by AI companies seeking to mitigate copyright risk through licensing agreements with publishers, stock photo sites, and content creators.

AI and Patent Law

Can AI Be Named as Inventor?

Current Answer in Major Jurisdictions: No

DABUS Cases (2018–2023):

Dr. Stephen Thaler filed patent applications naming his AI system "DABUS" as the sole inventor
USPTO (U.S.): Rejected—"inventor" must be a natural person
UK IPO and UK Supreme Court: Rejected—patent law contemplates human inventors only
EPO (Europe): Rejected—an inventor must have legal personality
Australia: Initial acceptance overturned on appeal
South Africa: Granted a DABUS patent with minimal examination; not influential elsewhere

Rationale:

Statutory language ("inventor," "individual," "person") is interpreted as human
Inventors must sign oaths and assign rights; AI systems cannot hold or transfer legal rights
Policy: Patent systems are designed to incentivize human innovation

Consequence:

AI-assisted inventions must list human inventors who contributed to conception
Purely AI-generated inventions with no human conception are currently unpatentable in most jurisdictions

Patenting AI-Implemented Inventions

AI as a Tool (Human Inventor)

If a human conceives the inventive concept and uses AI as a tool to implement or validate it, the invention can be patentable (subject to standard requirements)

Patentability Requirements:

Subject matter eligibility: Must not be a disembodied abstract idea, law of nature, or natural phenomenon
Novelty: Not previously disclosed in prior art
Non-obviousness: Not obvious to a person of ordinary skill in the art (POSITA)
Utility: Specific, substantial, and credible utility

AI Patentability Challenges:

Abstract Idea Rejections (Alice/Mayo):
- Many AI claims are characterized as abstract mathematical algorithms or data processing
- Applicants must show a concrete technical improvement or application (e.g., improved hardware, reduced latency, better compression)
Enablement (35 USC § 112):
- The specification must teach a POSITA how to make and use the invention without undue experimentation
- For AI, this often requires disclosing model architecture, training methodology, data characteristics, and performance metrics
Written Description:
- Applicants must show they possessed the claimed invention at filing
- Overly broad claims to "an AI system" without specific technical detail risk rejection
Obviousness in an AI-Enabled World:
- As AI tools become standard, what is "obvious" to a POSITA may evolve
- Current doctrine still evaluates obviousness from a human perspective, but AI-assisted design may raise the bar over time

Inventorship for AI-Assisted Inventions

Who Is the Inventor?

Conception Test:

The inventor is the person who forms in their mind a definite and permanent idea of the complete and operative invention

AI as Tool:

If a human formulates the problem, defines the solution space, and interprets AI outputs to reach a specific inventive concept, that human is the inventor

AI as Co-Inventor?

Current law does not recognize AI as a co-inventor
If AI contributes substantially to conception and human contribution is minimal, there is a risk that no valid human inventor exists, potentially rendering the invention unpatentable

Practical Strategies:

Document human contributions to conception, including problem framing, model design choices, and interpretation of results
Avoid characterizing the AI as the "inventor" in internal or external communications

Trade Secrets for AI Models

Why Trade Secret Instead of Patent?

Advantages:

No public disclosure of model architecture, weights, or training data
Protection can last indefinitely as long as secrecy is maintained
No examination or registration process required
Can cover a broad range of information (data, processes, parameters)

Disadvantages:

No protection against independent development or reverse engineering
Protection is lost once the secret becomes public
Enforcement requires proving misappropriation and reasonable secrecy measures

What Qualifies as a Trade Secret?

Under the UTSA and DTSA, trade secrets must:

Consist of information (e.g., formula, pattern, compilation, program, device, method, technique, or process)
Derive independent economic value from not being generally known
Be subject to reasonable efforts to maintain secrecy

AI Trade Secret Assets:

Model architectures and custom layers
Proprietary training datasets and curated corpora
Hyperparameters and optimization strategies
Training pipelines, preprocessing, and augmentation techniques
Pre-trained weights and fine-tuned checkpoints

Reasonable Secrecy Measures:

Role-based access controls and least-privilege permissions
NDAs and confidentiality clauses with employees, contractors, and partners
Encryption of data and model artifacts at rest and in transit
Logging and monitoring of access to sensitive systems
Watermarking or fingerprinting of models to trace leaks

Challenges for Cloud-Hosted AI:

API access exposes model behavior, enabling potential model extraction or membership inference attacks
Customers may infer aspects of training data from outputs
Mitigations include rate limiting, output filtering, and privacy-preserving training techniques

KEY INSIGHT
Trade Secret vs. Open Source: Once you release model weights or training data publicly, trade secret protection is lost. Many organizations open-source older models or subsets of their stack while keeping frontier models and data pipelines as trade secrets.

Licensing AI-Generated Content

Who Owns AI-Generated Content?

Pure AI Output (Minimal Human Input):

Under current U.S. guidance, purely AI-generated content without meaningful human authorship is not copyrightable
Such content effectively falls into the public domain; anyone can copy or reuse it

Human-AI Collaboration (Substantial Human Input):

Where a human contributes original expression—through detailed prompting, selection, arrangement, and editing—the human may own copyright in those contributions
Example: A designer uses AI to generate many images, then heavily edits and composes them into a unique layout; the resulting work can be protected

Contractual Terms (Provider ToS):

Many AI providers assign or grant broad rights in outputs to users, even where copyright status is uncertain
Some providers reserve rights to use inputs and outputs to improve their models unless users opt out

Commercial Use of AI-Generated Content

No Copyright = No Exclusivity:

If content is not protected by copyright, you cannot prevent others from copying it
This weakens IP-based competitive advantage for purely AI-generated assets (e.g., stock marketing images)

Infringement Risk:

Outputs may still infringe third-party rights if they are substantially similar to protected works
Users who publish or commercialize such outputs can face infringement claims

Mitigation Strategies:

Add substantial human creativity (editing, composition, narrative) to strengthen copyright claims
Review outputs for similarity to known works, especially in high-risk domains (logos, characters, code)
Seek contractual indemnities or warranties where available (e.g., from enterprise AI vendors)
Consider IP insurance (e.g., E&O policies covering copyright and trademark claims)

Licensing Models

AI Model Licenses:

Open Source: Models released under MIT, Apache 2.0, or similar licenses allow broad commercial use; copyleft licenses (e.g., GPL) impose share-alike obligations
AI-Specific Licenses: Some models use Responsible AI Licenses (RAIL) that restrict harmful uses while allowing commercial deployment
Proprietary APIs: Access via paid APIs with contractual restrictions on use, redistribution, and benchmarking

Content Licenses:

Stock AI Outputs: Platforms offer AI-generated images or videos with commercial licenses and IP warranties
Custom Services: Enterprise contracts may treat outputs as work-for-hire or grant exclusive licenses, subject to provider policies

Regulatory Developments

U.S. Copyright Office AI Initiatives

The U.S. Copyright Office has launched inquiries into:

Copyrightability of AI-generated works and human-AI collaborations
Legal treatment of training on copyrighted works
Liability for infringing AI outputs

Potential outcomes include new guidance, legislative proposals, and influential case law from ongoing litigation.

EU AI Act and IP Intersections

Transparency Obligations (Article 53):

Generative AI providers must:
- Disclose that content is AI-generated
- Publish summaries of copyrighted training data used, at least at a high level
These obligations aim to help rightsholders identify unauthorized use and enforce their rights

Copyright Directive (DSM Directive, Article 17):

Platforms hosting user content can be directly liable for copyright infringement
They must implement licensing, filtering, or takedown mechanisms
AI content platforms may face similar obligations as AI-generated content proliferates

International Developments

UK: Considered a broad TDM exception for any purpose but paused reforms after industry pushback
China: Generative AI regulations require respect for IP rights and discourage training on unlicensed content, though enforcement is evolving
Japan: Adopts a permissive stance; incidental copying for data analysis and AI training is generally allowed under Article 30-4, making it attractive for AI R&D

Practical IP Strategies

For AI Developers

Training Data Compliance
- Prefer licensed, public domain, or clearly permissioned datasets
- If relying on fair use or local TDM exceptions, document legal analysis and risk assessments
- Honor opt-out signals where feasible to reduce legal and reputational risk
Output Controls
- Implement filters to detect and block outputs that closely match known copyrighted works or contain watermarks
- Periodically test for memorization and adjust training or safety layers accordingly
Terms of Service and Policies
- Clarify ownership of inputs and outputs
- Allocate risk and responsibility for infringing use (e.g., user indemnities, limitations of liability)
- Provide clear acceptable-use policies and enforcement mechanisms
Patent and Trade Secret Strategy
- Patent core technical innovations that are hard to keep secret (e.g., hardware, protocols)
- Protect training data, weights, and proprietary pipelines as trade secrets
- Use defensive publications to prevent competitors from patenting widely used techniques

For AI Users and Deployers

Understand Ownership and Rights
- Review provider ToS for ownership, license scope, and reuse rights
- For custom solutions, negotiate work-for-hire or assignment where strategic
Manage Infringement Risk
- Establish review processes for high-stakes outputs (e.g., branding, product designs, production code)
- Maintain records of prompts, edits, and human contributions
Commercialization Practices
- Add human creative input to outputs used in branding, content, and product features
- Consider vendor selection based on IP warranties, indemnities, and compliance posture

For Content Creators

Control Use of Your Works
- Use robots.txt and Do Not Train registries to signal preferences
- Choose licenses (e.g., specific CC variants) that reflect your stance on AI training
Monitor and Enforce
- Use reverse image search and dataset search tools to identify potential misuse
- Leverage DMCA takedowns and platform policies when AI-generated content infringes your works
Monetize Participation
- Explore micro-licensing platforms and collective bargaining initiatives
- Consider direct licensing deals if you control valuable archives or catalogs

Frequently Asked Questions

If I use AI to create content, do I own the copyright?

It depends on the level of human creativity involved. If you provide only a simple prompt and accept the first output, U.S. guidance suggests there is no copyright in the output. If you contribute substantial original expression—through detailed prompting, selection, editing, and arrangement—you may own copyright in those human contributions.

Is scraping the internet to train AI legal?

The legality is unsettled. Plaintiffs argue that scraping and training infringe reproduction and derivative-work rights; defendants argue that training is a transformative fair use or covered by TDM exceptions in some jurisdictions. Outcomes will likely depend on jurisdiction, type of content, and specific use case.

Can I patent an AI-generated invention?

You cannot list an AI system as an inventor under current U.S., EU, and UK law. However, if a human conceived the inventive concept and used AI as a tool, that human can be named as inventor and seek patent protection, assuming other patentability requirements are met.

What's the difference between open-sourcing an AI model and open-sourcing code?

Open-sourcing code typically involves releasing source files under an open-source license. Open-sourcing an AI model may involve releasing code, model weights, and sometimes data. Once weights or data are publicly released, trade secret protection is lost, and license terms (including any copyleft obligations) govern downstream use.

If AI generates content similar to a copyrighted work, who's liable?

The user who prompts the AI and publishes or commercializes the output is typically the primary infringer. The AI provider may face secondary liability if it knew of and materially contributed to infringement or profited while having the ability to control it. Both may be sued, but plaintiffs often target providers with deeper pockets.

How do I protect my AI model as a trade secret?

Limit access to model artifacts and training data, use technical controls (encryption, access logs, watermarking), implement robust NDAs and confidentiality policies, and clearly label confidential materials. Regularly review security practices and have an incident response plan for suspected leaks or misappropriation.

What happens if I train AI on GPL-licensed code?

It is unclear whether a model trained on GPL code is a derivative work subject to GPL obligations. Some argue that training is transformative analysis; others argue that the model is derived from GPL-covered material. Because there is no definitive case law, training on GPL code carries legal risk and should be evaluated with specialized counsel.

Key Takeaways

Purely AI-generated works generally lack copyright protection in the U.S.; meaningful human authorship is required.
The legality of training AI on copyrighted works is unresolved and will be shaped by ongoing litigation and regional TDM rules.
AI systems cannot be named as inventors; human inventors must contribute to conception for patent protection.
Trade secret law is a primary mechanism for protecting proprietary models, weights, and training data.
EU and other jurisdictions are imposing transparency and IP-respecting obligations on generative AI providers.
Users of AI outputs bear significant infringement risk and should implement review and governance processes.
Content creators can use technical, contractual, and legal tools to control or monetize AI training on their works.

Frequently Asked Questions

You may own copyright only in the parts of the work that reflect your own original expression. Simple prompting with no meaningful creative input typically does not create copyrightable authorship, but substantial prompting, editing, and arrangement can support a human authorship claim.

The legality is unsettled and depends on jurisdiction and context. In the U.S., courts are still weighing whether training on scraped copyrighted content is fair use. In the EU and some other regions, specific text and data mining exceptions apply, often with opt-out rights for rightsholders.

No. Major patent offices, including the USPTO, EPO, and UKIPO, require inventors to be natural persons. AI-assisted inventions must list human inventors who contributed to conception.

Protect models and data through access controls, encryption, NDAs, monitoring, and clear confidentiality policies. Avoid releasing model weights or training data publicly, and document your reasonable efforts to maintain secrecy.

The user who generates and distributes the infringing output is typically directly liable. The AI provider may face secondary liability if it knowingly facilitates infringement or profits from it while having the ability to control it.

EU Text and Data Mining Rules

The EU Copyright Directive allows text and data mining for scientific research and, with an opt-out mechanism, for commercial purposes. AI developers targeting EU data should implement processes to detect and honor rightsholder opt-outs.

Multi-billion dollar

Projected size of the AI training data licensing market by the mid-2020s

Source: Industry analyst projections cited in IP and AI market reports

"For frontier AI systems, the most valuable IP is often not the code but the combination of proprietary data, training pipelines, and model weights—assets that are usually best protected as trade secrets rather than patents."
— AI and IP regulatory practitioners

References

Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence. U.S. Copyright Office (2023). View source
Inventorship Guidance for AI-Assisted Inventions. U.S. Patent and Trademark Office (2020). View source
EU Artificial Intelligence Act – Transparency Obligations for Generative AI. European Commission (2024). View source
Fair Learning. Texas Law Review (2021). View source
Artificial Intelligence's Fair Use Crisis. Columbia Journal of Law & the Arts (2023). View source

AI and Intellectual Property Regulations: Copyright, Patents, and Trade Secrets

Key Takeaways

AI and Copyright Law

Can AI-Generated Works Be Copyrighted?

AI Training on Copyrighted Works: Fair Use?

AI Output and Copyright Infringement

Licensing AI Training Data

AI and Patent Law

Can AI Be Named as Inventor?

Patenting AI-Implemented Inventions

Inventorship for AI-Assisted Inventions

Trade Secrets for AI Models

Why Trade Secret Instead of Patent?

What Qualifies as a Trade Secret?

Licensing AI-Generated Content

Who Owns AI-Generated Content?

Commercial Use of AI-Generated Content

Licensing Models

Regulatory Developments

U.S. Copyright Office AI Initiatives

EU AI Act and IP Intersections

International Developments

Practical IP Strategies

For AI Developers

For AI Users and Deployers

For Content Creators

Frequently Asked Questions

If I use AI to create content, do I own the copyright?

Is scraping the internet to train AI legal?

Can I patent an AI-generated invention?

What's the difference between open-sourcing an AI model and open-sourcing code?

If AI generates content similar to a copyrighted work, who's liable?

How do I protect my AI model as a trade secret?

What happens if I train AI on GPL-licensed code?

Key Takeaways

Frequently Asked Questions

If I use AI to create content, do I own the copyright?

Is scraping the internet to train AI legal?

Can AI be listed as an inventor on a patent?

How can I protect my AI model as a trade secret?

Who is liable if AI output infringes someone else’s copyright?

EU Text and Data Mining Rules

References

How Pertama Partners Can Help

AI Governance & Security

AI Fraud Detection & Risk Management for Financial Services

AI Family Business Operations & Governance

Ready to Apply These Insights to Your Organization?

Related Articles