The Great Debate: Exploring Ownership of Generated Data in the Digital Landscape

Leon Emmanuel ISHIMWE
5 min readJul 15, 2023

--

There is a pressing and crucial question that demands our immediate attention: Who holds ownership rights over the data generated by image or text generators? Unfortunately, this question often gets overlooked, particularly due to the fact that these generators represent innovative and novel ideas.

What is the problem?

The problem at hand becomes more apparent when we consider recent events, such as the strike in Hollywood where actors and movie dubbers protested against the use of artificial intelligence (AI) in their profession. While their concerns are justified, it raises a series of thought-provoking questions. It’s important to acknowledge that, in most cases, company executives prioritize financial gain as they are responsible for increasing profits within their organizations. In summary, the protests revolved around the following issues:

  1. If a dubbing company uses tools like elevenlabs to generate a voice that mimics my own based on a mere three-second sample, and subsequently uses this generated voice to dub an entire movie, who does the ownership of this generated voice belong to? Does it belong to the dubbing company that generated the voice, the elevenlabs that hosts the model responsible for the voice generation, or does it belong to me, the individual whose voice was used as the basis for the generated voice? Furthermore, am I entitled to compensation? Should they pay me for the three seconds they used to generate the voice, or should I be compensated for the entire duration of the movie that was dubbed using my voice? And what about future uses? Will I be paid once or every time they utilize my voice? These questions pose a complex dilemma with no clear answers.
  2. Now, let’s shift our focus to actors, considering two distinct scenarios. Firstly, we consider deceased actors. Rest in peace, Black Panther, but if I generate a video or movie with the main character being the late Chadwick Boseman, is this even legal? Do I have to obtain permission or pay anyone to use his likeness? And if so, whom should I pay and how much? It’s possible that I only used a single picture of him and generated an entire movie. Should I pay for the usage of that single picture or for the entire movie that I created? The second scenario pertains to living actors. If I virtually include them in my movies, am I obligated to pay them? If so, how does the payment structure work? Should I rent their virtual presence and compensate them for each movie in which they are included? These questions, among others, arise in this context. It is important to note that unscrupulous individuals may exploit this situation. For instance, someone could create a movie using the likeness of Black Panther. How would one go about suing them? With a population of around 8 billion people, it is entirely possible for individuals to resemble each other. What concrete proof can be provided to establish that the Black Panther used in the movie is the same one being sued for, or the one recognized by the public? Once again, we remember and pay tribute to Chadwick Boseman’s unforgettable portrayal of the character.
  3. Moving on to the realm of images, when I provide a text prompt to midjourney and generate an image, to whom does the resulting image belong? Does it belong to me, as the individual who wrote the prompt? Does it belong to midjourney, the platform responsible for generating the image? Or does it belong to the owners of the images used to train midjourney’s image generation algorithms? If I generate an image that closely resembles someone else’s photograph, could that person potentially sue me? However, what if we coincidentally had the same idea and independently created similar images? It raises further complexities. Can midjourney even detect if an image it generates has been used elsewhere?
  4. Lastly, we consider text generators powered by large language models. In this case, I will refrain from delving into extensive details. If I generate a book using a chatbot like ChatGPT, who ultimately owns the rights to the book? Is it me, the user who provided the input and prompted the generation? Or does the ownership belong to ChatGPT, the language model responsible for the actual generation process?

Possible solution

Now, I’d like to share my personal opinion on this matter. Interestingly, we have encountered similar challenges in the past. When cameras were first invented, questions arose regarding copyright ownership of the images captured. Did the company that created the camera hold the rights? Or did the photographer who captured the image? Perhaps it was the owner of the subject being photographed? It is essential to establish that whenever your image, voice, or text is used to train these models, you should receive compensation, and the amount should be determined by you, not by the company that owns the model. It is important to emphasize that this compensation relates specifically to the training phase. Once the training is complete, it becomes difficult to argue for ongoing payment. Drawing a parallel, suppose I want to take a picture of a house. If the owner objects and says, “Hey, you can’t take a picture of my house,” and I proceed to take the picture anyway, I would undoubtedly face legal consequences and substantial financial penalties. Similarly, in the context of these generators, if you can prove that your data was used to train their models without your consent, taking legal action and seeking compensation is a valid course of action. However, if the data owner explicitly states, “You may use my data for this price,” and you can afford to pay the specified amount, then you can proceed with using the data. This establishes a mutually beneficial arrangement where the data owner is compensated, and the user can utilize the data. Furthermore, if the data owner grants permission without specifying a price, the user is free to use the data without payment.

House analogy

I mentioned earlier that compensation should be limited to the training phase. Using the house example, once I have paid the house owner to take a picture and subsequently edited that picture using Photoshop or any other software, I no longer owe the house owner anything. Our transaction concluded when I paid the fee for taking the picture. It is worth noting that some individuals may not agree with this viewpoint. It is possible that in the future, we may devise a mechanism similar to “copyleft” that grants data owners the right to demand compensation whenever their data is utilized in generating new content. However, implementing such a system presents numerous challenges, particularly in the context of generators. It is also important to consider the analogy that construction workers are only paid once for building a house and do not receive ongoing payment for every subsequent rent.

In conclusion, the ownership rights surrounding data generated by image or text generators represent a complex and evolving issue. It draws parallels to historical debates, such as the ownership of images captured by cameras. While compensation for data usage during the training phase is justifiable and should be determined by the data owner, establishing ongoing payment structures poses significant challenges. It is crucial for individuals and companies to navigate this landscape with transparency, respect for intellectual property, and a willingness to adapt to the changing technological landscape.

--

--