Synthetic Image Detection

View on GitHub

Introduction

It is almost impossible now to go on any social media and not see at least one generated image or even a video. With the rise of generative models such as DALL-E, Stable Diffusion, and Midjourney, the quality of these images has improved drastically. While this is an exciting advancement in AI, it also raises concerns about misuse of such technology, especially in the context of misinformation and deepfakes.

As Uncle Ben once said:

Uncle Ben

And now, that power lies in the hands of many. Fortunately, in these times, these models are mostly used for brainrot memes and art generation, which are harmless by themselves, but there has been a rise in fraudulent activities using these models. From generating fake profiles on social media to creating misleading images for political propaganda, the potential for harm is significant.

This is why it is important to address these issues, and to bring attention to them. By collectively working on prevention mechanisms, and tools for oversight of these generative models, we can help make internet a safer place. One such mechanism is the detection of synthetic images. By developing frameworks, methods and models that can accurately identify generated images, we can mitigate the risks associated with their misuse. And yes, you heard me right, “use models to fight against models themselves”, quite ironic.

Now, if you clicked to read more about this project, I can assume you’re interested in it and want to know more about the techniques for detecting generated images. I assure you that you came to the right place, sort of. In the following paragraphs, I’ll be writing about my findings of the field called “Synthetic Image Detection (SID)” which I encountered while doing the research at CTU Prague, for my master’s thesis (which I strongly recommend you to read, link will be at the end). Main focus will be on the interesting discoveries I found, without too much in-depth theory and math. I want to introduce the field of SID to the people and to bring attention to its usefulness, and I also want it to be simple enough for everyone to understand and to be able to learn something new from it, while it also encourages practical and intuitive thinking without pure theory and math.

With that said, I do encourage you to discuss your opinions and thoughts in the comment section below. The field itself is still new, and I can’t pretend like I’m 100% correct with every single fact, so feel free to comment if you disagree with anything. Also, by the time I publish this article, there might be some new discoveries in the field, so if you know about any new methods or techniques, please share them in the comments as well.

How do generative models work?

Before delving into main topics, it would be smart to understand our opponent, or in our case, to understand how generative models work. There are many types of generative models, ranging from GANs (Generative Adversarial Networks) to VAEs (Variational Autoencoders) and Diffusion Models. Each of these models has its own architecture and way of generating images, but they all share a common idea: Look at the bunch of data, learn the patterns, and generate new data that resembles the original distribution.

Let’s say you are a generative model, and you want to learn how to generate images of cats, but you don’t know what a cat looks like. So, you are given a dataset of cat images, and you start analyzing them. You look at the shapes, colors, textures, and all the patterns that make a cat. After analyzing enough images, you start to understand that furry texture is common, that cats have pointy ears, and that they often have a certain shape of eyes.

Cat 1 Cat 2 Cat 3 Cat 4

(Some images of cats that you would look at if you were training to be a generative model.)

Now, for you that doesn’t sound too hard, right? You just look at the images, and you learn the patterns. What you’re doing implicitly is that you’re learning the distribution of the data. You learn that among all possible images, the ones that look like cats have certain characteristics, and you learn to generate new images that have those characteristics. In a way, you learn to “pick” the right characteristics to generate a new image.

For a machine, it’s not that simple. The machine doesn’t have eyes to see the images, it has to process them as numbers. So, it takes the pixel values of the images and feeds them into a neural network. The neural network then learns to recognize the patterns in the data and generates each pixel of the new image based on those patterns. It learns the distribution of the data, and generating new images is essentially sampling from that distribution. For some simpler datasets, like MNIST, which consists of images that are 28x28 pixels, that’s shouldn’t be too hard, but for more complex datasets, like ImageNet, which consists of images that are 224x224 pixels, it becomes much more challenging. The model has to learn a much more complex distribution, and it has to generate a much larger number of pixels.

To tackle this problem, researchers have developed various architectures and techniques. Mostly, those techniques are based on the idea that, instead of learning the distribution of the data directly, we can learn to generate images by learning to transform a simple distribution (like a Gaussian distribution) into the complex distribution of the data. For example, each of the architectures I mentioned earlier (GANs, VAEs, Diffusion Models) has its own way of doing this transformation. GANs use a generator and a discriminator to learn this transformation, VAEs use an encoder and a decoder, and Diffusion Models use a process of adding noise to the data and then learning to reverse that process.

That said, the field of generative models is vast and complex, and there are many different architectures and techniques that have been developed. The ones I mentioned are just a few examples, and there are many more out there. There’s a lot of mathematics and theory behind these models, but I won’t go into that in this article. If you’re interested in learning more about the theory and mathematics behind generative models, I recommend you to check out some of the resources I will link at the end of this article, and my master’s thesis, as well. :)

Synthetic Image Detection

I’ll first start by saying that this name is just a fancy and smart way to say “Detection of generated images”. With the name of SID, you enclose a large group of methods that can be used for the task of detecting generated images.

How do Deep Learning Models See generated Images?

Comments