Visual User-Generated Content Verification in Journalism: An Overview

Over the past few years, social media has become an indispensable part of the news generation and dissemination cycle on the global stage. These digital channels along with the easy-to-use editing tools have unfortunately created a medium for spreading mis-/disinformation containing visual content. Media practitioners and fact-checkers continue to struggle with scrutinising and debunking visual user-generated content (UGC) quickly and thoroughly as verification of visual content requires a high level of expertise and could be exceedingly complex amid the existing computational tools employed in newsrooms. The aim of this study is to present a forward-looking perspective on how visual UGC verification in journalism can be transformed by multimedia forensics research. We elaborate on a comprehensive overview of the five elements of the UGC verification and propose multimedia forensics as the sixth element. In addition, different types of visual content forgeries and detection approaches proposed by the computer science research community are explained. Finally, a mapping of the available verification tools media practitioners rely on is created along with their limitations and future research directions to gain the confidence of media professionals in using multimedia forensics tools in their day-to-day routine.


I. INTRODUCTION
In today's digital society where everyone has a voice on the multitude of social media platforms, harnessing usergenerated content (UGC) has become a daily routine in newsrooms. With prevalence of smartphones with high quality cameras and access to the Web, visual and textual information relating to trending and breaking news events on social media platforms such as, Facebook, Twitter and YouTube has grown in scale and scope. The large volume of textual and visual content shared on social media enables journalists to gather information for publishing timely breaking news and rapid updates on trending events. In the context of journalism, UGC is associated with the term citizen journalism, i.e., citizens contributing to the process of collecting, reporting and dis-The associate editor coordinating the review of this manuscript and approving it for publication was Ziyan Wu .
tributing news-related information in the time of crisis. When reporters cannot reach to the ground efficiently such as in the countries with limited press freedom or in cases where events unfold quickly such as during a natural disaster, UGC becomes a key element in media coverage. Some well-known examples of events that were hugely covered by street journalism are the Arab Spring in 2010s, the Hurricane Sandy in 2012, the 2019-2020 Iranian protests [1].
Social media, and other similar web based platforms are not only used by journalists and reporters to obtain updated information about the latest trending news stories around the globe [2], [3], but also prepare a ground for newsrooms and media outlets to increase audience reach [1], [4]. According to a study by the International Centre For Journalists (ICFJ) in 2019 [5], two-thirds of news organisations distribute content in at least four formats, e.g., (1) social media post, (2) news item on the website, (3) video, and (4) messaging apps, with social media being the most widely-used platform. Almost 80% of the news organisations which were surveyed found to be using social media platforms to distribute their content [5]. With the presence of media outlets as well as ordinary people creating content on these digital channels, the number of ordinary people relying on social media for news consumption is increasing [6]. According to Pew Research Center's report on ''News Consumption Across Social Media in 2021'', more than half of Twitter users regularly used the platform to consume news in 2021 [6]. The report also found that social media users often rely on the platforms such as Facebook, Twitter, YouTube, Reddit, LinkedIn, and TikTok to obtain up to date information about the trending stories online.
Unfortunately, social media becoming an integral part of news distribution and consumption is a double edged sword. On one hand, the public and the media community get easy and instant access to local and global news in almost real time. On the other hand, these digital channels are being used to spread misinformation (when someone unintentionally share misleading content) and disinformation (when someone knowingly share misleading content) [7]. The fight against mis-/disinformation has been an ongoing move by news outlets, fact-checkers, and social media companies. For journalists and news editors who have to leverage UGC, it is even more essential to effectively monitor, verify, and debunk fake/manipulated UGC shared on social media platforms especially during breaking news events. Because of the importance of verification, major newsrooms such as The Associated Press, or BBC, have dedicated teams focused on verifying UGC. It is evident from the ICFJ's 2019 report [5] that about one-third of the surveyed news organisations have dedicated fact-checking teams to verify and fact-check content. Moreover, non-profit fact-checking organisations such as FullFact 1 and Faktisk 2 have been established and are quickly growing. Some independent organisations offer UGC verification as a service. For instance, Storyful, 3 a social media intelligence agency, offers ''verification'' as one of its services, and they collaborate with major newsrooms such as The New York Times and Reuters.
The significance of visual UGC verification becomes more evident when we look back on the incidences in the past when newsrooms and professional journalists failed in identifying misleading photos/videos and shared them as reliable content related to a newsworthy event. For instance, during the devastating flooding in Queensland, Australia, in early 2019, photos of crocodiles on the streets of the flood-affected region were uploaded and shared on social media platforms. The well-known Australian news outlet, Nine News published those photos as if they were captured on the flood scene [8]. It was discovered later that the photos were originally taken in 2014 showing American alligators in Florida, USA. But it was too late for the news agency to undo the damage to its reputation. Many other images alleging to be of the 2019 flooding, shared sometimes even by professional journalists, actually belonged to other events from different time periods and geographic contexts. These kind of incidents where professional journalists fail in UGC verification and share unauthentic or out-of-context information promotes the spread of mis-/disinformation which journalists aim to fight against, profoundly affects the level of trust in news, and severely dents the reputation of the journalist as well as the media outlet [3], [7].
Besides the increasing amounts of visual UGC shared online every day, it is becoming even more effortless to produce false and misleading content using inexpensive and user-friendly photo editing software tools such as Adobe Photoshop and Gimp. Along with the classical image manipulation techniques, a contemporary form of visual content forgery known as ''Deepfake'' media (fake multimedia produced using deep neural networks) has emerged in recent years. These technological advances at everyone's fingertips pose more challenges for the newsrooms and media practitioners to verify visual UGC.
Manual UGC verification is an extremely time consuming task because this procedure typically entails interviewing eye witnesses, checking the digital footprint of the source who shared the UGC item online, gathering more details about the events being portrayed in the video/image (e.g., identifying location within the image, date and time), thus results in spending considerable amount of time before coming to conclusions [4]. Obviously, this is in stark contrast to the race against the clock in newsrooms and the necessity of debunking viral mis-/disinformation online. Therefore, digital verification using (semi) automated tools is crucial in reducing the time burden of visual UGC verification. According to the 2019 ICFJ study mentioned before, media practitioners have attracted to these tools in recent years and the trend of utilising computational tools to verify/factcheck UGC by the journalists and fact-checkers is rising [5]. However, only around 33% of the journalists have been found to use such tools to assist them during the verification procedure [5]. The obstacles on the path to encourage the usage of technology for visual UGC verification could be grouped into two categories. The first category is related to the lack of knowledge about manual verification procedure among computer scientists who develop these tools. It is essential for the technology experts to fully comprehend the process of the manual verification in order to align/improve the tools with the requirements in newsrooms. The second category of barriers concern the journalists' points of view on these digital verification tools. Media practitioners ought to recognise the capabilities of forensics techniques proposed by the computer science research community in detecting image/video forgeries. Moreover, mapping of the verification tools employed in newsrooms and by journalists along with their use cases is fruitful in picturing the assets FIGURE 1. The proposed Six (5+1) elements of UGC verification. The first five elements are inspired from [9]: Provenance: checking if the same content has been shared, Source: person who captured the content initially, Date of capturing the content, Location where the content was captured, Motivation of capturing/sharing the content; and we proposed the sixth element ''Multimedia Forensics'' to help in identifying whether the visual UGC item is manipulated or synthetic. and liabilities of the existing technology for visual UGC verification.
The main aim of this study is to highlight the merits of (semi) automated visual content verification. We contribute: • A comprehensive overview on visual UGC verification in journalism. The five basic elements of visual UGC verification in journalism, are described in detail. We seek to understand how state-of-the-art in multimedia forensics could enhance existing tools to facilitate the procedure journalists follow when they verify visual UGC.
• In addition to the five elements of UGC verification, we also propose a sixth element which we call ''Multimedia Forensics'' and describe why we think it is necessary.
• An extensive study on visual content forensics from a technical perspective and present a number of different image/video forgery techniques and detection strategies along with examples of manipulated imagery from the news domain.
• A map of tools frequently employed by journalists and media practitioners for visual content verification, their use cases, and the limitations associated with them. This paper is organised as follows. In section II, a detailed analysis of journalistic process for visual content verification is presented. Section III, discusses various classes of visual content forgeries and the technologies developed to detect these forgeries. In section IV, we present a comprehensive mapping of the tools and technologies available for visual content verification and the technologies being used in the media industry for visual content verification. Section V concludes the findings and proposes future research directions.

II. THE 5+1 ELEMENTS OF VISUAL UGC VERIFICATION IN JOURNALISM
Major newsrooms principally have their own verification guidelines. The Associated Press (AP), for instance, has well-established standards that haven't changed for years. These standards made it possible for the organisation to successfully deal with social media content [4].
At BBC (British Broadcasting Corporation), after all the essential measures to verify the content are completed, the journalists disseminate their findings across all of the BBC's platforms using a system called Electronic News Production System (ENPS) [10].
First Draft News 4 [9] proposed five elements constituting the investigative UGC verification process: (1) Provenance, (2) Source, (3) Date, (4) Location and (5) Motivation. In this section, we describe these five elements in detail, and present how journalists, fact-checkers base their investigations on these elements to verify UGC. In addition to these five elements, we describe our proposed sixth verification element i.e., Multimedia Forensics. We illustrate these six elements in Figure 1. This figure does not belong to First Draft News, and is created by us for the sake of this study.
Some real world examples of media practitioners performing visual UGC verification by following the five steps mentioned above are described in [4]. We have also conducted a set of discussions with journalists and fact-checkers from three media outlets including Bellingcat, 5 Faktisk, Verdens Gang (VG), 6 and Bergens Tidende (BT) 7 to discern how their way of UGC verification aligns with the procedure in Figure 1. It reveals that visual UGC verification workflow in practice is fairly similar to the specified steps, but following every single step in the given order is not always the case.
Besides these journalistic workflows and recommended UGC verification strategies, there are other comprehensive conceptual frameworks available in order to analyse and mitigate manipulative content, or propaganda influence campaigns [12]. Some examples are, (1) Carnegie Mellon BEND Framework, (2) The ABCDE Framework [13], [14], [15], (3) The AMITT Framework [16], and (4) The Scotch Framework [17]. Analysts who attempt to interpret and mitigate mis-/disinformation employ these frameworks in realworld scenarios [12]. All of them propose somewhat similar fundamental steps which can be followed in order to uncover organised propaganda campaigns behind spreading mis-/disinformation online. We therefore suggest that following at least one of these frameworks, along with the six elements we are presenting in this paper, will result in a more comprehensive visual UGC verification.

A. PROVENANCE
Provenance is considered as the most important step in UGC verification process [9]. Through provenance, it is established whether the piece of visual content is indeed the original one or has been shared online in the past. It is also worth checking if it is a manipulated version of an image/video shared in the past. Sometimes, images are downloaded from the internet (e.g., social media platforms, websites) and then uploaded again, maybe on a different social media platform or website at a later time. These are called scrapes [9] and makes the provenance even more difficult.
A well known technique journalists and fact-checkers employ to establish provenance of the visual UGC is by carrying out a reverse image search. Reverse image search is the process of using search engines, such as, Google, TinEye, Yandex and others to find similar looking images to the one which is being queried. Browser extensions, for example RevEye, are also useful in finding similar looking images online. Reverse image search is an extremely powerful tool used to find out if a given image/video has been shared online before or not. If an older version of the queried image is found online from an earlier timestamp, this is an instant indication that the image may be re-purposed, presented outof-context, or misleading [9]. Typically, the image with the highest resolution/size is considered to be the original image, which can help lead the journalists to its source [4].
To carry out video provenance, a similar strategy as reverse image search can be adopted. For example, an individual frame from the video is extracted and then a reverse image search is carried out for that specific frame. The InVID verification plugin [18] can be used to establish video provenance. The plugin is available freely in the form of a Chrome or Firefox browser extension. It makes video provenance easy by offering functionalities such as, breaking down videos into individual frames, extracting video metadata, by using natural language processing algorithms to show any associated comments which can be helpful in verifying the video. The InVID verification plugin also has a magnifier which can be used to read any small text within a video frame, or to analyse other smaller details within the frame.
Besides these strategies, to look for any other relevant information, media practitioners sometimes also look into anonymous platforms, for example, Reddit, 4chansearch.com, Gab.ai, Discord channels, Facebook groups and other similar websites. Looking into these sources is helpful as a variety of UGC, including memes, misinformation sometimes originate from these places [9].

B. SOURCE
Verifying source refers to finding out who originally captured the content (image/video), whereas provenance refers to finding out who uploaded/shared the content for the first time online. This is important because sometimes the content creator and the uploader maybe different, for example, if a person captures a video in Istanbul and send to another person  [19]. On top right we show an example of manipulated image shared by Iran's Defence Ministry using Copy-Move forgery [20]. On left of the middle row we present an example of image splicing forgery [8]. On right of the middle row we show an example of image cropping manipulation [21]. On bottom left we show an example of cheapfake media [22], where, on the right we have an example of an image which was shared along with an out-of-context caption on twitter, on the left side we show a tweet from the photographer who captured the image originally, stating that the image has been miscaptioned. In the middle of bottom row we have an example of deepfake media [23]. On bottom right we present an example of image retouching [3]. More details on each of the presented example in this figure can be found in upcoming sections.
(e.g., friend, family, colleague) in London who then uploads the video online. The primary source in this case is the person in Istanbul, who initially captured the video.
During the verification process, it is thus crucial for the journalists and fact-checkers to identify the primary source of the content by checking if the uploader is also the source of the content or not. Interviewing the uploader and establishing provenance can help in swift and reliable verification of the UGC item, and can lead to the primary source of the image/video being verified. Typical questions journalists might ask to confirm the identity of source might include (1) when was the image/footage captured, (2) the acquisition device, (3) what they saw on scene of the event, (4) what the source has been doing on scene of the incident, (5) if the source lives nearby etc [4]. In some cases, journalists request the source person to send the image/footage via email, since email services do not compress, strip metadata headers from the file, and thus can help journalists in verifying the source. Journalists might also ask for additional supporting evidence e.g., images or footage if any to confirm whether the person has been actually on the scene [4].
When interviews are not possible, journalists inspect the digital footprint of the uploader (to find out if he/she is the original source) by analysing the associated social media profiles, the kind of posts the person has created/shared in the past, checking if the person has other social media accounts (LinkedIn, Twitter, Facebook, Skype etc), search the web for any other relevant information about the account (email addresses, phone numbers, web-pages) [4], [9]. Investigating associated activities on the web could help in harvesting more details about the individual and reaching the genuine source of the image/video. If the person has a profile photo available, a reverse image search is conducted to retrieve more details.
There are also several tools for gathering more information about individuals on the web using person search engines such as ''Pipl'' or ''Spokeo'' [10]. For investigating a specific website, rather than a social media account, look-up tools such ''WHOIS'', ''ViewDNS'' or other related domain name search engines are utilised. Tools like ''BotSentinel'' or ''Hoaxy'' are employed to detect social media bots. Another useful tool is Twitonomy, a Twitter analytics tool to acquire detailed information about an account, for example, when the account was created, the associated tweet history, the percentage of retweeted tweets, the most used hashtags, to whom they reply the most, average tweet count per day and other similar statistics. More information about the mentioned tools can be found in Table 3.

C. DATE
Although every social media post has an associated timestamp which tells when the post was created, that timestamp does not tell when the content was actually captured. Besides this, some visuals (scrapes) are uploaded multiple times on different social media platforms and have different timestamps. Finding the true date of creation of a visual item is thus no easy task. Journalists are aware of this and therefore share the date and time of the capture along with the content while publishing. The InVID verification plugin [18] can be used to get the exact upload time in Coordinated Universal Time (UTC) format (if the associated Exif header data is still intact).
Exif data headers are informative in finding out the date and time of acquisition. However, if a piece of content is downloaded from social media platforms the Exif header information might not be available. That is because these platforms drop the information in the Exif header when content is uploaded to save storage space [24]. Journalists might ask the eyewitness or the person who uploaded/shared the content to email the original image/footage in order to verify the Exif information. However, the information in Exif header can be easily modified and therefore it needs to be handled with care.
Journalists also make use of weather tools like ''Wolfram Alpha'' to check weather, or ''SunCalc'' to find the angle of sun on a specific date/time at a particular location. Further information about the mentioned tools can be found in Table 3.

D. LOCATION
When it is possible to interview eyewitnesses, in order to verify the date, time and location of an event, journalists ask direct questions [4]. For further confirmation, journalists and fact-checkers sometimes request more pictures/videos from the witnesses from the scene during the interview or right after it. Having multiple pictures/videos from the scene of incident provide additional details about the location. When interviews are not possible, journalists use computational tools to infer the location by analysing the associated metadata headers. Exif headers can provide vital information for the verification task, for example, the brand and the model of the capturing device, timestamp at which the image/footage was captured, GPS coordinates etc. Tools such as Photoshop or websites like ''Fotoforensics'' can be used to generate Exif reports [4].
UGC posted on social media platforms is often geotagged. However, the geotagged location might not be the same as the location in which the content was captured [9]. Journalists and fact-checkers obtain more information about the location within the image/video using available online software, e.g., Google Maps, Bing Maps, Apple Maps, Wikimapia, Google Earth, and others. Online maps are employed to identify surroundings, specific notable buildings, or other structures present in a shared image/video on an interactive map. The identification task becomes difficult when the buildings or surroundings in an image/video are damaged or destroyed in incidents such as airstrikes, bombings or natural disasters.
Location services like ''Geofeedia'' are also utilised by the media professionals to establish location from which a certain image was shared. To automatically extract text from signboards in images, journalists make use of optical character readers (OCRs) such as Tesseract. 8 If there are shops present in the scene, their names can be searched on online maps e.g., Google maps, Bing Maps to acquire further information. Google Translate or other similar translation services are used when the text on the signboards present in the image/video are in a different language. Other tools and services are also sometimes used in order to verify the location being presented in the image/video, for example, weather services similar to Wolfram Alpha, shadow information (SunCalc), temperature information. Figure 2 shows an example of how the signboards present in the images can help journalists estimate the location. The images in Figure 2 were extracted from a viral video shared on Twitter. Image 1 on the left, contains two tweets shared on Twitter claiming that the video was captured in (1) Belgium and (2) France. However, when the fact-checkers investigated the video by extracting individual frames and focusing on the signboards as shown in image 2, they found out that the video was in fact captured in Philadelphia, United States. The journalistic process to find the location of an incidence captured in a video is the same as for an image. In addition, the audios associated with any given video also provide valuable information about the location, for instance, by analysing the language or the specific accent/dialect being spoken in the video. The BBC Monitoring Service helps its staff on analysing accents [3].

E. MOTIVATION
Finding out the motivation behind capturing the content and sharing it online is virtually impossible [9]. Journalists can ask basic questions about (1) the reason for being at the site of the incident, whether intentional or unintentional, (2) the person's social media footprint, (3) the person being an activist or not, (4) working for the government or a political organisation [9]. By figuring out at least some these questions, journalists and fact-checkers might end up having a sense of the motives.

F. MULTIMEDIA FORENSICS
Until now, we have described the 5 basic elements of UGC verification that journalists and fact-checkers typically employ, and the computational tools they use to carry out verification. However, we feel that the verification workflow can be further strengthened by adding an additional element into the UGC verification task. Thus in this study, we propose In section III we discuss these forgeries and also present forensics techniques aimed at detecting these forgeries. Some content of this figure is adopted from [25]. the sixth element: ''Multimedia Forensics'' to strengthen the verification process.
The first five elements can help verify visual UGC which has been scraped from the web, manipulated and then shared again online. However, the tools (e.g., reverse image search, online maps or geo-location tools) used in the first five elements are not designed to verify manipulated content surfacing online for the very first time (until it is debunked, which will of course take some time). Through the sixth element, we suggest the use of image/video forensics tools which can help with detecting multimedia forgeries, for example copy-move or splicing.
Fact-checkers sometimes find themselves in trouble while verifying newly surfaced visual UGC. According to the interviews we conducted, even after successfully localising the location being depicted in the image/video using digital tools, to verify if the image/video is genuine or manipulated, is not a simple task. It's true that there are multimedia forensics tools available for verifying visual UGC, however at present, their widespread use within the news media organisations is not evident.
There are a number of image/video forensics tools available online which can help uncover manipulated visual UGC for example, FotoForensics, Forensically, Ghiro, DeDigi, WeVerify, InVID, MeVer. 9 For deepfake media detection web-based tools such as, ''Deepware.ai'', ''DuckDuck-Goose.ai'' are available which can be used to debunk newly 9 A list of these tools can be found in Table 3 surfaced deepfake media. Context based visual UGC verification tools such as, Journalistic Decision Support System (JDSS) [26], Context Aggregation and Analysis Tool [27] are also available which are able to provide contextual information about a given UGC item at one place.
In the next section, we present an insight on some categories of multimedia forgeries and the available forensics solutions. Our aim is to present an insight on where the computer science research community stands in the fight against visual mis-/disinformation, what kind of tools/solutions are available and what is needed in the future.

III. STATE OF THE ART IN MULTIMEDIA FORENSICS
In this section, different visual content forgeries and forensics techniques proposed by the community of computer science researchers are presented. We also present some examples from the past where manipulated visuals were employed to spread mis-/disinformation online. See Figure 3 and Table 1 for reference. We categories visual content forgeries into five categories as shown in Figure 4 based on the degree of applied manipulations as proposed in [25]. These categories are: • Similar: Images or videos visually similar to the original unaltered visual content with variations only in resolution or format are placed in this category; • Enhanced/Retouched: Image enhancement operation is typically carried out globally on an entire image, for example boosting the color of image in order to make it look more pleasing, or enhancing the 6754 VOLUME 11, 2023 FIGURE 5. This figure illustrates the digital life-cycle of a visual content item (image, video) and the stages at which forensics operations can be applied to detect tampering the image/video might have undergone i.e., represented using solid purple lines. Dotted purple lines show the stages from which helpful features can be acquired to detect different forgeries, e.g., CFA interpolation patterns are used to identify the make/model of the capturing device [28], or in [29] sensor noise patterns were employed to detect image splicing forgeries.
contrast/brightness/saturation etc to make it look more attractive to the eyes. Image enhancement operation can also be employed to make minor corrections in order to highlight or suppress certain artifacts within an image, or to make an image look more dramatic (as can be seen on the top left corner of Figure 3). Generally, the enhancement operation is not performed with a malicious intent, e.g., it is not employed to change the semantics of the image, however, we do have some examples (in the upcoming sections) where this operation is used with a somewhat malicious intent; Retouched: Similar to image enhancement operation, image retouching is also usually employed without having any malicious intent behind it. For example, image retouching operation is often used to eliminate imperfections from an image, such as removing blemishes, under-eye circles from a face. The idea behind using this operation is also to make the photos look better, however, although less frequently this operation can be employed with a malicious intent to hide, or misrepresent information being conveyed in the image. One difference between image enhancement operation and the image retouching operation is that, image retouching operation can alter local as well as global details within an image, whereas, image enhancement operation only alters global details of the image. Because of the fact that these two operations are not very much different, in this study we present Enhanced and Retouched as a single category; • Doctored: Visual media altered using sophisticated editing techniques (e.g., copy-move, splicing) that change the semantics of the original visual content item and/or produce something different from the original data belong to this group; • Deepfakes: A new class of fake media which is generated using deep neural networks is called deepfake media. The deepfake generation models can generate totally new fake content, as well as, they are capable of manipulating already existing content. The deepfake generation models are not only capable of generating visual content, but they can also generate audio, textual content as well. However, in this study we will mainly focus on visual deepfakes; and • Others: In this category we present two different types of visual content forgeries i.e., (1) cheapfakes and (2) video forgeries. Cheapfakes refer to multimedia content produced using ''cheaper'', and more user-friendly tools (or in some cases, no tool is required at all) such as, Photoshop, Gimp, Final Cut Pro etc [30].
The following sections describe these categories. See [12] for an in-depth understanding of some of the concepts presented in this section.

A. SIMILAR
In visual UGC verification, variations in compression or scaling images/videos seem unaltered in human eyes. Multimedia forensics tools analyse the underlying structure of the visual content by analysing compression artifacts, sensor related artifacts of the capturing device and available metadata information. VOLUME 11, 2023 TABLE 1. A summary of multimedia problems presented in Section III. We also list suitable forensics techniques, as well as available tools to detect/debunk these forgeries. Some of content in this table is inspired from [8]. An analysis of the tools can be found in Table 3.

1) SOURCE/CAMERA IDENTIFICATION
Source/camera identification relates to finding out information e.g., make, model of the capturing device. Sometimes this can be achieved by simply analysing the associated metadata information, however, for images/videos shared online such information is often stripped to save storage. To cope with this, researchers propose to employ features inherent to the underlying properties of the image/video. Such features result from different phases of the digital image acquisition process which takes place inside digital cameras/capturing devices. A simplistic overview of the digital image acquisition process is shown in Figure 5. For the source camera identification task along with metadata information, features such as sensor noise patterns, CFA interpolation artifacts, and compression artifacts are employed by the experts to analyze image/video under question, as depicted in Figure 5.
A straightforward technique to identify the source/camera of an image is to analyse it's Exif (Exchangeable Image File) header. Some useful details about the image and acquisition device are saved in the Exif headers, for example, make and model of the device, image resolution, exposure settings, date/time of acquisition, and some other relevant details [37]. However, typically when an image is uploaded online or shared on a social media platform, the platform strips out the Exif header data to save memory [24]. Besides, the information present within the Exif header cannot be trusted in critical cases (e.g., police investigations, court proceedings) since it can be easily modified.
To address this problem, researchers proposed several innovative solutions to infer information about the source/camera properties of a given image. A diverse set of features inherent to a capturing device based on the artifacts produced during image acquisition process including Sensor Pattern Noise, CFA (Color Filter Array) interpolation, JPEG compression artifacts etc. are employed [37], [38], [39].
In [39], JPEG compression statistics are employed for source camera identification. Since different camera manufacturers employ different compression strategies considering the trade-off between the image size and quality, the authors argue that it is possible to classify images based on JPEG compression artifacts.
In [28], authors proposed to employ CFA configuration and the associated demosaicing algorithm for source camera identification. Altogether, authors proposed 34 different features and trained a SVM classifier to classify camera make and model. In [45], authors proposed to cluster images from same capturing device together using on PRNU noise residuals using correlation clustering approach. Authors argue that since noise residuals of images coming from the same device possess a somewhat larger correlation as compared to the noise residuals of images coming from unrelated devices. This property can be leveraged for source camera identification task.
Below we provide a mathematical formulation of how sensor noise can help distinguish between images captured from different devices. To start the process, the images are denoised using any available denoising filter. The denoised version of the image is then subtracted from the original image as follows [46]: In equation 1 above, I k (x, y) refer to the original images, I k (x, y) refer to denoised version of the original images, where k = 1 . . . N . The term W k (x, y) helps suppress the underlying content of the images and makes the PRNU noise estimation more effective [46]. The PRNU noise is then estimated as given in equation 2.
The PRNU K (x, y) can then be used to determine the specific device used to capture the image I (x, y), i.e., by comparing the estmiated PRNU of the image I (x, y) with available PRNU estimates from a dataset of images captured using a number of different devices. The PRNUs having a correlation more than a certain pre-defined threshold can be considered as resulting from the same device. The following equation 3 presents the correlation ρ as given in [46].
In [47], a deep convolutional neural network (CNN) model was employed to carry out the source identification task for images captured using mobile devices. Study [48] proposed content-adaptive fusion residual networks for source camera identification on small-sized images. An efficient source camera identification method based on modified deep CNN (VGG 10 ) network was adapted in [49].
The source/camera identification methods can be divided into two categories: (1) Perfect Knowledge Methods and (2) Limited/Zero Knowledge Methods [24]. These methods are briefly described below: 1) Perfect Knowledge Methods: Perfect knowledge methods carry out the source identification task while having a closed dataset containing reference camera fingerprint from a number of different camera makes and models. 2) Limited/Zero Knowledge Methods: Limited/Zero knowledge methods consider limited prior information about camera properties, or use small datasets having less details about the capturing devices.
2) IMAGE/VIDEO PROVENANCE Image/video provenance concerns determining the last web/social media platform where the visual content was shared. Platform provenance analysis is an important step in visual content verification because it can help establish the full life cycle of the UGC item of interest. Various research studies have been conducted in the past for both image, and video provenance analysis, using forensics techniques. Researchers rely on features obtained by signal processing methods i.e., noise residuals, DCT coefficients, or by using metadata information [50], [51], [52], [53]. A diverse set of machine learning and deep learning classifiers such as SVM, Logistic Regression, Decision Trees, Random Forests, and CNNs have been proposed in the literature for platform provenance analysis. In study [54] it was shown that for smartphones, the JPEG headers are to a certain extent useful in identifying the operating system, and sharing application.
Study [52] proposed a social media platform provenance technique using ensembled convolutional neural network (CNN) architectures called, FusionNET. Authors employed diverse features for the provenance task, such as, (1) histogram of DCT coefficients, (2) noise residuals. Appending multiple features such as PRNU (Photo Response Non-Uniformity) to the DCT features improves classification accuracy. A video provenance network (VPN) which utilises both video and audio features is proposed in [55]. In study [56] a novel multi-branch CNN architecture called MultiFrame-Net was proposed to find the social network from which the video under analysis originated.

B. ENHANCED & RETOUCHED
Image enhancement and retouching operations manipulate the visual content in subtle ways. Contrast enhancement, sharpening and cropping operations fall into this category. In most cases these operations are not carried out with a malicious intent to deceive the audience, however, in some cases these operations can be employed to deceive by altering the semantics of the visual content.

1) ENHANCEMENT/RETOUCHING DETECTION
Image enhancement makes minor corrections to highlight or suppress certain artifacts within an image, often without any malicious intent. An example of image enhancement with a rather malicious intent is presented at the top left corner of Figure 3. The original image is on the left, and the colour enhanced image is on the right. The photographer darkened smoke to make destruction from an airstrike look more catastrophic [19]. After discovering the manipulation, Reuters news agency refused to work with the photographer who captured and enhanced this image.
Image retouching is similar to image enhancement to some degree. However, the retouching operation may be used to alter subtle global as well as local details within an image. In case of facial images, retouching operation might be employed to remove acne, blemishes, or scars. Normally, the retouching operation is harmless, as it does not conceal or misrepresent the information within an image [8]. A somewhat problematic image retouching operation is given at the bottom right of Figure 3 where the photographer removed his shadow from the photo [3]. The photographer was consequently dismissed for editing the photo [3].
In [57], a blind image forensic method to detect global contrast enhancement operations used to modify images by analysing their histograms was proposed. Study [58] proposed a facial image retouching detection technique by using spatial and spectral features obtained from PRNU noise fingerprints. The same author have suggested to detect facial retouched images using a differential detection system in [59]. The proposed system compares a (suspected) retouched image with a genuine reference image by using a number of different features such as texture descriptors, deep face representations, and face landmark data. In [60], a deep CNN model was employed to automatically detect warping (retouching) operation applied to human faces using Adobe Photoshop. In [61] a deep CNN model was proposed to detect GAN-based synthetic image alterations. In addition to the CNN model, authors employed two different algorithms for classification namely, (1) SVM and (2) thresholding.

2) CROPPING DETECTION
Cropping operation is typically carried out in order to remove unnecessary parts around the corners of an image or a video frame. Cropping forgeries are not as common as other kinds of image forgeries (copy-move, splicing) and are mostly considered harmless. However, they can be employed to spread mis-/disinformation [62], [63] when there is a clear intent to deceive audience by concealing information or distorting facts presented within an image, i.e., to shroud objects or conceal the wider perspective [8]. For example, in 2017 during the US president Donald Trump's inauguration ceremony, the White House pressurised the US National Park Services to crop out empty spaces from images and publish the cropped version of images where crowd was present [63]. For reference see Figure 3.
Cropping operation in JPEG images can be identified by detecting artifacts resulting from multiple JPEG compressions. Study [65], proposed to detect JPEG re-compression by using histogram discontinuities, periodic artifacts resulting from image re-quantisation process. In [66], a method to detect cropping and re-compression operations within JPEG images using blocking artifact characteristics matrix (BACM) was proposed. Cropping operation disturbs the symmetry of the BACM, and thus it can be employed as evidence to detect double compressed JPEG cropped images [67]. Study [68] devised a fully automated cropping forgery detector for images cropped asymmetrically by estimating the camera principal point. Analysing block artifact grids (BAG) which result from block processing during JPEG compression is another approach to investigate image cropping forgeries [69]. In [70], a method to detect upscale cropping operation in surveillance videos using sensor pattern noise (SPN) features was proposed. Mellin radial harmonic (MACE-MRH) correlation filter was used to unveil indications of upscaling. By omitting the high-frequency components of the video under investigation, and deciding the size of the local search window, this technique localizes partially tampered regions in an effective manner.

C. DOCTORED
Major of image forensics techniques developed in the last decades are dealing with revealing the sophisticated image modification techniques (e.g., copy-move, splicing) that change the semantics and/or produce something different from the original visual content.

1) COPY-MOVE DETECTION
Copy-Move image forgery is carried out by copying a specific region from an image and pasting the copied segment elsewhere in the same image [71]. Copy-Move forgery is FIGURE 6. In this figure, on the left side we present two different methods employed to create deepfakes, e.g., (A) Encoder-Decoder netwroks, and (B) Generative adversarial networks or simply, GANs [64]. We also illustrate basic pipelines of forgery detectors employing deep networks for feature extraction and classification, e.g., (C) shows a basic multimodal cheapfake media detector, (D) copy/move forgery detector and localizer, and (E) a deepfake detection system. carried out to hide something within an image or to increase the number of objects present in an image. For instance, take the popular fake Iranian missiles photo, in which copy-move forgery was used to hide a miss-fired missile with a fired missile. For reference, see Figure 3.
There are two different families of copy-move forgery detection techniques including (1) block matching-based techniques and (2) keypoint matching-based techniques. Block matching-based techniques divide the image into smaller overlapping blocks. Features are extracted from the resulting blocks and matched in order to identify duplicated regions within an image [37]. Block matching based copy-move forgery detection techniques employ Discrete Cosine Transform (DCT) features among others [72]. To apply DCT on an image, the image is first divided into N x N blocks (typically N = 8). Equation 4 shows how the DCT for the i th ,j th entry of an image is computed [73].
where p(x, y) represent the x th , y th elements of the image as given in matrix p. For copy-move detection, these block are sorted and matched with other blocks of the image to detect any matching blocks i.e., having a correlation greater than the specified threshold. Correlation for a pair of sorted blocks can be calculated as below: In the equation 6 above, px and py represent the two blocks, whereas n represent the number of coefficients within the block [72].
However, these techniques are computationally expensive [38]. Some studies proposed to employ dimensionality reduction techniques such as Principal Component Analysis (PCA) to reduce the feature space resulting in lower computational complexity [71]. In [74], PCA was carried out on DCT features to detect copy-move forgeries, reducing the computational complexity while achieving a higher robustness against noise and compression. A similar strategy was followed in [75] which used Singular Value Decomposition (SVD) for dimensionality reduction. To detect copy-move forgeries using block matching methods, researchers exploit a diverse set of features including Discrete Cosine Transform (DCT) [74], [76], Discrete Wavelet Transform (DWT) [75], [77], and Fourier-Mellin Transform (FMT) [78].
Unlike the block-based techniques, keypoint-based techniques extract features from certain regions in the image having high entropy rather than the whole image [79], thus reducing the computational complexity. Keypoint-based techniques are typically robust against geometric transformations and rely on features such as Scale Invariant Feature Transform (SIFT) [80], [81], [82] and Speeded-Up Robust Features (SURF) [83], [84] for the detection task.

2) SPLICING DETECTION
In image splicing forgery, a segment/block from a given source image is copied and pasted inside the target image [71]. Spliced images possess various artifacts for example, differing noise patterns, multiple colour distributions, abnormal dynamic range, lightning inconsistencies etc. This happens because the image is spliced using segments from the source image having different noise, dynamic range, and colour distribution as compared to the target image, thus introducing irregularities within the target image's statistics [38]. An example of image splicing is given at left side of the middle row in Figure 3. The photo shown on the right was spread far and wide on social media during the Australian bushfire crisis in 2019-2020. It was later found that the image VOLUME 11, 2023 was a spliced version of multiple other images, as we show in the Figure 3.
To detect image splicing forgeries, diverse set of features such as noise residuals, CFA interpolation artifacts, and JPEG compression artifacts are employed. The same formulation of PRNU noise as given in the equation 2 presented above can be used to analyze images for tampering. However, in this case instead of estimating PRNU K (x, y) for images captured using different devices, PRNU K (x, y) is estimated only for the set of images known to have captured using the same device as I (x, y). The correlation ρ given in equation 3 can then be used to estimate authenticity of the image [46].
Sensor pattern noise (PRNU) fingerprints to detect image forgeries including image splicing and copy-move forgeries was proposed in [29]. Popescue and Farid analyse Color Filter Array (CFA) interpolation inconsistencies emanated by the tampering operations, to detect image splicing and copy-move forgeries [91]. In [92], an approach to detect image splicing forgeries by detecting JPEG ghost, which appears when the two images (source, target) are compressed using different quantisation amounts was proposed. In [93], Markov features acquired from DCT and DWT coefficients are used to train an SVM classifier to detect image splicing forgeries. In [94], a technique to detect image splicing forgeries by analysing lighting inconsistencies within the images was proposed.

D. DEEPFAKE DETECTION
In the previous sections, we have outlined classical image forgeries. Nowadays, with the availability of enormous compute power at low cost and with the development of sophisticated deep learning models, producing realistic fake multimedia content known as deepfakes is becoming prevalent. Deepfakes are not (until now) a widely popular form of UGC mis-/disinformation at present, however, according to the journalists and fact-checkers we have consulted, it can be said that they have the potential to become problematic in the future. While we can say that the deepfakes as mis-/disinformation are not popular, they are still around us in the form of TikTok, Instagram filters which people use to add different types of effects (makeup etc) to their faces. These filters are also driven by the deep neural networks.
According to [64], deepfakes can be defined as, ''Believable media generated by a deep neural network''. The term Deepfakes is a combination of two different words, ''deep learning'' and ''fake'', referring to manipulating/producing fake realistic multimedia content including, images, videos, text and audios. Deep learning models such as Autoencoders [107] and Generative Adversarial Networks (GANs) [108] are typically used to generate realistic deepfakes.
Contemporary deepfake generation methods usually employ GANs. The generative adversarial network, or simply GAN, is comprised of two different networks i.e., (1) a generator, and (2) a discriminator [108]. As evident from the name, the GAN is trained in an adversarial manner, where the generator tries to fool the discriminator by generating plausible (fake) data samples similar to the training data. The discriminator on the other hand tries to differentiate between the (fake) samples produced by the generator network from the ones in the training set (real samples).
Simply put, the generator and the discriminator networks play the so called min-max game [108], which is defined by the following equations 7 and 8. The discriminator is trained so that it tends to maximize the function given in equation 7. Alternatively, the generator is trained in a way so that it tries to minimize the function in equation 8, i.e., by generating more plausible data samples similar to the data distribution in the training set.
In equations above, x refers to real data sample, z is the latent vector, G(z) refers to the fake data produced by the generator G, D(x) is the prediction of discriminator D for real sample, D(G(z)) is the prediction of the discriminator of fake data [108]. After being trained for a large number of epochs, the generator is able to fool the discriminator by generating extremely plausible fake data, as can be seen in Figure 7.
Deepfakes extend further than just the visual content (images/videos), for example, in [111] it was shown that how generative networks can be employed to tamper medical evidence such as, MRI and CT scans. In 2019, a UK based energy firm's CEO was scammed for $250k [112], by using a voice cloning deepfake algorithm similar to the one proposed in [113]. Besides this, it has been shown that the generative models are capable of generating synthetic news articles and tweets [114], [115].
Deepfakes have the potential to be used to spread mis-/disinformation online and disrupt peace. In 2019, a video went viral on social media in which Boris Johnson and Jeremy Corbyn where seen endorsing each other for Prime Minister [23]. For reference, see Figure 3. Recently, amidst the Russian invasion of Ukraine, a deepfake video of Ukrainian president went viral on social media platforms [116].
[124], [125]. In most cases the proposed systems consider the deepfake detection task as an n-class classification (typically n = 2, e.g., fake or real) problem. To train the classification model, majority of the proposed systems mentioned above employ cross-entropy loss as defined in equation 9.
where X represents the training set, y ′ [c] refers to the predicted probability for a given sample x i of class c. Figure 6 E presents a simple overview of deepfake detection pipeline. For more general deepfake media detection, in [126] Zhang et al. proposed to exploit unique artifacts which result from the up-sampling operation present in most of the common GAN pipelines. In [127], several different classical as well as deep learning based fake content detectors [128], [129], [130], [131] were employed to detect GAN generated images found on social media platforms. In [132], a techniques employing co-occurrence matrices extracted from the pixel domain for all of the three colour channels to train deep convolutional neural network to detect GAN generated images was proposed.

E. OTHERS 1) CHEAPFAKE DETECTION
The term ''Cheap Fakes'' was initially coined in 2019 [30]. Cheapfakes are manipulated media produced to spread fake news and misinformation/disinformation. Examples of cheapfakes can be, (1) photoshopped images, (2) slowing down, speeding up, and/or cutting video frames, and (3) recontextualising genuine visual content by presenting it along with falsified textual captions etc.
Cheapfakes are created/manipulated using freely accessible editing tools such as, Photoshop or GIMP, unlike the deepfakes which are produced using sophisticated deep learning tools, and require technical expertise which makes them more prevalent online [133]. In case of re-contextualised cheapfakes, sometimes genuine images are presented along with false/out-of-context textual captions, thus requiring no editing tool to generate this type mis-/disinformation. For VOLUME 11, 2023 example, shortly after the 2015 earthquake in Nepal, an image with two children, a brother and a sister went viral on the internet claiming to be captured in Nepal. The picture was originally captured in Vietnam in 2007. The image itself was not manipulated, but presented out-of-context [35]. The mentioned picture is given at the bottom left corner of Figure 3.
Typical deep learning based cheapfake detection systems usually comprise of two different deep neural networks, i.e., (1) an image CNN to extract image features, and (2) a text CNN for textual feature extraction. The extracted multimodal features are then fused togather in order to get final classification score. In [134], a self-supervised learning strategy to train neural network models to detect outof-context captions associated with images was proposed. Authors also open-sourced a considerably large dataset comprising of around 200K images and 450K captions for further research in the domain. A neural network based system for multi-modal (image and text) fake news detection was proposed in [135]. ''FauxWard'', a novel framework based on graph convolutional neural network able to learn heterogeneous information extracted from a social media post's user comment network in order to effectively detect misleading information shared online was proposed in [136]. In [137], Khattar et al. proposed an autoencoder based fake news detection model, relying on both textual and visual content.

2) VIDEO FORENSICS
Video forensics is somewhat different than image forensics because unlike images, videos also carry temporal information along with spatial information. Video forensics techniques are divided into two categories, (1) inter-frame techniques, and (2) intra-frame techniques. To deal with temporal information, inter-frame video forensics techniques are employed. The intra-frame video forensics techniques are almost similar to the image forensics techniques as they deal with individual frames, and does not analyse the temporal information of the video. We briefly describe the two forgeries below.

Inter-frame Video Forgery Detection
Inter-frame video forgery is carried out in the temporal domain, for example, (1) frame insertion, (2) frame deletion, (3) frame shuffling, and (4) frame duplication. Typically, the inter-frame forgeries are employed to tamper, twist, conceal, or falsify the information present inside a video.
A number of different techniques were proposed by the scientific research community to detect inter-frame video forgeries by utilising diverse set of features as described in [138], for example, • Compression Artifacts: Compression related artifacts/abnormalities are used to detect the traces of forgery applied to the video.
• Noise Artifacts: Sensor noise fingerprints are analysed to detect traces of forgery. • Motion Features: Forgery performed on a video may interfere with the motion features of the video, resulting in changing the relation between different adjacent frames. Motion related features (optical flow etc) can thus be used to detect intra-frame video forgeries.
• Statistical Features: Pixel-based or statistical featurebased methods to detect video forgeries analyse statistical properties of objects, pixel-level inconsistencies and correlations between different frames of the video.
• Machine Learning Techniques: Machine learning, deep learning models (i.e., reacquiring huge amount of training data) are employed. New deep learning models are can automatically learn complex patterns from the data to detect image forgery, without requiring any handcrafted.

Intra-frame Video Forgery Detection
The intra-frame forgeries are carried out in the spatial domain, i.e., single frame present inside a video is manipulated using the image manipulation techniques, for example, copy/move, splicing or cropping etc. Intra-frame forgeries are used to add or remove a portion or an object from within one or multiple frames of any given video to conceal or misrepresent content of the video.
These forgeries are similar to image forgeries, since individual frames within a video are manipulated and thus can be detected using passive image forensics techniques as described in previous sections. However, some of the techniques take temporal features into account in order to detect intra-frame forgeries. For example, [139] proposed to employ optical flow (helps in tracking the movement of objects) related inconsistencies in order to detect intra-frame copymove video forgery.

F. ACTIVE FORENSICS
The forensics techniques presented in earlier sections are ''passive'' in nature, i.e., do not require any prior information about the visual content which is being analysed [38]. Active forensics is another family of forensics techniques which analyse visual content by examining specific watermarks, or signatures embedded during acquisition or processing stages.
A limitation of active approaches is that these approaches fail to work in cases where there is no prior information available about the image being verified, for example, if the information about the watermark/signature is not available, or if there is no watermark/signature embedded into the image. Also, when the images shared on social media platforms are uploaded/downloaded several times, the image compression rate gets affected severely, influencing watermark/signature embedded in the image initially [140]. Furthermore, if the watermarks or signatures are added during image acquisition phase, the camera must be equipped with a special watermarking chip or digital signature chip [38].

G. CONTENT AUTHENTICITY INITIATIVE
Content Authenticity Initiative (CAI) is a new project aimed at developing an end-to-end secure system for digital content (image/video) provenance and attribution. Through this initiative several big tech companies like Adobe, Microsoft are working collectively with big media houses including BBC, AFP, The Washington Post etc. to combat visual mis-/disinformation [141].
The initiative's goal is to include a layer of verifiable trust within all types of digital content i.e., photos, videos by employing provenance and attribution solutions. Although this initiative is at its evolutionary stages, it can prove to be extremely useful in fighting visual mis-/disinformation online. The initial version of the CAI will appear in the beta version of Adobe Photoshop, a widely popular Adobe's ubiquitous photo editing software. Eventually, the CAI might help transform the social media feeds or news websites by filtering out content which is ''possibly'' inauthentic.

IV. MAPPING
In this section, we present an overview of the verification tools media practitioners employ, the limitations associated with these tools, and the future prospect of visual UGC verification tools.
The employment of computational verification tools and resources is growing [5]. In 2017, ICFJ states that only 11% of the interviewed journalists and news managers were using some kind of computer tools to verify content shared on social media platforms. The figure in 2019, however, shows a remarkable increase in this number with around 33% of the interviewed journalists and news managers utilising computational tools and resources to verify UGC [5]. The ICFJ's 2019 report reveals that more than half of the surveyed journalists use digital fact-checking tools [5]. This upwards trend of using verification tools is due to the speed and the scale at which visual mis-/disinformation is disseminated. Table 2 presents the percentage of the UGC verification tools used in the media industry according to the ICFJ's 2019 report.
Journalists and fact-checkers typically use basic tools such as reverse image search, Exif data viewers and online maps with known limitations as discussed in Section II. A variety of multimedia forensics tools such as forensically, fotoforensics, WeVerify -InVID verification plugin, MeVer, DeDigi [142], and other similar tools are available online which can assist journalists in detecting possible tampering operations an image might have underwent. Some classical image forgeries such as copy-move and image splicing forgeries can be detected using these tools. However, such tools are not widely employed by media practitioners in visual UGC verification. The reason might be because these tools require technical knowledge and training to be used properly. Moreover, most of the available multimedia forensics tools do not take any contextual information into account when used to verify a piece of visual UGC. These caveats might be the reason why most of the news media professionals are reluctant in trusting automated verification tools.
For deepfake detection, although there are some tools available online they mostly do not work as expected. The available deepfake detection tools just provide a binary, ''real'' or ''fake'' answer without providing any insights on why and how the decision has been made.
A study to address the issue of contextual information was proposed in [143], describing a system called Seriously Rapid Source Review (SRSR). SRSR is able to provide contextual cues from different sources allowing media practitioners to find and analyse sources relating to breaking news events [143]. A similar tool called Journalistic Decision Support System (JDSS) [26] was also developed under the REVEAL project [144]. JDSS is free to use, and provides diverse set of functionalities to crawl Twitter for useful content in order to carry out verification. In [145], also under the REVEAL project, a web-based image verification system, which featured metadata visualisation, and image tampering detection tools was proposed. In [27], Context Aggregation and Analysis Tool to verify user generated videos was proposed. The tool is claimed to be capable of automatically collecting and calculating several different contextual verification cues for a given video. The cues include, (1) contextual information about the video (e.g., comments, thumbnails, Twitter context), (2) if the video has already been debunked in the past. Authors also used machine learning models trained on real and fake video data to automatically analyse a given video. Other similar popular projects succeeding REVEAL are InVID, and WeVerify which are focused on building a platforms to detect and verify visual content. These projects aim at developing tools for image/video metadata analysis, key-frame extraction, reverse image search, magnifier, forensic analysis and contextual data analysis [18], [146], [147]. Under the InVID project, researchers have also developed a social media analytics dashboard to find and track trending stories across several social media websites [148].
Considering the importance of visual UGC verification, and the lack of available trustworthy tools and resources, media industry is joining hands with the research community to address these issues [149]. It is true that no automated verification tool can verify a piece of visual content with 100% accuracy [150] but tools can make the verification process more efficient by reducing the burden on the fact-checkers. Tools which provide contextual information about a given visual UGC item, while analysing its veracity by using both content and context based features will be extremely helpful for the media practitioners. By using such tools, journalists and fact-checkers will have all of the required information from different sources in one place, which will enable them to carry out verification effectively. Also, such tools will help reduce the need to look at different sources online manually to gather more information, resulting in efficient verification.
It should be stressed that the final decision is to be made by the person (journalist/fact-checker/editor) who is using the tool, and not by the automated tool itself. New tools should be tailored to provide all of the required information at one place, and let the user to decide if the content is genuine, fake, tampered or re-contextualised. Table 3 presents a variety of available verification tools journalists and fact-checkers typically use to verify UGC along with some of their limitations.

V. CONCLUSION
In this paper we presented an overview of visual UGC verification in journalism, i.e., we described in detail 5 elements of UGC verification, along with the computational tools journalists and fact-checkers employ in order to verify visual UGC shared online. In addition to the 5 basic pillars of UGC verification, in this study we propose a 6th pillar which we call ''Multimedia Forensics'', which could potentially benefit the news media professionals in verifying manipulated visual content. Besides this, from a technical perspective we also analysed a variety of visual content forgeries, and the forensics techniques proposed by the computer science community to detect these forgeries. In the end, we presented a mapping of the available computational tools media professionals frequently employ in order to verify visual UGC, the available multimedia forensics tools which are not commonly used by the journalists and fact-checkers, and the limitations of the available tools.
Based on our analysis of the journalistic UGC verification practices, we conclude that (semi-)automated verification tools are required in order to aid media professionals and newsrooms in their fight against an increasing amount of visual mis-/disinformation online. We also suggest that multimedia forensics tools should be incorporated into the basic journalistic verification workflows. In addition to that, to properly make use of forensics tools, journalists and fact-checkers should be trained.
From a computer science perspective, we believe that more user-friendly, explainable forensics tools are required in order to gain the confidence of media professionals in using multimedia forensics tools in their day-to-day routine. Additionally, most of the available multimedia forensics tools carry out content based analysis only, and does not take into account the contextual information while verifying a piece of visual UGC. We suggest that new forensics tools should be designed in a way so that they can take advantage of the contextual information acquired from different sources relating to the UGC item being verified. We believe this will further enhance the verification process, and will gain confidence from the media industry to use such tools since they will then be able to see on what basis the tool has made a certain decision.
Generating visual mis-/disinformation and detecting it is an ongoing arms race. The researchers propose new solutions to detect manipulated visual content, and the adversaries propose new techniques to evade the detection algorithms while generating more and more realistic looking fakes. We expect that this will result in extremely realistic fake visual content that it will not be possible to detect such fake content using passive techniques anymore. We therefore think that active forensics techniques will be more useful in the future to detect fake content. The active forensics techniques require special signatures, watermarks to be inserted into the visual content at the time of creation. Such signature, watermarks can be used to check whether the content has been manipulated or not. Content Authenticity Initiative briefly discussed in section III is a step towards contemporary form of active forensics, and we foresee it as a vital apparatus in the fight against visual mis-/disinformation in the future.
SOHAIL AHMED KHAN received the M.Sc. degree in cybersecurity and artificial intelligence from the University of Sheffield, U.K. He is currently pursuing the joint Ph.D. degree with MediaFutures and the University of Bergen. Prior to joining MediaFutures, he worked as a Research Assistant at the Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates. Before that, he worked as a Remote Research Assistant at the CYENS Centre of Excellence, Nicosia, Cyprus. His research interests include deep learning, computer vision, and multimedia forensics. He is also associated with MediaFutures Work Package 3, Media Content Analysis, and Production.
GHAZAAL SHEIKHI received the master's degree in biomedical engineering from the Amirkabir University of Technology, Teheran, Iran, and the Ph.D. degree in computer engineering (machine learning) from Eastern Mediterranean University, North Cyprus. She is currently a Postdoctoral Fellow at MediaFutures. Her research interests include machine learning, natural language processing, and textual content analysis.
ANDREAS L. OPDAHL received the Ph.D. degree from the Norwegian University of Science and Technology (NTNU), in 1992. He is currently a Professor in information systems development at the University of Bergen, Norway, where he heads the Research Group for Intelligent Information Systems (I2S). His research interests include ontologies and knowledge graphs, enterprise, and IS modeling and their applications to media production. He is the author, the coauthor, or a co-editor of more than a 100 peer-reviewed and widely cited research papers. He is a member of IFIP WG5.8 on Enterprise Interoperability and WG8.1 on Design and Evaluation of Information Systems. He serves as an associate editor or renowned international journals and as an organizer of renowned international conferences and workshops.
FAZLE RABBI received the Doctor of Philosophy (Ph.D.) degree in software engineering from the University of Oslo. He is currently an Associate Professor at the University of Bergen. He has long and varied experience with software development in smaller and larger projects within a large spectrum of domain areas and technological solutions. His research interests include model-based software engineering, data mining, and machine learning, with emphasis on addressing the information science problems in healthcare applications, and software engineering related research: workflow modeling and its verification, metamodeling, building decision support systems, multi-agent systems, and process engineering.
SERGEJ STOPPEL received the Ph.D. degree in computer science in the field of visualization from the University of Bergen. He is currently the Chief Innovation Officer of Wolftech Broadcast Solutions, where he is driving the innovation of a collaborative news and media production tool that is used by more than 18000 of users on a daily basis. He is also working in areas of data science and analytics, deep learning, and natural language processing. He was awarded with the EuroVis Best Ph.D. Award, in 2019. DUC-TIEN DANG-NGUYEN (Member, IEEE) is currently an Associate Professor in computer science at the Department of Information Science and Media Studies, University of Bergen. His research interests include multimedia forensics, lifelogging, multimedia retrieval, and computer vision. He is a member of MediaFutures WP3-Media Content Analysis and Production in Journalism and The Nordic Observatory for Digital Media and Information Disorder (NORDIS). He is the author or coauthor of more than a 100 peer-reviewed and widely cited research papers. He is a PC member in a number of conferences in the fields of lifelogging, multimedia forensics, and pattern recognition. He is a co-organizer of over 40 special sessions, workshops, and research challenges from ACM MM, ACM ICMR, NTCIR, ImageClef, and MediaEval during the last ten years. He is also the General Chair of MMM 2023. VOLUME 11, 2023