Artificial intelligence generated content (AIGC) in medicine: A narrative review

: Recently, artificial intelligence generated content (AIGC) has been receiving increased attention and is growing exponentially. AIGC is generated based on the intentional information extracted from human-provided instructions by generative artificial intelligence (AI) models. AIGC quickly and automatically generates large amounts of high-quality content. Currently, there is a shortage of medical resources and complex medical procedures in medicine. Due to its characteristics, AIGC can help alleviate these problems. As a result, the application of AIGC in medicine has gained increased attention in recent years. Therefore, this paper provides a comprehensive review on the recent state of studies involving AIGC in medicine. First, we present an overview of AIGC. Furthermore, based on recent studies, the application of AIGC in medicine is reviewed from two aspects: medical image processing and medical text generation. The basic generative AI models, tasks, target organs, datasets and contribution of studies are considered and summarized. Finally, we also discuss the limitations and challenges faced by AIGC and propose possible solutions with relevant studies. We hope this review can help readers understand the potential of AIGC in medicine and obtain some innovative ideas in this field.


Introduction
The limited resources and complexity of medical procedures are common challenges worldwide in the field of medicine, while traditional methods of care require a high level of skill and can be timeconsuming.Artificial intelligence (AI) is a new technical science that researches and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.AI is an important driving force of a new round of scientific and technological revolution and industrial change.As a critical and fundamental technique of AI, machine learning (ML) studies how computers can either simulate or implement human learning behaviors to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance.ML can extract features from training data, learn patterns and quickly process new data in large quantities.Therefore, ML-based methods can automate the analysis of medical data and improve the efficiency of medical tasks.Thus, in the medical field, ML-based image classification, image segmentation and object detection algorithms have been widely used for tasks such as fracture detection and risk prediction [1], cancer period determination [2], and lesion detection [3].ML is a data-based learning process, and the operational effectiveness of ML algorithms will often depend on the quality and quantity of data.However, the acquisition and labelling of much medical data is a tedious process, and the quality of data has high requirements for medical experts and medical devices.These can lead to significant limitations in the application of ML.At the same time, many medical tasks are not simply data analysis.For example, medical diagnoses involve generating medical reports based on medical data.Meanwhile, generative tasks are becoming more widespread in medicine since the concept of extend reality (XR) has been introduced to the field.
With the rise of AI generation models, such as generative pre-training transformer (GPT) [4] and DALL-E [5], artificial intelligence generated content (AIGC) has started to gain widespread attention.Researchers have also generated interest and ideas for applying AIGC to various challenges in various fields, and medicine is one of the key application areas.AIGC is a kind of content with corresponding characteristics generated by generative AI based on various forms of manual instructions and guidance.Generative AI can produce AI-generated content quickly and in large quantities to meet specific demand, which makes it a great method for solving the data generation problems faced in medical AI.
Although AI-generated content has grown rapidly in recent years and is gaining attention in various fields, research into the application of generative AI to various fields is still in its infancy.AIgenerated content has a great potential for application in numerous fields, including medicine.For this reason, this paper tries to review the application of AIGC in medicine.The application of generative AI models and AI-generated content in medicine is gradually receiving wider and wider attention, as shown in Figure 1, with the study of the application of generative adversarial networks (GAN) and variational autoencoder (VAE) in medicine as an example.This review begins with an overview of the basic concepts and techniques of AI for generating content.We present generative AI in three categories: text generation models, visual generation models and multimodal generation models.Furthermore, by collecting and researching studies on AI-generated content in medicine, this paper summarizes the application of AI-generated content to various tasks in medicine.There are reviews summarizing the application of generative AI into natural language processing (NLP) tasks in medicine [6].Instead, we specifically review the current state of research in AI generated content based on the vision generation model (VGM) and the large language model (LLM) in the field of medicine.The VGM generates specific visual data such as images, videos and so on, while the LLM is a kind of AI model with a large number of parameters that can understand and generate human language.Ultimately, based on the aforementioned information, we analyzed the directions and challenges for the development of AI-generated content in the medical field.
Figure 1.Number of researches on the application of GAN and VAE in the field of medicine.The Web of Science database was searched for the keywords "GAN" + "medical" (due to misunderstanding of the subject matter, the research directions were restricted to the four research directions with the highest number of papers and most relevant to biomedical engineering: "Computer Science", "Engineering", "Mathematical Computational Biology", "Radiology Nuclear Medicine Medical Imaging") and "VAE" + "medical".

Overview of AIGC
Generative AI refers to training a model using a given set of artificial instructions or artificial data containing feature information, so that the trained model can generate data with specific features to meet the requirements based on the input of artificial instructions or artificial data.This generated data is the AI generated content.As shown in Figure 2, generative AI can perform a variety of generative tasks, which can be categorized into prompt-based and autonomously generated tasks based on the conditions of generation, and into unimodal and multimodal tasks based on the relationship between the types of data inputs and outputs.

Text generation models
Text generation models are trained to generate readable text content based on the content and structure of the input data, and are now widely used in dialogue systems, translation systems and other AI systems.Text generation models can be divided into decoder models and encoder-decoder models (Figure 3).Decoder models have been widely used for text generation tasks, while encoder-decoder models make use of contextual information and autoregressive properties to enhance the performance of the model in the task.

Decoder models
GPT is the most common text generation model based on the decoder model.Specifically, GPT is based on the transformer model, which predicts the words in the previous text and generates a coherent text.The later GPT-2 [7], GPT-3 [8] and GPT-4 [9] built on this idea by expanding the model parameters and using a combination of multiple datasets for training.Additionally, many text generation models have been proposed based on the GPT architecture.Gopher [10] used the RSNorm layer to replace the LayerNorm layer in the GPT-3 architecture, while BLOOM [11] applied a full attention network instead of the sparse attention mechanism used in GPT-3.For pre-trained GPT models, human feedback-based reinforcement learning was introduced in InstructGPT [12] to further tune the pre-trained GPT models and ultimately improve the model performance.

Encoder-decoder models
Text-to-text transformer (T5) [13] is a common text generation model based on an encoderdecoder structure, which uses a transformer-based encoder and decoder structure to transform the input and output text into a prescribed text format for a variety of text generation tasks.Researchers have developed a number of text generation models based on the T5 model.The switch transformer [14] introduced the idea of "transformation" by referring to a simplified MoE routing algorithm to train T5 models in parallel, while Google's ExT5 [15], which was proposed in 2021, extended the scale of the T5 model to learn more natural language tasks across a larger number of domains.In addition to the T5 model, BART [16], which is another common encoder-decoder model, used a BERT-based bidirectional encoder and a GPT-based autoregressive decoder.Based on BART, DQ-BART [17] used distillation and quantization to reduce the model size while maintaining the original performance of the model.Visual generation models can generate image data with specific features and content based on the input data, and such models can perform a variety of image generation tasks such as style translation, data enhancement, and so on.Commonly used visual generation models GAN [18], VAE [19], flow [20] and diffusion models [21].Figure 4 illustrates the underlying architecture of these network models, and this section will briefly introduce these commonly used models for visual generation models and the current state of research.

Generative adversarial network (GAN)
GAN was an early proposal for a visual generation model and has gained widespread attention.GAN consists of a generator, which generates image data by learning the feature distribution of real data, and a discriminator, which determines whether the input data is real or not based on real data.During the training process of GAN, the discriminator's power gradually improves, while the generator aims to generate pseudo-data that the discriminator judges to be real.

Variational autoencoder (VAE)
Based on variational Bayesian inference, the variational autoencoder learns generated data that is similar to the original data by mapping the data to a probability distribution.The variational autoencoder constructs an implicit variable space based on the statistical characteristics of the input data (including mean and variance) and reconstructs the generated data by randomly sampling the sampled data from the implicit variable space with the same probability distribution.Building on this idea, researchers have proposed jump connections for random sampling processes [36][37][38] to obtain probability distributions from different perspectives.To obtain smooth and representatively powerful implicit spaces, researchers have introduced regularization into encoders [39][40][41].For large scale images, researchers have introduced a hierarchical network architecture [42].

Flow
Flow models map input data into intermediate data by constructing a series of reversible and microscopic mapping relationships, which, in turn, obtains generated data by inverse mapping.Zheng et al. [43] and Hoogeboom et al. [44] introduced convolutional neural networks (CNN) into streaming models.To address the problem of gradient disappearance, RevNets [45] and iRevNets [46] constructed a reversible network structure based on residual links.

Diffusion model
The diffusion model is an advanced generative model that has excelled in recent years on generative tasks, which gradually blurs the data by adding noise in multiple stages and then learns its inverse process to generate data based on random noisy data.
Diffusion models can be divided into three main types of models: the denoising diffusion probability model (DDPM) consists of a forward process (i.e., a Markov chain) with a series of noise coefficients determined from a particular model as kernel parameters, and a backward process (i.e., a Gaussian transformation process) with learnable parameters; the score matching formulation (SMFM) aims to solve the original data distribution estimation problem by approximating the gradient of the data; and the score stochastic differential equation model (Score SDE) describes diffusion and denoising fractional matching models based on a uniform continuous form of the stochastic differential equation.Based on the three types of diffusion models, researchers have applied methods including knowledge distillation [47,48], noise scale design [49][50][51][52], dynamic programming [53,54] and a combination with other generative models [55][56][57] to further enhance the performance of the diffusion models.

Multimodal models
Multimodal AI for generating content is currently an important research direction in generative AI.The aim of multimodal generative AI is to achieve connections and interactions between data of different modalities.
The most common type of multimodal AI-generated content is text-image-generated content, which can either be based on textual instructions to generate corresponding images or on input images to generate corresponding descriptive text.The text-image generation model mainly uses an encoderdecoder structure, which is based on the aforementioned unimodal generation model, where unimodal encoders or decoders are relatively well established.However, encoders or decoders for extracting information from multimodal data are more complex.The encoder structure of the text-image generation model uses either stitching multimodal data into the encoder (VisualBERT [58], UNITER [59]) or pairing multimodal data through the output of the corresponding unimodal encoder to a crossalignment based encoder (LXMERT [60], ViLBERT [61]).There are two types of decoders for textimage generation models: text decoders that can use pre-trained large language models (Frozen [62]) and image decoders that can be designed based on the unimodal generation models mentioned in the previous section such as GAN (StyleCLIP [63]), diffusion models (GLIDE [64], Imagen [65]) and so on.

AIGC for medical image
Medical images are one of the key ways of obtaining physiological information about the human body and are the most important type of medical data.The development of medical imaging methods such as computed tomography (CT), magnetic resonance imaging (MRI) and various optical imaging methods have enriched access to medical images and improved the quality of medical images.However, for medical imaging, there are limitations to the quality of medical images due to the physical characteristics of the imaging method.Furthermore, the information from different types of medical images can often be one-sided, and the costs and negative effects (such as high doses of radiation) of the image acquisition process cannot be ignored.The traditional approach to medical image tasks includes the techniques of digital image processing (DIP), which process medical images as 2D discrete signals with basic mathematical methods and cannot complete many generative tasks.To address these issues, AIGC has a great potential in the field of medical imaging due to its generation efficiency, modality and feature transferability.In this section, we discuss AIGC in tasks of medical image reconstruction, medical image translation and medical image data augmentation.

Medical image reconstruction
High resolution images contain more information and more detail than lower resolution images.Medical professionals can make more accurate and comprehensive diagnoses and develop more appropriate treatment plans based on high resolution images, which can lead to better outcomes of medical treatment.However, the resolution of medical images is often subject to numerous limitations, such as physical limit constraints on optical imaging, the magnetic throughput and scanning time of the scanner during image acquisition, and considerations for radiation dose.Medical image reconstruction is the generation of corresponding high-resolution images based on low resolution images by means of image generation techniques.CT is a mainstream medical imaging technique that uses X-rays to image objects layer by layer, thereby allowing for the non-invasive acquisition of structural features within the human body.However, CT is associated with radiation problems; although low-dose CT can mitigate this problem to some extent, it's imaging resolution and quality can be greatly compromised.For this reason, it's a significant task to reconstruct normal-dose CT images based on low-dose CT images.Researchers have experimented with both network structures and training methods based on GAN.Huang et al. [74] designed a U-Net-based discriminator structure and performed adversarial training in the gradient domain of the images, which validated the reliability of the method on abdominal and chest clinical CT image datasets.Li et al. [72] applied noise-encoding transfer learning (NETL) to GAN and achieved excellent results on the low-dose CT image reconstruction AAPM dataset [75].Based on the feature that the diffusion model can simulate the noise addition process, Gao et al. [73] introduced a recovery network using contextual information to constrain the sampling process in the diffusion model and proposed CoreDiff; the experimental results on the AAPM dataset showed a superior performance of the model.
MRI is the process of detecting the electromagnetic waves emitted by an applied gradient magnetic field to discover the location and type of atomic nuclei that make up the object, from which a picture of the object's internal structure can be drawn.The use of compression-aware methods to acquire CS-MRI scans can improve the speed of MRI image acquisition, but with the expense of image resolution.For this reason, it is also important to reconstruct high-resolution images from CS-MRI scans, and generative AI techniques have great potential for this task.Zhao et al. [76] introduced the swin transformer into a GAN structure and demonstrated the superiority of their generative AI approach by conducting experiments on MRI scans of the brain and knee region.Other generative AI architectures have also received attention.For example, Zhang et al. [77], studied MRI reconstruction tasks using conditional variational autoencoders (cVAE) and validated the effectiveness of their method on the BrainWeb dataset.Luo et al. [78] introduced joint uncertainty estimation based on a diffusion model, the experiments on image reconstruction on brain MRI scans demonstrated the effectiveness of their model.
The generation of high-resolution images based on low resolution medical images is an important task to save costs and to ensure safety while obtaining high quality medical images to ensure medical efficiency.The characteristics of AIGC have led to widespread interest and use in high-resolution medical image reconstruction tasks.Tables 1 and 2 summarize recent studies based on GAN and diffusion models for medical image reconstruction tasks, respectively.

Medical image translation
Different modalities of medical images contain different structural information and physiological characteristics of the human body due to different imaging principles.Therefore, multimodal medical images are of a great significance for the accuracy and comprehensiveness of disease diagnoses and treatments.However, the cost of acquiring multimodal medical images directly from an imaging device is high, so the modal translation of unimodal medical images to obtain multimodal medical images has also become an important task in the field of medical image processing.AIGC has strong modal migration characteristics.Thus, a generative AI model becomes a critical tool for implementing modal translation in medical images.

Mathematical Biosciences and Engineering
V olume 21, Issue1, 1672-1711.2) Iteration between the numerical SDE solver and data consistency step to achieve reconstruction.
Gao et al. [73] 2023 CT (AAPM [75]) Introduction of a novel restoration network CLEAR-Net to mitigate accumulated errors by constraining the sampling process using contextual information among adjacent slices and calibrating the time step embedding feature using the latest prediction.

Selected works Year Modalities Organs (Datasets) Contribution
Gao et al. [103] 2023 CT (AAPM [75]) a noise estimation network to gradually convert a residual image to a Gaussian distribution based on a Markov chain with a low-dose image as the condition.
Ma et al. [191] ImpressionGPT ChatGPT 6 MIMIC-CXR [156] OpenI [157] Derive CT images are essential for radiotherapy, as target depiction and dose calculation must be performed on the CT images.However, the low contrast of soft tissues makes it difficult to depict target areas on CT images for critical organs such as the brain, liver and pelvis in particular.MRI scans have excellent soft tissue resolution and do not produce ionizing radiation.However, MRI scans lack the electron density information that CT images can provide, and thus do not allow for the calculation of radiation dosages.As a result, researchers have proposed aligning MRI scans with CT images to achieve information fusion.However, this method requires both MRI scans and CT images, which increases the economic cost.Generating corresponding CT images based on MRI scans is an important way to address this problem.Generative AI based on a variety of visual generation models has been intensively studied and widely used in such tasks.Zhao et al. [107] designed a hybrid CNN and transformer generator structure based on GAN networks to extract multi-level information of images.Additionally, they introduced feature reconstruction loss, thus ensuring the sensitivity of the network to structural features of the image.Their experiments on the pelvic MR-CT multimodal dataset demonstrated the superiority of their method.On the other hand, Li et al. [108] used MRI and sampled CT information as a prior knowledge embedded in a diffusion model and introduced the null-space measurement inference (N-SMI) module into their inverse inference process for the CT image generation task (Figure 7).Similarly, they demonstrated the performance of their method on the pelvic MR-CT multimodal dataset.Additionally, AIGC is used in other cross-modal medical image generation efforts.Due to the overdose and cost of radioactive tracers, positron emission tomography (PET) imaging is rarely used in routine medical examinations.However, PET has an important role in the treatment of tumors and neurological diseases due to its high specificity and sensitivity.Therefore, Wei et al. [109] used multiple sequences of MRI scans to predict myelin content in PET images and measured changes in myelin content in vivo by conditional flexible self-attentive generative adversarial networks (Figure 7), which are essential for understanding the mechanisms of multiple sclerosis.
In addition to modal transitions between different types of medical images, similar medical images may also have different modalities containing different features or information.For instance, the common modalities of MRI scans include T1, T2 and FLAIR modalities.T1 modalities can better show the anatomical structure of the imaging area, while T2 modalities are more sensitive to tissue lesions.Therefore, it is also important to convert between different modalities in MRI scans.Hu et al. [110] introduced unsupervised domain adaption to perform the modal migration task based on a 2D variational autoencoder (Figure 6) and performed experiments on the BraTS 2019 dataset to generate T1-MRI using T2-MRI and FLAIR-MRI, their method performed excellently.Meng et al. [111] designed a multi-in multi-out conditional score network based on the diffusion model to reverse the diffusion process in the full modal space, thereby using conditional diffusion and a score-based reverse generation process to accomplish cross-modal image generation.Their approach achieved superior results in various modal MRI generation tasks on the BraTS 2019 dataset [114].
Multimodal medical image generation based on modal migration is an important task to assist in improving the efficiency and quality of medical care.Generative AI can learn the features of different modal images and achieve the effect of feature migration.Thus, AIGC has shown an excellent performance in the field of medical image translation.Table 3 shows the latest studies on AIGC for medical image translation based on different kinds of architecture.

Data augmentation for medical image
The acquisition of medical images is generally costly, and the application of AI to many medical tasks such as medical diagnoses require large amounts of medical image data for training, this making the task of generating data from medical images critical to the application of AI in the medical field.Data augmentation of medical image is based on a small amount of existing medical image data to generate artificial data with similar distributions and features.While traditional geometric and intensity transforms can certainly accomplish this process, the distribution of data generated by these traditional methods is very limited.The distributed feature learning capability and diversity of AI generated content makes generative AI techniques of great value for medical data augmentation.
Diagnoses based on medical images include the detection of medical image lesion areas.However, the number of lesion images is often limited, which can have a significant impact on the training and performance of the AI.Thus, the synthesis of lesion images based on normal images is one of the important directions for the application of AIGC in the field of medical data augmentation.Generative adversarial networks applying self-attentive mechanisms have been widely used in lesion image generation tasks.Abdelhalim et al. [123] used this approach to generate fine-grained skin lesion images while Ambita et al. [124] applied this type of network to synthesize CT scan images of COVID-19.Hajij et al. [125] used realNVP, which is a commonly used normalizing flow model, to demonstrate its effectiveness in medical image data augmentation tasks on a CXR dataset [126].
Microscopic studies of diseased tissue by pathologists have been the cornerstone of cancer diagnoses and predictions.While deep learning methods have made significant progress in recent years in the analysis of pathological tissue images, the generation of high-quality pathological images by AI can further expand the volume of data, and thus facilitate the application of deep learning methods.Moghadam et al. [127] proposed the use of a diffusion probability model for histopathology image generation and improved the performance of the model using a method called color normalization.
They demonstrated the effectiveness of their model by testing it on low-grade glioma images from the TCGA dataset [3].Medical image data augmentation is an important task to facilitate the development of medical AI, and AIGC shows a great potential in this task with its diversity, high speed of generation and feature learning.Researchers have applied AIGC to medical image data augmentation tasks in a variety of modalities, as shown in Table 4.

AIGC for medical text
Medical text is an important carrier of medical data.Medical texts include medical diagnostic reports, medical research reports, medical terminology and so on.These kinds of medical text are also important elements in various medical tasks including medical diagnoses, medical education, doctorpatient communication, etc.The task of generating medical texts is important as the acquisition of various medical texts requires a lot of time and effort, which is a great challenge for medical resources.Medical texts are often generated based on inputs such as medical images, contents of doctor-patient dialogues, etc., for goals such as diagnoses, summaries, explanations, etc.Based on all kinds of demands, the LLM can quickly and automatically generate various types of textual data.Thus, the LLM has received widespread attention and applications in the medical field.[128] Based on studies of various LLMs for medical text generation tasks, as shown in Table 5, AIGC can be applied into many medical domains, while Figure 8 displays the application of generative AI in the tasks of medical text.This section discusses AIGC in data augmentation, medical Q&A and medical summarization in detail.

Data augmentation for medical text
Access to text data, especially medical text data, is not so easy due to privacy, ethical, and security reasons.In turn, the AI model performance is affected by the amount of training data.Thus, few-shot learning is an important method for training AI models.Data augmentation is the key technique for small sample learning, which can generate a large amount of data for training based on a small amount of existing data by transforming it.The high sensitivity of medical texts makes data augmentation significant in medical text processing tasks.
AIGC enables medical text data augmentation, which in turn assists AI models in accomplishing a series of downstream tasks related to medical texts, such as text categorization, text extraction, and even other text generation tasks.Inspired by InstructGPT, Dai et al. [141] applied the reinforcement learning from human feedback (RLHF) method into ChatGPT to augment medical text and further assist in improving the performance of the text categorization task.Li et al. [142] used BioMedLM, which is a medical large language model based on GPT, as a data augmentation model to expand the corresponding medical text according to the corresponding generalized content.Additionally, BioMedLM assisted PULSAR, which is a large language model they designed based on FlanT5, in the task of generalizing physical problems based on medical records.Data augmentation of medical text is an important task for developing AI models in the medical field, as well as an important basis for applying the LLM to assist healthcare.

Medical question answering
Medical question answering are tasks that automatically answer corresponding questions given a specific context.Medical question answering is one of the most important tasks in the field of medicine.Medical question answering can be categorized into two types: direct question and answer (Q&A) and reading comprehension.Direct Q&A refers to the generation of corresponding responses based on the content of the questioner's question and the internal knowledge of the model, such as the explanation of medical terms.On the other hand, reading comprehension analyzes and interprets based on the provided material (e.g., analyses of medical conditions based on medical images, explanations of problems related to clinical notes, etc).Medical question answering is based on a large amount of medical knowledge and information.However, the human learning ability is limited.So completing a variety of medical quiz tasks requires a large number of highly qualified medical personnel.Combining with various carriers of medical knowledge, LLMs can efficiently learn medical knowledge and information, then complete various medical question answering tasks.
Various forms of datasets provide ample learning resources for the LLMs.The sources of datasets include medical exams, medical papers, and medical market surveys.MedQA [143] uses medical questions from the US Medical License Exam (USMLE) as a vehicle for medical knowledge, while MedMCQA [144] contains 194k multiple-choice questions from Indian medical entrance examinations (i.e., AIIMS/NEET) to store medical knowledge.Similarly, with the contents of exams, MMLU [145] contains 57 domains of medical knowledge.PubMedQA [146] designed 1k expert-labeled Q&A using the content of abstracts on PubMed as context.On the other hand, LiveQA [147], MedicationQA [148], and HealthSearchQA [149] are datasets based on medical knowledge and questions frequently sought by users.
Various medical knowledge datasets in conjunction with the LLMs generate AIGC, which plays a great role in medical question answering.Singhal et al. [149] showed encouraging results on medical question answering based on PaLM [150] using a variety of datasets as training and test data.The LLMs can actually extract medical knowledge from these datasets and generate compliant AIGCs in medical question answering.Wu et al. [151] designed PMC-LLaMA using the large language model LLaMA [152], and they used the dataset S2ORC [153], which contains medical English papers, to train the large language model, and tested the model's performance in medical Q&A on PubMedQA, MedMCQA, and MedQA datasets.The model's excellent experimental performance demonstrated the reliability of AIGC in medical Q&A, as well as the ability of the LLM to extract knowledge information from data in different modalities.
Some research on medical applications based on the LLMs will focus on more specific medical domains.Thawkar et al. [154] proposed XrayGPT, which is a medical generative AI for radiology.Their model was based on Vicuna [155], which used the MIMIC-CXR dataset [156] that contained chest radiology images and corresponding medical reports, and OpenI [157] for training.XrayGPT ultimately accomplished the diagnosis and interpretation of radiology images.Following this, Zhou et al. [158] used a combination of SKINCON [159], which is a dataset that pairs skin lesion images with annotations, Dermnet, which is a dataset that pairs skin lesion images with corresponding disease categories, and a private dataset that combines skin lesion images with physicians' descriptions to train MiniGPT-4 [160].Additionally, they designed SkinGPT-4, which is a large language model that can diagnose, analyze and answer questions about skin diseases based on pictures.
AIGC plays an important role and excels in automated medical Q&A systems.The automation and high quality of medical Q&A can assist patients in obtaining the appropriate diagnosis faster and more accurately, as well as more timely and adequate communication during the medical process.Additionally, medical Q&A can provide students with an efficient way to acquire knowledge and resolve doubts, which is meaningful for enhancing the efficiency of medical education and alleviating the pressure on educational resources.

Summarization of medical text
Text summarization is a common natural language processing task.A text summarization task is essentially the generation of short, easy-to-understand text based on the input of long text.In medicine, the task of summarizing medical texts is difficult due to the diversity and complexity of medical language expressions and terminology.For this reason, AIGC in medical text summarization tasks has received increased attention.
Doctor-patient dialogues are an important foundation for physicians to make medical judgements.For patients, they are critical way to obtain medical information.However, dialogues are often long and can contain redundancies in their content.Thus, summarization of the doctor-patient dialogue is significant for improving the efficiency of healthcare.For this task, a large number of datasets can provide a basis for the application of the LLM in this task.The iCliniq and HealthCareMagic datasets extract a large number of doctor-patient dialogues from the MedDialog dataset [161] and combine them with the corresponding summarizations.Questions from patients are often verbose due to their lack of medical knowledge.Therefore, Ben Abacha et al. constructed the MeQSum dataset [162] using medical experts' summarizations regarding 1000 patient health questions selected from a dataset distributed by U.S. National Library of Medicine.To assist patients in understanding answers to medical questions, Savery et al. proposed the MEDIQA-ANS dataset [163], which contains the corresponding answers of 156 medical questions and their corresponding summarizations.Yuan et al. [164] designed a medical large language model, BioBART, based on the LLM, BART [15], and trained the model using a series of datasets including those mentioned above.Their model provided a good summarization for the content of doctor-patient dialogues.
A medical report is the analysis of medical data and has an important role in the healthcare process.Medical reports will often contain a findings section describing the detailed content of observations, as well as an impression section containing representational content in the observations (e.g., abnormal areas of the image).Summarizing the impression section of medical reports is a crucial step.However, this task is very time-consuming and requires a high level of physician experience.As a result, automatic impression generation (AIG) has attracted a lot of attention from researchers.Hu et al. [191] designed a kind of graph encoder based on the idea of graph neural networks (GNN) to exploit both findings and extra knowledge.Their proposed model achieved excellent performance on the MIMIC-CXR and OpenI datasets.Ma et al. [192] created dynamic indications by a similarity search.Based on this, they performed iterative optimization for the LLM and further used domain-specific data to fine-tune it.They proposed ImpressionGPT based on ChatGPT using the aforementioned approach.For the summarization of radiology reports, their model achieved state-of-the-art (SOTA) performance on the MIMIC-CXR and OpenI datasets.
Literature is an important carrier of medical knowledge, but it's not easy to extract valid information from it and understanding the information due to its long length and complex structure.AIGC can perform well in the task of summarizing literature.Pang et al. [193] proposed a model for text summarization by combining an encoder designed based on the idea of top-down and bottom-up inferences with a decoder based on the initialization by BART.Their model performed excellently on multi-domain text summarization datasets including literature datasets such as PubMed and arXiv.Due to the excellent performance of AIGC in multi-domain text summarization, some studies have also applied the LLM into medical literature summarization.Frisoni et al. [194] introduced the idea of GNNs into the large language model BART [15] and used reinforcement learning to optimize the network.Additionally, they proposed Cogito Ergo Summ, which is the first single-document abstractive summarization model applied to the biomedical domain.They achieved a near SOTA performance on CSDR [195] with fewer parameters.
Medical text summarization is of a great significance in the field of medicine.For example, the summarization of doctor-patient dialogues not only assists the doctor in making a diagnosis, but also makes it easier for the patient to understand his or her condition.The summarization of medical reports can improve the efficiency of doctors in diagnosing, treating and following up with patients.Besides, the summarization of medical literature facilitates information retrieval while making it possible to increase the utilization of medical education resources.With the development and application of the LLM, AIGC can automatically and quickly achieve medical text summarization, which is useful for many aspects of the medical field.

Limitations and challenges
Based on the current state of research, AIGC can play a great role in various medical tasks.However, there are still many limitations and challenges in widely applying AIGC in medicine, and this chapter will specifically discuss the current problems and potential solutions for the application of AIGC in medicine.

Hallucination and poor interpretability
Generative AI can generate a large quantity of medical contents.However, they are more like deductions or guesses.Thus, it's possible that AIGC seems plausible but is not correct.Especially for medical image generation, the acquisition of medical images is generally based on certain biological and physical principles, whereas AIGC simply generates images by numerical computation to reason.The difference in generation methods makes AIGC not always reflect the detailed information of the imaged area as correctly as real images.In addition, AIGC has a poor interpretability compared to real medical images.Furthermore, the LLMs sometimes face similar problems when generating medical text.Hallucination of AIGC may provide fake or wrong information for doctors and patients, which can lead to severe and even life-threatening medical problems.A lack of interpretability makes doctors not believe AIGC in some important medical cases.For this reason, hallucinations and interpretability strongly limit the broad application of AIGC in medicine.For this problem, RLHF can be a possible solution.RLHF is proposed to finetune the AI agent from ordinary people by allowing them to provide social feedback, such as evaluative feedback, advice or instruction.However, there are still some challenges for RLHF on obtaining human feedback, such as designing reward models and policies [196].

Envolving knowledge
With more studies and findings, medical knowledge changes rapidly over time (i.e., it expands and is updated with deeper studies).In particular, the development of medical knowledge, including new terminologies, updating medical concepts, innovative schemes for medical treatment, new standards for medical diagnosis and so on, can be much faster.In order to generate correct and more credible medical content, AIGC needs to not only retain existing knowledge, but also go on to incorporate new knowledge.To address this problem, generative AI can build on existing models to continue learning about new knowledge [197].In some cases, continual learning does not perform as well as learning from scratch [198]; however, the costs of learning from scratch are very high.As a result, for the application of AIGC, it is important to clarify the applicable circumstances of continuous learning and learning from scratch, or to identify the corresponding appropriate learning scenarios for the different modules in the model.

Large scale
The AI models for generating AIGC generally have a large scale, including a large number of training parameters, a large dataset and a high demand for computational resources.The high complexity of medical generative tasks and the large amount of knowledge increases the scale of generative AI models applied to medicine.However, large-scale AI models come with significant time and resource costs and higher deployment requirements.Thus, one of the key challenges in applying AIGC to medicine is a model scale setting that ensures the model performance while avoiding resource wastage.Hoffmann et al. [199] proposed a formal scaling law to predict the model performance based on the size of parameters and datasets.Based on the validation of this law, Aghajanyan et al. [200] investigated the relationship between different training tasks under multimodal training.These studies provided valuable insights into controlling the complexity of large models.

Data bias
AIGC often suffers from data bias problems [201].For instance, models trained for English text can generate content that better matches English features.This problem is even more pronounced in medicine.People of different races and countries may have different physiological characteristics and medical standards due to geographical factors and differences in living habits.Additionally, gender has a greater impact on the medical process.However, existing datasets are difficult to guarantee complete equalization and will not contain detailed information due to privacy concerns.This may lead to a significant bias in the AIGC in favor of data that accounts for a larger proportion of the dataset.This kind of bias will seriously affect the accuracy of medical diagnoses, the efficiency of medical treatment, etc., which is a great challenge for the application and promotion of medical AIGC.To tackle such a problem for GNNs, Dong et al. [202] designed a novel bias metric and proposed a modelagnostic debiasing framework named EDITS.A similar idea can be applied into AIGC for data bias.

Ethical and legal concerns
AIGC has been rapidly developing in various fields, and at the same time, faces various ethical and legal issues.Medical information, both text and images, is important information that involves personal privacy and effects an individual's life and health.One of the key ethical and legal concerns faced by AIGC in medicine is illegal dissemination and utilization of fake information.The collaboration of Deepfake recognition technology [203] and generative AI provides an important way to deal with this problem.Additionally, AIGC in medicine faces privacy protection challenges.In response to this problem, Federated Learning is an effective solution that helps multiple organizations use and model data while meeting the requirements of user privacy protection, data security, and government regulations [204,205].It's also significant to consider the privacy risks and existing solutions in the whole life cycle of the AI, including project planning, data collection, data preparation and model deployment [206].Most importantly, the generation and application of AIGC needs to be controlled by established laws.

Conclusions
This review focuses on the current state of research on the application of AIGC in medicine.First, we briefly described generative AI models for generating AIGC from the perspective of different modalities.On this basis, this paper summarized the innovative research work in recent years on applying AIGC to various medical tasks from two aspects: medical image tasks and medical text taskswhile focusing on their datasets, methodologies, and innovations.Finally, we discussed the limitations and challenges faced by AIGC in the medical field, and proposed potential solutions and research directions in view of relevant studies.We hope that this review can provide readers with a better understanding of AIGC in medicine and inspire ideas for the further application of AIGC in the medical field.We discussed most common application of AIGC in the medical field in this review, and would further explore and analyze AIGC in medicine.

Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Figure 2 .
Figure 2. Diagram of AIGC.Generative AI models can perform a variety of unimodal or multimodal tasks, including implementing the generation of large amounts of data with similar characteristics based on existing data (generating a large number of images of similar style based on a specific style of image), transforming the characteristics or modality of existing data based on demand (transforming a photograph into a certain painting style), generating multimodal data with corresponding characteristics based on the input demand (generating images or audio based on the input text to generate a corresponding image or audio, or a description based on the image).

Figure 3 .
Figure 3. Classification of text generation models: grey arrows indicate left-to-right information flow, while blue arrows indicate bidirectional information flow, TF indicates Transformer,   ,   , … ,   indicates fragments of input text, and   ,   , … ,   indicates fragments of output text.

Figure 4 .
Figure 4. Basic architecture diagram of various visual generation models, where  is the raw or real data and  is the generated data.

Figure 5 .
Figure 5. AIGC provided by generative AI can achieve many kinds of tasks about medical image including image translation (e.g., CT to MRI), image reconstruction (e.g., low-dose CT to normaldose CT), medical image augmentation (generation).

Figure 6 .
Figure 6.Generative artificial intelligence models for medical image modality migration.I. Generative adversarial network-based low-dose CT image reconstruction network (GAN-NETL [72]): using noise encoding networks to extract noise pattern information, while introducing migration learning methods in network training to transfer noise patterns of synthetic LDCT images; II.Diffusion model-based MR image reconstruction network (CoreDiff[73]): a recovery network using contextual information to constrain the sampling process is introduced on the basis of the diffusion model.

Figure 7 .
Figure 7. Generative artificial intelligence models for medical image translation.I. MR-CT translation network based on Diffusion Model (DDMM-Syth [109]: integration of both an implicit data distribution prior mapping from MRI to CT images and effective information derived from sparse sampled CT measurements); II.MR-PET translation network based on GAN (CF-SAGAN [108]: with conditional flexible self-attention); III.multimodal MR image translation network based on VAE (VAE-UDA [106]: with unsupervised domain adaptation).

Figure 9 .
Figure 9. AIGC provided by generative AI can achieve many kinds of tasks about medical text including medical dialogue (unimodal or multimodal), text processing (summarization & analysis) and text generation (based on prompt).
1) Training a continuous time-dependent score function with denoising score matching;
1) Combining the low-rank structural-Hankel matrix with the diffusion model to generate the ideal sinogram from the low-dose projection data; 2) Introduction of penalized weighted least-squares (PWLS) and TV to achieve superior image quality.Note: 1 https://brain-development.org/ixi-dataset 3. Propose an approach for adapting condition weights automatically.Continued on next page V olume 21, Issue1, 1672-1711.3https://www.adni-info.org/The letter after the related works represents which model the method is based on: (D): Diffusion Model, (G): Generative Adversarial Network (GAN), (V): Variational Autoencoder (VAE)