Cross-Modal Generative Semantic Communications for Mobile AIGC: Joint Semantic Encoding and Prompt Engineering
arxiv(2024)
摘要
Employing massive Mobile AI-Generated Content (AIGC) Service Providers
(MASPs) with powerful models, high-quality AIGC services can become accessible
for resource-constrained end users. However, this advancement, referred to as
mobile AIGC, also introduces a significant challenge: users should download
large AIGC outputs from the MASPs, leading to substantial bandwidth consumption
and potential transmission failures. In this paper, we apply cross-modal
Generative Semantic Communications (G-SemCom) in mobile AIGC to overcome
wireless bandwidth constraints. Specifically, we utilize a series of
cross-modal attention maps to indicate the correlation between user prompts and
each part of AIGC outputs. In this way, the MASP can analyze the prompt context
and filter the most semantically important content efficiently. Only semantic
information is transmitted, with which users can recover the entire AIGC output
with high quality while saving mobile bandwidth. Since the transmitted
information not only preserves the semantics but also prompts the recovery, we
formulate a joint semantic encoding and prompt engineering problem to optimize
the bandwidth allocation among users. Particularly, we present a
human-perceptual metric named Joint Perpetual Similarity and Quality (JPSQ),
which is fused by two learning-based measurements regarding semantic similarity
and aesthetic quality, respectively. Furthermore, we develop the
Attention-aware Deep Diffusion (ADD) algorithm, which learns attention maps and
leverages the diffusion process to enhance the environment exploration ability.
Extensive experiments demonstrate that our proposal can reduce the bandwidth
consumption of mobile users by 49.4
difference in AIGC output quality. Moreover, the ADD algorithm shows superior
performance over baseline DRL methods, with 1.74x higher overall reward.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要