東華大學圖書館 |

Food Image Retrieval and Generation.

Record Type:	Electronic resources : Monograph/item
Title/Author:	Food Image Retrieval and Generation./
Author:	Han, Fangda.
Published:	Ann Arbor : ProQuest Dissertations & Theses, : 2022,
Description:	118 p.
Notes:	Source: Dissertations Abstracts International, Volume: 83-08, Section: B.
Contained By:	Dissertations Abstracts International83-08B.
Subject:	Computer science. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28962934
ISBN:	9798790631016

Food Image Retrieval and Generation.
Han, Fangda.

Food Image Retrieval and Generation. - Ann Arbor : ProQuest Dissertations & Theses, 2022 - 118 p.

Source: Dissertations Abstracts International, Volume: 83-08, Section: B.

Thesis (Ph.D.)--Rutgers The State University of New Jersey, School of Graduate Studies, 2022.

This item must not be sold to any third party vendors.

The domain of analysis and synthesis of food images is gaining increasing research interest due to its widespread applications in cooking and diet management. For instance, retrieval of food images from textual prompts can help speed up the cooking process. Likewise, extracting nutritional information from meal images can help monitor daily nutrient intake, facilitating diet management. The computational synthesis of photo-realistic food images is complementary to food image analysis. As an essential element of modeling the food image data, the synthesis also enables novel applications such as augmenting cooking instructions with multimedia content and gamiﬁcation of the food creation process to promote healthy eating habits amongst children. This dissertation focuses on developing computational tools for food image retrieval and generation.Our food image retrieval algorithm leverages the auxiliary information capturing relationships between related text-image pairs to regularize the latent space of food instructions and food images. Speciﬁcally, we develop a Coherence Aware Module (CAM) to augment the traditional text-to-image retrieval pipeline. The CAM is then trained to predict the auxiliary coherence relations that systematically characterize possible forms of relationship between related text-image pairs. Capturing these coherence relations has the effect of regularizing the learning of latent space embeddings of text-image pairs, resulting in accurate retrieval. Moreover, we show how CAM can be used to reﬁne queries during inference using the process of Selective Similarity Reﬁnement (SSR). Both CAM and SSR lead to signiﬁcant performance improvements in general text-image retrieval systems.Next, we develop a food image generation algorithm to generate images conditioned on multiple ingredients. First, we propose CookGAN, a novel extension of StackGANv2 with an explicit regularization in the form of Cycle Consistent Constraint (Cyc-constraint). Speciﬁcally, Cyc-constraint utilizes the pre-trained retrieval system discussed above to regularize the generation process and helps the model in generating images that more accurately reﬂect the desired content. However, CookGAN suffers from image blurring due to the limitation of model capacity. To address this problem, we propose Multi-ingredient Pizza Generator (MPG), an image synthesis approach that extends the StyleGAN2 architecture using a controllable conditioning input paradigm. Speciﬁcally, the control of ingredients relies on Scalewise Label Encoder (SLE) which helps the model to be strongly conditioned on the input ingredients while maintaining StyleGAN2's excellent image quality. To verify the efﬁcacy of MPG, we validate it on Pizza10, which is a carefully annotated dataset of multi-ingredient pizza images. We show that MPG can successfully generate photo-realistic pizza images with the desired ingredients.However, while MPG can generate content-speciﬁc food images, it cannot control other image variation factors, such as the pizza shape, scale, or position, which are not available in the training data. To solve this problem, we propose Multi-attribute Pizza Generator (MPG2), together with Multi-Scale Multi-Attribute Encoder (MSMAE) and Attribute Regularizer (AR), targeting control of both ingredients and geometric attributes. We propose a cross-domain training schema to synthesize pizza images with the view attributes absent in the training dataset. This schema combines fully controllable computer graphics generated images (CGIs) with the partially annotated real-world data. To this end, we employ a view attribute regressor estimated on the CGI data to regularize the real-world food image generation process, thereby bridging the real-world and CGI training domains.

ISBN: 9798790631016Subjects--Topical Terms:

523869
Computer science.
Subjects--Index Terms:

Computer vision

Food Image Retrieval and Generation.
LDR:04972nmm a2200373 4500 001 2351438
005 20221107085644.5
008 241004s2022 ||||||||||||||||| ||eng d
020 $a 9798790631016
035 $a (MiAaPQ)AAI28962934
035 $a AAI28962934
040 $a MiAaPQ $c MiAaPQ
100 1 $a Han, Fangda. $0 (orcid)0000-0002-8663-2185 $3 3691010
245 1 0 $a Food Image Retrieval and Generation.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2022
300 $a 118 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-08, Section: B.
500 $a Advisor: Pavlovic, Vladimir.
502 $a Thesis (Ph.D.)--Rutgers The State University of New Jersey, School of Graduate Studies, 2022.
506 $a This item must not be sold to any third party vendors.
520 $a The domain of analysis and synthesis of food images is gaining increasing research interest due to its widespread applications in cooking and diet management. For instance, retrieval of food images from textual prompts can help speed up the cooking process. Likewise, extracting nutritional information from meal images can help monitor daily nutrient intake, facilitating diet management. The computational synthesis of photo-realistic food images is complementary to food image analysis. As an essential element of modeling the food image data, the synthesis also enables novel applications such as augmenting cooking instructions with multimedia content and gamiﬁcation of the food creation process to promote healthy eating habits amongst children. This dissertation focuses on developing computational tools for food image retrieval and generation.Our food image retrieval algorithm leverages the auxiliary information capturing relationships between related text-image pairs to regularize the latent space of food instructions and food images. Speciﬁcally, we develop a Coherence Aware Module (CAM) to augment the traditional text-to-image retrieval pipeline. The CAM is then trained to predict the auxiliary coherence relations that systematically characterize possible forms of relationship between related text-image pairs. Capturing these coherence relations has the effect of regularizing the learning of latent space embeddings of text-image pairs, resulting in accurate retrieval. Moreover, we show how CAM can be used to reﬁne queries during inference using the process of Selective Similarity Reﬁnement (SSR). Both CAM and SSR lead to signiﬁcant performance improvements in general text-image retrieval systems.Next, we develop a food image generation algorithm to generate images conditioned on multiple ingredients. First, we propose CookGAN, a novel extension of StackGANv2 with an explicit regularization in the form of Cycle Consistent Constraint (Cyc-constraint). Speciﬁcally, Cyc-constraint utilizes the pre-trained retrieval system discussed above to regularize the generation process and helps the model in generating images that more accurately reﬂect the desired content. However, CookGAN suffers from image blurring due to the limitation of model capacity. To address this problem, we propose Multi-ingredient Pizza Generator (MPG), an image synthesis approach that extends the StyleGAN2 architecture using a controllable conditioning input paradigm. Speciﬁcally, the control of ingredients relies on Scalewise Label Encoder (SLE) which helps the model to be strongly conditioned on the input ingredients while maintaining StyleGAN2's excellent image quality. To verify the efﬁcacy of MPG, we validate it on Pizza10, which is a carefully annotated dataset of multi-ingredient pizza images. We show that MPG can successfully generate photo-realistic pizza images with the desired ingredients.However, while MPG can generate content-speciﬁc food images, it cannot control other image variation factors, such as the pizza shape, scale, or position, which are not available in the training data. To solve this problem, we propose Multi-attribute Pizza Generator (MPG2), together with Multi-Scale Multi-Attribute Encoder (MSMAE) and Attribute Regularizer (AR), targeting control of both ingredients and geometric attributes. We propose a cross-domain training schema to synthesize pizza images with the view attributes absent in the training dataset. This schema combines fully controllable computer graphics generated images (CGIs) with the partially annotated real-world data. To this end, we employ a view attribute regressor estimated on the CGI data to regularize the real-world food image generation process, thereby bridging the real-world and CGI training domains.
590 $a School code: 0190.
650 4 $a Computer science. $3 523869
650 4 $a Food science. $3 3173303
650 4 $a Artificial intelligence. $3 516317
650 4 $a Information science. $3 554358
653 $a Computer vision
653 $a Deep learning
653 $a Generative model
653 $a Machine learning
690 $a 0984
690 $a 0723
690 $a 0800
690 $a 0359
710 2 $a Rutgers The State University of New Jersey, School of Graduate Studies. $b Computer Science. $3 3428998
773 0 $t Dissertations Abstracts International $g 83-08B.
790 $a 0190
791 $a Ph.D.
792 $a 2022
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28962934