Loading...
Thumbnail Image
Item

Machine learning for predicting aroma, flavour and preference of foods and beverages: Case studies with small and imbalanced datasets

Citations
Google Scholar:
Altmetric:
Date
2026-06
Type
Journal Article
Abstract
Datasets for training and testing machine learning models to predict intensities of aroma and flavour attributes and preference ratings for foods and beverages from instrumental measurements are often limited in size due to the cost and effort of producing and analysing samples. A consequence can be poor and unreliable predictive performance of models. Case studies using published white wine, citrus peel and yoghurt datasets have been conducted to examine potential solutions. Selecting expressive instrumental measurements that meaningfully reflect sensory attributes or drivers of preference is important for accurate prediction. Using a categorical scale with a limited number of categories for sensory evaluation can help ensure sufficient samples are allocated to each category, improving model generalisability and reducing prediction bias. After data collection, accuracy can be increased by merging categories to blend out a portion of any sensory noise and balance samples more evenly across the remaining categories. The latter can be helpful if techniques such as Synthetic Minority Over-sampling Technique (SMOTE) prove unsatisfactory for addressing prediction bias in models trained on imbalanced datasets.
Rights
© 2026 The Authors. Published by Elsevier Ltd.
Creative Commons Rights
Attribution
Access Rights