2025 Volume 7 Issue 4 Published: 30 July 2025
  

  • Select all
    |
    Topic--Intelligent Sensing and Grading of Agricultural Product Quality
  • Topic--Intelligent Sensing and Grading of Agricultural Product Quality
    HUYan, WANGYujie, ZHANGXuechen, ZHANGYiqiang, YUHuahao, SONGXinbei, YESitan, ZHOUJihong, CHENZhenlin, ZONGWeiwei, HEYong, LIXiaoli
    Abstract ( ) PDF ( ) HTML ( ) Knowledge map Save

    [Objective] Fu brick tea is a popular fermented black tea, and its "Jin hua" fermentation process determines the quality, flavor and function of the tea. Therefore, the establishment of a rapid and non-destructive detection method for the fungal fermentation stage is of great significance to improve the quality control and processing efficiency. [Methods] The variation trend of Fu brick tea was analyzed through the acquisition of visible-near-infrared (VIS-NIR) and near-infrared (NIR) hyperspectral images during the fermentation stage, and combined with the key quality indexes such as moisture, free amino acids, tea polyphenols, and tea pigments (including theaflavins, thearubigins, and theabrownines), the variation trend was analyzed. This study combined support vector machine (SVM) and convolutional neural network (CNN) to establish quantitative detection of key quality indicators and qualitative identification of the fungal fermentation stage. To enhance model performance, the squeeze-and-excitation (SE) attention mechanism was incorporated, which strengthens the adaptive weight adjustment of feature channels, resulting in the development of the Spectra-SE-CNN model. Additionally, t-distributed stochastic neighbor embedding (t-SNE) was used for feature dimensionality reduction, aiding in the visualization of feature distributions during the fermentation process. To improve the interpretability of the model, the Grad-CAM technique was employed for CNN and Spectra-SE-CNN visualization, helping to identify the key regions the model focuses on. [Results and Discussions] In the quantitative detection of Fu brick tea quality, the best models were all Spectra-SE-CNN, with R2p of 0.859 5, 0.852 5 and 0.838 3 for moisture, tea pigments and tea polyphenols, respectively, indicating a high correlation and modeling stability. These values suggest that the models were capable of accurately predicting these key quality indicators based on hyperspectral data. However, the R2p for free amino acids was lower (0.670 2), which could be attributed to their relatively minor changes during the fermentation process or a weak spectral response, making it more challenging to detect this component reliably with the current hyperspectral imaging approach. The Spectra-SE-CNN model significantly outperformed traditional CNN models, demonstrating the effectiveness of incorporating the SE attention mechanism. The SE attention mechanism enhanced the model's ability to extract and discriminate important spectral features, thereby improving both classification accuracy and generalization. This indicated that the Spectra-SE-CNN model excels not only in feature extraction but also in enhancing the model's robustness to variations in the fermentation stage. Furthermore, t-SNE revealed a clear separation of the different fungal fermentation stages in the low-dimensional space, with distinct boundaries. This visualization highlighted the model's ability to distinguish between subtle spectral differences during the fermentation process. The heatmap generated by Grad-CAM emphasized key regions, such as the fermentation location and edges, providing valuable insights into the specific features the model deemed important for accurate predictions. This improved the model's transparency and helped validate the spectral features that were most influential in identifying the fermentation stages. [Conclusions] A Spectra-SE-CNN model was proposed in this research, which incorporates the SE attention mechanism into a convolutional neural network to enhance spectral feature learning. This architecture adaptively recalibrates channel-wise feature responses, allowing the model to focus on informative spectral bands and suppress irrelevant signals. As a result, the Spectra-SE-CNN achieved improved classification accuracy and training efficiency compared to CNN models, demonstrating the strong potential of deep learning in hyperspectral spectral feature extraction. The findings validate Hyperspectral imaging technology(HIS) enables rapid, non-destructive, and high-resolution assessment of Fu brick tea during its critical fungal fermentation stage and the feasibility of integrating HSI with intelligent algorithms for real-time monitoring of the Fu brick tea fermentation process. Furthermore, this approach offers a pathway for broader applications of hyperspectral imaging and deep learning in intelligent agricultural product monitoring, quality control, and automation of traditional fermentation processes.

  • Topic--Intelligent Sensing and Grading of Agricultural Product Quality
    YANGQilang, YULu, LIANGJiaping
    Abstract ( ) PDF ( ) HTML ( ) Knowledge map Save

    [Objective]Asparagus officinalis L. is a perennial plant with a long harvesting cycle and fast growth rate. The harvesting period of tender stems is relatively concentrated, and the shelf life of tender stems is very short. Therefore, the harvested asparagus needs to be classified according to the specifications of asparagus in a short time and then packaged and sold. However, at this stage, the classification of asparagus specifications basically depends on manual work, and it is difficult for asparagus of different specifications to rely on sensory grading, which requires a lot of money and labor. To save labor costs, an algorithm based on asparagus stem diameter classification was developed using deep learning and computer vision technology. YOLOv11 was selected as the baseline model and several improvements were made to propose a lightweight model for accurate grading of post-harvest asparagus. [Methods] Dataset was obtained by cell phone photography of post-harvest asparagus using fixed camera positions. In order to improve the generalization ability of the model, the training set was augmented with data by increasing contrast, mirroring, and adjusting brightness. The data-enhanced training set included a total of 2 160 images for training the model, and the test set and validation set included 90 and 540 images respectively for inference and validation of the model. In order to enhance the performance of the improved model, the following four improvements were made to the baseline model, respectively. First, the efficient channel attention (ECA) module was added to the twelfth layer of the YOLOv11 backbone network. The ECA enhanced asparagus stem diameter feature extraction by dynamically adjusting channel weights in the convolutional neural network and improved the recognition accuracy of the improved model. Second, the bi-directional feature pyramid network (BiFPN) module was integrated into the neck network. This module modified the original feature fusion method to automatically emphasize key asparagus features and improved the grading accuracy through multi-scale feature fusion. What's more, BiFPN dynamically adjusted the importance of each layer to reduce redundant computations. Next, the slim-neck module was applied to optimize the neck network. The slim-neck module consisted of GSConv and VoVGSCSP. The GSConv module replaced the traditional convolutional. And the VoVGSCSP module replaced the C2k3 module. This optimization reduced computational costs and model size while improving the recognition accuracy. Finally, the original YOLOv11 detection head was replaced with an EfficientDet Head. EfficientDet Head had the advantages of light weight and high accuracy. This head co-training with BiFPN to enhance the effect of multi-scale fusion and improve the performance of the model. [Results and Discussions] In order to verify the validity of the individual modules introduced in the improved YOLOv11 model and the superiority of the performance of the improved model, ablation experiments and comparison experiments were conducted respectively. The results of the comparison test between different attentional mechanisms added to the baseline model showed that the ECA module had better performance than other attentional mechanisms in the post-harvest asparagus grading task. The YOLOv11-ECA had higher recognition accuracy and smaller model size, so the selection of the ECA module had a certain degree of reliability. Ablation experiments demonstrated that the improved YOLOv11 achieved 96.8% precision (P), 96.9% recall (R), and 92.5% mean average precision (mAP), with 4.6 GFLOPs, 1.67 × 10⁶ parameters, and a 3.6 MB model size. The results of the asparagus grading test indicated that the localization frames of the improved model were more accurate and had a higher confidence level. Compared with the original YOLOv11 model, the improved YOLOv11 model increased the precision, recall, and mAP by 2.6, 1.4, and 2.2 percentage points, respectively. And the floating-point operation, parameter quantity, and model size were reduced by 1.7 G, 9.1 × 105, and 1.6 MB, respectively. Moreover, various improvements to the model could increase the accuracy of the model while ensuring that the model was light weight. In addition, the results of the comparative tests showed that the performance of the improved YOLOv11 model was better than those of SSD, YOLOv5s, YOLOv8n, YOLOv11, and YOLOv12. Overall, the improved YOLOv11 had the best overall performance, but still had some shortcomings. In terms of the real-time performance of the model, the inference speed of the improved model was not optimal, and the inference speed of the improved YOLOv11 was inferior to that of YOLOv5s and YOLOv8n. The inference speed of improved YOLOv11 and YOLOv11 evaluate using the aggregate test. The results of the Wilcoxon signed-rank test showed that the improved YOLOv11 had a significant improvement in inference speed compared to the original YOLOv11 model. [Conclusions] The improved YOLOv11 model demonstrated better recognition, lower parameters and floating-point operations, and smaller model size in the asparagus grading task. The improved YOLOv11 could provide a theoretical foundation for intelligent post-harvest asparagus grading. Deploying the improved YOLOv11 model on asparagus grading equipment enables fast and accurate grading of post-harvest asparagus.

  • Information Processing and Decision Making
  • Information Processing and Decision Making
    WANGYi, XUERong, HANWenting, SHAOGuomin, HOUYanqiao, CUIXitong
    Abstract ( ) PDF ( ) HTML ( ) Knowledge map Save

    [Objective] Maize is one of the most widely cultivated staple crops worldwide, and its aboveground biomass (AGB) serves as a crucial indicator for evaluating crop growth status. Accurate estimation of maize AGB is vital for ensuring food security and enhancing agricultural productivity. However, maize AGB is influenced by a multitude of dynamic factors, exhibiting complex spatial and temporal variations that pose significant challenges to precise estimation. At present, most studies on maize AGB estimation rely primarily on single-source remote sensing data and conventional machine learning algorithms, which limits the accuracy and generalizability of the models. To overcome these limitations, a model architecture that integrates convolutional neural networks (CNN), long short-term memory networks (LSTM), and a self-attention (SA) mechanism was developed in this research to estimate maize AGB at the field scale. [Methods] The research utilized vegetation indices, crop parameters, and meteorological data that were collected under varying gradient water treatments in the experimental area. First, an optimized CNN-LSTM-SA model was constructed. The model employed two-dimensional convolutional layers to extract both spatial and temporal features, while utilizing max-pooling and dropout techniques to mitigate overfitting. The LSTM module was used to capture temporal dependencies in the data. The SA mechanism was introduced to compute global attention weights, enhancing the representation of critical time steps. Nonlinear activation functions were applied to mitigate multicollinearity among features. A fully connected layer was used to output the estimated AGB values. Second, the Pearson correlation coefficients between influencing factors and maize AGB were analyzed, and the importance of multi-source data was validated. recursive feature elimination (RFE) was used to select the optimal input features. The local interpretable model-agnostic explanations (LIME) method was employed to interpret individual samples. Finally, ablation experiments were conducted to assess the effects of incorporating CNN and SA into the model, with performance comparisons made against random forest (RF) and support vector machine (SVM) models. [Results and Discussions] The correlation analysis revealed that crop parameters exhibited strong correlations with AGB. Among the vegetation indices, the improved normalized difference red edge index (NDREI) demonstrated the highest correlation (r = 0.63). To address multicollinearity issues, the visible atmospherically resistant index (VARI), soil adjusted vegetation index (SAVI), and normalized difference red edge index (NDRE) were excluded from the analysis. The CNN-LSTM-SA model integrated crop parameters, vegetation indices, and meteorological data and initially achieved a coefficient of determination (R2) of 0.89, a root mean square error (RMSE) of 129.38 g/m2, and a mean absolute error (MAE) of 65.99 g/m2. When only vegetation indices and meteorological data were included, the model yielded an R2 of 0.83, an RMSE of 161.36 g/m2, and an MAE of 89.37 g/m2. Using a single vegetation index further reduced model accuracy. Based on multi-source data integration, RFE removed redundant features. After excluding the 2-meter average wind speed, the model reached its best performance with R2 of 0.92, RMSE of 107.53 g/m2, and MAE of 55.19 g/m2. Using the LIME method to interpret feature contributions for individual maize samples, the analysis revealed that during the rapid growth stage, the model was primarily influenced by the current growth status and vegetation indices. For samples in the mid-growth stage, multi-day crop physiological characteristics had a substantial impact on model predictions. In the late growth stage, higher vegetation index values showed a clear suppressive effect on the model outputs. During the mid-growth stage of maize under varying moisture conditions, the model consistently demonstrated heightened sensitivity to low temperatures, moderate humidity levels, and optimal vegetation indices. The CNN-LSTM-SA model demonstrated more consistent fitting performance and accuracy across different growth stages and water conditions compared to the LSTM, LSTM-SA, and CNN-LSTM models. Additionally, it also exceeded the performance of the RF model and the SVM model in all evaluation metrics. [Conclusions] This study leveraged the feature extraction capabilities of CNN, the temporal modeling strength of LSTM, and the dynamic attention mechanism of the SA to enhance the accuracy of maize AGB estimation from a spatiotemporal perspective. The approach not only reduced estimation errors but also improved model interpretability. This research could provide valuable insights and references for the dynamic modeling of crop AGB.

  • Information Processing and Decision Making
    XUWenwen, YUKejian, DAIZexu, WUYunzhi
    Abstract ( ) PDF ( ) HTML ( ) Knowledge map Save

    [Objective] As one of the world's largest cash crops in terms of total production value, grape has a yield whose accurate estimation is crucial for agricultural and economic development. However, at present, grape yield prediction is difficult and costly, detection of green grape varieties with similar colors of grape berries and grape leaves has limitations, and detection of grape bunches with small berries is ineffective. In order to solve the above problems, a multimodal detection framework is proposed based on transfer learning, which aims to realize the detection and counting of different varieties of grapes, so as to provide reliable technical support for grape yield prediction and intelligent management of orchards. [Methods] A multimodal grape detection framework based on transfer learning was proposed. This transfer learning utilized the feature representation capabilities of pretrained models, requiring only a small number of grape images for fine-tuning to adapt to the task. This approach not only reduced labeling costs but also enhanced the ability to capture grape features effectively. The multimodal framework adopted a dual-encoder-single-decoder structure, consisting of three core modules: the image and text feature extraction and enhancement module, the language-guided query selection module, and the cross-modality decoder module. In the feature extraction stage, the framework employed pretrained models from public datasets for transfer learning, which significantly reduced the training time and costs of the model on the target task while effectively improving the capability to capture grape features. By introducing a feature enhancement module, the framework achieved cross-modality fusion effects between grape images and text. Additionally, the attention mechanism was implemented to enhance both image and text features, facilitating cross-modality feature learning between images and text. During the cross-modality query selection phase, the framework utilized a language-guided query selection strategy that enabled the filtering of queries from grape images. This strategy allowed for a more effective use of input text to guide the object in target detection, selecting features that were more relevant to the input text as queries for the decoder. The cross-modality decoder combined the features from grape images and text modalities to achieve more accurate modality alignment, thereby facilitating a more effective fusion of grape image and text information, ultimately producing the corresponding grape prediction results. Finally, to comprehensively evaluate the model's performance, the mean average precision (mAP) and average recall (AR) were adopted as evaluation metrics for the detection task, while the counting task was quantified using the mean absolute error (MAE) and root mean square error (RMSE) as assessment indicators. [Results and Discussions] This method exhibited optimal performance in both detection and counting when compared to nine baseline models. Specifically, a comprehensive evaluation was conducted on the WGISD public dataset, where the method achieved an mAP50 of 80.3% in the detection task, representing a 2.7 percentage points improvement over the second-best model. Additionally, it reached 53.2% mAP and 58.2% mAP75, surpassing the second-best models by 13.4 and 22 percent points, respectively, and achieved an mAR of 76.5%, which was 9.8 percent points increase over the next best model. In the counting task, the method realized a MAE of 1.65 and an RMSE of 2.48, outperforming all other baseline models in counting effectiveness. Furthermore, experiments were conducted using a total of nine grape varieties from both the WGISD dataset and field-collected data, resulting in an mAP50 of 82.5%, an mAP of 58.5%, an mAP75 of 64.4%, an mAR of 77.1%, an MAE of 1.44, and an RMSE of 2.19. These results demonstrated the model's strong adaptability and effectiveness across diverse grape varieties. Notably, the method not only performed well in identifying large grape clusters but also showed superior performance on smaller grape clusters, achieving an mAP_s of 74.2% in the detection task, which was 9.5 percent points improvement over the second-best model. Additionally, to provide a more intuitive assessment of model performance, this study selected grape images from the test set for visual comparison analysis. The results revealed that the model's detection and counting outcomes for grape clusters closely aligned with the original annotation information from the label dataset. Overall, this method demonstrated strong generalization capabilities and higher accuracy under various environmental conditions for different grape varieties. This technology has the potential to be applied in estimating total orchard yield and reducing pre-harvest measurement errors, thereby effectively enhancing the precision management level of vineyards. [Conclusions] The proposed method achieved higher accuracy and better adaptability in detecting five grape varieties compared to other baseline models. Furthermore, the model demonstrated substantial practicality and robustness across nine different grape varieties. These findings suggested that the method developed in this study had significant application potential in grape detection and counting tasks. It could provide strong technical support for the intelligent development of precision agriculture and the grape cultivation industry, highlighting its promising prospects in enhancing agricultural practices.