Most Read
  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • Topic--Intelligent Agricultural Knowledge Services and Smart Unmanned Farms(Part 1)
    ZHAOChunjiang, LIJingchen, WUHuarui, YANGYusen
    Smart Agriculture. 2024, 6(6): 63-71. https://doi.org/10.12133/j.smartag.SA202410008

    [Objective] In the era of digital agriculture, real-time monitoring and predictive modeling of crop growth are paramount, especially in autonomous farming systems. Traditional crop growth models, often constrained by their reliance on static, rule-based methods, fail to capture the dynamic and multifactorial nature of vegetable crop growth. This research tried to address these challenges by leveraging the advanced reasoning capabilities of pre-trained large language models (LLMs) to simulate and predict vegetable crop growth with accuracy and reliability. Modeling the growth of vegetable crops within these platforms has historically been hindered by the complex interactions among biotic and abiotic factors. [Methods] The methodology was structured in several distinct phases. Initially, a comprehensive dataset was curated to include extensive information on vegetable crop growth cycles, environmental conditions, and management practices. This dataset incorporates continuous data streams such as soil moisture, nutrient levels, climate variables, pest occurrence, and historical growth records. By combining these data sources, the study ensured that the model was well-equipped to understand and infer the complex interdependencies inherent in crop growth processes. Then, advanced techniques was emploied for pre-training and fine-tuning LLMs to adapt them to the domain-specific requirements of vegetable crop modeling. A staged intelligent agent ensemble was designed to work within the digital twin platform, consisting of a central managerial agent and multiple stage-specific agents. The managerial agent was responsible for identifying transitions between distinct growth stages of the crops, while the stage-specific agents were tailored to handle the unique characteristics of each growth phase. This modular architecture enhanced the model's adaptability and precision, ensuring that each phase of growth received specialized attention and analysis. [Results and Discussions] The experimental validation of this method was conducted in a controlled agricultural setting at the Xiaotangshan Modern Agricultural Demonstration Park in Beijing. Cabbage (Zhonggan 21) was selected as the test crop due to its significance in agricultural production and the availability of comprehensive historical growth data. Over five years, the dataset collected included 4 300 detailed records, documenting parameters such as plant height, leaf count, soil conditions, irrigation schedules, fertilization practices, and pest management interventions. This dataset was used to train the LLM-based system and evaluate its performance using ten-fold cross-validation. The results of the experiments demonstrating the efficacy of the proposed system in addressing the complexities of vegetable crop growth modeling. The LLM-based model achieved 98% accuracy in predicting crop growth degrees and a 99.7% accuracy in identifying growth stages. These metrics significantly outperform traditional machine learning approaches, including long short-term memory (LSTM), XGBoost, and LightGBM models. The superior performance of the LLM-based system highlights its ability to reason over heterogeneous data inputs and make precise predictions, setting a new benchmark for crop modeling technologies. Beyond accuracy, the LLM-powered system also excels in its ability to simulate growth trajectories over extended periods, enabling farmers and agricultural managers to anticipate potential challenges and make proactive decisions. For example, by integrating real-time sensor data with historical patterns, the system can predict how changes in irrigation or fertilization practices will impact crop health and yield. This predictive capability is invaluable for optimizing resource allocation and mitigating risks associated with climate variability and pest outbreaks. [Conclusions] The study emphasizes the importance of high-quality data in achieving reliable and generalizable models. The comprehensive dataset used in this research not only captures the nuances of cabbage growth but also provides a blueprint for extending the model to other crops. In conclusion, this research demonstrates the transformative potential of combining large language models with digital twin technology for vegetable crop growth modeling. By addressing the limitations of traditional modeling approaches and harnessing the advanced reasoning capabilities of LLMs, the proposed system sets a new standard for precision agriculture. Several avenues also are proposed for future work, including expanding the dataset, refining the model architecture, and developing multi-crop and multi-region capabilities.

  • Information Processing and Decision Making
    LIZusheng, TANGJishen, KUANGYingchun
    Smart Agriculture. 2025, 7(2): 146-159. https://doi.org/10.12133/j.smartag.SA202412003

    Objective The accuracy of identifying litchi pests is crucial for implementing effective control strategies and promoting sustainable agricultural development. However, the current detection of litchi pests is characterized by a high percentage of small targets, which makes target detection models challenging in terms of accuracy and parameter count, thus limiting their application in real-world production environments. To improve the identification efficiency of litchi pests, a lightweight target detection model YOLO-LP (YOLO-Litchi Pests) based on YOLOv10n was proposed. The model aimed to enhance the detection accuracy of small litchi pest targets in multiple scenarios by optimizing the network structure and loss function, while also reducing the number of parameters and computational costs. Methods Two classes of litchi insect pests (Cocoon and Gall) images were collected as datasets for modeling in natural scenarios (sunny, cloudy, post-rain) and laboratory environments. The original data were expanded through random scaling, random panning, random brightness adjustments, random contrast variations, and Gaussian blurring to balance the category samples and enhance the robustness of the model, generating a richer dataset named the CG dataset (Cocoon and Gall dataset). The YOLO-LP model was constructed after the following three improvements. Specifically, the C2f module of the backbone network (Backbone) in YOLOv10n was optimized and the C2f_GLSA module was constructed using the global-to-local spatial aggregation (GLSA) module to focus on small targets and enhance the differentiation between the targets and the backgrounds, while simultaneously reducing the number of parameters and computation. A frequency-aware feature fusion module (FreqFusion) was introduced into the neck network (Neck) of YOLOv10n and a frequency-aware path aggregation network (FreqPANet) was designed to reduce the complexity of the model and address the problem of fuzzy and shifted target boundaries. The SCYLLA-IoU (SIoU) loss function replaced the Complete-IoU (CIoU) loss function from the baseline model to optimize the target localization accuracy and accelerate the convergence of the training process. Results and Discussions YOLO-LP achieved 90.9%, 62.2%, and 59.5% for AP50, AP50:95, and AP-Small50:95 in the CG dataset, respectively, and 1.9%, 1.0%, and 1.2% higher than the baseline model. The number of parameters and the computational costs were reduced by 13% and 17%, respectively. These results suggested that YOLO-LP had a high accuracy and lightweight design. Comparison experiments with different attention mechanisms validated the effectiveness of the GLSA module. After the GLSA module was added to the baseline model, AP50, AP50:95, and AP-Small50:95 achieved the highest performance in the CG dataset, reaching 90.4%, 62.0%, and 59.5%, respectively. Experiment results comparing different loss functions showed that the SIoU loss function provided better fitting and convergence speed in the CG dataset. Ablation test results revealed that the validity of each model improvement and the detection performance of any combination of the three improvements was significantly better than the baseline model in the YOLO-LP model. The performance of the models was optimal when all three improvements were applied simultaneously. Compared to several mainstream models, YOLO-LP exhibited the best overall performance, with a model size of only 5.1 MB, 1.97 million parameters (Params), and a computational volume of 5.4 GFLOPs. Compared to the baseline model, the detection of the YOLO-LP performance was significantly improved across four multiple scenarios. In the sunny day scenario, AP50, AP50:95, and AP-Small50:95 increased by 1.9%, 1.0 %, and 2.0 %, respectively. In the cloudy day scenario, AP50, AP50:95, and AP-Small50:95 increased by 2.5%, 1.3%, and 1.3%, respectively. In the post-rain scenario, AP50, AP50:95, and AP-Small50:95 increased by 2.0%, 2.4%, and 2.4%, respectively. In the laboratory scenario, only AP50 increased by 0.7% over the baseline model. These findings indicated that YOLO-LP achieved higher accuracy and robustness in multi-scenario small target detection of litchi pests. Conclusions The proposed YOLO-LP model could improve detection accuracy and effectively reduce the number of parameters and computational costs. It performed well in small target detection of litchi pests and demonstrated strong robustness across different scenarios. These improvements made the model more suitable for deployment on resource-constrained mobile and edge devices. The model provided a valuable technical reference for small target detection of litchi pests in various scenarios.

  • Topic--Intelligent Sensing and Grading of Agricultural Product Quality
    YANGQilang, YULu, LIANGJiaping
    Smart Agriculture. 2025, 7(4): 84-94. https://doi.org/10.12133/j.smartag.SA202501024

    [Objective]Asparagus officinalis L. is a perennial plant with a long harvesting cycle and fast growth rate. The harvesting period of tender stems is relatively concentrated, and the shelf life of tender stems is very short. Therefore, the harvested asparagus needs to be classified according to the specifications of asparagus in a short time and then packaged and sold. However, at this stage, the classification of asparagus specifications basically depends on manual work, and it is difficult for asparagus of different specifications to rely on sensory grading, which requires a lot of money and labor. To save labor costs, an algorithm based on asparagus stem diameter classification was developed using deep learning and computer vision technology. YOLOv11 was selected as the baseline model and several improvements were made to propose a lightweight model for accurate grading of post-harvest asparagus. [Methods] Dataset was obtained by cell phone photography of post-harvest asparagus using fixed camera positions. In order to improve the generalization ability of the model, the training set was augmented with data by increasing contrast, mirroring, and adjusting brightness. The data-enhanced training set included a total of 2 160 images for training the model, and the test set and validation set included 90 and 540 images respectively for inference and validation of the model. In order to enhance the performance of the improved model, the following four improvements were made to the baseline model, respectively. First, the efficient channel attention (ECA) module was added to the twelfth layer of the YOLOv11 backbone network. The ECA enhanced asparagus stem diameter feature extraction by dynamically adjusting channel weights in the convolutional neural network and improved the recognition accuracy of the improved model. Second, the bi-directional feature pyramid network (BiFPN) module was integrated into the neck network. This module modified the original feature fusion method to automatically emphasize key asparagus features and improved the grading accuracy through multi-scale feature fusion. What's more, BiFPN dynamically adjusted the importance of each layer to reduce redundant computations. Next, the slim-neck module was applied to optimize the neck network. The slim-neck module consisted of GSConv and VoVGSCSP. The GSConv module replaced the traditional convolutional. And the VoVGSCSP module replaced the C2k3 module. This optimization reduced computational costs and model size while improving the recognition accuracy. Finally, the original YOLOv11 detection head was replaced with an EfficientDet Head. EfficientDet Head had the advantages of light weight and high accuracy. This head co-training with BiFPN to enhance the effect of multi-scale fusion and improve the performance of the model. [Results and Discussions] In order to verify the validity of the individual modules introduced in the improved YOLOv11 model and the superiority of the performance of the improved model, ablation experiments and comparison experiments were conducted respectively. The results of the comparison test between different attentional mechanisms added to the baseline model showed that the ECA module had better performance than other attentional mechanisms in the post-harvest asparagus grading task. The YOLOv11-ECA had higher recognition accuracy and smaller model size, so the selection of the ECA module had a certain degree of reliability. Ablation experiments demonstrated that the improved YOLOv11 achieved 96.8% precision (P), 96.9% recall (R), and 92.5% mean average precision (mAP), with 4.6 GFLOPs, 1.67 × 10⁶ parameters, and a 3.6 MB model size. The results of the asparagus grading test indicated that the localization frames of the improved model were more accurate and had a higher confidence level. Compared with the original YOLOv11 model, the improved YOLOv11 model increased the precision, recall, and mAP by 2.6, 1.4, and 2.2 percentage points, respectively. And the floating-point operation, parameter quantity, and model size were reduced by 1.7 G, 9.1 × 105, and 1.6 MB, respectively. Moreover, various improvements to the model could increase the accuracy of the model while ensuring that the model was light weight. In addition, the results of the comparative tests showed that the performance of the improved YOLOv11 model was better than those of SSD, YOLOv5s, YOLOv8n, YOLOv11, and YOLOv12. Overall, the improved YOLOv11 had the best overall performance, but still had some shortcomings. In terms of the real-time performance of the model, the inference speed of the improved model was not optimal, and the inference speed of the improved YOLOv11 was inferior to that of YOLOv5s and YOLOv8n. The inference speed of improved YOLOv11 and YOLOv11 evaluate using the aggregate test. The results of the Wilcoxon signed-rank test showed that the improved YOLOv11 had a significant improvement in inference speed compared to the original YOLOv11 model. [Conclusions] The improved YOLOv11 model demonstrated better recognition, lower parameters and floating-point operations, and smaller model size in the asparagus grading task. The improved YOLOv11 could provide a theoretical foundation for intelligent post-harvest asparagus grading. Deploying the improved YOLOv11 model on asparagus grading equipment enables fast and accurate grading of post-harvest asparagus.

  • Topic--Intelligent Agricultural Knowledge Services and Smart Unmanned Farms(Part 1)
    GAOQun, WANGHongyang, CHENShiyao
    Smart Agriculture. 2024, 6(6): 168-179. https://doi.org/10.12133/j.smartag.SA202404005

    [Objective] In order to summarize exemplary cases of high-quality development in regional smart agriculture and contribute strategies for the sustainable advancement of the national smart agriculture cause, the spatiotemporal characteristics and key driving factors of smart farms in the Yangtze River Economic Belt were studied. [Methods] Based on data from 11 provinces (municipalities) spanning the years 2014 to 2023, a comprehensive analysis was conducted on the spatio-temporal differentiation characteristics of smart farms in the Yangtze River Economic Belt using methods such as kernel density analysis, spatial auto-correlation analysis, and standard deviation ellipse. Including the overall spatial clustering characteristics, high-value or low-value clustering phenomena, centroid characteristics, and dynamic change trends. Subsequently, the geographic detector was employed to identify the key factors driving the spatio-temporal differentiation of smart farms and to discern the interactions between different factors. The analysis was conducted across seven dimensions: special fiscal support, industry dependence, human capital, urbanization, agricultural mechanization, internet infrastructure, and technological innovation. [Results and Discussions] Firstly, in terms of temporal characteristics, the number of smart farms in the Yangtze River Economic Belt steadily increased over the past decade. The year 2016 marked a significant turning point, after which the growth rate of smart farms had accelerated noticeably. The development of the upper, middle, and lower reaches exhibited both commonalities and disparities. Specifically, the lower sub-regions got a higher overall development level of smart farms, with a fluctuating upward growth rate; the middle sub-regions were at a moderate level, showing a fluctuating upward growth rate and relatively even provincial distribution; the upper sub-regions got a low development level, with a stable and slow growth rate, and an unbalanced provincial distribution. Secondly, in terms of spatial distribution, smart farms in the Yangtze River Economic Belt exhibited a dispersed agglomeration pattern. The results of global auto-correlation indicated that smart farms in the Yangtze River Economic Belt tended to be randomly distributed. The results of local auto-correlation showed that the predominant patterns of agglomeration were H-L and L-H types, with the distribution across provinces being somewhat complex; H-H type agglomeration areas were mainly concentrated in Sichuan, Hubei, and Anhui; L-L type agglomeration areas were primarily in Yunnan and Guizhou. The standard deviation ellipse results revealed that the mean center of smart farms in the Yangtze River Economic Belt had shifted from Anqing city in Anhui province in 2014 to Jingzhou city in Hubei province in 2023, with the spatial distribution showing an overall trend of shifting southwestward and a slow expansion toward the northeast and south. Finally, in terms of key driving factors, technological innovation was the primary critical factor driving the formation of the spatio-temporal distribution pattern of smart farms in the Yangtze River Economic Belt, with a factor explanatory degree of 0.311 1. Moreover, after interacting with other indicators, it continued to play a crucial role in the spatio-temporal distribution of smart farms, which aligned with the practical logic of smart farm development. Urbanization and agricultural mechanization levels were the second and third largest key factors, with factor explanatory degrees of 0.292 2 and 0.251 4, respectively. The key driving factors for the spatio-temporal differentiation of smart farms in the upper, middle, and lower sub-regions exhibited both commonalities and differences. Specifically, the top two key factors driver identification in the upper region were technological innovation (0.841 9) and special fiscal support (0.782 3). In the middle region, they were technological innovation (0.619 0) and human capital (0.600 1), while in the lower region, they were urbanization (0.727 6) and technological innovation (0.425 4). The identification of key driving factors and the detection of their interactive effects further confirmed that the spatio-temporal distribution characteristics of smart farms in the Yangtze River Economic Belt were the result of the comprehensive action of multiple factors. [Conclusions] The development of smart farms in the Yangtze River Economic Belt is showing a positive momentum, with both the total number of smart farms and the number of sub-regions experiencing stable growth. The development speed and level of smart farms in the sub-regions exhibit a differentiated characteristic of "lower reaches > middle reaches > upper reaches". At the same time, the overall distribution of smart farms in the Yangtze River Economic Belt is relatively balanced, with the degree of sub-regional distribution balance being "middle reaches (Hubei province, Hunan province, Jiangxi province are balanced) > lower reaches (dominated by Anhui) > upper reaches (Sichuan stands out)". The coverage of smart farm site selection continues to expand, forming a "northeast-southwest" horizontal diffusion pattern. In addition, the spatio-temporal characteristics of smart farms in the Yangtze River Economic Belt are the result of the comprehensive action of multiple factors, with the explanatory power of factors ranked from high to low as follows: Technological innovation > urbanization > agricultural mechanization > human capital > internet infrastructure > industry dependence > special fiscal support. Moreover, the influence of each factor is further strengthened after interaction. Based on these conclusions, suggestions are proposed to promote the high-quality development of smart farms in the Yangtze River Economic Belt. This study not only provides a theoretical basis and reference for the construction of smart farms in the Yangtze River Economic Belt and other regions, but also helps to grasp the current status and future trends of smart farm development.

  • Topic--Development and Application of the Big Data Platform for Grain Production
    ZHAOPeiqin, LIUChangbin, ZHENGJie, MENGYang, MEIXin, TAOTing, ZHAOQian, MEIGuangyuan, YANGXiaodong
    Smart Agriculture. 2025, 7(2): 106-116. https://doi.org/10.12133/j.smartag.SA202408009

    [Objective] Winter wheat yield is crucial for national food security and the standard of living of the population. Existing crop yield prediction models often show low accuracy under disaster-prone climatic conditions. This study proposed an improved hierarchical linear model (IHLM) based on a drought weather index reduction rate, aiming to enhance the accuracy of crop yield estimation under drought conditions. [Methods] HLM was constructed using the maximum enhanced vegetation index-2 (EVI2max), meteorological data (precipitation, radiation, and temperature from March to May), and observed winter wheat yield data from 160 agricultural survey stations in Shandong province (2018-2021). To validate the model's accuracy, 70% of the data from Shandong province was randomly selected for model construction, and the remaining data was used to validate the accuracy of the yield model. HLM considered the variation in meteorological factors as a key obstacle affecting crop growth and improved the model by calculating the relative meteorological factors. The calculation of relative meteorological factors helped reduce the impact of inter-annual differences in meteorological data. The accuracy of the HLM model was compared with that of the random forest (RF), Support Vector Regression (SVR), and Extreme Gradient Boosting (XGBoost) models. The HLM model provided more intuitive interpretation, especially suitable for processing hierarchical data, which helped capture the variability of winter wheat yield data under drought conditions. Therefore, a drought weather index reduction rate model from the agricultural insurance industry was introduced to further optimize the HLM model, resulting in the construction of the IHLM model. The IHLM model was designed to improve crop yield prediction accuracy under drought conditions. Since the precipitation differences between Henan and Shandong provinces were small, to test the transferability of the IHLM model, Henan province sample data was processed in the same way as in Shandong, and the IHLM model was applied to Henan province to evaluate its performance under different geographical conditions. [Results and Discussions] The accuracy of the HLM model, improved based on relative meteorological factors (rMF), was higher than that of RF, SVR, and XGBoost. The validation accuracy showed a Pearson correlation coefficient (r) of 0.76, a root mean squared error (RMSE) of 0.60 t/hm2, and a normalized RMSE (nRMSE) of 11.21%. In the drought conditions dataset, the model was further improved by incorporating the relationship between the winter wheat drought weather index and the reduction rate of winter wheat yield. After the improvement, the RMSE decreased by 0.48 t/hm2, and the nRMSE decreased by 28.64 percentage points, significantly enhancing the accuracy of the IHLM model under drought conditions. The IHLM model also demonstrated good applicability when transferred to Henan province. [Conclusions] The IHLM model developed in this study improved the accuracy and stability of crop yield predictions, especially under drought conditions. Compared to RF, SVR, and XGBoost models, the IHLM model was more suitable for predicting winter wheat yield. This research can be widely applied in the agricultural insurance field, playing a significant role in the design of agricultural insurance products, rate setting, and risk management. It enables more accurate predictions of winter wheat yield under drought conditions, with results that are closer to actual outcomes.

  • Topic--Intelligent Agricultural Knowledge Services and Smart Unmanned Farms(Part 1)
    CHEN Junlin, ZHAO Peng, CAO Xianlin, NING Jifeng, YANG Shuqin
    Smart Agriculture. 2024, 6(6): 132-143. https://doi.org/10.12133/j.smartag.SA202408001

    [Objective] Plug tray seedling cultivation is a contemporary method known for its high germination rates, uniform seedling growth, shortened transplant recovery period, diminished pest and disease incidence, and enhanced labor efficiency. Despite these advantages, challenges such as missing or underdeveloped seedlings can arise due to seedling quality and environmental factors. To ensure uniformity and consistency of the seedlings, sorting is frequently necessary, and the adoption of automated seedling sorting technology can significantly reduce labor costs. Nevertheless, the overgrowth of seedlings within the plugs can effect the accuracy of detection algorithms. A method for grading and locating strawberry seedlings based on a lightweight YOLOv8s model was presented in this research to effectively mitigate the interference caused by overgrown seedlings. [Methods] The YOLOv8s model was selected as the baseline for detecting different categories of seedlings in the strawberry plug tray cultivation process, namely weak seedlings, normal seedlings, and plug holes. To improve the detection efficiency and reduce the model's computational cost, the layer-adaptive magnitude-based pruning(LAMP) score-based channel pruning algorithm was applied to compress the base YOLOv8s model. The pruning procedure involved using the dependency graph to derive the group matrices, followed by normalizing the group importance scores using the LAMP Score, and ultimately pruning the channels according to these processed scores. This pruning strategy effectively reduced the number of model parameters and the overall size of the model, thereby significantly enhancing its inference speed while maintaining the capability to accurately detect both seedlings and plug holes. Furthermore, a two-stage seedling-hole matching algorithm was introduced based on the pruned YOLOv8s model. In the first stage, seedling and plug hole bounding boxes were matched according to their the degree of overlap (Dp), resulting in an initial set of high-quality matches. This step helped minimize the number of potential matching holes for seedlings exhibiting overgrowth. Subsequently, before the second stage of matching, the remaining unmatched seedlings were ranked according to their potential matching hole scores (S), with higher scores indicating fewer potential matching holes. The seedlings were then prioritized during the second round of matching based on these scores, thus ensuring an accurate pairing of each seedling with its corresponding plug hole, even in cases where adjacent seedling leaves encroached into neighboring plug holes. [Results and Discussions] The pruning process inevitably resulted in the loss of some parameters that were originally beneficial for feature representation and model generalization. This led to a noticeable decline in model performance. However, through meticulous fine-tuning, the model's feature expression capabilities were restored, compensating for the information loss caused by pruning. Experimental results demonstrated that the fine-tuned model not only maintained high detection accuracy but also achieved significant reductions in FLOPs (86.3%) and parameter count (95.4%). The final model size was only 1.2 MB. Compared to the original YOLOv8s model, the pruned version showed improvements in several key performance metrics: precision increased by 0.4%, recall by 1.2%, mAP by 1%, and the F1-Score by 0.1%. The impact of the pruning rate on model performance was found to be non-linear. As the pruning rate increased, model performance dropped significantly after certain crucial channels were removed. However, further pruning led to a reallocation of the remaining channels' weights, which in some cases allowed the model to recover or even exceed its previous performance levels. Consequently, it was necessary to experiment extensively to identify the optimal pruning rate that balanced model accuracy and speed. The experiments indicated that when the pruning rate reached 85.7%, the mAP peaked at 96.4%. Beyond this point, performance began to decline, suggesting that this was the optimal pruning rate for achieving a balance between model efficiency and performance, resulting in a model size of 1.2 MB. To further validate the improved model's effectiveness, comparisons were conducted with different lightweight backbone networks, including MobileNetv3, ShuffleNetv2, EfficientViT, and FasterNet, while retaining the Neck and Head modules of the original YOLOv8s model. Results indicated that the modified model outperformed these alternatives, with mAP improvements of 1.3%, 1.8%, 1.5%, and 1.1%, respectively, and F1-Score increases of 1.5%, 1.8%, 1.1%, and 1%. Moreover, the pruned model showed substantial advantages in terms of floating-point operations, model size, and parameter count compared to these other lightweight networks. To verify the effectiveness of the proposed two-stage seedling-hole matching algorithm, tests were conducted using a variety of complex images from the test set. Results indicated that the proposed method achieved precise grading and localization of strawberry seedlings even under challenging overgrowth conditions. Specifically, the correct matching rate for normal seedlings reached 96.6%, for missing seedlings 84.5%, and for weak seedlings 82.9%, with an average matching accuracy of 88%, meeting the practical requirements of the strawberry plug tray cultivation process. [Conclusions] The pruned YOLOv8s model successfully maintained high detection accuracy while reducing computational costs and improving inference speed. The proposed two-stage seedling-hole matching algorithm effectively minimized the interference caused by overgrown seedlings, accurately locating and classifying seedlings of various growth stages within the plug tray. The research provides a robust and reliable technical solution for automated strawberry seedling sorting in practical plug tray cultivation scenarios, offering valuable insights and technical support for optimizing the efficiency and precision of automated seedling grading systems.

  • Overview Article
    ZHANGZhiyong, CAOShanshan, KONGFantao, LIUJifang, SUNWei
    Smart Agriculture. 2025, 7(3): 48-68. https://doi.org/10.12133/j.smartag.SA202305005

    [Significance] Estrus monitoring and identification in cows is a crucial aspect of breeding management in beef and dairy cattle farming. Innovations in precise sensing and intelligent identification methods and technologies for estrus in cows are essential not only for scientific breeding, precise management, and smart breeding on a population level but also play a key supportive role in health management, productivity enhancement, and animal welfare improvement at the individual level. The aims are to provide a reference for scientific management and the study of modern production technologies in the beef and dairy cattle industry, as well as theoretical methodologies for the research and development of key technologies in precision livestock farming. [Progress] Based on describing the typical characteristics of normal and abnormal estrus in cows, this paper systematically categorizes and summarizes the recent research progress, development trends, and methodological approaches in estrus monitoring and identification technologies, focusing on the monitoring and diagnosis of key physiological signs and behavioral characteristics during the estrus period. Firstly, the paper outlines the digital monitoring technologies for three critical physiological parameters, body temperature, rumination, and activity levels, and their applications in cow estrus monitoring and identification. It analyzes the intrinsic reasons for performance bottlenecks in estrus monitoring models based on body temperature, compares the reliability issues faced by activity-based estrus monitoring, and addresses the difficulties in balancing model generalization and robustness design. Secondly, the paper examines the estrus sensing and identification technologies based on three typical behaviors: feeding, vocalization, and sexual desire. It highlights the latest applications of new artificial intelligence technologies, such as computer vision and deep learning, in estrus monitoring and points out the critical role of these technologies in improving the accuracy and timeliness of monitoring. Finally, the paper focuses on multi-factor fusion technologies for estrus perception and identification, summarizing how different researchers combine various physiological and behavioral parameters using diverse monitoring devices and algorithms to enhance accuracy in estrus monitoring. It emphasizes that multi-factor fusion methods can improve detection rates and the precision of identification results, being more reliable and applicable than single-factor methods. The importance and potential of multi-modal information fusion in enhancing monitoring accuracy and adaptability are underlined. The current shortcomings of multi-factor information fusion methods are analyzed, such as the potential impact on animal welfare from parameter acquisition methods, the singularity of model algorithms used for representing multi-factor information fusion, and inadequacies in research on multi-factor feature extraction models and estrus identification decision algorithms. [Conclusions and Prospects] From the perspectives of system practicality, stability, environmental adaptability, cost-effectiveness, and ease of operation, several key issues are discussed that need to be addressed in the further research of precise sensing and intelligent identification technologies for cow estrus within the context of high-quality development in digital livestock farming. These include improving monitoring accuracy under weak estrus conditions, overcoming technical challenges of audio extraction and voiceprint construction amidst complex background noise, enhancing the adaptability of computer vision monitoring technologies, and establishing comprehensive monitoring and identification models through multi-modal information fusion. It specifically discusses the numerous challenges posed by these issues to current technological research and explains that future research needs to focus not only on improving the timeliness and accuracy of monitoring technologies but also on balancing system cost-effectiveness and ease of use to achieve a transition from the concept of smart farming to its practical implementation.

  • Topic--Development and Application of the Big Data Platform for Grain Production
    YANGGuijun, ZHAOChunjiang, YANGXiaodong, YANGHao, HUHaitang, LONGHuiling, QIUZhengjun, LIXian, JIANGChongya, SUNLiang, CHENLei, ZHOUQingbo, HAOXingyao, GUOWei, WANGPei, GAOMeiling
    Smart Agriculture. 2025, 7(2): 1-12. https://doi.org/10.12133/j.smartag.SA202409014

    [Significance] The explosive development of agricultural big data has accelerated agricultural production into a new era of digitalization and intelligentialize. Agricultural big data is the core element to promote agricultural modernization and the foundation of intelligent agriculture. As a new productive forces, big data enhances the comprehensive intelligent management decision-making during the whole process of grain production. But it faces the problems such as the indistinct management mechanism of grain production big data resources, the lack of the full-chain decision-making algorithm system and big data platform for the whole process and full elements of grain production. [Progress] Grain production big data platform is a comprehensive service platform that uses modern information technologies such as big data, Internet of Things (IoT), remote sensing and cloud computing to provide intelligent decision-making support for the whole process of grain production based on intelligent algorithms for data collection, processing, analysis and monitoring related to grain production. In this paper, the progress and challenges in grain production big data, monitoring and decision-making algorithms are reviewed, as well as big data platforms in China and worldwide. With the development of the IoT and high-resolution multi-modal remote sensing technology, the massive agricultural big data generated by the "Space-Air-Ground" Integrated Agricultural Monitoring System, has laid an important foundation for smart agriculture and promoted the shift of smart agriculture from model-driven to data-driven. However, there are still some issues in field management decision-making, such as the requirements for high spatio-temporal resolution and timeliness of the information are difficult to meet, and the algorithm migration and localization methods based on big data need to be studied. In addition, the agricultural machinery operation and spatio-temporal scheduling algorithm based on remote sensing and IoT monitoring information to determine the appropriate operation time window and operation prescription, needs to be further developed, especially the cross-regional scheduling algorithm of agricultural machinery for summer harvest in China. Aiming to address the issues of non-bi-connected monitoring and decision-making algorithms in grain production, as well as the insufficient integration of agricultural machinery and information perception, a framework for the grain production big data intelligent platform based on digital twins is proposed. The platform leverages multi-source heterogeneous grain production big data and integrates a full-chain suit of standardized algorithms, including data acquisition, information extraction, knowledge map construction, intelligent decision-making, full-chain collaboration of agricultural machinery operations. It covers the typical application scenarios such as irrigation, fertilization, pests and disease management, emergency response to drought and flood disaster, all enabled by digital twins technology. [Conclusions and Prospects] The suggestions and trends for development of grain production big data platform are summarized in three aspects: (1) Creating an open, symbiotic grain production big data platform, with core characteristics such as open interface for crop and environmental sensors, maturity grading and a cloud-native packaging mechanism for core algorithms, highly efficient response to data and decision services; (2) Focusing on the typical application scenarios of grain production, take the exploration of technology integration and bi-directional connectivity as the base, and the intelligent service as the soul of the development path for the big data platform research; (3) The data-algorithm-service self-organizing regulation mechanism, the integration of decision-making information with the intelligent equipment operation, and the standardized, compatible and open service capabilities, can form the new quality productivity to ensure food safety, and green efficiency grain production.

  • Topic--Intelligent Agricultural Knowledge Services and Smart Unmanned Farms(Part 1)
    LU Bibo, LIANG Di, YANG Jie, SONG Aiqing, HUANGFU Shangwei
    Smart Agriculture. 2024, 6(6): 109-120. https://doi.org/10.12133/j.smartag.SA202407007

    [Objective] Crop leaf area is an important indicator reflecting light absorption efficiency and growth conditions. This paper established a diverse Chinese yam image dataset and proposesd a deep learning-based method for Chinese yam leaf image segmentation. This method can be used for real-time measurement of Chinese yam leaf area, addressing the inefficiency of traditional measurement techniques. This will provide more reliable data support for genetic breeding, growth and development research of Chinese yam, and promote the development and progress of the Chinese yam industry. [Methods] A lightweight segmentation network based on improved ENet was proposed. Firstly, based on ENet, the third stage was pruned to reduce redundant calculations in the model. This improved the computational efficiency and running speed, and provided a good basis for real-time applications. Secondly, PConv was used instead of the conventional convolution in the downsampling bottleneck structure and conventional bottleneck structure, the improved bottleneck structure was named P-Bottleneck. PConv applied conventional convolution to only a portion of the input channels and left the rest of the channels unchanged, which reduced memory accesses and redundant computations for more efficient spatial feature extraction. PConv was used to reduce the amount of model computation while increase the number of floating-point operations per second on the hardware device, resulting in lower latency. Additionally, the transposed convolution in the upsampling module was improved to bilinear interpolation to enhance model accuracy and reduce the number of parameters. Bilinear interpolation could process images smoother, making the processed images more realistic and clear. Finally, coordinate attention (CA) module was added to the encoder to introduce the attention mechanism, and the model was named CBPA-ENet. The CA mechanism not only focused on the channel information, but also keenly captured the orientation and position-sensitive information. The position information was embedded into the channel attention to globally encode the spatial information, capturing the channel information along one spatial direction while retaining the position information along the other spatial direction. The network could effectively enhance the attention to important regions in the image, and thus improve the quality and interpretability of segmentation results. [Results and Discussions] Trimming the third part resulted in a 28% decrease in FLOPs, a 41% decrease in parameters, and a 9 f/s increase in FPS. Improving the upsampling method to bilinear interpolation not only reduces the floating-point operation and parameters, but also slightly improves the segmentation accuracy of the model, increasing FPS by 4 f/s. Using P-Bottleneck instead of downsampling bottleneck structure and conventional bottleneck structure can reduce mIoU by only 0.04%, reduce FLOPs by 22%, reduce parameters by 16%, and increase FPS by 8 f/s. Adding CA mechanism to the encoder could only increase a small amount of FLOPs and parameters, improving the accuracy of the segmentation network. To verify the effectiveness of the improved segmentation algorithm, classic semantic segmentation networks of UNet, DeepLabV3+, PSPNet, and real-time semantic segmentation network LinkNet, DABNet were selected to train and validate. These six algorithms got quite high segmentation accuracy, among which UNet had the best mIoU and the mPA, but the model size was too large. The improved algorithm only accounts for 1% of the FLOPs and 0.41% of the parameters of UNet, and the mIoU and mPA were basically the same. Other classic semantic segmentation algorithms, such as DeepLabV3+, had similar accuracy to improved algorithms, but their large model size and slow inference speed were not conducive to embedded development. Although the real-time semantic segmentation algorithm LinkNet had a slightly higher mIoU, its FLOPs and parameters count were still far greater than the improved algorithm. Although the PSPNet model was relatively small, it was also much higher than the improved algorithm, and the mIoU and mPA were lower than the algorithm. The experimental results showed that the improved model achieved a mIoU of 98.61%. Compared with the original model, the number of parameters and FLOPs significantly decreased. Among them, the number of model parameters decreased by 51%, the FLOPs decreased by 49%, and the network operation speed increased by 38%. [Conclusions] The improved algorithm can accurately and quickly segment Chinese yam leaves, providing not only a more accurate means for determining Chinese yam phenotype data, but also a new method and approach for embedded research of Chinese yam. Using the model, the morphological feature data of Chinese yam leaves can be obtained more efficiently, providing a reliable foundation for further research and analysis.

  • Topic--Intelligent Agricultural Knowledge Services and Smart Unmanned Farms(Part 1)
    ZHANG Hui, HU Jun, SHI Hang, LIU Changxi, WU Miao
    Smart Agriculture. 2024, 6(6): 85-95. https://doi.org/10.12133/j.smartag.SA202406013

    [Objective] Spraying calcium can effectively prevent the occurrence of dry burning heart disease in Chinese cabbage. Accurately targeting spraying calcium can more effectively improve the utilization rate of calcium. Since the sprayer needs to move rapidly in the field, this can lead to over-application or under-application of the pesticide. This study aims to develop a targeted spray control system based on deep learning technology, explore the relationship between the advance speed, spray volume, and coverage of the sprayer, thereby addressing the uneven application issues caused by different nebulizer speeds by studying the real scenario of calcium administration to Chinese cabbage hearts. [Methods] The targeted spraying control system incorporates advanced sensors and computing equipment that were capable of obtaining real-time data regarding the location of crops and the surrounding environmental conditions. This data allowed for dynamic adjustments to be made to the spraying system, ensuring that pesticides were delivered with high precision. To further enhance the system's real-time performance and accuracy, the YOLOv8 object detection model was improved. A Ghost-Backbone lightweight network structure was introduced, integrating remote sensing technologies along with the sprayer's forward speed and the frequency of spray responses. This innovative combination resulted in the creation of a YOLOv8-Ghost-Backbone lightweight model specifically tailored for agricultural applications. The model operated on the Jetson Xavier NX controller, which was a high-performance, low-power computing platform designed for edge computing. The system was allowed to process complex tasks in real time directly in the field. The targeted spraying system was composed of two essential components: A pressure regulation unit and a targeted control unit. The pressure regulation unit was responsible for adjusting the pressure within the spraying system to ensure that the output remains stable under various operational conditions. Meanwhile, the targeted control unit played a crucial role in precisely controlling the direction, volume, and coverage of the spray to ensure that the pesticide was applied effectively to the intended areas of the plants. To rigorously evaluate the performance of the system, a series of intermittent spray tests were conducted. During these tests, the forward speed of the sprayer was gradually increased, allowing to assess how well the system responded to changes in speed. Throughout the testing phase, the response frequency of the electromagnetic valve was measured to calculate the corresponding spray volume for each nozzle. [Results and Conclusions] The experimental results indicated that the overall performance of the targeted spraying system was outstanding, particularly under conditions of high-speed operation. By meticulously recording the response times of the three primary components of the system, the valuable data were gathered. The average time required for image processing was determined to be 29.50 ms, while the transmission of decision signals took an average of 6.40 ms. The actual spraying process itself required 88.83 ms to complete. A thorough analysis of these times revealed that the total response time of the spraying system lagged by approximately 124.73 ms when compared to the electrical signal inputs. Despite the inherent delays, the system was able to maintain a high level of spraying accuracy by compensating for the response lag of the electromagnetic valve. Specifically, when tested at a speed of 7.2 km/h, the difference between the actual spray volume delivered and the required spray volume, after accounting for compensation, was found to be a mere 0.01 L/min. This minimal difference indicates that the system met the standard operational requirements for effective pesticide application, thereby demonstrating its precision and reliability in practical settings. [Conclusions] In conclusion, this study developed and validated a deep learning-based targeted spraying control system that exhibited excellent performance regarding both spraying accuracy and response speed. The system serves as a significant technical reference for future endeavors in agricultural automation. Moreover, the research provides insights into how to maintain consistent spraying effectiveness and optimize pesticide utilization efficiency by dynamically adjusting the spraying system as the operating speed varies. The findings of this research will offer valuable experiences and guidance for the implementation of agricultural robots in the precise application of pesticides, with a particular emphasis on parameter selection and system optimization.

  • Topic--Intelligent Agricultural Knowledge Services and Smart Unmanned Farms (Part 2)
    ZHUShunyao, QUHongjun, XIAQian, GUOWei, GUOYa
    Smart Agriculture. 2025, 7(1): 85-96. https://doi.org/10.12133/j.smartag.SA202410004

    [Objective] Plant leaf shape is an important part of plant architectural model. Establishment of a three-dimensional structural model of leaves may assist simulating and analyzing plant growth. However, existing leaf modeling approaches lack interpretability, invertibility, and operability, which limit the estimation of model parameters, the simulation of leaf shape, the analysis and interpretation of leaf physiology and growth state, and model reusage. Aiming at the interoperability between three-dimensional structure representation and mathematical model parameters, this study paid attention to three aspects in wheat leaf shape parametric reconstruction: (1) parameter-driven model structure, (2) model parameter inversion, and (3) parameter dynamic mapping during growth. Based on this, a set of parameter-driven and point cloud inversion model for wheat leaf interoperability was proposed in this study. [Methods] A parametric surface model of a wheat leaf with seven characteristic parameters by using parametric modeling technology was built, and the forward parametric construction of the wheat leaf structure was realized. Three parameters, maximum leaf width, leaf length, and leaf shape factor, were used to describe the basic shape of the blade on the leaf plane. On this basis, two parameters, namely the angle between stems and leaves and the curvature degree, were introduced to describe the bending characteristics of the main vein of the blade in the three-dimensional space. Two parameters, namely the twist angle around the axis and the twist deviation angle around the axis, were introduced to represent the twisted structure of the leaf blade along the vein. The reverse parameter estimation module was built according to the surface model. The point cloud was divided by the uniform segmentation method along the Y-axis, and the veins were fit by a least squares regression method. Then, the point cloud was re-segmented according to the fit vein curve. Subsequently, the rotation angle was precisely determined through the segment-wise transform estimation method, with all parameters being optimally fit using the RANSAC regression algorithm. To validate the reliability of the proposed methodology, a set of sample parameters was randomly generated, from which corresponding sample point clouds were synthesized. These sample point clouds were then subjected to estimation using the described method. Then error analyzing was carried out on the estimation results. Three-dimensional imaging technology was used to collect the point clouds of Zhengmai 136, Yangmai 34, and Yanmai 1 samples. After noise reduction and coordinate registration, the model parameters were inverted and estimated, and the reconstructed point clouds were produced using the parametric model. The reconstruction error was validated by calculating the dissimilarity, represented by the Chamfer Distance, between the reconstructed point cloud and the measured point cloud. [Results and Discussions] The model could effectively reconstruct wheat leaves, and the average deviation of point cloud based parametric reconstruction results was about 1.2 mm, which had a high precision. Parametric modeling technology based on prior knowledge and point cloud fitting technology based on posterior data was integrated in this study to construct a digital twin model of specific species at the 3D structural level. Although some of the detailed characteristics of the leaves were moderately simplified, the geometric shape of the leaves could be highly restored with only a few parameters. This method was not only simple, direct and efficient, but also had more explicit geometric meaning of the obtained parameters, and was both editable and interpretable. In addition, the practice of using only tools such as rulers to measure individual characteristic parameters of plant organs in traditional research was abandoned in this study. High-precision point cloud acquisition technology was adopted to obtain three-dimensional data of wheat leaves, and pre-processing work such as point cloud registration, segmentation, and annotation was completed, laying a data foundation for subsequent research. [Conclusions] There is interoperability between the reconstructed model and the point cloud, and the parameters of the model can be flexibly adjusted to generate leaf clusters with similar shapes. The inversion parameters have high interpretability and can be used for consistent and continuous estimation of point cloud time series. This research is of great value to the simulation analysis and digital twinning of wheat leaves.

  • Topic--Intelligent Agricultural Knowledge Services and Smart Unmanned Farms(Part 1)
    FU Zhuojun, HU Zheng, DENG Yangjun, LONG Chenfeng, ZHU Xinghui
    Smart Agriculture. 2024, 6(6): 144-154. https://doi.org/10.12133/j.smartag.SA202409001

    [Objective] Apple Alternaria leaf spot can easily lead to premature defoliation of apple tree leaves, thereby affecting the quality and yield of apples. Consequently, accurately detecting of the disease has become a critical issue in the precise prevention and control of apple tree diseases. Due to factors such as backlighting, traditional image segmentation-based methods for detecting disease spots struggle to accurately identify the boundaries of diseased areas against complex backgrounds. There is an urgent need to develop new methods for detecting apple Alternaria leaf spot, which can assist in the precise prevention and control of apple tree diseases. [Methods] A novel detection method named Deep Semi-Non-negative Matrix Factorization-based Mahalanobis Distance Anomaly Detection (DSNMFMAD) was proposed, which combines Deep Semi-Non-negative Matrix Factorization (DSNMF) with Mahalanobis distance for robust anomaly detection in complex image backgrounds. The proposed method began by utilizing DSNMF to extract low-rank background components and sparse anomaly features from the apple Alternaria leaf spot images. This enabled effective separation of the background and anomalies, mitigating interference from complex background noise while preserving the non-negativity constraints inherent in the data. Subsequently, Mahalanobis distance was employed, based on the Singular Value Decomposition (SVD) feature subspace, to construct a lesion detector. The detector identified lesions by calculating the anomaly degree of each pixel in the anomalous regions. The apple tree leaf disease dataset used was provided by PaddlePaddle AI-Studio. Each image in the dataset has a resolution of 512×512 pixels, in RGB color format, and was in JPEG format. The dataset was captured in both laboratory and natural environments. Under laboratory conditions, 190 images of apple leaves with spot-induced leaf drop were used, while 237 images were collected under natural conditions. Furthermore, the dataset was augmented with geometric transformations and random changes in brightness, contrast, and hue, resulting in 1 145 images under laboratory conditions and 1 419 images under natural conditions. These images reflect various real-world scenarios, capturing apple leaves at different stages of maturity, in diverse lighting conditions, angles, and noise environments. This diversed dataset ensured that the proposed method could be tested under a wide range of practical conditions, providing a comprehensive evaluation of its effectiveness in detecting apple Alternaria leaf spot. [Results and Discussions] DSNMFMAD demonstrated outstanding performance under both laboratory and natural conditions. A comparative analysis was conducted with several other detection methods, including GRX (Reed-Xiaoli detector), LRX (Local Reed-Xiaoli detector), CRD (Collaborative-Representation-Based Detector), LSMAD (LRaSMD-Based Mahalanobis Distance Detector), and the deep learning model Unet. The results demonstrated that DSNMFMAD exhibited superior performance in the laboratory environment. The results demonstrated that DSNMFMAD attained a recognition accuracy of 99.8% and a detection speed of 0.087 2 s/image. The accuracy of DSNMFMAD was found to exceed that of GRX, LRX, CRD, LSMAD, and Unet by 0.2%, 37.9%, 10.3%, 0.4%, and 24.5%, respectively. Additionally, the DSNMFMAD exhibited a substantially superior detection speed in comparison to LRX, CRD, LSMAD, and Unet, with an improvement of 8.864, 107.185, 0.309, and 1.565 s, respectively. In a natural environment, where a dataset of 1 419 images of apple Alternaria leaf spot was analysed, DSNMFMAD demonstrated an 87.8% recognition accuracy, with an average detection speed of 0.091 0 s per image. In this case, its accuracy outperformed that of GRX, LRX, CRD, LSMAD, and Unet by 2.5%, 32.7%, 5%, 14.8%, and 3.5%, respectively. Furthermore, the detection speed was faster than that of LRX, CRD, LSMAD, and Unet by 2.898, 132.017, 0.224, and 1.825 s, respectively. [Conclusions] The DSNMFMAD proposed in this study was capable of effectively extracting anomalous parts of an image through DSNMF and accurately detecting the location of apple Alternaria leaf spot using a constructed lesion detector. This method achieved higher detection accuracy compared to the benchmark methods, even under complex background conditions, demonstrating excellent performance in lesion detection. This advancement could provide a valuable technical reference for the detection and prevention of apple Alternaria leaf spot.

  • Information Processing and Decision Making
    CHANGJian, WANGBingbing, YINLong, LIYanqing, LIZhaoxin, LIZhuang
    Smart Agriculture. 2025, 7(3): 185-198. https://doi.org/10.12133/j.smartag.SA202503033

    [Objective] Bee pollination is pivotal to plant reproduction and crop yield, making its identification and monitoring highly significant for agricultural production. However, practical detection of bee pollination poses various challenges, including the small size of bee targets, their low pixel occupancy in images, and the complexity of floral backgrounds. Aimed to scientifically evaluate pollination efficiency, accurately detect the pollination status of flowers, and provide reliable data to guide flower and fruit thinning in orchards, ultimately supports the scientific management of bee colonies and enhances agricultural efficiency, a lightweight recognition model that can effectively overcome the above obstacles was proposed, thereby advancing the practical application of bee pollination detection technology in smart agriculture. [Methods] A specialized bee pollination dataset was constructed comprising three flower types: strawberry, blueberry, and chrysanthemum. High-resolution cameras were used to record videos of the pollination process, which were then subjected to frame sampling to extract representative images. These initial images underwent manual screening to ensure quality and relevance. To address challenges such as limited data diversity and class imbalance, a comprehensive data augmentation strategy was employed. Techniques including rotation, flipping, brightness adjustment, and mosaic augmentation were applied, significantly expanding the dataset's size and variability. The enhanced dataset was subsequently split into training and validation sets at an 8:2 ratio to ensure robust model evaluation. The base detection model was built upon an improved YOLOv10n architecture. The conventional C2f module in the backbone was replaced with a novel cross stage partial network_multi-scale edge information enhance (CSP_MSEE) module, which synergizes the cross-stage partial connections from cross stage partial network (CSPNet) with a multi-scale edge enhancement strategy. This design greatly improved feature extraction, particularly in scenarios involving fine-grained structures and small-scale targets like bees. For the neck, a hybrid-scale feature pyramid network (HS-FPN) was implemented, incorporating a channel attention (CA) mechanism and a dimension matching (DM) module to refine and align multi-scale features. These features were further integrated through a selective feature fusion (SFF) module, enabling the effective combination of low-level texture details and high-level semantic representations. The detection head was replaced with the lightweight shared detail enhanced convolutional detection head (LSDECD), an enhanced version of the Lightweight shared convolutional detection head (LSCD) detection head. It incorporated detail enhancement convolution (DEConv) from DEA-Net to improve the extraction of fine-grained bee features. Additionally, the standard convolution_groupnorm (Conv_GN) layers were replaced with detail enhancement convolution_ groupnorm (DEConv_GN), significantly reducing model parameters and enhancing the model's sensitivity to subtle bee behaviors. This lightweight yet accurate model design made it highly suitable for real-time deployment on resource-constrained edge devices in agricultural environments. [Results and Discussions] Experimental results on the three bee pollination datasets: strawberry, blueberry, and chrysanthemum, demonstrated the effectiveness of the proposed improvements over the baseline YOLOv10n model. The enhanced model achieved significant reductions in computational overhead, lowering the computational complexity by 3.1 GFLOPs and the number of parameters by 1.3 M. The computational cost of the improved model reached 5.1 GFLOPS, and the number of parameters was 1.3 M. These reductions contribute to improved efficiency, making the model more suitable for deployment on edge devices with limited processing capabilities, such as mobile platforms or embedded systems used in agricultural monitoring. In terms of detection performance, the improved model showed consistent gains across all three datasets. Specifically, the recall rates reached 82.6% for strawberry flowers, 84.0% for blueberry flowers, and 84.8% for chrysanthemum flowers. Corresponding mAP50 (Mean Average Precision at IoU threshold of 0.5) scores were 89.3%, 89.5%, and 88.0%, respectively. Compared to the original YOLOv10n model, these results marked respective improvements of 2.1% in recall and 1.7% in mAP50 on the strawberry dataset, 2.0% and 2.6% on the blueberry dataset, and 2.1% and 2.2% on the chrysanthemum dataset. [Conclusions] The proposed YOLOv10n-CHL lightweight bee pollination detection model, through coordinated enhancements at multiple architectural levels, achieved notable improvements in both detection accuracy and computational efficiency across multiple bee pollination datasets. The model significantly improved the detection performance for small objects while substantially reducing computational overhead, facilitating its deployment on edge computing platforms such as drones and embedded systems. This research could provide a solid technical foundation for the precise monitoring of bee pollination behavior and the advancement of smart agriculture. Nevertheless, the model's adaptability to extreme lighting and complex weather conditions remains an area for improvement. Future work will focus on enhancing the model's robustness in these scenarios to support its broader application in real-world agricultural environments.

  • Topic--Intelligent Agricultural Knowledge Services and Smart Unmanned Farms (Part 2)
    QIZijun, NIUDangdang, WUHuarui, ZHANGLilin, WANGLunfeng, ZHANGHongming
    Smart Agriculture. 2025, 7(1): 44-56. https://doi.org/10.12133/j.smartag.SA202410022

    [Objective] Chinese kiwifruit texts exhibit unique dual-dimensional characteristics. The cross-paragraph dependency is complex semantic structure, whitch makes it challenging to capture the full contextual relationships of entities within a single paragraph, necessitating models capable of robust cross-paragraph semantic extraction to comprehend entity linkages at a global level. However, most existing models rely heavily on local contextual information and struggle to process long-distance dependencies, thereby reducing recognition accuracy. Furthermore, Chinese kiwifruit texts often contain highly nested entities. This nesting and combination increase the complexity of grammatical and semantic relationships, making entity recognition more difficult. To address these challenges, a novel named entity recognition (NER) method, KIWI-Coord-Prune(kiwifruit-CoordKIWINER-PruneBi-LSTM) was proposed in this research, which incorporated dual-dimensional information processing and pruning techniques to improve recognition accuracy. [Methods] The proposed KIWI-Coord-Prune model consisted of a character embedding layer, a CoordKIWINER layer, a PruneBi-LSTM layer, a self-attention mechanism, and a CRF decoding layer, enabling effective entity recognition after processing input character vectors. The CoordKIWINER and PruneBi-LSTM modules were specifically designed to handle the dual-dimensional features in Chinese kiwifruit texts. The CoordKIWINER module applied adaptive average pooling in two directions on the input feature maps and utilized convolution operations to separate the extracted features into vertical and horizontal branches. The horizontal and vertical features were then independently extracted using the Criss-Cross Attention (CCNet) mechanism and Coordinate Attention (CoordAtt) mechanism, respectively. This module significantly enhanced the model's ability to capture cross-paragraph relationships and nested entity structures, thereby generating enriched character vectors containing more contextual information, which improved the overall representation capability and robustness of the model. The PruneBi-LSTM module was built upon the enhanced dual-dimensional vector representations and introduced a pruning strategy into Bi-LSTM to effectively reduce redundant parameters associated with background descriptions and irrelevant terms. This pruning mechanism not only enhanced computational efficiency while maintaining the dynamic sequence modeling capability of Bi-LSTM but also improved inference speed. Additionally, a dynamic feature extraction strategy was employed to reduce the computational complexity of vector sequences and further strengthen the learning capacity for key features, leading to improved recognition of complex entities in kiwifruit texts. Furthermore, the pruned weight matrices become sparser, significantly reducing memory consumption. This made the model more efficient in handling large-scale agricultural text-processing tasks, minimizing redundant information while achieving higher inference and training efficiency with fewer computational resources. [Results and Discussions] Experiments were conducted on the self-built KIWIPRO dataset and four public datasets: People's Daily, ClueNER, Boson, and ResumeNER. The proposed model was compared with five advanced NER models: LSTM, Bi-LSTM, LR-CNN, Softlexicon-LSTM, and KIWINER. The experimental results showed that KIWI-Coord-Prune achieved F1-Scores of 89.55%, 91.02%, 83.50%, 83.49%, and 95.81%, respectively, outperforming all baseline models. Furthermore, controlled variable experiments were conducted to compare and ablate the CoordKIWINER and PruneBi-LSTM modules across the five datasets, confirming their effectiveness and necessity. Additionally, the impact of different design choices was explored for the CoordKIWINER module, including direct fusion, optimized attention mechanism fusion, and network structure adjustment residual optimization. The experimental results demonstrated that the optimized attention mechanism fusion method yielded the best performance, which was ultimately adopted in the final model. These findings highlight the significance of properly designing attention mechanisms to extract dual-dimensional features for NER tasks. Compared to existing methods, the KIWI-Coord-Prune model effectively addressed the issue of underutilized dual-dimensional information in Chinese kiwifruit texts. It significantly improved entity recognition performance for both overall text structures and individual entity categories. Furthermore, the model exhibited a degree of generalization capability, making it applicable to downstream tasks such as knowledge graph construction and question-answering systems. [Conclusions] This study presents an novel NER approach for Chinese kiwifruit texts, which integrating dual-dimensional information extraction and pruning techniques to overcome challenges related to cross-paragraph dependencies and nested entity structures. The findings offer valuable insights for researchers working on domain-specific NER and contribute to the advancement of agriculture-focused natural language processing applications. However, two key limitations remain: 1) The balance between domain-specific optimization and cross-domain generalization requires further investigation, as the model's adaptability to non-agricultural texts has yet to be empirically validated; 2) the multilingual applicability of the model is currently limited, necessitating further expansion to accommodate multilingual scenarios. Future research should focus on two key directions: 1) Enhancing domain robustness and cross-lingual adaptability by incorporating diverse textual datasets and leveraging pre-trained multilingual models to improve generalization, and 2) Validating the model's performance in multilingual environments through transfer learning while refining linguistic adaptation strategies to further optimize recognition accuracy.

  • Information Processing and Decision Making
    HULingyan, GUORuiya, GUOZhanjun, XUGuohui, GAIRongli, WANGZumin, ZHANGYumeng, JUBowen, NIEXiaoyu
    Smart Agriculture. 2025, 7(3): 131-142. https://doi.org/10.12133/j.smartag.SA202502008

    [Objective] Within the field of plant phenotyping feature extraction, the accurate delineation of small targets boundaries and the adequate recovery of spatial details during upsampling operations have long been recognized as significant obstacles hindering progress. To address these limitations, an improved U-Net architecture designed for greenhouse sweet cherry image segmentation. [Methods] Taking temporal phenotypic images of sweet cherries as the research subject, the U-Net segmentation model was employed to delineate the specific organ regions of the plant. This architecture was referred to as the U-Net integrating self-supervised contrastive learning method for plant time-series images with priori distance embedding (PDE) pre-training and graph convolutional networks (GCN ) skip connection for greenhouse sweet cherry image segmentation. To accelerate model convergence, the pre-trained weights derived from the PDE plant temporal image contrastive learning method were transferred to. Concurrently, the incorporation of a GCN local feature fusion layer was incorporated as a skip connection to optimize feature fusion, thereby providing robust technical support for image segmentation task. The PDE plant temporal image contrastive learning method pre-training required the construction of image pairs corresponding to different phenological periods. A classification distance loss function, which incorporated prior knowledge, was employed to construct an Encoder with adjusted parameters. Pre-trained weights obtained from the PDE plant temporal image contrastive learning method were effectively transferred and and applied to the semantic segmentation task, enabling the network to accurately learn semantic information and detailed textures of various sweet cherry organs. The Encoder module performs multi-scale feature extraction by convolutional and pooling layers. This process enabled the hierarchical processing of the semantic information embedded in the input image to construct representations that progress transitions from low-level texture features to high-level semantic features. This allows consistent extraction of semantic features from images across various scales and abstraction of underlying information, enhancing feature discriminability and optimizing modeling of complex targets. The Decoder module was employed to conduct up sampling operations, which facilitated the integration of features from diverse scales and the restoration of the original image resolution. This enabled the results to effectively reconstruct spatial details and significantly improve the efficiency of model optimization. At the interface between the Encoder and Decoder modules, a GCN layer designed for local feature fusion was strategically integrated as a skip connection, enabling the network to better capture and learn the local features in multi-scale images. [Results and Discussions] Utilizing a set of evaluation metrics including accuracy, precision, recall, and F1-Score, an in-depth and rigorous assessment of the model's performance capabilities was conducted. The research findings revealed that the improved U-Net model achieved superior performance in semantic segmentation of sweet cherry images, with an accuracy of up to 0.955 0. Ablation experiments results further revealed that the proposed method attained a precision of 0.932 8, a recall of 0.927 4, and an F1-Score of 0.912 8. The accuracy of improved U-Net is higher by 0.069 9, 0.028 8, and 0.042 compared to the original U-Net, U-Net with PDE plant temporal image contrastive learning method, and U-Net with GCN skip connections, respectively. Meanwhile the F1-Score is 0.078 3, 0.033 8, and 0.043 8 higher respectively. In comparative experiments against DeepLabV3, Swin Transformer and Segment Anything Model segmentation methods, the proposed model surpassed the above models by 0.022 2, 0.027 6 and 0.042 2 in accuracy; 0.063 7, 0.147 1 and 0.107 7 in precision; 0.035 2, 0.065 4 and 0.050 8 in recall; and 0.076 8, 0.127 5 and 0.103 4 in F1-Score. [Conclusions] The incorporation of the PDE plant temporal image contrastive learning method and the GCN techniques was utilized to develop an advanced U-Net architecture that is specifically designed and optimized for the analysis of sweat cherry plant phenotyping. The results demonstrate that the proposed method is capable of effectively addressing the issues of boundary blurring and detail loss associated with small targets in complex orchard scenarios. It enables the precise segmentation of the primary organs and background regions in sweet cherry images, thereby enhancing the segmentation accuracy of the original model. This improvement provides a solid foundation for subsequent crop modeling research and holds significant practical importance for the advancement of agricultural intelligence.

  • Information Processing and Decision Making
    WANGYi, XUERong, HANWenting, SHAOGuomin, HOUYanqiao, CUIXitong
    Smart Agriculture. 2025, 7(4): 159-173. https://doi.org/10.12133/j.smartag.SA202412004

    [Objective] Maize is one of the most widely cultivated staple crops worldwide, and its aboveground biomass (AGB) serves as a crucial indicator for evaluating crop growth status. Accurate estimation of maize AGB is vital for ensuring food security and enhancing agricultural productivity. However, maize AGB is influenced by a multitude of dynamic factors, exhibiting complex spatial and temporal variations that pose significant challenges to precise estimation. At present, most studies on maize AGB estimation rely primarily on single-source remote sensing data and conventional machine learning algorithms, which limits the accuracy and generalizability of the models. To overcome these limitations, a model architecture that integrates convolutional neural networks (CNN), long short-term memory networks (LSTM), and a self-attention (SA) mechanism was developed in this research to estimate maize AGB at the field scale. [Methods] The research utilized vegetation indices, crop parameters, and meteorological data that were collected under varying gradient water treatments in the experimental area. First, an optimized CNN-LSTM-SA model was constructed. The model employed two-dimensional convolutional layers to extract both spatial and temporal features, while utilizing max-pooling and dropout techniques to mitigate overfitting. The LSTM module was used to capture temporal dependencies in the data. The SA mechanism was introduced to compute global attention weights, enhancing the representation of critical time steps. Nonlinear activation functions were applied to mitigate multicollinearity among features. A fully connected layer was used to output the estimated AGB values. Second, the Pearson correlation coefficients between influencing factors and maize AGB were analyzed, and the importance of multi-source data was validated. recursive feature elimination (RFE) was used to select the optimal input features. The local interpretable model-agnostic explanations (LIME) method was employed to interpret individual samples. Finally, ablation experiments were conducted to assess the effects of incorporating CNN and SA into the model, with performance comparisons made against random forest (RF) and support vector machine (SVM) models. [Results and Discussions] The correlation analysis revealed that crop parameters exhibited strong correlations with AGB. Among the vegetation indices, the improved normalized difference red edge index (NDREI) demonstrated the highest correlation (r = 0.63). To address multicollinearity issues, the visible atmospherically resistant index (VARI), soil adjusted vegetation index (SAVI), and normalized difference red edge index (NDRE) were excluded from the analysis. The CNN-LSTM-SA model integrated crop parameters, vegetation indices, and meteorological data and initially achieved a coefficient of determination (R2) of 0.89, a root mean square error (RMSE) of 129.38 g/m2, and a mean absolute error (MAE) of 65.99 g/m2. When only vegetation indices and meteorological data were included, the model yielded an R2 of 0.83, an RMSE of 161.36 g/m2, and an MAE of 89.37 g/m2. Using a single vegetation index further reduced model accuracy. Based on multi-source data integration, RFE removed redundant features. After excluding the 2-meter average wind speed, the model reached its best performance with R2 of 0.92, RMSE of 107.53 g/m2, and MAE of 55.19 g/m2. Using the LIME method to interpret feature contributions for individual maize samples, the analysis revealed that during the rapid growth stage, the model was primarily influenced by the current growth status and vegetation indices. For samples in the mid-growth stage, multi-day crop physiological characteristics had a substantial impact on model predictions. In the late growth stage, higher vegetation index values showed a clear suppressive effect on the model outputs. During the mid-growth stage of maize under varying moisture conditions, the model consistently demonstrated heightened sensitivity to low temperatures, moderate humidity levels, and optimal vegetation indices. The CNN-LSTM-SA model demonstrated more consistent fitting performance and accuracy across different growth stages and water conditions compared to the LSTM, LSTM-SA, and CNN-LSTM models. Additionally, it also exceeded the performance of the RF model and the SVM model in all evaluation metrics. [Conclusions] This study leveraged the feature extraction capabilities of CNN, the temporal modeling strength of LSTM, and the dynamic attention mechanism of the SA to enhance the accuracy of maize AGB estimation from a spatiotemporal perspective. The approach not only reduced estimation errors but also improved model interpretability. This research could provide valuable insights and references for the dynamic modeling of crop AGB.

  • Information Processing and Decision Making
    XUWenwen, YUKejian, DAIZexu, WUYunzhi
    Smart Agriculture. 2025, 7(4): 174-186. https://doi.org/10.12133/j.smartag.SA202504005

    [Objective] As one of the world's largest cash crops in terms of total production value, grape has a yield whose accurate estimation is crucial for agricultural and economic development. However, at present, grape yield prediction is difficult and costly, detection of green grape varieties with similar colors of grape berries and grape leaves has limitations, and detection of grape bunches with small berries is ineffective. In order to solve the above problems, a multimodal detection framework is proposed based on transfer learning, which aims to realize the detection and counting of different varieties of grapes, so as to provide reliable technical support for grape yield prediction and intelligent management of orchards. [Methods] A multimodal grape detection framework based on transfer learning was proposed. This transfer learning utilized the feature representation capabilities of pretrained models, requiring only a small number of grape images for fine-tuning to adapt to the task. This approach not only reduced labeling costs but also enhanced the ability to capture grape features effectively. The multimodal framework adopted a dual-encoder-single-decoder structure, consisting of three core modules: the image and text feature extraction and enhancement module, the language-guided query selection module, and the cross-modality decoder module. In the feature extraction stage, the framework employed pretrained models from public datasets for transfer learning, which significantly reduced the training time and costs of the model on the target task while effectively improving the capability to capture grape features. By introducing a feature enhancement module, the framework achieved cross-modality fusion effects between grape images and text. Additionally, the attention mechanism was implemented to enhance both image and text features, facilitating cross-modality feature learning between images and text. During the cross-modality query selection phase, the framework utilized a language-guided query selection strategy that enabled the filtering of queries from grape images. This strategy allowed for a more effective use of input text to guide the object in target detection, selecting features that were more relevant to the input text as queries for the decoder. The cross-modality decoder combined the features from grape images and text modalities to achieve more accurate modality alignment, thereby facilitating a more effective fusion of grape image and text information, ultimately producing the corresponding grape prediction results. Finally, to comprehensively evaluate the model's performance, the mean average precision (mAP) and average recall (AR) were adopted as evaluation metrics for the detection task, while the counting task was quantified using the mean absolute error (MAE) and root mean square error (RMSE) as assessment indicators. [Results and Discussions] This method exhibited optimal performance in both detection and counting when compared to nine baseline models. Specifically, a comprehensive evaluation was conducted on the WGISD public dataset, where the method achieved an mAP50 of 80.3% in the detection task, representing a 2.7 percentage points improvement over the second-best model. Additionally, it reached 53.2% mAP and 58.2% mAP75, surpassing the second-best models by 13.4 and 22 percent points, respectively, and achieved an mAR of 76.5%, which was 9.8 percent points increase over the next best model. In the counting task, the method realized a MAE of 1.65 and an RMSE of 2.48, outperforming all other baseline models in counting effectiveness. Furthermore, experiments were conducted using a total of nine grape varieties from both the WGISD dataset and field-collected data, resulting in an mAP50 of 82.5%, an mAP of 58.5%, an mAP75 of 64.4%, an mAR of 77.1%, an MAE of 1.44, and an RMSE of 2.19. These results demonstrated the model's strong adaptability and effectiveness across diverse grape varieties. Notably, the method not only performed well in identifying large grape clusters but also showed superior performance on smaller grape clusters, achieving an mAP_s of 74.2% in the detection task, which was 9.5 percent points improvement over the second-best model. Additionally, to provide a more intuitive assessment of model performance, this study selected grape images from the test set for visual comparison analysis. The results revealed that the model's detection and counting outcomes for grape clusters closely aligned with the original annotation information from the label dataset. Overall, this method demonstrated strong generalization capabilities and higher accuracy under various environmental conditions for different grape varieties. This technology has the potential to be applied in estimating total orchard yield and reducing pre-harvest measurement errors, thereby effectively enhancing the precision management level of vineyards. [Conclusions] The proposed method achieved higher accuracy and better adaptability in detecting five grape varieties compared to other baseline models. Furthermore, the model demonstrated substantial practicality and robustness across nine different grape varieties. These findings suggested that the method developed in this study had significant application potential in grape detection and counting tasks. It could provide strong technical support for the intelligent development of precision agriculture and the grape cultivation industry, highlighting its promising prospects in enhancing agricultural practices.

  • Topic--Intelligent Sensing and Grading of Agricultural Product Quality
    HUYan, WANGYujie, ZHANGXuechen, ZHANGYiqiang, YUHuahao, SONGXinbei, YESitan, ZHOUJihong, CHENZhenlin, ZONGWeiwei, HEYong, LIXiaoli
    Smart Agriculture. 2025, 7(4): 71-83. https://doi.org/10.12133/j.smartag.SA202505012

    [Objective] Fu brick tea is a popular fermented black tea, and its "Jin hua" fermentation process determines the quality, flavor and function of the tea. Therefore, the establishment of a rapid and non-destructive detection method for the fungal fermentation stage is of great significance to improve the quality control and processing efficiency. [Methods] The variation trend of Fu brick tea was analyzed through the acquisition of visible-near-infrared (VIS-NIR) and near-infrared (NIR) hyperspectral images during the fermentation stage, and combined with the key quality indexes such as moisture, free amino acids, tea polyphenols, and tea pigments (including theaflavins, thearubigins, and theabrownines), the variation trend was analyzed. This study combined support vector machine (SVM) and convolutional neural network (CNN) to establish quantitative detection of key quality indicators and qualitative identification of the fungal fermentation stage. To enhance model performance, the squeeze-and-excitation (SE) attention mechanism was incorporated, which strengthens the adaptive weight adjustment of feature channels, resulting in the development of the Spectra-SE-CNN model. Additionally, t-distributed stochastic neighbor embedding (t-SNE) was used for feature dimensionality reduction, aiding in the visualization of feature distributions during the fermentation process. To improve the interpretability of the model, the Grad-CAM technique was employed for CNN and Spectra-SE-CNN visualization, helping to identify the key regions the model focuses on. [Results and Discussions] In the quantitative detection of Fu brick tea quality, the best models were all Spectra-SE-CNN, with R2p of 0.859 5, 0.852 5 and 0.838 3 for moisture, tea pigments and tea polyphenols, respectively, indicating a high correlation and modeling stability. These values suggest that the models were capable of accurately predicting these key quality indicators based on hyperspectral data. However, the R2p for free amino acids was lower (0.670 2), which could be attributed to their relatively minor changes during the fermentation process or a weak spectral response, making it more challenging to detect this component reliably with the current hyperspectral imaging approach. The Spectra-SE-CNN model significantly outperformed traditional CNN models, demonstrating the effectiveness of incorporating the SE attention mechanism. The SE attention mechanism enhanced the model's ability to extract and discriminate important spectral features, thereby improving both classification accuracy and generalization. This indicated that the Spectra-SE-CNN model excels not only in feature extraction but also in enhancing the model's robustness to variations in the fermentation stage. Furthermore, t-SNE revealed a clear separation of the different fungal fermentation stages in the low-dimensional space, with distinct boundaries. This visualization highlighted the model's ability to distinguish between subtle spectral differences during the fermentation process. The heatmap generated by Grad-CAM emphasized key regions, such as the fermentation location and edges, providing valuable insights into the specific features the model deemed important for accurate predictions. This improved the model's transparency and helped validate the spectral features that were most influential in identifying the fermentation stages. [Conclusions] A Spectra-SE-CNN model was proposed in this research, which incorporates the SE attention mechanism into a convolutional neural network to enhance spectral feature learning. This architecture adaptively recalibrates channel-wise feature responses, allowing the model to focus on informative spectral bands and suppress irrelevant signals. As a result, the Spectra-SE-CNN achieved improved classification accuracy and training efficiency compared to CNN models, demonstrating the strong potential of deep learning in hyperspectral spectral feature extraction. The findings validate Hyperspectral imaging technology(HIS) enables rapid, non-destructive, and high-resolution assessment of Fu brick tea during its critical fungal fermentation stage and the feasibility of integrating HSI with intelligent algorithms for real-time monitoring of the Fu brick tea fermentation process. Furthermore, this approach offers a pathway for broader applications of hyperspectral imaging and deep learning in intelligent agricultural product monitoring, quality control, and automation of traditional fermentation processes.