基于深度学习的科技文献摘要结构功能识别研究

毛进, 陈子洋

农业图书情报学报. 2022, 34(3): 15-27

PDF(7120 KB)
PDF(7120 KB)
农业图书情报学报 ›› 2022, Vol. 34 ›› Issue (3) : 15-27. DOI: 10.13998/j.cnki.issn1002-1248.21-0707
特约文章

基于深度学习的科技文献摘要结构功能识别研究

  • 毛进1,2, 陈子洋1,2
作者信息 +

A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts

  • MAO Jin1,2, CHEN Ziyang1,2
Author information +
History +

摘要

[目的/意义]科技文献摘要往往由承担特定功能的部分构成,利用深度学习对科技文献摘要结构功能进行识别有助于实现科技文献文本深度分析。[方法/过程]本文将科技文献摘要特征功能识别任务转换为文本分类问题,将结构功能分为“引言-方法-结果-结论(Introduction-Methods-Results-Conclusions,IMRC)”4类,基于摘要句内容及其上下文特征,利用BERT、BERT-BiLSTM、BERT-TextCNN、ERNIE等模型构建分类器,实现摘要结构功能自动识别。[结果/结论]在eHealth领域3 130篇文献数据集上开展实验,结果表明:ERNIE模型的各项指标均高于其他模型,BERT-TextCNN模型在短句子上效果更好,而BERT-BiLSTM模型对于长句子的识别效果更好。本研究有助于实现科技文献摘要文本的细粒度功能理解,对文献结构的解析能够服务于科技文献深度挖掘和基于文献的知识发现。

Abstract

[Purpose/Significance] Abstracts of scientific documents are often composed of sections with specific functions. Using the deep learning method to identify structural functions of abstracts of scientific documents is conducive to the in-depth analysis of the documents. [Method/Process] In this paper, identifying structural functions of abstracts of scientific documents is transformed into a text classification problem, and its structure functions are divided into four categories: "introduction, methods, results, conclusions (IMRC)". Based on the text content and context features of abstract sentences, the classifier is constructed based on deep learning models such as BERT, BERT-BiLSTM, BERT-TextCNN and ERNIE, to automatically identify structural functions of abstracts of scientific documents. [Results/Conclusions] Experiments are carried out on a dataset with 3,130 articles in the field of eHealth. The results show that the scores of indicators for ERNIE are higher than other models. BERT-TextCNN model is better in dealing with short text, while BERT-BiLSTM model is better in handling long sentences. The method proposed in this paper is helpful for the fine-grained functional understanding of scientific literature abstracts, and is of great significance to the in-depth mining of scientific literature and literature based knowledge discovery.

关键词

深度学习 / BERT / 文献结构 / 功能识别 / 文本分类

Key words

deep learning / BERT / literature structure / function identification / text classification

引用本文

导出引用
毛进, 陈子洋. 基于深度学习的科技文献摘要结构功能识别研究. 农业图书情报学报. 2022, 34(3): 15-27 https://doi.org/10.13998/j.cnki.issn1002-1248.21-0707
MAO Jin, CHEN Ziyang. A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts. Journal of Library and Information Sciences in Agriculture. 2022, 34(3): 15-27 https://doi.org/10.13998/j.cnki.issn1002-1248.21-0707

参考文献

[1] 沈思, 胡昊天, 叶文. 基于全字语义的摘要结构功能自动识别研究[J]. 情报学报, 2019, 38(1): 79-88.
SHEN S, HU H T, YE W.Research on abstract structure function automatic recognition based on full character semantics[J]. Journal of the China society for scientific and technical information, 2019, 38(1): 79-88
[2] 曹雁, 牟爱鹏. 科技期刊英文摘要学术词汇的语步特点研究[J]. 外语学刊, 2011(3): 46-49.
CAO Y, MU A P.The characteristics of academic words across different abstract moves of English scientific and technical journals[J]. Foreign language research, 2011(3): 46-49.
[3] GRATEZ N.Teaching EFL students to extract structural information from abstracts[M]. Belgium: ACCO, 1985 :123-135.
[4] SWALES J M.Genre analysis: English in academic and research settings[D]. Cambridge: Cambridge university press, 1990.
[5] TSENG F.Analyses of move structure and verb tense of research article abstracts in applied linguistics[J]. International journal of English linguistics, 2011, 1(2): 27-39.
[6] 李涛. 科技论文的英文摘要规范化问题研究——以自然科学论文为例[J]. 辽宁工业大学学报(社会科学版), 2018, 20(6): 70-73.
LI T.Research on the standardization of English abstracts of scientific and technological papers - Taking natural science papers as an example[J]. Journal of Liaoning institute of technology (social science edition), 2018, 20(6): 70-73.
[7] 周志超. 中文图情期刊摘要的核心要素与逻辑结构分析[J]. 情报科学, 2018, 36(3): 8-12, 32.
ZHOU Z C.The analysis on core elements and logical structure of abstracts of Chinese journals in library and information science domain[J]. Information science, 2018, 36(3): 8-12, 32.
[8] 宋建武, 朱静, 黄开颜, 等. 高影响因子国际医学期刊摘要类型的分析与思考[J]. 中国科技期刊研究, 2010, 21(2): 181-184.
SONG J W, ZHU J, HUANG K Y, et al.Structured or unstructured abstracts? - A comparative analysis of international medical journals with high impact factors and Chinese medical journals[J]. Chinese journal of scientific and technical periodicals, 2010, 21(2): 181-184.
[9] HARTLEY J.Current findings from research on structured abstracts[J]. Med libr assoc, 2014, 92(3): 368-371.
[10] 宋东桓, 李晨英, 刘子瑜, 等. 英文科技论文摘要的语义特征词典构建[J]. 图书情报工作, 2020, 64(6): 108-119.
SONG D H, LI C Y, LIU Z Y, et al.Semantic feature dictionary construction of abstract in English scientific journals[J]. Library and information service, 2020, 64(6): 108-119.
[11] ANTHONY L.A machine learning system for the automatic identification of text structure and application to research article abstracts in computer science[D]. Birmingham: Birmingham university, 2002.
[12] TUAROB S, MITRA P, GILES C L.A hybrid approach to discover semantic hierarchical sections in scholarly documents[C]. New York, USA: International conference on document analysis & recognition, Tunis, Tunisia, IEEE, 2015.
[13] KIM S, MARTINE Z, CAVEDON L.Automatic classification of sentences to support evidence based medicine[J]. BMC bioinformalics, 2011, 12(2): 1-10.
[14] 王东波, 陆昊翔, 周鑫. 面向摘要结构功能划分的模型性能比较研究[J]. 图书情报工作, 2018, 62(12): 84-90.
WANG D B, LU H X, ZHOU X.A comparative study of model performances facing abstract structure function[J]. Library and information service, 2018, 62(12): 84-90.
[15] 王东波, 高瑞卿, 叶文豪. 不同特征下的学术文本结构功能自动识别研究[J]. 情报学报, 2018, 37(10): 997-1008.
WANG D B, GAO R Q, YE W H.Research on the structure recognition of academic texts under different characteristics[J]. Journal of the China society for scientific andtechnical information, 2018, 37(10): 997-1008.
[16] 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别功能框架及基于章节标题的识别[J]. 情报学报, 2014, 33(9): 979-985.
LU W, HUANG Y, CHEN Q K.The structure function of academic text and its classification[J]. Journal of the China society for scientific andtechnical information, 2014, 33(9): 979-985.
[17] 黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016, 35(5): 530-538.
HUANG Y, LU W, CHENG Q K, et al.The structure function recognition of academic text - Paragraph-based recognition[J]. Journal of the China society for scientific andtechnical information, 2016, 35(5): 530-538.
[18] 张智雄, 刘欢, 于改红. 构建基于科技文献知识的人工智能引擎[J]. 农业图书情报学报, 2021, 33(1): 17-31.
ZHANG Z X, LIU H, YU G H.Building an artificial intelligence engine based on scientific and technological literature knowledge[J]. Journal of library and information science in agriculture, 2021, 33(1): 17-31.
[19] 陆伟, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究[J]. 情报学报, 2020, 39(12): 1320-1329.
LU W, LI P C, ZHANG G B, et al.Recognition of lexical functions in academic texts: Automatic classification of keywords based on BERT vectorization[J]. Journal of the China society for scientific and technical information, 2020, 39(12): 1320-1329.
[20] ALMUGBEL Z, ELHAGGAR N, BUGSHAN N.Automatic structured abstract for research papers supported by tabular format using NLP[J]. International journal of advanced computer science and applications (IJACSA), 2019, 10(2): 233-240.
[21] 赵丹宁, 牟冬梅, 白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
ZHAO D N, MU D M, BAI S.Automatically extracting structural elements of sci-tech literature abstracts based on deep learning[J]. Data analysis and knowledge discovery, 2021, 5(7): 70-80.
[22] 刘忠宝, 王宇飞, 张志剑. 基于深度学习模型的摘要结构功能识别方法研究[J]. 情报科学, 2021, 39(3): 107-112.
LIU Z B, WANG Y F, ZHANG Z J.Research on the recognition method of abstract structure function based on deep learning model[J]. Information science, 2021, 39(3): 107-112.
[23] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[J/OL]. [2017-12-06].https://arxiv.org/abs/1706.03762.
[24] LOGESWARAN L, LEE H.An efficient framework for learning sentence representations[J/OL]. [2018-05-07].https://arxiv.org/abs/1803.02893.
[25] RADFORD A, NARASIMHAN K.Improving language under-standing by generative pre-training[J/OL]. [2021-08-24].https://s3-us-west2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
[26] YOON K.Convolutional neural networks for sentence classification[J/OL]. [2014-09-03].https://arxiv.org/abs/1408.5882.
[27] 马晨峰. 混合深度学习模型在新闻文本分类中的应用[D]. 济南: 山东大学, 2018.
MA C F.Hybrid deep learning model for news classification[D]. Jinan: Shandong university, 2018.
[28] ZHANG Y, WALLACEB. A sensitivity analysis of (and practitioners' guide to) convolutionalneural networks for sentence classification[J/OL].[2016-04-06]. https://arxiv.org/abs/1510.03820.
[29] SUN Y, WANG S, LI Y, et al.ERNIE: Enhanced representation through knowledge integration[J/OL].[2019-04-19].https://arxiv.org/abs/1904.09223v1.
[30] 黄河清, 韩健, 张鲸惊, 等. 中外科技期刊英文摘要文体格式的变化及建议[J]. 中国科技期刊研究, 2015, 26(2): 143-151.
HUANG H Q, HAN J, ZHANG J J, et al.Format and style of English abstract of scientific papers: Trend and recommendations[J]. Chinese journal of scientific and technical periodicals, 2015, 26(2): 143-151.
[31] GOTMARE A, KESKAR N S, XIONG C, et al.A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation[J/OL]. [2018-10-29].https://arxiv.org/abs/1810.13243.
[32] YOU Y, GITMAN I, GINSBURG B.Large batch training of convolitional networks[J/OL]. [2017-09-13].https://arxiv.org/abs/1708.03888v3.
PDF(7120 KB)

66

Accesses

0

Citation

Detail

段落导航
相关文章

/