[目的/意义]深度学习语言模型是当前提高机器语言智能的主要方法之一,已成为数据资源自动处理分析与知识情报智能挖掘计算不可或缺的重要技术手段,但在图情领域利用其进行技术开发和应用服务仍存在着一些困难。本研究通过系统梳理与揭示深度学习语言模型的研究进展、技术原理与应用开发方法,以期为图书馆员及同行从业者深入理解与应用深度学习语言模型提供理论依据与方法路径。[方法/过程]系统地调研和梳理了深度学习语言模型的产生背景、基础性特征表示算法、代表性应用开发工具,揭示其演化发展的动态历程及技术原理,分析各算法模型与开发工具的优缺点与适用性;深入地归纳总结了深度学习语言模型应用开发面临的挑战问题,提出两种拓展其应用能力的方法策略。[结果/结论]深度学习语言模型应用开发面临的重要挑战包括参数繁多,精度难调;依赖于大量准确的训练数据,变化困难;可能引发知识产权和信息安全问题等。未来可考虑从面向特定领域和特征工程两方面入手以拓展和提升其应用能力。
Abstract
[Purpose/Significance] Deep learning for language modeling is one of the major methods and advanced technologies to enhance language intelligence of machines at present, which has become an indispensable important technical means for automatic processing and analysis of data resources, and intelligent mining of information and knowledge. However, there are still some difficulties in using deep learning for language modeling for technology development and application service in the library and information science (LIS) field. Therefore, this study systematically reviews and reveals the research progress, technical principles, and development methods of deep learning for language modeling, with the aim at providing reliable theoretical basis and feasible methodological paths for the deep understanding and application of deep learning for language modeling for librarians and fellow practitioners. [Method/Process] The data used in this study were collected from the WOS core database, CNKI literature database, arXiv preprint repository, GitHub open-source software hosting platform and the open resources on the Internet. Based on these data, this paper first systematically investigates the background, basic feature representation algorithms, and representative application development tools of deep learning for language modeling, reveals their dynamic evolution and technical principles, and analyzes the advantages and disadvantages and applicability of each algorithm model and development tool. Second, an in-depth analysis of the possible challenging problems faced by the development and application of deep learning for language modeling was performed, and two strategic approaches to expand their application capabilities were put forward. [Results/Conclusions] The important challenges faced by the application and development of deep learning for language modeling include numerous parameters and difficulties to adjust accuracy, relying on a large amount of accurate training data, difficulties in making changes, and the intellectual property and information security issues. In the future, we will start from two aspects of specific domains and feature engineering to expand and improve the application capabilities of deep learning for language modeling. Specifically, we focus on consideration of the collection and preparation of domain data, selection of model architecture, participation of domain experts, and optimization for specific tasks, in order to ensure that the data source of the model is more reliable and secure, and the application effect is more accurate and practical. Moreover, the strategic methods for feature engineering to expand the application capabilities of deep learning for language modeling include selecting appropriate features, feature pre-processing, feature selection, and feature dimensionality reduction. These strategies can help improve the performance and efficiency of deep learning for language models, making them more suitable for specific tasks or domains. To sum up, LIS institutions should leverage the deep learning for language modeling related technologies, guided by the needs of scientific research and social development, and based on advantages of existing literature data resources and knowledge services; they should carry out innovative professional or vertical domain intelligent knowledge management and application service, and develop technology and systems with independent intellectual property rights, which is their long-term sustainable development path.
关键词
深度学习 /
语言模型 /
神经网络 /
预训练模型 /
词嵌入
{{custom_keyword}} /
Key words
deep learning /
language model /
neural network /
pre-trained model /
word embedding
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] ZHAO W X, ZHOU K, LI J Y, et al. A survey of large language models[J]. arXiv Preprint, arXiv:2303.18223, 2023.
[2] QIU X P, SUN T X, XU Y G, et al.Pre-trained models for natural lan-guage processing: A survey[J]. Science China technological sciences, 2020, 63(10): 1872-1897.
[3] 毛进, 陈子洋. 基于深度学习的科技文献摘要结构功能识别研究[J]. 农业图书情报学报, 2022, 34(3): 15-27.
MAO J, CHEN Z Y.A Deep learning based approach to structural function recognition of scientific literature abstracts[J]. Journal of library and information science in agriculture, 2022, 34(3): 15-27.
[4] 康明. 深度学习预训练语言模型-案例篇: 中文金融文本情绪分类研究[M]. 北京: 清华大学出版社, 2022.
KANG M.Deep learning pre-training language model-case: A study on emotion classification of Chinese financial texts[M]. Beijing: Tsinghua University Press, 2022.
[5] HINTON G E, OSINDERO S, TEH Y W.A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7): 1527-1554.
[6] YUSUKE S.Java deep learning essentials[M]. Beijing: China Ma-chine Press, 2017: 97-113.
[7] IENCO D, GAETANO R, INTERDONATO R, et al.Combining sen-tinel-1 and sentinel-2 time series via RNN for object-based land cov-er classification[C]// IGARSS 2019-2019 IEEE International Geo-science and Remote Sensing Symposium. Piscataway, New Jersey: IEEE, 2019: 4881-4884.
[8] JI S H, VISHWANATHAN S V N, SATISH N, et al. BlackOut: Speeding up recurrent neural network language models with very large vocabularies[J]. arXiv Preprint, arXiv:1511.06909, 2015.
[9] RNNLM Toolkit[EB/OL].[2023-02-20].https://github.com/IntelLabs/rnnlm.
[10] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[J]. arXiv Preprint, arXiv:1409.3215, 2014.
[11] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv Preprint, arXiv:1406.1078, 2014.
[12] KIM Y. Convolutional neural networks for sentence classification[J]. arXiv Preprint, arXiv:1408.5882, 2014.
[13] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[J]. arXiv Preprint, arXiv:1607.01759, 2016.
[14] LIU P F, QIU X P, HUANG X J. Recurrent neural network for text classification with multi-task learning[J]. arXiv Preprint, arXiv:1605.05101, 2016.
[15] LAI S W, XU L H, LIU K, et al.Recurrent convolutional neural networks for text classification[C]// Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. New York: ACM, 2015: 2267-2273.
[16] JOHNSON R, ZHANG T.Deep pyramid convolutional neural networks for text categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2017: 562-570.
[17] PAPPAS N, POPESCU-BELIS A. Multilingual hierarchical attention networks for document classification[J]. arXiv Preprint, arXiv:1707.00896, 2017.
[18] KIM Y, LEE H, JUNG K. Attention-based convolutional neural networks for multi-label emotion classification[EB/OL].[2018-01-01]. http://sciencewise.info/articles/1804.00831/.
[19] TensorFlow[EB/OL].[2023-02-25].https://tensorflow.google.cn/.
[20] Deeplearning4j[EB/OL].[2023-02-25].https://github.com/deep-learning4j.
[21] PyTorch[EB/OL].[2023-02-25].https://pytorch.org/.
[22] Theano[EB/OL].[2023-02-25].https://pypi.org/project/Theano/.
[23] Keras[EB/OL].[2023-02-25].https://keras.io/.
[24] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv Preprint, arXiv:1301.3781, 2014.
[25] LE Q V, MIKOLOV T.Distributed representations of sentences and documents[C]//ICML'14 Proceedings of the 31st International Conference on International Conference on Machine Learning. Beijing, China: ICML, 2014(32): 1188-1196.
[26] JEFFREY P, RICHARD S, CHRISTOPHER D M.GloVe: Global vectors for word representation[EB/OL].[2018-12-29].https://nlp.stanford.edu/projects/glove/.
[27] NIU L Q, DAI X Y, ZHANG J B, et al.Topic2Vec: Learning dis-tributed representations of topics[C]// 2015 International Conference on Asian Language Processing(IALP). Piscataway, New Jersey: IEEE, 2016: 193-196.
[28] MOODY C E. Mixing dirichlet topic models and word embeddings to make lda2vec[J]. arXiv Preprint, arXiv:1605.02019, 2016.
[29] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for im-age recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway, New Jersey: IEEE, 2016: 770-778.
[30] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[J]. arXiv Preprint, arXiv:1706.03762, 2017.
[31] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[J]. arXiv Preprint, arXiv:1802.05365, 2018.
[32] REDDY R. Universal language model fine-tuning for text classification[J]. arXiv Preprint, arXiv:1801.06146, 2018.
[33] GPT-2[EB/OL].[2023-02-28].https://github.com/openai/gpt-2.
[34] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv Preprint, arXiv:1810.04805, 2019.
[35] DAI Z H, YANG Z L, YANG Y M, et al.Transformer-XL: Attentive language models beyond a fixed-length context[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019.
[36] YANG Z L, DAI Z H, YANG Y M, et al.XLNet: Generalized autore-gressive pretraining for language understanding[J]. arXiv Preprint, arXiv: 1906.08237, 2019.
[37] ZHONG H X, ZHANG Z Y, LIU Z Y, et al.Open Chinese language pre-trained model zoo[EB/OL].[2020-03-18].https://github.com/thunlp/OpenCLaP.
[38] CUI Y M, CHE W X, LIU T, et al.Pre-training with whole word masking for Chinese BERT[EB/OL].[2023-03-09].https://github.com/ymcui/Chinese-BERT-wwm.
[39] XU L.RoBERTa for Chinese[EB/OL].[2022-06-15].https://github.com/brightmart/roberta_zh.
[40] ALAN A, DUNCAN B, ROLAND V.Contextual string embeddings for sequence labeling[EB/OL].[2023-03-10].https://github.com/zalandoresearch/flair.
[41] Stanford NLP[EB/OL].[2023-03-10].https://github.com/stanfordnlp.
[42] ChatGPT: Optimizing language models for dialogue[EB/OL].[2023-03-16].https://openai.com/blog/chatgpt.
[43] NISAN S, LONG O, JEFFREY W, et al.Learning to summarize with human feedback[C]//Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020: 3008-3021.
[44] LEO G, JOHN S, JACOB H. Scaling laws for reward model overoptimization[J]. arXiv Preprint, arXiv:2210.10760, 2022.
[45] GPT-4[EB/OL].[2023-03-16].https://openai.com/product/gpt-4.
[46] 刘高畅, 杨然. ChatGPT需要多少算力[R/OL]. 北京: 国盛证券, 2023.
LIU G C, YANG R.How much computing power does ChatGPT require[R/OL]. Beijing: Guosen Securities, 2023.
[47] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al.Dropout: A simple way to prevent neural networks from overfitting[J]. Journal of machine learning research, 2014, 15: 1929-1958.
[48] Al text classifier[EB/OL].[2023-03-16].https://platform.openai.com/ai-text-classifier.
[49] AIGC-X[EB/OL].[2023-03-16]. http://ai.sklccc.com.
[50] VAN DIS E A M, BOLLEN J, ZUIDEMA W, et al. ChatGPT: Five priorities for research[J]. Nature, 2023, 614(7947): 224-226.
[51] Prompt engineer and librarian[EB/OL].[2023-03-31].https://jobs.lever.co/Anthropic/e3cde481-d446-460f-b576-93cab67bd1ed.
[52] 张智雄, 钱力, 谢靖, 等. ChatGPT对科学研究和文献情报工作的影响[R/OL]. 北京: 国家科技图书文献中心 & 中国科学院文献情报中心, 2023.
ZHANG Z X, QIAN L, XIE J, et al.The Impact of ChatGPT on scientific research and documentation and information work[R/OL]. Beijing: National Science and Technology Library & National Science Library of Chinese Academy of Sciences, 2023.
[53] 张晓林. 从猿到人:探索知识服务的凤凰涅槃之路[J]. 数据分析与知识发现, 2023, 7(3): 1-4.
ZHANG X L.From ape to man: Exploring the phoenix nirvana road of knowledge service[J]. Data analysis and knowledge discovery, 2023, 7(3): 1-4.
[54] 曹树金, 曹茹烨. 从ChatGPT看生成式AI对情报学研究与实践的影响[J]. 现代情报, 2023, 43(4): 3-10.
CAO S J, CAO R Y.Influence of generative AI on the research and practice of information science from the perspective of ChatGPT[J]. Journal of modern information, 2023, 43(4): 3-10.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}