
Construction of a Multimodal Dataset for Emergency Event Identification and Classification
Yifan ZHANG, Zuqin CHEN, Jike GE, Mingkun HE, Jie TAN
Construction of a Multimodal Dataset for Emergency Event Identification and Classification
[Purpose/Significance] Rich Internet data provide a multi-dimensional perspective for understanding emergencies, and multimodal emergency classification methods have emerged. However, the existing multimodal datasets of emergencies are not only scarce, but also lacking in diversity in categories, which is not enough to support related research, and greatly affects the progress of subsequent research. Compared with previous public datasets, the dataset constructed in this paper has richer categories and more improved modalities. This dataset solves the key gaps in the availability and diversity of multimodal datasets of emergencies. It not only expands the category range, but also provides more detailed classification in the natural disaster category, which is crucial for developing robust and accurate multimodal classification models. [Method/Process] An emergency event dataset (MEED) based on multimodal information was constructed, which contains data from five categories: accident disasters, public health, social security, natural disasters, and non-emergency events. The natural disaster data are divided into seven subcategories: geological disasters, biological disasters, drought disasters, marine disasters, meteorological disasters, earthquake disasters, and forest and grassland fires. [Results/Conclusions] The existing emergency classification methods were analyzed and validated on the emergency public dataset and MEED. The results showed that MEED helped improve the performance of multimodal models by more than 10% compared with the currently available emergency datasets. The results show that the improvement in model performance highlights the value of MEED in promoting emergency management and response research and applications. The dataset enables researchers and practitioners to better understand the complexity of emergencies and develop more effective prevention, mitigation, and response strategies. The improvement in model performance also shows that multimodal methods are a promising direction for analyzing emergency events because it leverages the advantages of different types of data to achieve higher accuracy and reliability in classification tasks. The creation of MEED is a major advancement in the field of emergency management, providing researchers with a valuable resource and potentially leading to the development of more sophisticated tools for responding to emergencies. However, the dataset still has certain limitations. Over time, the number of emergencies on the Internet continues to grow, which requires us to continuously update the dataset to adapt to new situations. The size of the dataset largely determines the performance of the classification model. The class imbalance problem of the emergency dataset constructed in this paper needs to be solved. In future research, we will continue to update and maintain the dataset in a timely manner to address these issues.
incidents / multimodal / dataset / deep learning / data acquisition / data annotations {{custom_keyword}} /
Table 1 Existing single-modal and multi-modal emergency event datasets表1 现有的单模态和多模态突发事件数据集 |
数据集分类 | 数据集名称 | 标签数量/个 |
---|---|---|
单模态 | CEC-Corpus CEEC-Corpus DuEE1.0 HumAID TREC | 5 6 8 5 5 |
多模态 | CrisisMMD | 4 |
Table 2 Analysis of the number of MEED events表2 MEED事件数量分析 |
类别名称 | 数量/个 |
---|---|
事故灾难 | 4 619 |
公共卫生 | 4 271 |
社会安全 | 1 132 |
自然灾害 | 1 822 |
非突发事件 | 3 337 |
Table 3 Fine-grained quantitative analysis of natural disasters表3 自然灾害细粒度数量分析 |
类别名称 | 数量/个 |
---|---|
地质灾害 | 343 |
生物灾害 | 34 |
干旱灾害 | 6 |
海洋灾害 | 14 |
气象灾害 | 643 |
地震灾害 | 668 |
森林草原火灾 | 114 |
Table 4 Comparison between MEED and existing multimodal emergency event datasets表4 MEED和现有的多模态突发事件数据集对比 |
数据集 | 突发事件/个 | 非突发事件/个 | 类别数量/个 |
---|---|---|---|
CrisisMMD | 12 043 | 0 | 4 |
MEED | 11 844 | 3 337 | 5 |
Table 5 Parameter setting for different emergency classification methods表5 不同突发事件分类方法的参数设置 |
分类方法 | 批量大小 | 优化器 |
---|---|---|
VGG-16 | 10 | Adam |
TextCNN | 128 | Adam |
BERT-base | 128 | Adam |
TextCNN + VGG16 | 10 | Adam |
BERT+Vit | 10 | Adam |
Table 6 Detection effects of various emergency classification methods on different datasets表6 多种突发事件分类方法在不同数据集上的检测效果 |
模型 | MEED | CrisisMMD | ||
---|---|---|---|---|
Accuracy | F1-Score | Accuracy | F1-Score | |
VGG-16 | 0.856 | 0.855 | 0.833 | 0.832 |
TextCNN | 0.858 | 0.860 | 0.808 | 0.809 |
BERT-base | 0.966 | 0.957 | 0.852 | 0.891 |
TextCNN + VGG16 | 0.973 | 0.967 | 0.844 | 0.842 |
BERT+Vit | 0.979 | 0.973 | 0.851 | 0.853 |
Fig.2 Comparison of indicators of different classification methods on MEED and CrisisMMD图2 不同分类方法在MEED与CrisisMMD上的指标对比 |
1 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
2 |
陈国兰. 基于爆发词识别的微博突发事件监测方法研究[J]. 情报杂志, 2014, 33(9): 123-128.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
3 |
张馨月, 宋绍成. 突发事件中基于支持向量机算法的文本分类研究[J]. 信息技术与信息化, 2022(8): 13-16.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
4 |
闫宏丽, 罗永莲. 基于决策树方法的突发事件新闻分类[J]. 电子技术与软件工程, 2020(2): 194-195.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
5 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
6 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
7 |
胡庭恺, 陈祖琴, 葛继科, 等. 开放领域新闻中基于自适应决策边界的突发事件识别和分类研究[J]. 情报理论与实践, 2023, 46(2): 194-200.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
8 |
范昊, 何灏. 融合上下文特征和BERT词嵌入的新闻标题分类研究[J]. 情报科学, 2022, 40(6): 90-97.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
9 |
宋英华, 吕龙, 刘丹. 基于组合深度学习模型的突发事件新闻识别与分类研究[J]. 情报学报, 2021, 40(2): 145-151.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
10 |
陈锟, 裴雷, 范涛. 基于多模态融合的突发事件分类研究[J]. 现代情报, 2023, 43(6): 24-34.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
11 |
周红磊, 张海涛, 栾宇, 等. 基于文本—图像增强的突发事件识别及分类方法研究[J]. 情报理论与实践, 2024, 47(4): 181-188.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
12 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
13 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
14 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
15 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
16 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
17 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
18 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
19 |
国家质量监督检验检疫总局, 中国国家标准化管理委员会. 突发事件分类与编码: GB/T 35561-2017 [S]. 北京: 中国标准出版社, 2018.
General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China, Standardization Administration of the People's Republic of China. Emergency classification and coding: GB/T 35561-2017 [S]. Beijing: Standards Press of China, 2018.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
20 |
国家质量监督检验检疫总局, 中国国家标准化管理委员会. 自然灾害分类与代码: GB/T 28921-2012 [S]. 北京: 中国标准出版社, 2013.
General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China, Standardization Administration of the People's Republic of China. Classification and codes for natural disasters: GB/T 28921-2012 [S]. Beijing: Standards Press of China, 2013.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
21 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
22 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_ref.label}} |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
/
〈 |
|
〉 |