Skip to content

Source code for the EMNLP 2019 paper "Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text" (给定科研文本如生物医药文献,联合抽取其中事实三元组、条件三元组,即对文献进行信息结构化)

License

twjiang/MIMO_CFE

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Joint Extraction of Fact and Condition Tuples from Sceintific Text

Introduction

This repository contains source code for the EMNLP 2019 paper " "Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text" (Paper).

Usage

1.Clone the Repository

git clone https://github.com/twjiang/MIMO_CFE.git

2.Download External Resources

  • The dumped MIMO can be found here.

  • The word embedding we use can be found here.

  • The pre-trained language model we use can be found here.

put these files into ./resources folder

3.Install Requirements

This repo is tested on Python 3.6, PyTorch 1.2.0

Create Environment (Optional): Ideally, you should create an environment for the project.

conda create -n mimo python=3.6

conda activate mimo

pip install -r requirments.txt

4.Start a demo application

cd MIMO_service

python mimo_server.py #Start a MIMO service

python client.py 

The output of the demo is shown below.

{
	'statements': {
		'stmt 1': {
			'text': 'Histone deacetylase inhibitor valproic acid ( VPA ) has been used to increase the reprogramming efficiency of induced pluripotent stem cell ( iPSC ) from somatic cells , yet the specific molecular mechanisms underlying this effect is unknown .',
			'fact tuples': [
				['Histone deacetylase inhibitor valproic acid', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming efficiency'],
				['VPA', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming efficiency'],
				['Histone deacetylase inhibitor valproic acid', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming'],
				['specific molecular mechanisms', 'NIL', 'is unknown', 'NIL', 'NIL']
			],
			'condition tuples': [
				['iPSC', 'reprogramming efficiency', 'from', 'somatic cells', 'NIL'],
				['induced pluripotent stem cell', 'reprogramming efficiency', 'from', 'somatic cells', 'NIL'],
				['specific molecular mechanisms', 'NIL', 'underlying', 'NIL', 'effect']
			],
			'concept_indx': [0, 1, 2, 3, 4, 6, 17, 18, 19, 20, 22, 25, 26, 30, 31, 32],
			'attr_indx': [14, 15, 35],
			'predicate_indx': [8, 9, 10, 11, 12, 24, 33, 36, 37]
		}
	}
}

5. Train Your Own MIMO

example commands for pretrain:

(all gates for LM, pretrain)

python train.py --cuda --config 111000000 --model_name MIMO_BERT_LSTM --pretrain

(all gates for POS, pretrain)

python train.py --cuda --config 000111000 --model_name MIMO_BERT_LSTM --pretrain

(all gates for LM and POS, pretrain)

python train.py --cuda --config 111111000 --model_name MIMO_BERT_LSTM --pretrain

example commands with multi-output:

(all gates for LM with multi-output)

python train.py --cuda --config 111000000 --model_name MIMO_BERT_LSTM

(all gates for POS with multi-output)

python train.py --cuda --config 000111000 --model_name MIMO_BERT_LSTM

(all gates for LM and POS, with multi-output)

python train.py --cuda --config 111111000 --model_name MIMO_BERT_LSTM

Reference

@inproceedings{jiang-mimo,
    title = "Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text",
    author = "Jiang, Tianwen and Zhao, Tong and Qin, Bing and Liu, Ting and Chawla, Nitesh V and Jiang, Meng",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
}

About

Source code for the EMNLP 2019 paper "Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text" (给定科研文本如生物医药文献,联合抽取其中事实三元组、条件三元组,即对文献进行信息结构化)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages