Skip to content

Commit 1898eae

Browse files
committed
Update readme to make it more clear
1 parent 803946b commit 1898eae

1 file changed

Lines changed: 24 additions & 20 deletions

File tree

README.md

Lines changed: 24 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
1-
# GreaseLM: Graph REASoning Enhanced Language Models
1+
# GreaseLM: Graph REASoning Enhanced Language Models for Question Answering
22

3-
This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language Models".
3+
This repo provides the source code & data of our paper [GreaseLM: Graph REASoning Enhanced Language Models for Question Answering](https://arxiv.org/abs/2201.08860) (ICLR 2022 spotlight).
44

55
<p align="center">
66
<img src="./figs/greaselm.png" width="600" title="GreaseLM model architecture" alt="">
77
</p>
88

9-
## Usage
10-
### 1. Dependencies
9+
## 1. Dependencies
1110

1211
- [Python](<https://www.python.org/>) == 3.8
1312
- [PyTorch](<https://pytorch.org/get-started/locally/>) == 1.8.0
@@ -34,14 +33,17 @@ pip install torch-geometric==1.7.0 -f https://pytorch-geometric.com/whl/torch-1.
3433
```
3534

3635

37-
### 2. Download data
36+
## 2. Download data
3837

39-
Download all the raw data -- ConceptNet, CommonsenseQA, OpenBookQA -- by
38+
### Download and preprocess data yourself
39+
**Preprocessing the data yourself may take long, so if you want to directly download preprocessed data, please jump to the next subsection.**
40+
41+
Download the raw ConceptNet, CommonsenseQA, OpenBookQA data by using
4042
```
4143
./download_raw_data.sh
4244
```
4345

44-
You can preprocess the raw data by running
46+
You can preprocess these raw data by running
4547
```
4648
CUDA_VISIBLE_DEVICES=0 python preprocess.py -p <num_processes>
4749
```
@@ -51,32 +53,34 @@ You can specify the GPU you want to use in the beginning of the command `CUDA_VI
5153
* Identify all mentioned concepts in the questions and answers
5254
* Extract subgraphs for each q-a pair
5355

54-
**TL;DR**. The preprocessing may take long; for your convenience, you can download all the processed data [here](https://drive.google.com/drive/folders/1T6B4nou5P3u-6jr0z6e3IkitO8fNVM6f?usp=sharing) into the top-level directory of this repo and run
55-
```
56-
unzip data_preprocessed.zip
57-
```
56+
The script to download and preprocess the [MedQA-USMLE](https://github.com/jind11/MedQA) data and the biomedical knowledge graph based on Disease Database and DrugBank is provided in `utils_biomed/`.
5857

59-
**Add MedQA-USMLE**. Besides the commonsense QA datasets (*CommonsenseQA*, *OpenBookQA*) with the ConceptNet knowledge graph, we added a biomedical QA dataset ([*MedQA-USMLE*](https://github.com/jind11/MedQA)) with a biomedical knowledge graph based on Disease Database and DrugBank. You can download all the data for this from [[here]](https://drive.google.com/file/d/1EqbiNt2ACXVrc9gmoXnzTEo9GJTe9Uor/view?usp=sharing). Unzip it and put the `medqa_usmle` and `ddb` folders inside the `data/` directory.
58+
### Directly download preprocessed data
59+
For your convenience, if you don't want to preprocess the data yourself, you can download all the preprocessed data [here](https://drive.google.com/drive/folders/1T6B4nou5P3u-6jr0z6e3IkitO8fNVM6f?usp=sharing). Download them into the top-level directory of this repo and unzip them. Move the `medqa_usmle` and `ddb` folders into the `data/` directory.
6060

61+
### Resulting file structure
6162

6263
The resulting file structure should look like this:
6364

6465
```plain
6566
.
6667
├── README.md
67-
── data/
68-
├── cpnet/ (preprocessed ConceptNet)
69-
── csqa/
68+
── data/
69+
├── cpnet/ (prerocessed ConceptNet)
70+
── csqa/
7071
├── train_rand_split.jsonl
7172
├── dev_rand_split.jsonl
7273
├── test_rand_split_no_answers.jsonl
7374
├── statement/ (converted statements)
7475
├── grounded/ (grounded entities)
7576
├── graphs/ (extracted subgraphs)
7677
├── ...
78+
├── obqa/
79+
├── medqa_usmle/
80+
└── ddb/
7781
```
7882

79-
### 3. Training GreaseLM
83+
## 3. Training GreaseLM
8084
To train GreaseLM on CommonsenseQA, run
8185
```
8286
CUDA_VISIBLE_DEVICES=0 ./run_greaselm.sh csqa --data_dir data/
@@ -93,14 +97,14 @@ To train GreaseLM on MedQA-USMLE, run
9397
CUDA_VISIBLE_DEVICES=0 ./run_greaselm__medqa_usmle.sh
9498
```
9599

96-
### 4. Pretrained model checkpoints
100+
## 4. Pretrained model checkpoints
97101
You can download a pretrained GreaseLM model on CommonsenseQA [here](https://drive.google.com/file/d/1QPwLZFA6AQ-pFfDR6TWLdBAvm3c_HOUr/view?usp=sharing), which achieves an IH-dev acc. of `79.0` and an IH-test acc. of `74.0`.
98102

99103
You can also download a pretrained GreaseLM model on OpenbookQA [here](https://drive.google.com/file/d/1-QqyiQuU9xlN20vwfIaqYQ_uJMP8d7Pv/view?usp=sharing), which achieves an test acc. of `84.8`.
100104

101105
You can also download a pretrained GreaseLM model on MedQA-USMLE [here](https://drive.google.com/file/d/1x5nZEprV0Ht8IWViyz3d07uGLXtNjUN1/view?usp=sharing), which achieves an test acc. of `38.5`.
102106

103-
### 5. Evaluating a pretrained model checkpoint
107+
## 5. Evaluating a pretrained model checkpoint
104108
To evaluate a pretrained GreaseLM model checkpoint on CommonsenseQA, run
105109
```
106110
CUDA_VISIBLE_DEVICES=0 ./eval_greaselm.sh csqa --data_dir data/ --load_model_path /path/to/checkpoint
@@ -112,13 +116,13 @@ SimilarlyTo evaluate a pretrained GreaseLM model checkpoint on OpenbookQA, run
112116
CUDA_VISIBLE_DEVICES=0 ./eval_greaselm.sh obqa --data_dir data/ --load_model_path /path/to/checkpoint
113117
```
114118

115-
### 6. Use your own dataset
119+
## 6. Use your own dataset
116120
- Convert your dataset to `{train,dev,test}.statement.jsonl` in .jsonl format (see `data/csqa/statement/train.statement.jsonl`)
117121
- Create a directory in `data/{yourdataset}/` to store the .jsonl files
118122
- Modify `preprocess.py` and perform subgraph extraction for your data
119123
- Modify `utils/parser_utils.py` to support your own dataset
120124

121-
## Acknowledgment
125+
## 7. Acknowledgment
122126
This repo is built upon the following work:
123127
```
124128
QA-GNN: Question Answering using Language Models and Knowledge Graphs

0 commit comments

Comments
 (0)