README.md 9 KB
Newer Older
1
# HOTMapper #
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
2

3
This respository contains the HOTMapper tool, a tool that allows the user to manage his historical data using a mapping protocol.
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
4

jvfpw18's avatar
jvfpw18 committed
5 6 7
Bellow we have a simple usage tutorial, if you want a more complete tutorial or know more about all HOTMapper aspects, 
please head to our [wiki page.](https://gitlab.c3sl.ufpr.br/tools/hotmapper/wikis/home)

8 9 10 11 12 13 14 15 16 17
## Table of content ##

 [HOTMapper](#hotmapper)
 - [Data](#data)
 - [Requirements](#requirements)
 - [Installation](#installation)
 - [Command Line Interface](#command-line-interface)
 - [Demo scenarios](#demo-scenarios)
     - [Demo scenario 1](#demo-scenario-1)
     - [Demo scenario 2](#demo-scenario-2)
Marcos Didonet Del Fabro's avatar
Marcos Didonet Del Fabro committed
18
 - [Publications](#publications)
19

Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
20 21
## Data ##

22
The Open Data sources extracted and processed by the tool can be found at the link: [INEP](http://portal.inep.gov.br/web/guest/microdados) in the section "Censo Escolar" and "Censo da Educação Superior".
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
23

24
To make it easier to execute the tool, we have dowloaded all data from "Local Oferta" is in the directory `open_data`. This way it is not necessary to search for the original sources.
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
25

26
**NOTE**: It's important to verify if there is a column identifying the year of the dataset
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
27

28
## Requirements ##
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
29

30
* Python 3 (It's recommended to use a virtual environment, such as virtualenv)
jvfpw18's avatar
jvfpw18 committed
31
* [MonetDB](https://www.monetdb.org/Downloads) (We plan to make other databases to work with HOTMapper in the future)
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
32

33
## Installation ##
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
34

35 36
----
**NOTICE:**
37
We assume that Python 3.x is installed in the local computer and that all the following commands that use Python will use Python 3.x.
38 39 40 41 42
--

1) Install virtualenv

1a) On Linux/macOS
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
43 44

```bash
jvfpw18's avatar
jvfpw18 committed
45
$ sudo -H pip3 install virtualenv
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
46 47
```

48
1b) On Windows (with administrator privilleges)
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
49

50 51 52
```cmd
$ pip install virtualenv
```
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
53 54


55 56 57 58
2) Clone this repository
```bash
$ git clone git@gitlab.c3sl.ufpr.br:tools/hotmapper.git
```
59 60 61 62 63
or

```bash
$ git clone https://github.com/C3SL/hotmapper.git
```
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
64

65
3) Go to the repository
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
66

67 68 69 70 71 72 73 74 75 76 77
```bash
$ cd hotmapper
```

4) Create a virtual environment
 
```bash
$ virtualenv env
```

5) Start the virtual environment
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
78

79 80 81 82 83 84 85 86 87 88 89
5a) On Linux/macOS

```bash
$ source env/bin/activate
```

5b) On Windows (with administrator privilleges)

```cmd
$ .\env\Scripts/activate
```
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
90

91 92
6) Install dependencies
 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
93
```bash
94
$ pip install -r requirements.txt
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
95 96
```

97
## Command Line Interface (CLI) ##
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
98

99
The CLI (Command Line Interface) uses the standard actions provided by manage.py, which means that to invoke a command it follows the following patterns:
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
100 101

```bash
102
$ python manage.py [COMMAND] [POSITIONAL ARGUMENTS] [OPTIONAL ARGUMENTS]
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
103 104
```

105
Where COMMAND can be:
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
106

107
* create: Creates a table using the mapping protocol.
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
108 109

```bash
110
$ python manage.py create <table_name>
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
111 112
```

113
**NOTICE** that the HOTMapper will use the name of the protocol as the name of the table.
114

Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
115

116
* insert: Inserts a CSV file in an existing table.
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
117 118

```bash
119 120 121 122 123 124 125 126 127 128 129 130 131 132
$ python manage.py insert <full/path/for/the/file> <table_name> <year> [--sep separator] [--null null_value]
```

```
<full/path/for/the/file> : The absolute file path

<table_name>: The name of the table where the file will be inserted

<year>: The column of the mapping protocol that the HOTMapper should use to insert data

[--sep separator]: The custom separtor of the CSV. To change it you should just replace 'separator' with the token your file uses

[--null null_value]: Define what will replace the null value. Replace the 'null_value' with what you wish to do.

Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
133 134 135
```


136 137

* drop: Delete a table from the database
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
138 139

```bash
140
$ python manage.py drop <table_name>
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
141 142
```

143
**NOTICE:** The command does not handle foreign keys that points to the table that are being deleted.
144

145
* remap: syncronizes a table with the mapping definition.
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
146

147 148 149
```bash
$ python manage.py remap <table_name>
```
150
This command should be run everytime a mapping definition is updated.
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
151

152
The remap allows the creation of new columns, the exclusion of existing columns, the renaming of columns and the modification of the type of columns. Be aware that the bigger the table the bigger the useage of RAM memory.
153

154
* update_from_file: Updates the data in the table
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
155 156 157 158 159

```bash
$ python manage.py update_from_file <csv_file> <table_name> <year> [--columns="column_name1","column_name2"] [--sep=separator]
```

160
* generate_pairing_report: generates reports to compare data from diferent years.
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
161 162 163 164 165

```bash
$ python manage.py generate_pairing_report [--output xlsx|csv]
```

166 167
The reports will be created in the folder "pairing" 

Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
168

169
* generate_backup: Create/Update a file to backup the database.
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
170 171 172

```bash
$ python manage.py generate_backup
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
173 174 175
```
## Demo scenarios ##

176
In this Section we will explain how to execute the demo scenarios that were submitted to EDBT 2019. Demo scenario 1 uses the dataset "local oferta", which is included in the directory `open_data`. Demo scenario 2 uses the dataset "matricula" which can be downloaded from the [INEP's Link ](http://portal.inep.gov.br/web/guest/microdados) in the section "Censo Escolar".
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
177

178
In both scenarios, we assume that you started the virtual environment as explained in Section `Installation - 5`
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
179

180
### Demo scenario 1 ###
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
181

182
This Section contains the commands used in the scenario 1, which is the creation of a new table and the inclusion of the corresponding data.
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
183 184


185
1) First we need to create the table in the database, to do so we execute the following command:
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
186 187 188 189
```bash
$ ./manage.py create localoferta_ens_superior
```

190
2) Now, as we already have the mapping definition, we need to insert the open data in the database. To do it we must execute the following commands:
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
191

192
**NOTE:** FILEPATH is the **_full path_** for the directory where the open data table is, for example (in a Linux environment): `/home/c3sl/HOTMapper/open_data/DM_LOCAL_OFERTA_2010.CSV`
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
193 194 195 196


a) To insert 2010:
```bash
jvfpw18's avatar
jvfpw18 committed
197
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2010.CSV localoferta_ens_superior 2010 --sep="|" 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
198 199 200 201
```

b) To insert 2011:
```bash
jvfpw18's avatar
jvfpw18 committed
202
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2011.CSV localoferta_ens_superior 2011 --sep="|" 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
203 204 205 206
```

c) To insert 2012:
```bash
jvfpw18's avatar
jvfpw18 committed
207
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2012.CSV localoferta_ens_superior 2012 --sep="|" 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
208 209 210 211
```

d) To insert 2013:
```bash
jvfpw18's avatar
jvfpw18 committed
212
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2013.CSV localoferta_ens_superior 2013 --sep="|" 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
213 214 215 216
```

e) To insert 2014:
```bash
jvfpw18's avatar
jvfpw18 committed
217
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2014.CSV localoferta_ens_superior 2014 --sep="|" 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
218 219 220 221
```

f) To insert 2015:
```bash
jvfpw18's avatar
jvfpw18 committed
222
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2015.CSV localoferta_ens_superior 2015 --sep="|" 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
223 224 225 226
```

g) To insert 2016:
```bash
jvfpw18's avatar
jvfpw18 committed
227
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2016.CSV localoferta_ens_superior 2016 --sep="|" 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
228 229
```

230
### Demo scenario 2 ###
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
231

232
This Section contains the commands used in the scenario 2, which is an update of an existing table.
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
233 234


235
1) First we need to create the table in the database, to do so we execute the following command:
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
236
```bash
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
237
$ ./manage.py create matricula
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
238 239 240 241
```

2) Now, as we already have the mapping protocol, we need to insert the open data in the data base. To do it we must execute the following commands:

Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
242
**NOTE:** FILEPATH is the **_full path_** for the directory where the open data table is, for example (in a Linux environment): `/home/c3sl/HOTMapper/open_data/MATRICULA_2013.CSV`
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
243 244 245

a) To insert 2013:
```bash
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
246
$ ./manage.py insert FILEPATH/MATRICULA_2013.CSV matricula 2013 --sep="|" 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
247 248 249 250
```

b) To insert 2014:
```bash
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
251
$ ./manage.py insert FILEPATH/MATRICULA_2014.CSV matricula 2014 --sep="|" 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
252 253 254 255
```

c) To insert 2015:
```bash
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
256
$ ./manage.py insert FILEPATH/MATRICULA_2015.CSV matricula 2015 --sep="|" 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
257 258 259 260
```

d) To insert 2016:
```bash
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
261
$ ./manage.py insert FILEPATH/MATRICULA_2016.CSV matricula 2016 --sep="|" 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
262 263
```

264
3) Change the matricula's mapping protocol. You can use the `matricula_remap.csv` (To do so, rename the current `matricula.csv` to something else and the `matricula_remap.csv` to `matricula.csv`). In that case, the only column that will change is the "profissionalizante", because now, instead of the `ELSE returns 0` it returns `9`. 
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
265 266 267 268 269 270

4) Run the remap command

```bash
$ ./manage.py remap matricula
```
271
The above command will update the table `Fonte` and the schema from the table `matricula`
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
272 273 274 275

5) Update the table

```bash
276
$ ./manage.py update_from_file FILEPATH/MATRICULA_2013.CSV matricula 2013 --columns="profissionalizante" --sep="|"
Henrique Varella Ehrenfried's avatar
Henrique Varella Ehrenfried committed
277 278
```

279
The above command will update the data in the table `matricula`.
Marcos Didonet Del Fabro's avatar
Marcos Didonet Del Fabro committed
280 281

## Publications ##
Marcos Didonet Del Fabro's avatar
Marcos Didonet Del Fabro committed
282
* Henrique Varella Ehrenfried, Eduardo Todt, Daniel Weingaertner, Luis Carlos Erpen de Bona, Fabiano Silva, and Marcos Didonet Del Fabro. Managing Open Data Evolution through Bi-dimensional Mappings. IEEE/ACM BDCAT ’19, pp 159-162, December 2–5, 2019, Auckland, New Zealand [Extended version available](http://www.inf.ufpr.br/didonet/articles/2019_HotMapper_report.pdf) 
Marcos Didonet Del Fabro's avatar
Marcos Didonet Del Fabro committed
283
* Henrique Varella Ehrenfried, Rudolf Eckelberg, Hamer Iboshi, Eduardo Todt, Daniel Weingaertner and Marcos Didonet Del Fabro, HOTMapper: Historical Open Data Table Mapper. EDBT 2019, Demo paper. pp. 550-553, March 2019, Lisbon, Portugal. [Open Proceeddings available](http://openproceedings.org/2019/conf/edbt/EDBT19_paper_231.pdf)