README.md 8.25 KB
Newer Older
1
# HOTMapper #
2

3
This respository contains the HOTMapper tool, a tool that allows the user to manage his historical data using a mapping protocol.
4

jvfpw18's avatar
jvfpw18 committed
5 6 7
Bellow we have a simple usage tutorial, if you want a more complete tutorial or know more about all HOTMapper aspects, 
please head to our [wiki page.](https://gitlab.c3sl.ufpr.br/tools/hotmapper/wikis/home)

8 9 10 11 12 13 14 15 16 17 18
## Table of content ##

 [HOTMapper](#hotmapper)
 - [Data](#data)
 - [Requirements](#requirements)
 - [Installation](#installation)
 - [Command Line Interface](#command-line-interface)
 - [Demo scenarios](#demo-scenarios)
     - [Demo scenario 1](#demo-scenario-1)
     - [Demo scenario 2](#demo-scenario-2)

19 20
## Data ##

21
The Open Data sources extracted and processed by the tool can be found at the link: [INEP](http://portal.inep.gov.br/web/guest/microdados) in the section "Censo Escolar" and "Censo da Educação Superior".
22

23
To make it easier to execute the tool, we have dowloaded all data from "Local Oferta" is in the directory `open_data`. This way it is not necessary to search for the original sources.
24

25
**NOTE**: It's important to verify if there is a column identifying the year of the dataset
26

27
## Requirements ##
28

29
* Python 3 (It's recommended to use a virtual environment, such as virtualenv)
jvfpw18's avatar
jvfpw18 committed
30
* [MonetDB](https://www.monetdb.org/Downloads) (We plan to make other databases to work with HOTMapper in the future)
31

32
## Installation ##
33

34 35
----
**NOTICE:**
36
We assume that Python 3.x is installed in the local computer and that all the following commands that use Python will use Python 3.x.
37 38 39 40 41
--

1) Install virtualenv

1a) On Linux/macOS
42 43

```bash
jvfpw18's avatar
jvfpw18 committed
44
$ sudo -H pip3 install virtualenv
45 46
```

47
1b) On Windows (with administrator privilleges)
48

49 50 51
```cmd
$ pip install virtualenv
```
52 53


54 55 56 57
2) Clone this repository
```bash
$ git clone git@gitlab.c3sl.ufpr.br:tools/hotmapper.git
```
58 59 60 61 62
or

```bash
$ git clone https://github.com/C3SL/hotmapper.git
```
63

64
3) Go to the repository
65

66 67 68 69 70 71 72 73 74 75 76
```bash
$ cd hotmapper
```

4) Create a virtual environment
 
```bash
$ virtualenv env
```

5) Start the virtual environment
77

78 79 80 81 82 83 84 85 86 87 88
5a) On Linux/macOS

```bash
$ source env/bin/activate
```

5b) On Windows (with administrator privilleges)

```cmd
$ .\env\Scripts/activate
```
89

90 91
6) Install dependencies
 
92
```bash
93
$ pip install -r requirements.txt
94 95
```

96
## Command Line Interface (CLI) ##
97

98
The CLI (Command Line Interface) uses the standard actions provided by manage.py, which means that to invoke a command it follows the following patterns:
99 100

```bash
101
$ python manage.py [COMMAND] [POSITIONAL ARGUMENTS] [OPTIONAL ARGUMENTS]
102 103
```

104
Where COMMAND can be:
105

106
* create: Creates a table using the mapping protocol.
107 108

```bash
109
$ python manage.py create <table_name>
110 111
```

112
**NOTICE** that the HOTMapper will use the name of the protocol as the name of the table.
113

114

115
* insert: Inserts a CSV file in an existing table.
116 117

```bash
118 119 120 121 122 123 124 125 126 127 128 129 130 131
$ python manage.py insert <full/path/for/the/file> <table_name> <year> [--sep separator] [--null null_value]
```

```
<full/path/for/the/file> : The absolute file path

<table_name>: The name of the table where the file will be inserted

<year>: The column of the mapping protocol that the HOTMapper should use to insert data

[--sep separator]: The custom separtor of the CSV. To change it you should just replace 'separator' with the token your file uses

[--null null_value]: Define what will replace the null value. Replace the 'null_value' with what you wish to do.

132 133 134
```


135 136

* drop: Delete a table from the database
137 138

```bash
139
$ python manage.py drop <table_name>
140 141
```

142
**NOTICE:** The command does not handle foreign keys that points to the table that are being deleted.
143

144
* remap: syncronizes a table with the mapping definition.
145

146 147 148
```bash
$ python manage.py remap <table_name>
```
149
This command should be run everytime a mapping definition is updated.
150

151
The remap allows the creation of new columns, the exclusion of existing columns, the renaming of columns and the modification of the type of columns. Be aware that the bigger the table the bigger the useage of RAM memory.
152

153
* update_from_file: Updates the data in the table
154 155 156 157 158

```bash
$ python manage.py update_from_file <csv_file> <table_name> <year> [--columns="column_name1","column_name2"] [--sep=separator]
```

159
* generate_pairing_report: generates reports to compare data from diferent years.
160 161 162 163 164

```bash
$ python manage.py generate_pairing_report [--output xlsx|csv]
```

165 166
The reports will be created in the folder "pairing" 

167

168
* generate_backup: Create/Update a file to backup the database.
169 170 171

```bash
$ python manage.py generate_backup
172 173 174
```
## Demo scenarios ##

175
In this Section we will explain how to execute the demo scenarios that were submitted to EDBT 2019. Demo scenario 1 uses the dataset "local oferta", which is included in the directory `open_data`. Demo scenario 2 uses the dataset "matricula" which can be downloaded from the [INEP's Link ](http://portal.inep.gov.br/web/guest/microdados) in the section "Censo Escolar".
176

177
In both scenarios, we assume that you started the virtual environment as explained in Section `Installation - 5`
178

179
### Demo scenario 1 ###
180

181
This Section contains the commands used in the scenario 1, which is the creation of a new table and the inclusion of the corresponding data.
182 183


184
1) First we need to create the table in the database, to do so we execute the following command:
185 186 187 188
```bash
$ ./manage.py create localoferta_ens_superior
```

189
2) Now, as we already have the mapping definition, we need to insert the open data in the database. To do it we must execute the following commands:
190

191
**NOTE:** FILEPATH is the **_full path_** for the directory where the open data table is, for example (in a Linux environment): `/home/c3sl/HOTMapper/open_data/DM_LOCAL_OFERTA_2010.CSV`
192 193 194 195


a) To insert 2010:
```bash
jvfpw18's avatar
jvfpw18 committed
196
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2010.CSV localoferta_ens_superior 2010 --sep="|" 
197 198 199 200
```

b) To insert 2011:
```bash
jvfpw18's avatar
jvfpw18 committed
201
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2011.CSV localoferta_ens_superior 2011 --sep="|" 
202 203 204 205
```

c) To insert 2012:
```bash
jvfpw18's avatar
jvfpw18 committed
206
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2012.CSV localoferta_ens_superior 2012 --sep="|" 
207 208 209 210
```

d) To insert 2013:
```bash
jvfpw18's avatar
jvfpw18 committed
211
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2013.CSV localoferta_ens_superior 2013 --sep="|" 
212 213 214 215
```

e) To insert 2014:
```bash
jvfpw18's avatar
jvfpw18 committed
216
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2014.CSV localoferta_ens_superior 2014 --sep="|" 
217 218 219 220
```

f) To insert 2015:
```bash
jvfpw18's avatar
jvfpw18 committed
221
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2015.CSV localoferta_ens_superior 2015 --sep="|" 
222 223 224 225
```

g) To insert 2016:
```bash
jvfpw18's avatar
jvfpw18 committed
226
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2016.CSV localoferta_ens_superior 2016 --sep="|" 
227 228
```

229
### Demo scenario 2 ###
230

231
This Section contains the commands used in the scenario 2, which is an update of an existing table.
232 233


234
1) First we need to create the table in the database, to do so we execute the following command:
235
```bash
236
$ ./manage.py create matricula
237 238 239 240
```

2) Now, as we already have the mapping protocol, we need to insert the open data in the data base. To do it we must execute the following commands:

241
**NOTE:** FILEPATH is the **_full path_** for the directory where the open data table is, for example (in a Linux environment): `/home/c3sl/HOTMapper/open_data/MATRICULA_2013.CSV`
242 243 244

a) To insert 2013:
```bash
245
$ ./manage.py insert FILEPATH/MATRICULA_2013.CSV matricula 2013 --sep="|" 
246 247 248 249
```

b) To insert 2014:
```bash
250
$ ./manage.py insert FILEPATH/MATRICULA_2014.CSV matricula 2014 --sep="|" 
251 252 253 254
```

c) To insert 2015:
```bash
255
$ ./manage.py insert FILEPATH/MATRICULA_2015.CSV matricula 2015 --sep="|" 
256 257 258 259
```

d) To insert 2016:
```bash
260
$ ./manage.py insert FILEPATH/MATRICULA_2016.CSV matricula 2016 --sep="|" 
261 262
```

263
3) Change the matricula's mapping protocol. You can use the `matricula_remap.csv` (To do so, rename the current `matricula.csv` to something else and the `matricula_remap.csv` to `matricula.csv`). In that case, the only column that will change is the "profissionalizante", because now, instead of the `ELSE returns 0` it returns `9`. 
264 265 266 267 268 269

4) Run the remap command

```bash
$ ./manage.py remap matricula
```
270
The above command will update the table `Fonte` and the schema from the table `matricula`
271 272 273 274

5) Update the table

```bash
275
$ ./manage.py update_from_file FILEPATH/MATRICULA_2013.CSV matricula 2013 --columns="profissionalizante" --sep="|"
276 277
```

278
The above command will update the data in the table `matricula`.