Commit 382f81d2 authored by Henrique Varella Ehrenfried's avatar Henrique Varella Ehrenfried
Browse files

Update readme

parent c809f67a
......@@ -4,11 +4,11 @@ This respository was created in order to make available the HOTMapper, a tool th
## Data ##
The dataset "Matrícula" can be found at the link: [INEP](http://portal.inep.gov.br/web/guest/microdados) in the section "Censo Escolar".
The original open data set can be found at the link: [INEP](http://portal.inep.gov.br/web/guest/microdados) in the section "Censo Escolar" and "Censo da Educação Superior".
The dataset "Local Oferta" can be found in the same link, but at the section "Censo da Educação Superior". Additionaly for increase the convenience, all data from "Local Oferta" is in the directory open_data.
Additionaly for increase the convenience, all data from "Local Oferta" is in the directory open_data.
**NOTE**: It's important that you ta
**NOTE**: It's important that you verify ith there is a column identifying the year of the dataset;
## Requirements ##
......@@ -74,7 +74,7 @@ $ .\env\Scripts/activate
$ pip install -r requirements.txt
```
## Interface de linha de comando ##
## Command Line Interface ##
The CLI (Command Line Interface) uses the standart of the manage.py package, which means that to invoke a command you should use the following pattern:
......@@ -131,6 +131,12 @@ You should use this command everytime a mapping protocol is updated.
The remap allows the creation of new columns, the drop of existent columns, the renaming of columns and the change of type of columns. Be aware that the bigger the table the bigger the usegae of RAM memory.
* update_from_file: Update the data in the table
```bash
$ python manage.py update_from_file <csv_file> <table_name> <year> [--columns="column_name1","column_name2"] [--sep=separator]
```
* generate_pairing_report: generate reports to compare data from diferent years.
```bash
......@@ -144,4 +150,110 @@ The reports will be created in the folder "pairing"
```bash
$ python manage.py generate_backup
```
\ No newline at end of file
```
## Demo scenarios ##
In this Section we will explain how to execute the demo. Demo scenario 1 uses the dataset "local oferta", which is included in the directory open_data. Demo scenario 2 uses the dataset "matricula" which can be downloaded from the [INEP's Link ](http://portal.inep.gov.br/web/guest/microdados) in the section "Censo Escolar".
In both scnearios, we assume that you started the virtual environment as explained in Section `Installation - 5`
### Demo scenario 1: ###
This section contains the commands used in the scenario 1, which is the creation of a new data source and the inclusion of the corresponding data.
1) First we need to create the database, to do so execute the following command:
```bash
$ ./manage.py create localoferta_ens_superior
```
2) Now, as we already have the mapping protocol, we need to insert the open data in the data base. To do it we must execute the following commands:
**NOTE:** FILEPATH is the **_full path_** for the directory where the open data table is, for example (in a Linux environment): `/home/c3sl/HOTMapper/open_data/DM_LOCAL_OFERTA_2010`
a) To insert 2010:
```bash
$ ./manage.py insert FILEPATH/DM_LOCAL_OFERTA_2010.CSV localoferta_ens_superior 2010 --sep="|"
```
b) To insert 2011:
```bash
$ ./manage.py insert FILEPATH/DM_LOCAL_OFERTA_2011.CSV localoferta_ens_superior 2011 --sep="|"
```
c) To insert 2012:
```bash
$ ./manage.py insert FILEPATH/DM_LOCAL_OFERTA_2012.CSV localoferta_ens_superior 2012 --sep="|"
```
d) To insert 2013:
```bash
$ ./manage.py insert FILEPATH/DM_LOCAL_OFERTA_2013.CSV localoferta_ens_superior 2013 --sep="|"
```
e) To insert 2014:
```bash
$ ./manage.py insert FILEPATH/DM_LOCAL_OFERTA_2014.CSV localoferta_ens_superior 2014 --sep="|"
```
f) To insert 2015:
```bash
$ ./manage.py insert FILEPATH/DM_LOCAL_OFERTA_2015.CSV localoferta_ens_superior 2015 --sep="|"
```
g) To insert 2016:
```bash
$ ./manage.py insert FILEPATH/DM_LOCAL_OFERTA_2016.CSV localoferta_ens_superior 2016 --sep="|"
```
### Demo scenario 2: ###
This section contains the commands used in the scenario 2, which is the update of an existing data source.
1) First we need to create the database, to do so execute the following command:
```bash
$ ./manage.py create localoferta_ens_superior
```
2) Now, as we already have the mapping protocol, we need to insert the open data in the data base. To do it we must execute the following commands:
**NOTE:** FILEPATH is the **_full path_** for the directory where the open data table is, for example (in a Linux environment): `/home/c3sl/HOTMapper/open_data/DM_LOCAL_OFERTA_2010`
a) To insert 2013:
```bash
$ ./manage.py insert FILEPATH/DM_LOCAL_OFERTA_2013.CSV localoferta_ens_superior 2013 --sep="|"
```
b) To insert 2014:
```bash
$ ./manage.py insert FILEPATH/DM_LOCAL_OFERTA_2014.CSV localoferta_ens_superior 2014 --sep="|"
```
c) To insert 2015:
```bash
$ ./manage.py insert FILEPATH/DM_LOCAL_OFERTA_2015.CSV localoferta_ens_superior 2015 --sep="|"
```
d) To insert 2016:
```bash
$ ./manage.py insert FILEPATH/DM_LOCAL_OFERTA_2016.CSV localoferta_ens_superior 2016 --sep="|"
```
3) Change the matricula's mapping protocol. You can use the `matricula_remap.csv` (To do so, rename the current `matricula.csv` to something else and the `matricula_remap.csv` to `matricula.csv`). In that case, the only column that will change is the "fundamental_af" in the year 2013.
4) Run the remap command
```bash
$ ./manage.py remap matricula
```
The above command will update the table `Fonte` and the schema from the table matricula
5) Update the table
```bash
$ ./manage.py update_from_file /FILEPATH/2013_MATRICULA.csv matricula 2013 --columns="fundamental_af" --sep="|"
```
The above command will update the data in the table matricula
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment