|
|
|
# HOTMapper Tutorial #WIP
|
|
|
|
Here you'll learn how to set your environment, create your table, and insert data into it.
|
|
|
|
|
|
|
|
## Summary
|
|
|
|
* [1. Setting your environment](#1-setting-your-environment)
|
|
|
|
* [2. Adjusting your HOTMapper Settings](#2-adjusting-your-hotmapper-settings)
|
|
|
|
* [3. Creating your table definition](#3-creating-your-table-definition)
|
|
|
|
* [4. Creating your mapping protocol](#4-creating-your-mapping-protocol)
|
|
|
|
* [5. Creating the table in the database](#5-creating-the-table-in-the-database)
|
|
|
|
* [6. Inserting the data into the table](#6-inserting-the-data-into-the-table)
|
|
|
|
* [7. Creating, removing, renaming columns](#7-creating-removing-renaming-columns)
|
|
|
|
* [8. Updating the table data](#8-updating-the-table-data)
|
|
|
|
|
|
|
|
## 1. Setting your environment
|
|
|
|
In this section we'll look at how to get MonetDB working with hotmapper.
|
|
|
|
#### Important:
|
|
|
|
The MonetDB Apr2019 (11.33.3) version should be avoided, because it has a bug when inserting into a table with multiple constraints. It's already fixed in their nightly build so you should be fine using any superior version or the August2018 one.
|
|
|
|
|
|
|
|
#### Getting MonetDB
|
|
|
|
You can get your MonetDB installation following [their instructions](https://www.monetdb.org/Downloads).
|
|
|
|
|
|
|
|
Be sure to create and start your farm, if you didn't already, using:
|
|
|
|
~~~Bash
|
|
|
|
$ monetdbd create /path/to/my-dbfarm
|
|
|
|
$ monetdbd start /path/to/my-dbfarm
|
|
|
|
~~~
|
|
|
|
If you didn't set a systemctl command for your farm remember to start it every time you reboot your OS.
|
|
|
|
|
|
|
|
To create a new database, and remove it from maintenance mode(to avoid a very high memory usage):
|
|
|
|
~~~Bash
|
|
|
|
$ monetdb create hotmapper-demo
|
|
|
|
$ monetdb release hotmapper-demo
|
|
|
|
~~~
|
|
|
|
|
|
|
|
#### Getting HOTMapper dependencies
|
|
|
|
Let's create a new virtual environment and install the requirements.
|
|
|
|
|
|
|
|
Be sure that you have installed pip3 and python 3.5+.
|
|
|
|
|
|
|
|
Install python3-venv, if you don't have it, using the command bellow.
|
|
|
|
~~~Bash
|
|
|
|
$ sudo apt-get install python3-venv
|
|
|
|
~~~
|
|
|
|
|
|
|
|
Inside the hotmapper folder execute the following commands:
|
|
|
|
~~~Bash
|
|
|
|
$ python3 -m venv env
|
|
|
|
$ source env/bin/activate
|
|
|
|
$ pip install -r requirements.txt
|
|
|
|
~~~
|
|
|
|
|
|
|
|
## 2. Adjusting your HOTMapper Settings
|
|
|
|
Now that you have a working monetdb database it's needed to verify the HOTMapper settings to be sure it'll work with it.
|
|
|
|
|
|
|
|
First, using your favourite editor, open the file settings.py contained inside your HOTMapper folder.
|
|
|
|
|
|
|
|
Set the DATABASE variable to yours.
|
|
|
|
|
|
|
|
DATABASE = 'hotmapper_demo'
|
|
|
|
|
|
|
|
Verify the locations of your mapping protocols folder and table definitions. If you cloned the master repository it
|
|
|
|
should already be correct and the folders already exist.
|
|
|
|
|
|
|
|
MAPPING_PROTOCOLS_FOLDER = 'mapping_protocols'
|
|
|
|
TABLE_DEFINITIONS_FOLDER = 'table_definitions'
|
|
|
|
|
|
|
|
## 3. Creating your table definition
|
|
|
|
The HOTMapper tool uses two files to create a table, a json inside the table_definitions folder and a csv present on the
|
|
|
|
mapping_protocols folder. Both should have as name of the file: "table name + extension". Ex: table_test.json,
|
|
|
|
table_test.csv.
|
|
|
|
|
|
|
|
The table_definitions json has the objective of storing the table constraints and information about the source of the
|
|
|
|
table and it's description.
|
|
|
|
|
|
|
|
Let's create a json for a table_test:
|
|
|
|
* Go to the table_definitions folder and create a file named table_test.json.
|
|
|
|
* Open the file with your favourite editor and create the base object as shown bellow.
|
|
|
|
~~~json
|
|
|
|
{
|
|
|
|
"pairing_description": "A string with the description of what this table stores.",
|
|
|
|
"data_source": "A string with the source of the data.",
|
|
|
|
"pk": ["A list with the primary key columns."],
|
|
|
|
"foreign_keys": ["A list with the foreign key columns."]
|
|
|
|
}
|
|
|
|
~~~
|
|
|
|
* Let's fill the json with the information of table_test:
|
|
|
|
~~~json
|
|
|
|
{
|
|
|
|
"pairing_description": "Table with the graduation rates, which represent a percentage of students expected to graduate, used to learn the HOTMapper Tool",
|
|
|
|
"data_source": "Open data made by Sustainable Development Solutions Network",
|
|
|
|
"pk": ["cod_country", "year", "gender", "isced"],
|
|
|
|
"foreign_keys": []
|
|
|
|
}
|
|
|
|
~~~
|
|
|
|
|
|
|
|
## 4. Creating your mapping protocol
|
|
|
|
The map protocol file is used to store the information about the table columns and to make the link between the data
|
|
|
|
from the csv and the database. This file is stored inside the mapping_protocols folder and uses as name the "table name
|
|
|
|
\+ .csv"
|
|
|
|
|
|
|
|
This csv must contain the following columns:
|
|
|
|
* Var. Lab -- An unique identifier for the column, this represents a specific column for the HOTMapper and shouldn't
|
|
|
|
be changed after table creation.
|
|
|
|
* Novo Rótulo -- A description of this column.
|
|
|
|
* Nome Banco -- The name of the column in the database.
|
|
|
|
* Tipo de Dado -- Type of the column in the database.
|
|
|
|
* \*[YEAR] - Eg.: 2010, 2011, ... -- A temporal identifier to help insertion of data from various years, which can have a different mapping
|
|
|
|
from the standard one. It has the name of the column in the data csv.
|
|
|
|
|
|
|
|
Let's create a mapping protocol for the table_test:
|
|
|
|
|
|
|
|
* First create a file named table_test.csv inside the mapping_protocols folder.
|
|
|
|
* Now, in the row number 1, create the following headers for the columns A-G: Var.Lab, Novo Rótulo, Nome Banco,
|
|
|
|
Tipo de Dado, 2016.
|
|
|
|
* If you have any doubt, inside your mapping_protocols folder should have a file named empty_map_protocol.csv
|
|
|
|
which you can use as a model.
|
|
|
|
|
|
|
|
For this example, we'll insert data with the graduation rates as percentage of students expected to graduate, the file
|
|
|
|
is EAD_GRAD_RATES_2016.csv which is inside the folder: hotmapper/open_data/.
|
|
|
|
|
|
|
|
Data taken from the Organisation for Economic Co-operation and Development. (https://stats.oecd.org/Index.aspx?datasetcode=EAG_GRAD_ENTR_RATES#) accessed in 2019-08-26
|
|
|
|
|
|
|
|
Let's create the columns.
|
|
|
|
|
|
|
|
* First, let's fill the protocol for the cod_country.
|
|
|
|
* In Var.Lab we'll use as an unique identifier: CODECOUNTRY
|
|
|
|
* Novo Rótulo: Code of the country
|
|
|
|
* Nome Banco: cod_country
|
|
|
|
* Tipo de Dado: VARCHAR(3)
|
|
|
|
* 2016: COUNTRY
|
|
|
|
* Now let's follow a similiar formula for the columns: Country, Year, Value, ISC11_LEVEL_CAT, SEX. You should have
|
|
|
|
a protocol similar with the one bellow.
|
|
|
|
![hmp1](uploads/c2c7eee2748e3f64bda569e379bf5c57/hmp1.png)
|
|
|
|
* Now, lets create a column called sex_id where we'll store a TINYINT representing the sex. 1 = Female, 2 = Male.
|
|
|
|
* For that we'll use a case. For the hotmapper recognize that we wanna make an derivative we must start the mapping
|
|
|
|
column (2016) if a "~" (without quotes).
|
|
|
|
* So, in the 2016 column for sex_id the protocol will be: ~CASE WHEN ("SEX" = 'F') THEN 0 ELSE 1 END
|
|
|
|
![hmp2](uploads/b7a4d36d4ed6b7e757a835f56b4423c7/hmp2.png)
|
|
|
|
|
|
|
|
## 5. Creating the table in the database
|
|
|
|
|
|
|
|
To create the table in your database we'll execute the hotmapper command 'create'. In the following way:
|
|
|
|
~~~bash
|
|
|
|
$ ./manage.py create <table_name>
|
|
|
|
~~~
|
|
|
|
So, for the table test:
|
|
|
|
~~~bash
|
|
|
|
$ ./manage.py create table_test
|
|
|
|
~~~
|
|
|
|
|
|
|
|
## 6. Inserting the data into the table
|
|
|
|
|
|
|
|
To insert the csv data into the table, we'll use the hotmapper command 'insert' in the following way:
|
|
|
|
~~~bash
|
|
|
|
$ ./manage.py insert <file_location> <table_name> <year> --sep=<optional, csv separator>
|
|
|
|
~~~
|
|
|
|
For the table test:
|
|
|
|
~~~bash
|
|
|
|
$ ./manage.py insert ~/hotmapper/open_data/EAG_GRAD_RATES_2016.csv table_test 2016 --sep="|"
|
|
|
|
~~~
|
|
|
|
|
|
|
|
## 7. Creating, removing, renaming columns
|
|
|
|
|
|
|
|
* To create a new column, first add it to the protocol the same way as the others.
|
|
|
|
* To remove a column, first delete it from the protocol.
|
|
|
|
* To rename a column, change the "Nome Banco" (name of the column) in the protocol.
|
|
|
|
|
|
|
|
After that, execute the hotmapper command 'remap' in the following way:
|
|
|
|
~~~bash
|
|
|
|
$ ./manage.py remap <table_name>
|
|
|
|
~~~
|
|
|
|
For the table test:
|
|
|
|
~~~bash
|
|
|
|
$ ./manage.py create table_test
|
|
|
|
~~~
|
|
|
|
|
|
|
|
Now you'll have to update the data in the table
|
|
|
|
|
|
|
|
## 8. Updating the table data
|
|
|
|
|
|
|
|
If your data csv suffered any modifications or you changed the table by adding, removing or renaming a column, you'll
|
|
|
|
have to update the data of the table. You'll can do that using the hotmapper command 'update_from_file' in a similar way
|
|
|
|
to the insert command:
|
|
|
|
~~~bash
|
|
|
|
$ ./manage.py update_from_file <file_location> <table_name> <year> --sep=<optional, csv separator>
|
|
|
|
~~~
|
|
|
|
~~~bash
|
|
|
|
$ ./manage.py update_from_file ~/hotmapper/open_data/EAG_GRAD_RATES_2016.csv table_test 2016 --sep="|"
|
|
|
|
~~~ |
|
|
|
\ No newline at end of file |