Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • HOTMapper HOTMapper
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 25
    • Issues 25
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • HOTMapper
  • HOTMapperHOTMapper
  • Wiki
  • Tutorial

Tutorial · Changes

Page history
Renamed tutorial to /Tutorial authored Sep 19, 2019 by jvfpw18's avatar jvfpw18
Hide whitespace changes
Inline Side-by-side
Tutorial.md 0 → 100644
View page @ 422978d3
# HOTMapper Tutorial #WIP
Here you'll learn how to set your environment, create your table, and insert data into it.
## Summary
* [1. Setting your environment](#1-setting-your-environment)
* [2. Adjusting your HOTMapper Settings](#2-adjusting-your-hotmapper-settings)
* [3. Creating your table definition](#3-creating-your-table-definition)
* [4. Creating your mapping protocol](#4-creating-your-mapping-protocol)
* [5. Creating the table in the database](#5-creating-the-table-in-the-database)
* [6. Inserting the data into the table](#6-inserting-the-data-into-the-table)
* [7. Creating, removing, renaming columns](#7-creating-removing-renaming-columns)
* [8. Updating the table data](#8-updating-the-table-data)
## 1. Setting your environment
In this section we'll look at how to get MonetDB working with hotmapper.
#### Important:
The MonetDB Apr2019 (11.33.3) version should be avoided, because it has a bug when inserting into a table with multiple constraints. It's already fixed in their nightly build so you should be fine using any superior version or the August2018 one.
#### Getting MonetDB
You can get your MonetDB installation following [their instructions](https://www.monetdb.org/Downloads).
Be sure to create and start your farm, if you didn't already, using:
~~~Bash
$ monetdbd create /path/to/my-dbfarm
$ monetdbd start /path/to/my-dbfarm
~~~
If you didn't set a systemctl command for your farm remember to start it every time you reboot your OS.
To create a new database, and remove it from maintenance mode(to avoid a very high memory usage):
~~~Bash
$ monetdb create hotmapper-demo
$ monetdb release hotmapper-demo
~~~
#### Getting HOTMapper dependencies
Let's create a new virtual environment and install the requirements.
Be sure that you have installed pip3 and python 3.5+.
Install python3-venv, if you don't have it, using the command bellow.
~~~Bash
$ sudo apt-get install python3-venv
~~~
Inside the hotmapper folder execute the following commands:
~~~Bash
$ python3 -m venv env
$ source env/bin/activate
$ pip install -r requirements.txt
~~~
## 2. Adjusting your HOTMapper Settings
Now that you have a working monetdb database it's needed to verify the HOTMapper settings to be sure it'll work with it.
First, using your favourite editor, open the file settings.py contained inside your HOTMapper folder.
Set the DATABASE variable to yours.
DATABASE = 'hotmapper_demo'
Verify the locations of your mapping protocols folder and table definitions. If you cloned the master repository it
should already be correct and the folders already exist.
MAPPING_PROTOCOLS_FOLDER = 'mapping_protocols'
TABLE_DEFINITIONS_FOLDER = 'table_definitions'
## 3. Creating your table definition
The HOTMapper tool uses two files to create a table, a json inside the table_definitions folder and a csv present on the
mapping_protocols folder. Both should have as name of the file: "table name + extension". Ex: table_test.json,
table_test.csv.
The table_definitions json has the objective of storing the table constraints and information about the source of the
table and it's description.
Let's create a json for a table_test:
* Go to the table_definitions folder and create a file named table_test.json.
* Open the file with your favourite editor and create the base object as shown bellow.
~~~json
{
"pairing_description": "A string with the description of what this table stores.",
"data_source": "A string with the source of the data.",
"pk": ["A list with the primary key columns."],
"foreign_keys": ["A list with the foreign key columns."]
}
~~~
* Let's fill the json with the information of table_test:
~~~json
{
"pairing_description": "Table with the graduation rates, which represent a percentage of students expected to graduate, used to learn the HOTMapper Tool",
"data_source": "Open data made by Sustainable Development Solutions Network",
"pk": ["cod_country", "year", "gender", "isced"],
"foreign_keys": []
}
~~~
## 4. Creating your mapping protocol
The map protocol file is used to store the information about the table columns and to make the link between the data
from the csv and the database. This file is stored inside the mapping_protocols folder and uses as name the "table name
\+ .csv"
This csv must contain the following columns:
* Var. Lab -- An unique identifier for the column, this represents a specific column for the HOTMapper and shouldn't
be changed after table creation.
* Novo Rótulo -- A description of this column.
* Nome Banco -- The name of the column in the database.
* Tipo de Dado -- Type of the column in the database.
* \*[YEAR] - Eg.: 2010, 2011, ... -- A temporal identifier to help insertion of data from various years, which can have a different mapping
from the standard one. It has the name of the column in the data csv.
Let's create a mapping protocol for the table_test:
* First create a file named table_test.csv inside the mapping_protocols folder.
* Now, in the row number 1, create the following headers for the columns A-G: Var.Lab, Novo Rótulo, Nome Banco,
Tipo de Dado, 2016.
* If you have any doubt, inside your mapping_protocols folder should have a file named empty_map_protocol.csv
which you can use as a model.
For this example, we'll insert data with the graduation rates as percentage of students expected to graduate, the file
is EAD_GRAD_RATES_2016.csv which is inside the folder: hotmapper/open_data/.
Data taken from the Organisation for Economic Co-operation and Development. (https://stats.oecd.org/Index.aspx?datasetcode=EAG_GRAD_ENTR_RATES#) accessed in 2019-08-26
Let's create the columns.
* First, let's fill the protocol for the cod_country.
* In Var.Lab we'll use as an unique identifier: CODECOUNTRY
* Novo Rótulo: Code of the country
* Nome Banco: cod_country
* Tipo de Dado: VARCHAR(3)
* 2016: COUNTRY
* Now let's follow a similiar formula for the columns: Country, Year, Value, ISC11_LEVEL_CAT, SEX. You should have
a protocol similar with the one bellow.
![hmp1](uploads/c2c7eee2748e3f64bda569e379bf5c57/hmp1.png)
* Now, lets create a column called sex_id where we'll store a TINYINT representing the sex. 1 = Female, 2 = Male.
* For that we'll use a case. For the hotmapper recognize that we wanna make an derivative we must start the mapping
column (2016) if a "~" (without quotes).
* So, in the 2016 column for sex_id the protocol will be: ~CASE WHEN ("SEX" = 'F') THEN 0 ELSE 1 END
![hmp2](uploads/b7a4d36d4ed6b7e757a835f56b4423c7/hmp2.png)
## 5. Creating the table in the database
To create the table in your database we'll execute the hotmapper command 'create'. In the following way:
~~~bash
$ ./manage.py create <table_name>
~~~
So, for the table test:
~~~bash
$ ./manage.py create table_test
~~~
## 6. Inserting the data into the table
To insert the csv data into the table, we'll use the hotmapper command 'insert' in the following way:
~~~bash
$ ./manage.py insert <file_location> <table_name> <year> --sep=<optional, csv separator>
~~~
For the table test:
~~~bash
$ ./manage.py insert ~/hotmapper/open_data/EAG_GRAD_RATES_2016.csv table_test 2016 --sep="|"
~~~
## 7. Creating, removing, renaming columns
* To create a new column, first add it to the protocol the same way as the others.
* To remove a column, first delete it from the protocol.
* To rename a column, change the "Nome Banco" (name of the column) in the protocol.
After that, execute the hotmapper command 'remap' in the following way:
~~~bash
$ ./manage.py remap <table_name>
~~~
For the table test:
~~~bash
$ ./manage.py create table_test
~~~
Now you'll have to update the data in the table
## 8. Updating the table data
If your data csv suffered any modifications or you changed the table by adding, removing or renaming a column, you'll
have to update the data of the table. You'll can do that using the hotmapper command 'update_from_file' in a similar way
to the insert command:
~~~bash
$ ./manage.py update_from_file <file_location> <table_name> <year> --sep=<optional, csv separator>
~~~
~~~bash
$ ./manage.py update_from_file ~/hotmapper/open_data/EAG_GRAD_RATES_2016.csv table_test 2016 --sep="|"
~~~
\ No newline at end of file
Clone repository
  • Commands
  • Home
  • Installing
  • Known Bugs
  • Mapping Protocols
  • MonetDB
  • Settings
  • Table Definitions
  • Tutorial