Ground Motion Ground Truth is a collection of datasets of Ground motion time histories and metadata specifically created for Big data and machine learning applications.
In this README you will find how to access and work with the data in Python. Other references (also mentioned throughout the README) include:
For any question / problem / enhancement please open a new Issue (see "Issues" on top of this web page).
In the following, [domain] refers to gfz.de
and [port] to 54646 (both redacted for basic security hygiene).
The datasets have been created with a public Python project hosted on GitHub. Please refer to that project if you are supplying new source data to be harmonized and shipped as gmgt dataset.
Data is hosted on a private server at GFZ (section 2.6). To access the data, you must use SSH (Secure Shell Protocol), i.e., you must generate a pair of private and public keys, and send to us the public key only via email.
Open the terminal and generate your ssh key (replace your email below)
ssh-keygen -t ed25519 -C "your_email@example.com"and follow instructions (if you are asked about a passphrase,
you can leave it empty and continue). If you now type
ls -l $HOME/.ssh (usual directory with ssh keys, or wherever you
decided to save the key into) you should see something like this:
ls -l $HOME/.ssh/
drwx------ 6 user staff 192 Jan 19 16:02 .
drwxr-x---+ 31 user staff 992 Jan 20 17:44 ..
-rw-r--r-- 1 user staff 94 Jan 19 16:02 id_ed25519.pub
-rw------- 1 user staff 399 Jan 19 16:02 id_ed25519
... (other files not shown) ...
Take the file with .pub extension (e.g., id_ed25519.pub.
do not share with anybody the same file without extension:
it's the PRIVATE key)
and send it to lemgo@[domain] (rizac@[domain] in CC)
asking for access to the casco server as ethz user
In you terminal, cd to the directory you want to save the
files from the GFZ server (recommended name gmgt/datasets) and type:
scp -P [port] -r ethz@casco.[domain]:/home/ethz/gmgt/datasets/*.hdf .This will create a datasets directory with all files.
The process will take several minutes (in the order of tens of minutes,
not hours).
If you want to copy a specific dataset only
(it could be necessary in the validation phase, where we
will likely recreate the datasets from scratch), type:
scp -P [port] ethz@casco.[domain]:/home/ethz/gmgt/datasets/esm.hdf .
scp -P [port] ethz@casco.[domain]:/home/ethz/gmgt/datasets/kik.hdf .
scp -P [port] ethz@casco.[domain]:/home/ethz/gmgt/datasets/knet.hdf .
scp -P [port] ethz@casco.[domain]:/home/ethz/gmgt/datasets/ngawest2.hdf .
The datasets directory will contain the following files:
ngawest2.hdf
esm.hdf
kik.hdf
knet.hdf
where each hdf file denotes a gmgt dataset, composed of
time histories (accelerometers in m/sˆ2) and relative
metadata
all in a single hdf file.
Hint: For processing large datasets, we recommend executing Python modules as scripts instead of Jupyter notebooks, which are better suited for illustrative examples and exploratory analysis; running heavy computations in a script is more efficient
-
Clone the repository
git clone https://github.com/rizac/gmgt.git cd gmgt -
If you already have your Python virtual environment and setup, you can copy the file
gmgt.pyin your Python module, or even its content directly in your code. This is a very "quick and dirty" approach: it's fast, but you need to be sure that all requirements are already installed. -
Otherwise, you can create a new fresh virtual env (it can be done inside the
gmgtcloned directory for instance):python3 -m venv .env # create a venv. Please use Ptyhon 3.11+ source .env/bin/activate # Linux/macOS # .\env\Scripts\activate # Windows PowerShell (not tested)and then install this package (from within the gmgt directory):
pip install -e .Then you can start coding (Jupyter, Python module) after activating the virtual environment each time (type
deactivateto deactivate the ven). In your code, you just have to import:from gmgt import get_records
For illustrative purposes (or if you really want to stick to Notebooks to process the data) we provided also a Python notebook