From 39d30708ec5e1f53ade8516e3ef5298f18e9b2d5 Mon Sep 17 00:00:00 2001
From: Tom Selier <tbj.selier@gmail.com>
Date: Sun, 22 Oct 2023 19:07:57 +0200
Subject: [PATCH] added dataset splitter to readme

---
 README.md | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index a890566..2cad4e2 100644
--- a/README.md
+++ b/README.md
@@ -113,7 +113,7 @@ $ python ./src/helpers/test/decision_tree.py -i ./out/result-(date/time).csv -o
 ### Template extraction
 > :warning: **Please note:** <br>
 > This tool uses the legacy format for datasets.<br>
-> Images are sorted using folders, instead of by name
+> Images are sorted using folders, instead of by name.
 
 1. Images should have four standard Aruco markers clearly visible
 2. Run the template extraction tool with an input directory as argument
@@ -122,6 +122,18 @@ $ python ./src/experiments/template_extraction/script.py ./dataset
 ```
 3. The script generates new folders, ending with `_out`
 4. The paths to any failed images are saved in `skipped.txt`
+
+### Dataset splitting
+1. Ensure that the dataset is in `./res/dataset`
+2. Run the dataset splitter tool:
+```sh
+$ python ./src/experiments/dataset.py
+```
+3. Three new folders will be created, containing the following percentage of images:
+    - `./res/dataset/training`, 70%
+    - `./res/dataset/validation`, 20%
+    - `./res/dataset/training`, 10%
+4. Images are split pseudorandomly, thus will create the same datasets on different machines.
 ---
 
 Arne van Iterson<br>