diff --git a/README.md b/README.md
index 370882c..fa1cc54 100644
--- a/README.md
+++ b/README.md
@@ -134,18 +134,18 @@ Please refer to our [paper](https://www.ykilcher.com/OA_Paper_2023_04_15.pdf) fo
 
 ### Dataset Structure
 
-This dataset contains message trees which each have an inital prompt message as root which can have
-multiple child messages as replies which itself again can have multiple replies. 
+This dataset contains message trees. Each message tree has an initial prompt message as the root node, 
+which can have multiple child messages as replies, and these child messages can have multiple replies. 
 
-All messages have a role property which can either be "assistant" or "prompter". The roles in 
-conversation threads from prompt to leaf node are stricly alternating between "prompter" and "assistant".
+All messages have a role property: this can either be "assistant" or "prompter". The roles in 
+conversation threads from prompt to leaf node strictly alternate between "prompter" and "assistant".
 
-This version of the dataset contains data collected on the [open-assistant.io](https://www.open-assistant.io/) website until April, 12 2023.
+This version of the dataset contains data collected on the [open-assistant.io](https://www.open-assistant.io/) website until April 12 2023.
 
 ### JSON Example: Message
 
-For readability the following JSON examples are shown formatted with indentation on multiple lines.
-Objects are stored without indentation on a single lines in the actual jsonl files.
+For readability, the following JSON examples are shown formatted with indentation on multiple lines.
+Objects are stored without indentation (on single lines) in the actual jsonl files.
 
 ```json
 {
@@ -179,7 +179,7 @@ Objects are stored without indentation on a single lines in the actual jsonl fil
 
 ### JSON Example: Conversation Tree
 
-For readability only a subset of the message properties is shown here.
+For readability, only a subset of the message properties is shown here.
 
 ```json
 {
@@ -236,7 +236,7 @@ details about the data structure and Python code to read and write jsonl files c
 ## Main Dataset Files
 
 Conversation data is provided either as nested messages in trees (extension `.trees.jsonl.gz`) 
-or as flat list (table) of messages (extension `.messages.jsonl.gz`).
+or as a flat list (table) of messages (extension `.messages.jsonl.gz`).
 
 ### Ready For Export Trees
 
@@ -245,7 +245,7 @@ or as flat list (table) of messages (extension `.messages.jsonl.gz`).
 2023-04-12_oasst_ready.messages.jsonl.gz    88,838 messages
 ```
 Trees in `ready_for_export` state without spam and deleted messages including message labels.
-The oasst_ready-trees file is normally sufficient for supervised fine-tuning (SFT) & reward model (RM) training.
+The oasst_ready-trees file usually is sufficient for supervised fine-tuning (SFT) & reward model (RM) training.
 
 
 ### All Trees
@@ -254,7 +254,7 @@ The oasst_ready-trees file is normally sufficient for supervised fine-tuning (SF
 2023-04-12_oasst_all.trees.jsonl.gz         66,497 trees with 161,443 total messages
 2023-04-12_oasst_all.messages.jsonl.gz     161,443 messages
 ```
-All trees including those in states `prompt_lottery_waiting` (trees that consist of only one message, namely the inital prompt),
+All trees, including those in states `prompt_lottery_waiting` (trees that consist of only one message, namely the initial prompt),
 `aborted_low_grade` (trees that stopped growing because the messages had low quality), and `halted_by_moderator`.
 
 
@@ -263,19 +263,19 @@ All trees including those in states `prompt_lottery_waiting` (trees that consist
 ```
 2023-04-12_oasst_spam.messages.jsonl.gz
 ```
-Messages which were deleted or have a negative review result (`"review_result": false`).
-Beside low quality a frequent reason for message deletion is a wrong language tag.
+These are messages which were deleted or have a negative review result (`"review_result": false`).
+Besides low quality, a frequent reason for message deletion is a wrong language tag.
 
 ```
 2023-04-12_oasst_prompts.messages.jsonl.gz
 ```
-All non-deleted initial prompt messages with positile spam review result of trees in `ready_for_export` or `prompt_lottery_waiting` state.
+These are all the kept initial prompt messages with positive spam review result of trees in `ready_for_export` or `prompt_lottery_waiting` state.
 
 ### Using the Huggingface Datasets
 
-While HF datasets is ideal for tabular datasets it is not a natuaral fit for nested data structures like the OpenAssistant conversation trees.
-Nevertheless we make all messages which can alse be found in the file `2023-04-12_oasst_ready.trees.jsonl.gz` available as parquet train/validation 
-split which is directly loadable by the [Huggingface Datasets](https://pypi.org/project/datasets/).
+While HF datasets is ideal for tabular datasets, it is not a natural fit for nested data structures like the OpenAssistant conversation trees.
+Nevertheless, we make all messages which can also be found in the file `2023-04-12_oasst_ready.trees.jsonl.gz` available in parquet as train/validation splits. 
+These are directly loadable by [Huggingface Datasets](https://pypi.org/project/datasets/).
 
 To load the oasst1 train & validation splits use:
 
@@ -290,8 +290,7 @@ The messages appear in depth-first order of the message trees.
 
 Full conversation trees can be reconstructed from the flat messages table by using the `parent_id` 
 and `message_id` properties to identify the parent-child relationship of messages. The `message_tree_id` 
-and `tree_state` properties (only present in flat messages files) can be used to find all
-all messages of a message tree or to select trees by their state.
+and `tree_state` properties (only present in flat messages files) can be used to find all messages of a message tree or to select trees by their state.
 
 ### Languages