add tokenizer

This commit is contained in:
ankur310794 2022-01-04 06:29:28 +00:00
parent d39f4f8a18
commit d71cdce5b4
5 changed files with 50005 additions and 0 deletions

50001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

1
special_tokens_map.json Normal file

@ -0,0 +1 @@
{"bos_token": "<|endoftext|>", "eos_token": "<|endoftext|>", "unk_token": "<|endoftext|>", "pad_token": "<|endoftext|>"}

1
tokenizer.json Normal file

File diff suppressed because one or more lines are too long

1
tokenizer_config.json Normal file

@ -0,0 +1 @@
{"unk_token": "<|endoftext|>", "bos_token": "<|endoftext|>", "eos_token": "<|endoftext|>", "add_prefix_space": false, "model_max_length": 1024, "special_tokens_map_file": null, "name_or_path": "./models/", "tokenizer_class": "GPT2Tokenizer"}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long