From 2f66b4ad412aec7a85f40a6aa12a30853640ac91 Mon Sep 17 00:00:00 2001
From: Vaibhav Srivastav <vaibhavs10@gmail.com>
Date: Fri, 28 Apr 2023 17:00:44 +0200
Subject: [PATCH] add model card

---
 README.md | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)
 create mode 100644 README.md

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..44ea705
--- /dev/null
+++ b/README.md
@@ -0,0 +1,41 @@
+# Model Card: Bark
+
+This is the official codebase for running the text to audio model, from Suno.ai.
+
+The following is additional information about the models released here. 
+
+## Model Details
+
+Bark is a series of three transformer models that turn text into audio.
+
+### Text to semantic tokens
+ - Input: text, tokenized with [BERT tokenizer from Hugging Face](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer)
+ - Output: semantic tokens that encode the audio to be generated
+
+### Semantic to coarse tokens
+ - Input: semantic tokens
+ - Output: tokens from the first two codebooks of the [EnCodec Codec](https://github.com/facebookresearch/encodec) from facebook
+
+### Coarse to fine tokens
+ - Input: the first two codebooks from EnCodec
+ - Output: 8 codebooks from EnCodec
+
+### Architecture
+|           Model           | Parameters | Attention  | Output Vocab size |  
+|:-------------------------:|:----------:|------------|:-----------------:|
+|  Text to semantic tokens  |    80 M    | Causal     |       10,000      |
+| Semantic to coarse tokens |    80 M    | Causal     |     2x 1,024      |
+|   Coarse to fine tokens   |    80 M    | Non-causal |     6x 1,024      |
+
+
+### Release date
+April 2023
+
+## Broader Implications
+We anticipate that this model's text to audio capabilities can be used to improve accessbility tools in a variety of languages. 
+Straightforward improvements will allow models to run faster than realtime, rendering them useful for applications such as virtual assistants. 
+ 
+While we hope that this release will enable users to express their creativity and build applications that are a force
+for good, we acknowledge that any text to audio model has the potential for dual use. While it is not straightforward
+to voice clone known people with Bark, they can still be used for nefarious purposes. To further reduce the chances of unintended use of Bark, 
+we also release a simple classifier to detect Bark-generated audio with high accuracy (see notebooks section of the main repository). 
\ No newline at end of file