diff --git a/README.md b/README.md index 09a4bda..3888c1a 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,8 @@ Our model hasn't been fine-tuned through reinforcement learning from human feedb Phi-2 was integrated in `transformers` version 4.37. If you need to use an earlier version, you need to pass `trust_remote_code=True` to the `from_pretrained()` function. +Phi-2 is known for having an attention overflow issue (with FP16). If you are facing this issue, please enable/disable autocast on the [PhiAttention.forward()](https://huggingface.co/microsoft/phi-2/blob/main/modeling_phi.py#L306) function. + ## Intended Uses Given the nature of the training data, the Phi-2 model is best suited for prompts using the QA format, the chat format, and the code format.