What do you do when your input text is longer than BERT's maximum of 512 tokens? Longformer & BigBird are two very similar models which employ a technique called Sparse Attention to address this.
In my video lecture (divided into 9 bite-size pieces), I provide the context for Sparse Attention and explain all about how it works.
I've also created an eBook covering the same material if you prefer that medium!
To put things into practice, there is also a Colab Notebook applying BigBird to a dataset with longer text sequences.
Video Tutorial   +   eBook   +   Example Code
Why does BERT have a limitation on sequence length to begin with?
NLP Base Camp Members have complete access to this tutorial
and all of my NLP content!
Here's what you'll see in your library!
50% Complete
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.