We don’t have an implementation for skipgrams in sklearn. This post covers how to use the skipgram function in nltk with sklearn’s CountVectorizer and TfidfVectorizer
We are going to create a skipgram tokenizer that can be passed to the tokenizer parameter of the vectorizer.
Create a basic tokenizer that can split the original strings to tokesn.
Tokenizer can be just .split() or a function to filter non-alpahbets etc. We can use tokenizer as below