Use Skipgrams with sklearn CountVectorizer and TfidfVectorizer
We don’t have an implementation for skipgrams in sklearn. This post covers how to use the skipgram function in nltk with sklearn’s CountVectorizer and TfidfVectorizer
We are going to create a skipgram tokenizer that can be passed to the tokenizer parameter of the vectorizer.
Create a basic tokenizer that can split the original strings to tokesn. Tokenizer can be just .split() or a function to filter non-alpahbets etc. We can use tokenizer as below
[Read More]