Use Skipgrams with sklearn CountVectorizer and TfidfVectorizer

We don’t have an implementation for skipgrams in sklearn. This post covers how to use the skipgram function in nltk with sklearn’s CountVectorizer and TfidfVectorizer We are going to create a skipgram tokenizer that can be passed to the tokenizer parameter of the vectorizer. Create a basic tokenizer that can split the original strings to tokesn. Tokenizer can be just .split() or a function to filter non-alpahbets etc. We can use tokenizer as below [Read More]

Django development on OSX using emacs

I started following the Django tutorials by arun. I wanted to know about the related classes and quickly look into docs for the things that I code, I wanted an environment that is nice to newbies, helping with code completion and easy documentation lookup. I searched and found a huge list of options. Since I was begining with Django I wanted an environment that is much easier to setup, my first choice was pycharm, but unfortunately the community edition of pycharm doesn’t support Django - see this. [Read More]