Posts

Showing posts from March, 2017

Create a custom Transformer in PySpark ML

Via: http://stackoverflow.com/questions/32331848/create-a-custom-transformer-in-pyspark-mlimport nltk from pyspark import keyword_only ## < 2.0 -> pyspark.ml.util.keyword_onlyfrom pyspark.ml importTransformerfrom pyspark.ml.param.shared importHasInputCol,HasOutputCol,Paramfrom pyspark.sql.functions import udf from pyspark.sql.types importArrayType,StringTypeclassNLTKWordPunctTokenizer(Transformer,HasInputCol,HasOutputCol):@keyword_onlydef __init__(self, inputCol=None, outputCol=None, stopwords=None): super(NLTKWordPunctTokenizer