matchzoo.preprocessors.units.frequency_filter
¶
Module Contents¶
-
class
matchzoo.preprocessors.units.frequency_filter.
FrequencyFilter
(low:float=0, high:float=float('inf'), mode:str='df')¶ Bases:
matchzoo.preprocessors.units.stateful_unit.StatefulUnit
Frequency filter unit.
Parameters: - low – Lower bound, inclusive.
- high – Upper bound, exclusive.
- mode – One of tf (term frequency), df (document frequency), and idf (inverse document frequency).
- Examples::
>>> import matchzoo as mz
- To filter based on term frequency (tf):
>>> tf_filter = mz.preprocessors.units.FrequencyFilter( ... low=2, mode='tf') >>> tf_filter.fit([['A', 'B', 'B'], ['C', 'C', 'C']]) >>> tf_filter.transform(['A', 'B', 'C']) ['B', 'C']
- To filter based on document frequency (df):
>>> tf_filter = mz.preprocessors.units.FrequencyFilter( ... low=2, mode='df') >>> tf_filter.fit([['A', 'B'], ['B', 'C']]) >>> tf_filter.transform(['A', 'B', 'C']) ['B']
- To filter based on inverse document frequency (idf):
>>> idf_filter = mz.preprocessors.units.FrequencyFilter( ... low=1.2, mode='idf') >>> idf_filter.fit([['A', 'B'], ['B', 'C', 'D']]) >>> idf_filter.transform(['A', 'B', 'C']) ['A', 'C']
-
fit
(self, list_of_tokens:typing.List[typing.List[str]])¶ Fit list_of_tokens by calculating mode states.
-
transform
(self, input_:list)¶ Transform a list of tokens by filtering out unwanted words.
-
classmethod
_tf
(cls, list_of_tokens:list)¶
-
classmethod
_df
(cls, list_of_tokens:list)¶
-
classmethod
_idf
(cls, list_of_tokens:list)¶