### 特征工程数据提取

{‘city’:‘北京’,‘temperature’:100},{‘city’:‘上海’,‘temperature’:20},{‘city’:‘杭州’,‘temperature’:90}

one-hot编码后的字典：

{‘city=北京’: 1.0, ‘temperature’: 100.0}, {‘city=上海’: 1.0, ‘temperature’: 20.0}, {‘city=杭州’: 1.0, ‘temperature’: 90.0}

## ：

DictVectorizer(sparse=True,…)

DictVectorizer.fit_transform(X)

X:字典或者包含字典的迭代器

DictVectorizer.inverse_transform(X)

X:array数组或者sparse矩阵

DictVectorizer.get_feature_names()

DictVectorizer.transform(X)

1：实例化类DictVectorizer

2：调用fit_transform方法输入数据并转换 注意返回格式

from sklearn.feature_extraction import DictVectorizer

``````def dictvec():
"""字典数据抽取"""
#实例化
dict=DictVectorizer(sparse=False)
data=dict.fit_transform([{'city':'北京','temperature':100},{'city':'上海','temperature':80},{'city':'广州','temperature':70}])
#返回类别名称
name=dict.get_feature_names()
print(name)
print(data)

if __name__ == '__main__':
dictvec()

``````

``````CountVectorizer(max_df=1.0,min_df=1,…)
``````

``````CountVectorizer.fit_transform(X,y)
X:文本或者包含文本字符串的可迭代对象
返回值：返回sparse矩阵
CountVectorizer.inverse_transform(X)
X:array数组或者sparse矩阵
``````

``````CountVectorizer.get_feature_names()
返回值:单词列表
``````

``````from sklearn.feature_extraction.text import CountVectorizer
def countvec():
"""
对文本进行特征化
:return:None
"""
#实例化
cv=CountVectorizer()
data=cv.fit_transform(["life is short,i like python","life is too long,i dislike python"])
print(cv.get_feature_names())
print(data.toarray())
if __name__ == '__main__':
countvec()
``````

``````['dislike', 'is', 'life', 'like', 'long', 'python', 'short', 'too']
[[0 2 1 1 0 1 1 0]
[1 1 1 0 1 1 0 1]]
``````