【Scikit】实现Multi-label text classification代码模板

Refer to: https://stackoverflow.com/a/10527953

code:

# -*- coding: utf-8 -*-

import numpy as np

from sklearn.pipeline import Pipeline

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.svm import LinearSVC

from sklearn.feature_extraction.text import TfidfTransformer

from sklearn.multiclass import OneVsRestClassifier

from sklearn.preprocessing import MultiLabelBinarizer

X_train = np.array(["new york is a hell of a town",

                    "new york was originally dutch",

                    "the big apple is great",

                    "new york is also called the big apple",

                    "nyc is nice",

                    "people abbreviate new york city as nyc",

                    "the capital of great britain is london",

                    "london is in the uk",

                    "london is in england",

                    "london is in great britain",

                    "it rains a lot in london",

                    "london hosts the british museum",

                    "new york is great and so is london",

                    "i like london better than new york"])

y_train_text = [["new york"],["new york"],["new york"],["new york"],["new york"],

                ["new york"],["london"],["london"],["london"],["london"],

                ["london"],["london"],["new york","london"],["new york","london"]]

X_test = np.array(['nice day in nyc',

                   'welcome to london',

                   'london is rainy',

                   'it is raining in britian',

                   'it is raining in britian and the big apple',

                   'it is raining in britian and nyc',

                   'hello welcome to new york. enjoy it here and london too'])

target_names = ['New York', 'London']

mlb = MultiLabelBinarizer()

Y = mlb.fit_transform(y_train_text)

classifier = Pipeline([

    ('vectorizer', CountVectorizer()),

    ('tfidf', TfidfTransformer()),

    ('clf', OneVsRestClassifier(LinearSVC()))])

classifier.fit(X_train, Y)

predicted = classifier.predict(X_test)

all_labels = mlb.inverse_transform(predicted)

for item, labels in zip(X_test, all_labels):

    print('{0} => {1}'.format(item, ', '.join(labels)))

Output:

nice day in nyc => new york

welcome to london => london

london is rainy => london

it is raining in britian => london

it is raining in britian and the big apple => new york

it is raining in britian and nyc => london, new york

hello welcome to new york. enjoy it here and london too => london, new york

【Scikit】实现Multi-label text classification代码模板的更多相关文章

[Bayes] Maximum Likelihood estimates for text classification
Naïve Bayes Classifier. We will use, specifically, the Bernoulli-Dirichlet model for text classifica ...
论文阅读：《Bag of Tricks for Efficient Text Classification》
论文阅读:<Bag of Tricks for Efficient Text Classification> 2018-04-25 11:22:29 卓寿杰_SoulJoy 阅读数 954 ...
论文翻译——Character-level Convolutional Networks for Text Classification
论文地址 Abstract Open-text semantic parsers are designed to interpret any statement in natural language ...
论文解读（XR-Transformer）Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
Paper Information Title:Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text C ...
给label text 上色 && 给textfiled placeholder 上色
1.给label text 上色: NSInteger stringLength = ; stringLength = model.ToUserNickName.length; NSMutableAt ...
[转] Implementing a CNN for Text Classification in TensorFlow
Github上的一个开源项目,文档讲得极清晰 Github - https://github.com/dennybritz/cnn-text-classification-tf 原文- http:// ...
[Tensorflow] RNN - 04. Work with CNN for Text Classification
Ref: Combining CNN and RNN for spoken language identification Ref: Convolutional Methods for Text [1 ...
Implementing a CNN for Text Classification in TensorFlow
参考: 1.Understanding Convolutional Neural Networks for NLP 2.Implementing a CNN for Text Classificati ...
论文列表——text classification
https://blog.csdn.net/BitCs_zt/article/details/82938086 列出自己阅读的text classification论文的列表,以后有时间再整理相应的笔 ...

随机推荐

各种组件的js 获取值 / js动态赋值
jQuery获取Select选择的Text和Value:语法解释:1. $("#select_id").change(function(){//code...}); //为Se ...
动态ip、静态ip、pppoe拨号的区别
pppoe拨号 pppoe拨号上网,又叫做ADSL拨号上网.宽带拨号上网.指现在有很多我的E家用户,送的无线猫,阉割了PPPOE拨号功能,必须要从电脑上拨号才能上网.还有大街上的WIFI热点也很多,如 ...
微信小程序 scroll-view 实现锚点跳转
在微信小程序中,使用 scroll-view 实现长页面的标记跳转,官方文档中没有例子演示,锚点标记主要是使用<scroll-view> 的 scroll-into-view 属性. 实现 ...
oracle 常用 sql
判断字段值是否为空( mysql 为 ifnull(,)): nvl (Withinfocode,'') as *** 两字段拼接: (1)concat(t.indate, t.intime) as ...
How to make an IntelliJ IDEA plugin in less than 30 minutes
Sometimes it is a nice thing to extend an editor to have it do some new stuff, like being able to re ...
WPF腾讯视频通话开发
一.IntPtr.HandleC#中的IntPtr类型称为“平台特定的整数类型”,它们用于本机资源,如窗口句柄. 1.WPF窗口句柄IntPtr wnip = new System.Windows.I ...
leetcode笔记：Validate Binary Search Tree
一. 题目描写叙述 Given a binary tree, determine if it is a valid binary search tree (BST). Assume a BST is ...
Android 组件系列-----Activity生命周期
本篇随笔将会深入学习Activity,包括如何定义多个Activity,并设置为默认的Activity.如何从一个Activity跳转到另一个Activity,还有就是详细分析Activity的生命周 ...
[k8s]jenkins部署在k8s集群
$ cat jenkins-pvc.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: jenkins-pvc spec: ...
MYSQL 线程池
https://www.jianshu.com/p/88e606eca2a5 https://www.percona.com/doc/percona-server/LATEST/performance ...

【Scikit】实现Multi-label text classification代码模板

【Scikit】实现Multi-label text classification代码模板的更多相关文章

随机推荐

热门专题