结论先行: 当使用 from lightgbm import LGBMClassifier 的模型进行训练时,使用的是sklearn中的XGBClassifier类,该方法中无需特意指定分类类别,方法自带类别数量_n_classes的计算,并根据数量指定了_objective参数,简而言之:该方法会自动判别是多分类还是二分类任务,无需特殊说明。
之所以特殊强调from lightgbm import LGBMClassifier 是要区别于import lightgbm as lgb 的调用,本人更建议使用前者。
下方代码取自 sklearn.py 的 class LGBMModel(_LGBMModelBase)…和class LGBMClassifier(_LGBMClassifierBase, LGBMModel)…
# class LGBMModel(_LGBMModelBase)......"""
objective : str, callable or None, optional (default=None)
Specify the learning task and the corresponding learning objective or
a custom objective function to be used (see note below).
Default: 'regression' for LGBMRegressor, 'binary' or 'multiclass' for LGBMClassifier, 'lambdarank' for LGBMRanker.
"""# class LGBMClassifier(_LGBMClassifierBase, LGBMModel)......deffit(self, X, y,# ......
self._classes = self._le.classes_
self._n_classes =len(self._classes)if self._n_classes >2:# Switch to using a multiclass objective in the underlying LGBM instance
ova_aliases ={"multiclassova","multiclass_ova","ova","ovr"}if self._objective notin ova_aliases andnotcallable(self._objective):
self._objective ="multiclass"# ......
一.轻松实现多分类
1.1导入第三方库、数据集
# 导入第三方库,包括分类模型、数据集、数据集分割方法、评估方法from lightgbm import LGBMClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score, precision_score, recall_score, f1_score, \
classification_report
import lightgbm as lgb
# 导入sklearn的鸢尾花卉数据集,作为模型的训练和验证数据
data = datasets.load_iris()# 数据划分,按照7 3分切割数据集为训练集和验证集,其中最终4个结果依次为训练数据、验证数据、训练数据的标签、验证数据的标签
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3,random_state=123)
# 数据划分
X_train, X_test, y_train, y_test =......# 接# 训练集和验证集的标签都改成0和1
y_train =[1if y >0else0for y in y_train]
y_test =[1if y >0else0for y in y_test]# 计算训练数据类别
n_classes =len(set(y_train))# 接# 默认参数的模型
model = LGBMClassifier()......