本文實(shí)例為大家分享了python實(shí)現(xiàn)K折交叉驗(yàn)證的具體代碼,供大家參考,具體內(nèi)容如下
用KNN算法訓(xùn)練iris數(shù)據(jù),并使用K折交叉驗(yàn)證方法找出最優(yōu)的K值
import numpy as np from sklearn import datasets from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import KFold # 主要用于K折交叉驗(yàn)證 # 導(dǎo)入iris數(shù)據(jù)集 iris = datasets.load_iris() X = iris.data y = iris.target print(X.shape,y.shape) # 定義想要搜索的K值,這里定義8個(gè)不同的值 ks = [1,3,5,7,9,11,13,15] # 進(jìn)行5折交叉驗(yàn)證,KFold返回的是每一折中訓(xùn)練數(shù)據(jù)和驗(yàn)證數(shù)據(jù)的index # 假設(shè)數(shù)據(jù)樣本為:[1,3,5,6,11,12,43,12,44,2],總共10個(gè)樣本 # 則返回的kf的格式為(前面的是訓(xùn)練數(shù)據(jù),后面的驗(yàn)證集): # [0,1,3,5,6,7,8,9],[2,4] # [0,1,2,4,6,7,8,9],[3,5] # [1,2,3,4,5,6,7,8],[0,9] # [0,1,2,3,4,5,7,9],[6,8] # [0,2,3,4,5,6,8,9],[1,7] kf = KFold(n_splits = 5, random_state=2001, shuffle=True) # 保存當(dāng)前最好的k值和對應(yīng)的準(zhǔn)確率 best_k = ks[0] best_score = 0 # 循環(huán)每一個(gè)k值 for k in ks: curr_score = 0 for train_index,valid_index in kf.split(X): # 每一折的訓(xùn)練以及計(jì)算準(zhǔn)確率 clf = KNeighborsClassifier(n_neighbors=k) clf.fit(X[train_index],y[train_index]) curr_score = curr_score + clf.score(X[valid_index],y[valid_index]) # 求一下5折的平均準(zhǔn)確率 avg_score = curr_score/5 if avg_score > best_score: best_k = k best_score = avg_score print("current best score is :%.2f" % best_score,"best k:%d" %best_k) print("after cross validation, the final best k is :%d" %best_k)
以上就是本文的全部內(nèi)容,希望對大家的學(xué)習(xí)有所幫助,也希望大家多多支持腳本之家。
標(biāo)簽:臨汾 金華 赤峰 日照 貴州 陽泉 克拉瑪依 雙鴨山
巨人網(wǎng)絡(luò)通訊聲明:本文標(biāo)題《python實(shí)現(xiàn)K折交叉驗(yàn)證》,本文關(guān)鍵詞 python,實(shí)現(xiàn),折,交叉,驗(yàn)證,;如發(fā)現(xiàn)本文內(nèi)容存在版權(quán)問題,煩請?zhí)峁┫嚓P(guān)信息告之我們,我們將及時(shí)溝通與處理。本站內(nèi)容系統(tǒng)采集于網(wǎng)絡(luò),涉及言論、版權(quán)與本站無關(guān)。