Program Tip

Python에서 ROC 곡선을 그리는 방법

programtip 2020. 12. 13. 10:28

Python에서 ROC 곡선을 그리는 방법

로지스틱 회귀 패키지를 사용하여 Python에서 개발 한 예측 모델의 정확성을 평가하기 위해 ROC 곡선을 그리려고합니다. 나는 참 양성율과 거짓 양성율을 계산했습니다. 그러나 matplotlibAUC 값을 사용하여 올바르게 플로팅하는 방법을 알 수 없습니다 . 어떻게 할 수 있습니까?

다음은 modelsklearn 예측 변수 라고 가정하여 시도 할 수있는 두 가지 방법입니다 .

import sklearn.metrics as metrics
# calculate the fpr and tpr for all thresholds of the classification
probs = model.predict_proba(X_test)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
roc_auc = metrics.auc(fpr, tpr)

# method I: plt
import matplotlib.pyplot as plt
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

# method II: ggplot
from ggplot import *
df = pd.DataFrame(dict(fpr = fpr, tpr = tpr))
ggplot(df, aes(x = 'fpr', y = 'tpr')) + geom_line() + geom_abline(linetype = 'dashed')

또는 시도

ggplot(df, aes(x = 'fpr', ymin = 0, ymax = 'tpr')) + geom_line(aes(y = 'tpr')) + geom_area(alpha = 0.2) + ggtitle("ROC Curve w/ AUC = %s" % str(roc_auc))

이것은 일련의 Ground Truth 레이블과 예측 된 확률을 고려하여 ROC 곡선을 그리는 가장 간단한 방법입니다. 가장 중요한 부분은 모든 클래스에 대한 ROC 곡선을 플로팅하므로 여러 개의 깔끔한 곡선도 얻을 수 있습니다.

import scikitplot as skplt
import matplotlib.pyplot as plt

y_true = # ground truth labels
y_probas = # predicted probabilities generated by sklearn classifier
skplt.metrics.plot_roc_curve(y_true, y_probas)
plt.show()

다음은 plot_roc_curve에 의해 생성 된 샘플 곡선입니다. scikit-learn의 샘플 숫자 데이터 세트를 사용 했으므로 10 개의 클래스가 있습니다. 각 클래스에 대해 하나의 ROC 곡선이 그려져 있습니다.

면책 조항 : 이것은 내가 만든 scikit-plot 라이브러리를 사용합니다 .

여기서 문제가 무엇인지는 전혀 명확하지 않지만 배열 true_positive_rate과 배열이있는 경우 false_positive_rateROC 곡선을 플로팅하고 AUC를 얻는 것은 다음과 같이 간단합니다.

import matplotlib.pyplot as plt
import numpy as np

x = # false_positive_rate
y = # true_positive_rate 

# This is the ROC curve
plt.plot(x,y)
plt.show() 

# This is the AUC
auc = np.trapz(y,x)

matplotlib를 사용한 이진 분류에 대한 AUC 곡선

from sklearn import svm, datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt

유방암 데이터 세트 불러 오기

breast_cancer = load_breast_cancer()

X = breast_cancer.data
y = breast_cancer.target

데이터 세트 분할

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.33, random_state=44)

모델

clf = LogisticRegression(penalty='l2', C=0.1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

정확성

print("Accuracy", metrics.accuracy_score(y_test, y_pred))

AUC 곡선

y_pred_proba = clf.predict_proba(X_test)[::,1]
fpr, tpr, _ = metrics.roc_curve(y_test,  y_pred_proba)
auc = metrics.roc_auc_score(y_test, y_pred_proba)
plt.plot(fpr,tpr,label="data 1, auc="+str(auc))
plt.legend(loc=4)
plt.show()

다음은 ROC 곡선을 계산하기위한 Python 코드입니다 (산점도).

import matplotlib.pyplot as plt
import numpy as np

score = np.array([0.9, 0.8, 0.7, 0.6, 0.55, 0.54, 0.53, 0.52, 0.51, 0.505, 0.4, 0.39, 0.38, 0.37, 0.36, 0.35, 0.34, 0.33, 0.30, 0.1])
y = np.array([1,1,0, 1, 1, 1, 0, 0, 1, 0, 1,0, 1, 0, 0, 0, 1 , 0, 1, 0])

# false positive rate
fpr = []
# true positive rate
tpr = []
# Iterate thresholds from 0.0, 0.01, ... 1.0
thresholds = np.arange(0.0, 1.01, .01)

# get number of positive and negative examples in the dataset
P = sum(y)
N = len(y) - P

# iterate through all thresholds and determine fraction of true positives
# and false positives found at this threshold
for thresh in thresholds:
    FP=0
    TP=0
    for i in range(len(score)):
        if (score[i] > thresh):
            if y[i] == 1:
                TP = TP + 1
            if y[i] == 0:
                FP = FP + 1
    fpr.append(FP/float(N))
    tpr.append(TP/float(P))

plt.scatter(fpr, tpr)
plt.show()

이전 답변은 실제로 TP / Sens를 직접 계산했다고 가정합니다. 이 작업을 수동으로 수행하는 것은 나쁜 생각입니다. 계산에 실수를하기 쉽기 때문에이 모든 작업에 라이브러리 함수를 사용하십시오.

scikit_lean의 plot_roc 함수는 필요한 작업을 정확히 수행합니다. http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

코드의 필수 부분은 다음과 같습니다.

  for i in range(n_classes):
      fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
      roc_auc[i] = auc(fpr[i], tpr[i])

from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

y_true = # true labels
y_probas = # predicted results
fpr, tpr, thresholds = metrics.roc_curve(y_true, y_probas, pos_label=0)

# Print ROC curve
plt.plot(fpr,tpr)
plt.show() 

# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)

ROC 곡선 용 패키지에 포함 된 간단한 기능을 만들었습니다. 방금 머신 러닝을 연습하기 시작 했으니이 코드에 문제가 있으면 알려주세요!

자세한 내용은 github readme 파일을 참조하십시오! :)

https://github.com/bc123456/ROC

from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, roc_curve
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

def plot_ROC(y_train_true, y_train_prob, y_test_true, y_test_prob):
    '''
    a funciton to plot the ROC curve for train labels and test labels.
    Use the best threshold found in train set to classify items in test set.
    '''
    fpr_train, tpr_train, thresholds_train = roc_curve(y_train_true, y_train_prob, pos_label =True)
    sum_sensitivity_specificity_train = tpr_train + (1-fpr_train)
    best_threshold_id_train = np.argmax(sum_sensitivity_specificity_train)
    best_threshold = thresholds_train[best_threshold_id_train]
    best_fpr_train = fpr_train[best_threshold_id_train]
    best_tpr_train = tpr_train[best_threshold_id_train]
    y_train = y_train_prob > best_threshold

    cm_train = confusion_matrix(y_train_true, y_train)
    acc_train = accuracy_score(y_train_true, y_train)
    auc_train = roc_auc_score(y_train_true, y_train)

    print 'Train Accuracy: %s ' %acc_train
    print 'Train AUC: %s ' %auc_train
    print 'Train Confusion Matrix:'
    print cm_train

    fig = plt.figure(figsize=(10,5))
    ax = fig.add_subplot(121)
    curve1 = ax.plot(fpr_train, tpr_train)
    curve2 = ax.plot([0, 1], [0, 1], color='navy', linestyle='--')
    dot = ax.plot(best_fpr_train, best_tpr_train, marker='o', color='black')
    ax.text(best_fpr_train, best_tpr_train, s = '(%.3f,%.3f)' %(best_fpr_train, best_tpr_train))
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.0])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC curve (Train), AUC = %.4f'%auc_train)

    fpr_test, tpr_test, thresholds_test = roc_curve(y_test_true, y_test_prob, pos_label =True)

    y_test = y_test_prob > best_threshold

    cm_test = confusion_matrix(y_test_true, y_test)
    acc_test = accuracy_score(y_test_true, y_test)
    auc_test = roc_auc_score(y_test_true, y_test)

    print 'Test Accuracy: %s ' %acc_test
    print 'Test AUC: %s ' %auc_test
    print 'Test Confusion Matrix:'
    print cm_test

    tpr_score = float(cm_test[1][1])/(cm_test[1][1] + cm_test[1][0])
    fpr_score = float(cm_test[0][1])/(cm_test[0][0]+ cm_test[0][1])

    ax2 = fig.add_subplot(122)
    curve1 = ax2.plot(fpr_test, tpr_test)
    curve2 = ax2.plot([0, 1], [0, 1], color='navy', linestyle='--')
    dot = ax2.plot(fpr_score, tpr_score, marker='o', color='black')
    ax2.text(fpr_score, tpr_score, s = '(%.3f,%.3f)' %(fpr_score, tpr_score))
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.0])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC curve (Test), AUC = %.4f'%auc_test)
    plt.savefig('ROC', dpi = 500)
    plt.show()

    return best_threshold

이 코드로 생성 된 샘플 roc 그래프

공식 문서 양식 scikit을 따를 수도 있습니다.

https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py

Based on multiple comments from stackoverflow, scikit-learn documentation and some other, I made a python package to plot ROC curve (and other metric) in a really simple way.

To install package : pip install plot-metric (more info at the end of post)

To plot a ROC Curve (example come from the documentation) :

Binary classification

Let's load a simple dataset and make a train & test set :

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)

Train a classifier and predict test set :

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=50, random_state=23)
model = clf.fit(X_train, y_train)

# Use predict_proba to predict probability of the class
y_pred = clf.predict_proba(X_test)[:,1]

You can now use plot_metric to plot ROC Curve :

from plot_metric.functions import BinaryClassification
# Visualisation with plot_metric
bc = BinaryClassification(y_test, y_pred, labels=["Class 1", "Class 2"])

# Figures
plt.figure(figsize=(5,5))
bc.plot_roc_curve()
plt.show()

Result :

You can find more example of on the github and documentation of the package:

Github : https://github.com/yohann84L/plot_metric
Documentation : https://plot-metric.readthedocs.io/en/latest/

참고 URL : https://stackoverflow.com/questions/25009284/how-to-plot-roc-curve-in-python

'Program Tip' 카테고리의 다른 글

SQL Server Management Studio에서 저장 프로 시저 코드를 보는 방법 (0)	2020.12.13
C에서 Segfault를 생성하는 가장 간단한 표준 준수 방법은 무엇입니까? (0)	2020.12.13
파이썬, 왜 elif 키워드일까요? (0)	2020.12.13
기계적 인조 인간. (0)	2020.12.13
PHP를 사용하여 mysql 데이터베이스에서 .sql 파일을 가져 오는 방법 (0)	2020.12.13

현재글Python에서 ROC 곡선을 그리는 방법

programtip

Python에서 ROC 곡선을 그리는 방법

Python에서 ROC 곡선을 그리는 방법

matplotlib를 사용한 이진 분류에 대한 AUC 곡선

유방암 데이터 세트 불러 오기

데이터 세트 분할

모델

정확성

AUC 곡선

Binary classification

'Program Tip' 카테고리의 다른 글

'Program Tip'의 다른글

티스토리툴바

Python에서 ROC 곡선을 그리는 방법

Python에서 ROC 곡선을 그리는 방법

matplotlib를 사용한 이진 분류에 대한 AUC 곡선

유방암 데이터 세트 불러 오기

데이터 세트 분할

모델

정확성

AUC 곡선

Binary classification

'Program Tip' 카테고리의 다른 글

'Program Tip'의 다른글

관련글

티스토리툴바