# Newton's Method to Solve Logistic Regression

## Background and Motivation

Diabetes is a chronic disease affecting millions worldwide, and early diagnosis is critical to prevent severe complications. The Pima Indians Diabetes dataset is a popular public resource for studying diabetes onset. It contains medical data for 768 adult female Pima individuals (e.g. glucose level, BMI, age), along with a label indicating whether the patient developed diabetes. Building a predictive model on this data can aid early detection of diabetes, supporting timely intervention. Logistic regression is a natural choice for this binary classification task, as it is a widely used algorithm for two-class problems. We will formulate diabetes prediction as an optimization problem – finding model parameters that maximize the likelihood of the observed outcomes. Newton’s method (a second-order technique) is well-suited here because the objective is smooth and convex, and it often converges in far fewer iterations than first-order methods.

## Dataset and Features

The dataset (originally from the National Institute of Diabetes and Digestive and Kidney Diseases) provides several diagnostic features for each patient, such as number of pregnancies, plasma glucose concentration, blood pressure, skin fold thickness, insulin level, BMI, diabetes pedigree function, and age. The target variable is binary (1 if the patient showed signs of diabetes within 5 years, 0 otherwise). The goal is to learn the weight vector w (including an intercept term) for the logistic regression model that best predicts the probability of diabetes from these features.


## Tasks:

- Formulate the logistic regression and the optimization problem to solve for the parameters (model coefficients);

- Apply Newton's method (naive or Fisher) to the problem and derive the detailed steps;

- Implement Newton's method in python (write your own code) and fit to the diabetes data, validate the model fitting result.

In [3]:
import pandas as pd

url = "https://raw.githubusercontent.com/npradaschnor/Pima-Indians-Diabetes-Dataset/master/diabetes.csv"
df = pd.read_csv(url)
df

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1
