Logistic Regression

^Description

This program is an implementation of the Logistic Regression model algorithm for classification introduced by [Cox, 1958; Truett et al., 1967] and described in [Amini, 2015; p.73-75]. The developed algorithm is a supervised learning model that is employed to explain the effects of the feature characteristics of instances on their binary {0,1} categorical outputs. The main hypothesis is that for each observation $\mathbf{x}\in\mathbb{R}^d$ , the logarithm of the ratio of posteriors is a linear combination of its features
$\ln\left(\frac{P(Y=1\mid \mathbf{x})}{P(Y=0\mid \mathbf{x})}\right)=\underbrace{\langle\mathbf{x},\bar{\boldsymbol{w}}\rangle+w_0}_{= h_{\boldsymbol{w}} (\mathbf{x})}$ where, $\boldsymbol{w}=(\bar{\boldsymbol{w}},w_0)\in \mathbb{R}^d\times \mathbb{R}$ are model parameters that are usually learned by maximizing the complete log-likelihood of the data. By shiftting the output space to {-1,1}, the maximiztion of the log-likelihood is then equivalent to the minimization of the logistic surrogate of the 0/1 loss, which on a training set $S=\left((\mathbf{x}_i,y_i)\right)_{i=1}^m$ writes $\hat{\mathcal{L}}(\boldsymbol{w})=\frac{1}{m}\sum_{i=1}^m \ln(1+e^{-y_i h_{\boldsymbol{w}} (\mathbf{x}_i)})$ Efficient first and second order optimization techniques are generally applied to achieve the minimization [Hastie et al. 2009]. The proposed program is based on the gradient conjugate technique.

^{Download and Installation}

The program is free for scientific use only and it is developed on Linux with gcc and the source code is available from:
http://ama.liglab.fr/~amini/LR/LogisticRegression.tar.bz2

After downloading the file, and unpackting it:

 > bzip2 -cd LogisticRegression.tar.bz2 | tar xvf -

you need to compile the program in the new directory LogisticRegression/

 > make

After compilation, two executables are created:

LogisticRegression-learn (for training the model)
LogisticRegression-test (for testing it)

^{Training and testing}

Each example in these files is represented by its class label (+1 or -1) followed by its plain vector representation. In LogisticRegression/example/ there are four (training_set and test_set) files, from UCI repository.

^{Train the model:}

 > LogisticRegression-learn [options] input_file parameter_file

Options are:

-e (float)	Precision (default 1e-4),
-d ({0,1})	Display (default 0),
-?	Help

^{Test the model:}

 > LogisticRegression-test input_file parameter_file

^Example

 > LogisticRegression-learn -d 1 -e 0.01 example/IONO-Train Params-LR-IONO 

The training set contains 210 examples in dimension 34

Iteration:0 Loss:0.707320

Iteration:5 Loss:0.450916

Iteration:10 Loss:0.331135

Iteration:15 Loss:0.274300

Iteration:20 Loss:0.239811

Iteration:25 Loss:0.221960

Iteration:30 Loss:0.207585

Iteration:35 Loss:0.196435

Precision:0.923077 Recall:0.970588 F1-measure:0.946237 Error=0.071429

> LogisticRegression-test example/IONO-Test Params-LR-IONO 

Prediction on the test set containing 141 examples in dimension 34

Precision:0.865979 Rappel:0.943820 mesure-F:0.903226 Erreur=0.127660

^Disclaimer

This program is publicly available for research use only. It should not be distributed for commercial use and the author is not responsible for any (mis)use of this algorithm.

^Bibliography

[Amini, 2015] Massih-Reza Amini. Apprentissage Machine: de la théorie à la pratique. Eyrolles, 2015.

[Cox, 1958] David Roxbee Cox, DR. The regression analysis of binary sequences (with discussion). Journal of the Royal Statistical Society. Series B, 20: 215-242, 1958.

[Hastie et al. 2009] Trevor Hastie, Robert Tibshirani, and Jerome Friedmann. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009.

[Mohri et al. 2012] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2012.

[Truett et al., 1967] Jeanne Truett, Jerome Cornfield, William Kannel. A multivariate analysis of the risk of coronary heart disease in Framingham. Journal of chronic diseases 20 (7): 511-24, 1967.