## Logistic Regression

Distribution 1.0
17 July 2015

Massih R. Amini

Université Grenoble Alpes
Laboratoire d'Informatique de Grenoble

### Description

This program is an implementation of the Logistic Regression model algorithm for classification introduced by [Cox, 1958; Truett et al., 1967] and described in [Amini, 2015; p.73-75]. The developed algorithm is a supervised learning model that is employed to explain the effects of the feature characteristics of instances on their binary {0,1} categorical outputs. The main hypothesis is that for each observation $\mathbf{x}\in\mathbb{R}^d$, the logarithm of the ratio of posteriors is a linear combination of its features
$\ln\left(\frac{P(Y=1\mid \mathbf{x})}{P(Y=0\mid \mathbf{x})}\right)=\underbrace{\langle\mathbf{x},\bar{\boldsymbol{w}}\rangle+w_0}_{= h_{\boldsymbol{w}} (\mathbf{x})}$
where, $\boldsymbol{w}=(\bar{\boldsymbol{w}},w_0)\in \mathbb{R}^d\times \mathbb{R}$ are model parameters that are usually learned by maximizing the complete log-likelihood of the data. By shiftting the output space to {-1,1}, the maximiztion of the log-likelihood is then equivalent to the minimization of the logistic surrogate of the 0/1 loss, which on a training set $S=\left((\mathbf{x}_i,y_i)\right)_{i=1}^m$ writes
$\hat{\mathcal{L}}(\boldsymbol{w})=\frac{1}{m}\sum_{i=1}^m \ln(1+e^{-y_i h_{\boldsymbol{w}} (\mathbf{x}_i)})$
Efficient first and second order optimization techniques are generally applied to achieve the minimization [Hastie et al. 2009]. The proposed program is based on the gradient conjugate technique.

The program is free for scientific use only and it is developed on Linux with gcc and the source code is available from:
http://ama.liglab.fr/~amini/LR/LogisticRegression.tar.bz2

> bzip2 -cd LogisticRegression.tar.bz2 | tar xvf -

you need to compile the program in the new directory LogisticRegression/

> make

After compilation, two executables are created:

• LogisticRegression-learn (for training the model)
• LogisticRegression-test (for testing it)

### Training and testing

Each example in these files is represented by its class label (+1 or -1) followed by its plain vector representation. In LogisticRegression/example/ there are four (training_set and test_set) files, from UCI repository.

Train the model:
> LogisticRegression-learn [options] input_file parameter_file

Options are:
 -e   (float) Precision (default 1e-4), -d   ({0,1}) Display (default 0), -? Help

Test the model:
> LogisticRegression-test input_file parameter_file

### Example

> LogisticRegression-learn -d 1 -e 0.01 example/IONO-Train Params-LR-IONO
The training set contains 210 examples in dimension 34
Iteration:0 Loss:0.707320
Iteration:5 Loss:0.450916
Iteration:10 Loss:0.331135
Iteration:15 Loss:0.274300
Iteration:20 Loss:0.239811
Iteration:25 Loss:0.221960
Iteration:30 Loss:0.207585
Iteration:35 Loss:0.196435
Precision:0.923077 Recall:0.970588 F1-measure:0.946237 Error=0.071429
> LogisticRegression-test example/IONO-Test Params-LR-IONO
Prediction on the test set containing 141 examples in dimension 34
Precision:0.865979 Rappel:0.943820 mesure-F:0.903226 Erreur=0.127660

### Disclaimer

This program is publicly available for research use only. It should not be distributed for commercial use and the author is not responsible for any (mis)use of this algorithm.

### Bibliography

[Amini, 2015] Massih-Reza Amini. Apprentissage Machine: de la théorie à la pratique. Eyrolles, 2015.

[Cox, 1958] David Roxbee Cox, DR. The regression analysis of binary sequences (with discussion). Journal of the Royal Statistical Society. Series B, 20: 215-242, 1958.

[Hastie et al. 2009] Trevor Hastie, Robert Tibshirani, and Jerome Friedmann. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009.

[Mohri et al. 2012] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2012.

[Truett et al., 1967] Jeanne Truett, Jerome Cornfield, William Kannel. A multivariate analysis of the risk of coronary heart disease in Framingham. Journal of chronic diseases 20 (7): 511-24, 1967.