This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.111

An introduction to Majorization‐Minimization algorithms for machine learning and statistical estimation

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

MM (majorization–minimization) algorithms are an increasingly popular tool for solving optimization problems in machine learning and statistical estimation. This article introduces the MM algorithm framework in general and via three commonly considered example applications: Gaussian mixture regressions, multinomial logistic regressions, and support vector machines. Specific algorithms for these three examples are derived and Mathematical Programming Series A numerical demonstrations are presented. Theoretical and practical aspects of MM algorithm design are discussed. WIREs Data Mining Knowl Discov 2017, 7:e1198. doi: 10.1002/widm.1198

This article is categorized under:

  • Algorithmic Development > Statistics
  • Technologies > Machine Learning
  • Technologies > Statistical Fundamentals
Examples of majorizers for objectives of the form O θ = θ d at the point υ = 1/2, for d = 1, 1.5. The solid lines indicate the objectives and the dashed lines indicate the majorizers.
[ Normal View | Magnified View ]
Separating hyperplane between for the classification problem of separating the Setosa and Versicolor irises by their sepal length and width. The dashed line indicates the SVM‐obtained separating hyperplane. The Circles and Triangles indicate Setosa and Versicolor irises, respectively.
[ Normal View | Magnified View ]
Negative log‐likelihood risk versus log 10 iterations for the MLR model fitted to the iris data set. The solid line indicates the MM algorithm‐obtained sequence, and the dashed line indicates the sequence obtained from the multinom function.
[ Normal View | Magnified View ]
Example of an instance of the Case 2 of the sampling experiments of Quandt. Solid lines indicate generative mean functions according to the g = 2 different components of the mixture, and dashed lines indicate fitted mean functions y^ c (x) for c = 1,2.
[ Normal View | Magnified View ]

Browse by Topic

Algorithmic Development > Statistics
Technologies > Machine Learning
Technologies > Statistical Fundamentals

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts