What is Apache Mahout?

The Apache Mahout™ machine learning library's goal is to build scalable machine learning libraries.

Mahout currently has

  • Collaborative Filtering
  • User and Item based recommenders
  • K-Means, Fuzzy K-Means clustering
  • Mean Shift clustering
  • Dirichlet process clustering
  • Latent Dirichlet Allocation
  • Singular value decomposition
  • Parallel Frequent Pattern mining
  • Complementary Naive Bayes classifier
  • Random forest decision tree based classifier
  • High performance java collections (previously colt collections)
  • A vibrant community
  • and many more cool stuff to come by this summer thanks to Google summer of code

With scalable we mean:

Scalable to reasonably large data sets. Our core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. However we do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms

Scalable to support your business case. Mahout is distributed under a commercially friendly Apache Software license.

Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Come to the mailing lists to find out more.

Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.

Interested in helping? See the wiki or join the mailing lists.

Mahout News

16 June 2012 - Apache Mahout 0.7 released

Apache Mahout has reached version 0.7. All developers are encouraged to begin using version 0.7. Highlights include:

  • Outlier removal capability in K-Means, Fuzzy K, Canopy and Dirichlet Clustering
  • New Clustering implementation for K-Means, Fuzzy K, Canopy and Dirichlet using Cluster Classifiers
  • Collections and Math api consolidated
  • (Complementary) Naive Bayes refactored and cleaned
  • Watchmaker and Old Naive Bayes dropped.
  • Many bug fixes, refactorings, and other small improvements

Changes in 0.7 are detailed in the release notes.

Downloads of all releases available from Apache mirrors.

6 Feb 2012 - Apache Mahout 0.6 released

Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:

  • Improved Decision Tree performance and added support for regression problems
  • New LDA implementation using Collapsed Variational Bayes 0th Derivative Approximation
  • Reduced runtime of LanczosSolver tests
  • K-Trusses, Top-Down and Bottom-Up clustering, Random Walk with Restarts implementation
  • Reduced runtime of dot product between vectors
  • Added MongoDB and Cassandra DataModel support
  • Increased efficiency of parallel ALS matrix factorization
  • SSVD enhancements
  • Performance improvements in RowSimilarityJob, TransposeJob
  • Added numerous clustering display examples
  • Many bug fixes, refactorings, and other small improvements

Changes in 0.6 are detailed in the release notes.

Downloads of all releases available from Apache mirrors.

9 Oct 2011 - Mahout in Action released

At last, the book Mahout in Action is available in print. Sean Owen, Robin Anil, Ted Dunning and Ellen Friedman thank the community (especially those who were reviewers) for input during the process and hope it is enjoyable.

Find it at your favorite bookstore, or order print and eBook copies from Manning -- use discount code "mahout37" for 37% off.

27 May 2011 - Apache Mahout 0.5 released

Apache Mahout has reached version 0.5. All developers are encouraged to begin using version 0.5, as again much has changed and been fixed since version 0.4. Many APIs have been changed, added or removed, and will continue before version 1.0. Highlights of version 0.5 include:

  • Improved Lanczos solver: graceful restarts, better scalability
  • LDA improvements: document-topic distribution output, graceful restarts
  • Stochastic Singular Value Decomposition implementation
  • Incremental SVD implementation
  • Alternating Least Squares with Weighted Regularization collaborative filtering implementation, both distributed and non-distributed
  • SVDRecommender enhancements
  • Initial work at merging clustering and classification infrastructure
  • Better control over candidate item selection in item-based recommenders
  • Significant removal of deprecated or dead code
  • Many bug fixes, refactorings and other small improvements

Changes in 0.5 are detailed in the release notes.

Downloads of all releases are available from Apache Mirrors.