Welcome to Apache Lucene

The Apache LuceneTM project develops open-source search software, including:

  • Lucene Core, our flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
  • SolrTM is a high performance search server built using Lucene Core, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface.
  • Open Relevance Project is a subproject with the aim of collecting and distributing free materials for relevance testing and performance.
  • PyLucene is a Python port of the Core project.

LuceneTM News

18 June 2013 - Apache Lucene 4.3.1 and Apache SolrTM 4.3.1 available

The Lucene PMC is pleased to announce the availability of Apache Lucene 4.3.1 and Apache Solr 4.3.1.

Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-latest-redir.html and Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the Lucene CHANGES.txt and Solr CHANGES.txt files included with the release for a full list of details.

Highlights of the Lucene release include:

  • Lucene 4.3.1 includes 12 bug fixes and 1 optimization, including fixes for a serious bug that can cause deadlock.

Highlights of the Solr release include:

  • Solr 4.3.1 includes 24 bug fixes. The list includes a lot of SolrCloud bug fixes around Shard Splitting as well as some fixes in other areas.

  • Lucene 4.3.1 bug fixes and optimizations.

6 May 2013 - Apache Lucene 4.3.0 and Apache SolrTM 4.3.0 available

The Lucene PMC is pleased to announce the availability of Apache Lucene 4.3.0 and Apache Solr 4.3.0.

Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-latest-redir.html and Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the Lucene CHANGES.txt and Solr CHANGES.txt files included with the release for a full list of details.

Highlights of the Lucene release include:

  • Significant performance improvements for minShouldMatch BooleanQuery due to skipping resulting in up to 4000% faster queries.

  • A new SortingAtomicReader which allows sorting an index based on a sort criteria (e.g. a numeric DocValues field), as well as SortingMergePolicy which sorts documents before segments are merged.

  • DocIdSetIterator and Scorer now has a cost API that provides an upper bound of the number of documents the iterator might match. This API allows optimisation during query execution or how filters are applied.

  • Analyzing/FuzzySuggester now allow to record arbitrary byte[] as a payload. The suggesters also use an ending offset to determine whether the last token was finished or not, so that a query "i " will no longer suggest "Isla de Muerta" for example.

  • Lucene Spatial Module can now search for indexed shapes by Within, Contains, and Disjoint relationships, in addition to typical Intersects.

  • PostingsHighlighter now allows custom passage scores, per-field BreakIterators and has been detached from TopDocs. Additionally, subclasses can override where string values for highlighting are pulled from alternatively to stored fields.

  • New SearcherTaxonomyManager manages near-real-time reopens of both IndexSearcher and TaxonomyReader (for faceting).

  • Added new facet method to the facet module to compute facet counts using SortedSetDocValuesField, without a separate taxonomy index.

  • DrillSideways class, for computing sideways facet counts, is now more flexible: it allows more than one FacetRequest per dimension and now allows drilling down on dimensions that do not have a facet request.

  • Various bugfixes and optimizations since the 4.2.1 release.

Highlights of the Solr release include:

  • Tired of maintaining core information in solr.xml? Now you can configure Solr to automatically find cores by walking an arbitrary directory.

  • Shard Splitting: You can now split SolrCloud shards to expand your cluster as you grow.

  • The read side schema REST API has been improved and expanded upon: all schema information is now available and the full live schema can now be returned in json or xml. Ground work is included for the upcoming write side of the schema REST API.

  • Spatial queries can now search for indexed shapes by "IsWithin", "Contains" and "IsDisjointTo" relationships, in addition to typical "Intersects".

  • Faceting now supports local parameters for faceting on the same field with different options.

  • Significant performance improvements for minShouldMatch (mm) queries due to skipping resulting in up to 4000% faster queries.

  • Various new highlighting configuration parameters.

  • A new solr.xml format that is closer to that of solrconfig.xml. The example still uses the old format, but 4.4 will ship with the new format.

  • Lucene 4.3.0 bug fixes and optimizations.

Solr 4.3.0 also includes many other new features as well as numerous optimizations and bugfixes.

3 April 2013 - Apache Lucene 4.2.1 and Apache SolrTM 4.2.1 available

The Lucene PMC is pleased to announce the availability of Apache Lucene 4.2.1 and Apache Solr 4.2.1.

Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-latest-redir.html and Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the Lucene CHANGES.txt and Solr CHANGES.txt files included with the release for a full list of details.

Highlights of the Lucene release include:

  • Lucene 4.2.1 includes 9 bug fixes and 3 optimizations, including a fix for a serious bug that could result in the loss of an index.

Highlights of the Solr release include:

  • Solr 4.2.1 includes 38 bug fixes and 2 optimizations. The list includes a lot of SolrCloud bug fixes around the Collections API as well as many fixes around Directory management. There are many fixes in other areas as well.

  • Lucene 4.2.1 bug fixes and optimizations.

11 March 2013 - Apache Lucene 4.2 and Apache SolrTM 4.2 available

The Lucene PMC is pleased to announce the availability of Apache Lucene 4.2 and Apache Solr 4.2.

Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-latest-redir.html and Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the Lucene CHANGES.txt and Solr CHANGES.txt files included with the release for a full list of details.

Highlights of the Lucene release include:

  • Lucene 4.2 has a new default codec (Lucene42Codec) with a more efficient docvalues format (sorted bytes in FST, less addressing overhead, improved numeric compression) and smaller term vectors (LZ4-compressed terms dictionaries and payloads, delta-encoded positions and offsets using blocks of packed integers).

  • Doc values external and codec API and implementations have been simplified: the codec is no longer responsible for buffering doc values; the numerous types have been consolidated down to only three (NUMERIC, BINARY, SORTED); PerFieldDocValuesFormat lets you set a different format for each field, and the doc values and FieldCache APIs were unified.

  • Significant refactoring and performance enhancements to the facet module, resulting in overall ~3.8X speedup in one case (single Date field faceting).

  • DrillDownQuery in the facet module now supports multi-select.

  • A new DrillSideways class enables counting facet labels and counts for both hits and near-misses in a single query. See http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html

  • An additional docvalues type (SORTED_SET) was added that supports multiple values.

  • FSTs are a bit smaller, and the FST package supports FSTs over 2GB in size.

  • A new LiveFieldValues class lets you get live or real-time values for any indexed doc / field. See http://blog.mikemccandless.com/2013/01/getting-real-time-field-values-in-lucene.html

  • Added a new classification module.

  • Various bugfixes and optimizations since the 4.1 release.

Highlights of the Solr release include:

  • A read side REST API for the schema. Always wanted to introspect the schema over http? Now you can. Looks like the write side will be coming next.

  • DocValues have been integrated into Solr. DocValues can be loaded up a lot faster than the field cache and can also use different compression algorithms as well as in RAM or on Disk representations. Faceting, sorting, and function queries all get to benefit. How about the OS handling faceting and sorting caches off heap? No more tuning 60 gigabyte heaps? How about a snappy new per segment DocValues faceting method? Improved numeric faceting? Sweet.

  • Collection Aliasing. Got time based data? Want to re-index in a temporary collection and then swap it into production? Done. Stay tuned for Shard Aliasing.

  • Collection API responses. The collections API was still very new in 4.0, and while it improved a fair bit in 4.1, responses were certainly needed, but missed the cut off. Initially, we made the decision to make the Collection API super fault tolerant, which made responses tougher to do. No one wants to hunt through logs files to see how things turned out. Done in 4.2.

  • Interact with any collection on any node. Until 4.2, you could only interact with a node in your cluster if it hosted at least one replica of the collection you wanted to query/update. No longer - query any node, whether it has a piece of your intended collection or not and get a proxied response.

  • Allow custom shard names so that new host addresses can take over for retired shards. Working on Amazon without elastic ips? This is for you.

  • Lucene 4.2 optimizations such as compressed term vectors.

Solr 4.2 also includes many other new features as well as numerous optimizations and bugfixes.

22 January 2013 - Apache Lucene 4.1 and Apache SolrTM 4.1 available

The Lucene PMC is pleased to announce the availability of Apache Lucene 4.1 and Apache Solr 4.1.

Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-latest-redir.html and Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the Lucene CHANGES.txt and Solr CHANGES.txt files included with the release for a full list of details.

Highlights of the Lucene release include:

  • Lucene 4.1 has a new default codec (Lucene41Codec) based on the previously-experimental "Block" indexing format for improved performance, but also incorporating the functionality of "Appending" and "Pulsing".

  • The default codec incorporates the optimization of Pulsing: terms that appear in only one document (such as primary key/id fields) just store the document id in the term dictionary instead of a pointer to this document id in a separate file.

  • The default codec incorporates an efficient compressed stored fields implementation that compresses chunks of documents together with LZ4. (see http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)

  • Lucene no longer seeks when writing files (all fields are written in an append-only way). This means it works by default with append-only streams, hdfs, etc.

  • New suggest implementations: AnalyzingSuggester, where the underlying form (computed from a lucene Analyzer) used for suggestions is separate from the returned text (see http://blog.mikemccandless.com/2012/09/lucenes-new-analyzing-suggester.html), and FuzzySuggester, which additionally allows for inexact matching on the input.

  • Near-realtime support was added to the facet module. (see http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.html)

  • New Highlighter (postingshighlighter) added to the highlighter module. (see http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html)

  • Added FilterStrategy to FilteredQuery for more flexibility in filtered query execution.

  • Added CommonTermsQuery to speed up queries with very highly frequent terms. Term frequencies are efficiently detected at query time - no index time preparation required.

  • Several bugfixes and optimizations since the 4.0 release.

Highlights of the Solr release include:

SolrCloud enhancements (see http://wiki.apache.org/solr/SolrCloud):

  • Simple multi-tenancy through enhanced document routing:
    • The "compositeId" router is the default for collections with hash based routing (i.e. when numShards=N is specified on collection creation).
    • Documents with ids sharing the same domain/prefix, e.g. 'customerB!', will be routed to the same shard, allowing for efficient querying. At query time, one can specify a "shard.keys" parameter that lists the domains, e.g. 'shard.keys=customerB!', and controls what shards the query is routed to.
    • Collections that do not specify numShards at collection creation time use custom sharding and default to the "implicit" router. Document updates received by a shard will be indexed to that shard, unless a "shard" parameter or document field names a different shard.
  • Short circuiting for distributed search if a request only needs to query a single shard.
  • Allow creating more than one shard per instance with the Collection API.
  • Allow access to the collections API through CloudSolrServer without referencing an existing collection.
  • Collection API: Support for specifying a list of Solr addresses to spread a new collection across.
  • New and improved auto host detection strategy.
  • Numerous bug fixes and general hardening - it's recommended that all Solr 4.0 SolrCloud users upgrade to 4.1.

New features:

  • The majority of Solr's features, including replication, now work with custom Directory and DirectoryFactory implementations.
  • Indexed term offsets, specifiable via a 'storeOffsetsWithPositions' flag on field definitions in the schema. Useful for highlighters.
  • Solr QParsers may now be directly invoked in the lucene query syntax via localParams and without the query magic field hack. Example: foo AND {!term f=myfield v=$qq}
  • Solr now parses request parameters (from URL or sent with POST using content-type application/x-www-form-urlencoded) in its dispatcher code. It no longer relies on special configuration settings in Tomcat or other web containers to enable UTF-8 encoding, which is mandatory for correct Solr behaviour. Solr now works out of the box with e.g. Tomcat, JBoss,...
  • Directory IO rate limiting based on the IO context.
  • Distributed search support for MoreLikeThis.
  • Multi-core: On-demand core loading and LRU-based core unloading after reaching a user-specified maximum number.
  • The new Solr 4 spatial fields now work with the {!geofilt} and {!bbox} query parsers. The score local-param works too.
  • Extra statistics to RequestHandlers - 5 & 15-minute reqs/sec rolling averages; median, 75th, 95th, 99th, 99.9th percentile request times.
  • PostingsHighlighter support (see http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html)

Admin UI improvements:

  • Internet Explorer is now supported
  • Enhanced readability of XML query response display in Query UI
  • Many improvements to DataImportHandler UI
  • Core creation and deletion now updates the main/left list of cores
  • Admin Cores UI now redirects to newly created core details
  • Deleted documents are calculated/displayed
  • Allow multiple Items to stay open on Plugins-Page

Storage improvements (thanks to the new Lucene 4.1 codec):

DataImportHandler contrib module backwards-compatibility breaks:

  • These default to the "root" Locale, rather than the JVM default locale as before.
    • NumberFormatTransformer & DateFormatTransformer
    • "formatDate" evaluator
    • "dataimport.properties" file "last_index_time" property
  • These default to UTF-8 encoding, rather than the JVM default encoding as before.
    • FileDataSource & FieldReaderDataSource
  • These may require code changes to custom plug-ins
    • The EvaluatorBag class was eliminated and its public/protected methods were moved to the Evaluator abstract class.
    • The experimental DIHPropertiesWriter interface was renamed DIHProperties, changed to an abstract class and given new signature.

Solr 4.1 also includes numerous optimizations and bugfixes.

25 December 2012 - Apache Lucene 3.6.2 and Apache Solr 3.6.2 available

The Lucene PMC is pleased to announce the availability of Apache Lucene 3.6.2 and Apache Solr 3.6.2.

This release is a bug fix release for version 3.6.1. It contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below.

Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-3x-redir.html and Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-3x-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 3.6.2 Release Highlights:

  • Fixed ArrayIndexOutOfBoundsException when the in-memory terms index requires more than 2.1GB of RAM (billions of terms).

  • Fixed a bug in contrib/queryparser's parsing of boolean queries.

  • Fixed BooleanScorer2 to return the correct freq() when using the scorer visitor API.

  • Fixed IndexWriter RAM accounting bug that would cause it to flush too early when using many different field names.

  • Several other minor bugfixes: scoring bugs when using a custom coord(), a rare IndexWriter thread-safety issue, and fixes to the faceting and highlighting modules.

Solr 3.6.2 Release Highlights:

  • Fixed ConcurrentModificationException during highlighting, if all fields were requested.

  • Fixed edismax queryparser to apply minShouldMatch to implicit boolean queries.

  • Several bugfixes to the DataImportHandler.

  • Bug fixes from Apache Lucene 3.6.2.

12 October 2012 - Lucene Core 4.0 and Solr 4.0 Available

The Lucene PMC is pleased to announce the availability of Apache Lucene 4.0 and Apache Solr 4.0.

Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-latest-redir.html and Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Noteworthy changes since Lucene 4.0-BETA:

  • A new "Block" PostingsFormat offering improved search performance and index compression. This will likely become the default format in a future release.
  • All non-default codec implementations were moved to a separated codecs module. Just add lucene-codecs-4.0.0.jar to your classpath to test these out.
  • Payloads can be optionally stored on the term vectors.
  • Many bugfixes and optimizations.

Noteworthy changes since Solr 4.0-BETA:

  • New spatial field types with polygon support.
  • Various Admin UI improvements.
  • SolrCloud related performance optimizations in writing the the transaction log, PeerSync recovery, Leader election, and ClusterState caching.
  • Numerous bug fixes and optimizations.

14 August 2012 - Lucene Core 4.0-BETA and Solr 4.0-BETA Available

The Lucene PMC is pleased to announce the availability of Apache Lucene 4.0-BETA and Apache Solr 4.0-BETA

Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-latest-redir.html and Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Highlights of the Lucene release include:

  • IndexWriter.tryDeleteDocument can sometimes delete by document ID, for higher performance in some applications.

  • New experimental postings formats: BloomFilteringPostingsFormat uses a bloom filter to sometimes avoid disk seeks when looking up terms, DirectPostingsFormat holds all postings as simple byte[] and int[] for very fast performance at the cost of very high RAM consumption.

  • CJK analysis improvements: JapaneseIterationMarkCharFilter normalizes Japanese iteration marks, added unigram+bigram support to CJKBigramFilter.

  • Improvements to Scorer navigation API ( Scorer.getChildren) to support all queries, useful for determining which portions of the query matched.

  • Analysis improvements: factories for creating Tokenizer, TokenFilter, and CharFilter have been moved from Solr to Lucene's analysis module, less memory overhead for StandardTokenizer and Snowball filters.

  • Improved highlighting for multi-valued fields.

  • Various other API changes, optimizations and bug fixes.

Highlights of the Solr release include:

  • Added a Collection management API for Solr Cloud.

  • Solr Admin UI now clearly displays failures related to initializing SolrCores

  • Updatable documents can create a document if it doesn't already exist, or you can force that the document must already exist.

  • Full delete-by-query support for Solr Cloud.

  • Default to NRTCachingDirectory for improved near-realtime performance.

  • Improved Solrj client performance with Solr Cloud: updates are only sent to leaders by default.

  • Various other API changes, optimizations and bug fixes.

22 July 2012 - Apache Lucene 3.6.1 and Apache Solr 3.6.1 available

The Lucene PMC is pleased to announce the availability of Apache Lucene 3.6.1 and Apache Solr 3.6.1.

This release is a bug fix release for version 3.6.0. It contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below.

Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-3x-redir.html and Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-3x-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 3.6.1 Release Highlights:

  • The concurrency of MMapIndexInput.clone() was improved, which caused a performance regression in comparison to Lucene 3.5.0.

  • MappingCharFilter was fixed to return correct final token positions.

  • QueryParser now supports +/- operators with any amount of whitespace.

  • DisjunctionMaxScorer now implements visitSubScorers().

  • Changed the visibility of Scorer#visitSubScorers() to public, otherwise it's impossible to implement Scorers outside the Lucene package. This is a small backwards break, affecting a few users who implemented custom Scorers.

  • Various analyzer bugs where fixed: Kuromoji to not produce invalid token graph due to UNK with punctuation being decompounded, invalid position length in SynonymFilter, loading of Hunspell dictionaries that use aliasing, be consistent with closing streams when loading Hunspell affix files.

  • Various bugs in FST components were fixed: Offline sorter minimum buffer size, integer overflow in sorter, FSTCompletionLookup missed to close its sorter.

  • Fixed a synchronization bug in handling taxonomies in facet module.

  • Various minor bugs were fixed: BytesRef/CharsRef copy methods with nonzero offsets and subSequence off-by-one, TieredMergePolicy returned wrong-scaled floor segment setting.

Solr 3.6.1 Release Highlights:

  • The concurrency of MMapDirectory was improved, which caused a performance regression in comparison to Solr 3.5.0. This affected users with 64bit platforms (Linux, Solaris, Windows) or those explicitely using MMapDirectoryFactory.

  • ReplicationHandler "maxNumberOfBackups" was fixed to work if backups are triggered on commit.

  • Charset problems were fixed with HttpSolrServer, caused by an upgrade to a new Commons HttpClient version in 3.6.0.

  • Grouping was fixed to return correct count when not all shards are queried in the second pass. Solr no longer throws Exception when using result grouping with main=true and using wt=javabin.

  • Config file replication was made less error prone.

  • Data Import Handler threading fixes.

  • Various minor bugs were fixed.

3 July 2012 - Lucene Core 4.0-ALPHA and Solr 4.0-ALPHA Available

The Lucene PMC is pleased to announce the availability of Apache Lucene 4.0-ALPHA and Apache Solr 4.0-ALPHA

Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-latest-redir.html and Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Highlights of the Lucene release include:

  • The index formats for terms, postings lists, stored fields, term vectors, etc are pluggable via the Codec api. You can select from the provided implementations or customize the index format with your own Codec to meet your needs.

  • Similarity has been decoupled from the vector space model (TF/IDF). Additional models such as BM25, Divergence from Randomness, Language Models, and Information-based models are provided (see http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene-4).

  • Added support for per-document values (DocValues). DocValues can be used for custom scoring factors (accessible via Similarity), for pre-sorted Sort values, and more.

  • When indexing via multiple threads, each IndexWriter thread now flushes its own segment to disk concurrently, resulting in substantial performance improvements (see http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html).

  • Per-document normalization factors ("norms") are no longer limited to a single byte. Similarity implementations can use any DocValues type to store norms.

  • Added index statistics such as the number of tokens for a term or field, number of postings for a field, and number of documents with a posting for a field: these support additional scoring models (see http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-40.html).

  • Implemented a new default term dictionary/index (BlockTree) that indexes shared prefixes instead of every n'th term. This is not only more time- and space- efficient, but can also sometimes avoid going to disk at all for terms that do not exist. Alternative term dictionary implementions are provided and pluggable via the Codec api.

  • Indexed terms are no longer UTF-16 char sequences, instead terms can be any binary value encoded as byte arrays. By default, text terms are now encoded as UTF-8 bytes. Sort order of terms is now defined by their binary value, which is identical to UTF-8 sort order.

  • Substantially faster performance when using a Filter during searching.

  • File-system based directories can rate-limit the IO (MB/sec) of merge threads, to reduce IO contention between merging and searching threads.

  • Added a number of alternative Codecs and components for different use-cases: "Appending" works with append-only filesystems (such as Hadoop DFS), "Memory" writes the entire terms+postings as an FST read into RAM (see http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-faster-with.html), "Pulsing" inlines the postings for low-frequency terms into the term dictionary (see http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-primary-key.html), "SimpleText" writes all files in plain-text for easy debugging/transparency (see http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html), among others.

  • Term offsets can be optionally encoded into the postings lists and can be retrieved per-position.

  • A new AutomatonQuery returns all documents containing any term matching a provided finite-state automaton (see http://www.slideshare.net/otisg/finite-state-queries-in-lucene).

  • FuzzyQuery is 100-200 times faster than in past releases (see http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html).

  • A new spell checker, DirectSpellChecker, finds possible corrections directly against the main search index without requiring a separate index.

  • Various in-memory data structures such as the term dictionary and FieldCache are represented more efficiently with less object overhead (see http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html).

  • All search logic is now required to work per segment, IndexReader was therefore refactored to differentiate between atomic and composite readers (see http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html).

  • Lucene 4.0 provides a modular API, consolidating components such as Analyzers and Queries that were previously scattered across Lucene core, contrib, and Solr. These modules also include additional functionality such as UIMA analyzer integration and a completely reworked spatial search implementation.

Highlights of the Solr release include:

The largest set of features goes by the development code-name “Solr Cloud” and involves bringing easy scalability to Solr. See http://wiki.apache.org/solr/SolrCloud for more details.

  • Distributed indexing designed from the ground up for near real-time (NRT) and NoSQL features such as realtime-get, optimistic locking, and durable updates.

  • High availability with no single points of failure.

  • Apache Zookeeper integration for distributed coordination and cluster metadata and configuration storage.

  • Immunity to split-brain issues due to Zookeeper's Paxos distributed consensus protocols.

  • Updates sent to any node in the cluster and are automatically forwarded to the correct shard and replicated to multiple nodes for redundancy.

  • Queries sent to any node automatically perform a full distributed search across the cluster with load balancing and fail-over.

Solr 4.0-alpha includes more NoSQL features for those using Solr as a primary data store:

  • Update durability – A transaction log ensures that even uncommitted documents are never lost.

  • Real-time Get – The ability to quickly retrieve the latest version of a document, without the need to commit or open a new searcher

  • Versioning and Optimistic Locking – combined with real-time get, this allows read-update-write functionality that ensures no conflicting changes were made concurrently by other clients.

  • Atomic updates - the ability to add, remove, change, and increment fields of an existing document without having to send in the complete document again.

There are many other features coming in Solr 4, such as

  • Pivot Faceting – Multi-level or hierarchical faceting where the top constraints for one field are found for each top constraint of a different field.

  • Pseudo-fields – The ability to alias fields, or to add metadata along with returned documents, such as function query values and results of spatial distance calculations.

  • A spell checker implementation that can work directly from the main index instead of creating a sidecar index.

  • Pseudo-Join functionality – The ability to select a set of documents based on their relationship to a second set of documents.

  • Function query enhancements including conditional function queries and relevancy functions.

  • New update processors to facilitate modifying documents prior to indexing.

  • A brand new web admin interface, including support for SolrCloud.

The Apache Software Foundation

The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are defined by collaborative consensus based processes, an open, pragmatic software license and a desire to create high quality software that leads the way in its field. Apache Lucene, Apache Solr, Apache PyLucene, Apache Open Relevance Project and their respective logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.