News
-- October 2008
- Flavio starts his PhD? Thesis co-supervised by Universidade do Minho and TELECOM & Management SudParis? (France).
-- February 2008
- Helder Pereira integrated the project working in the SIME application (XML markup of SIME resources).
-- February 2008
- A group of 4 INT students started the project about "query by example from XML documents" using Web interface and cgi technology.
-- November 2007
- Marco Freire integrated the project assuming the master project subject "interactive elements selection interface".
-- August 2007
- Flavio Xavier Ferreira integrated the project participating in ICEIS 08 paper. The SIME project is now our case study.
-- May 2007
- Artigo da Webist07 foi apresentado por Alda Lopes Gancarski em Barcelona.
- Artigo da IADIS AC07 foi apresentado por Pedro Henriques a 18/2/07 em Salamanca.
-- PedroRangelHenriques - 02 Jan 2007
- Foi realizado um trabalho para a visualizacao dos resultados da recuperacao de informacao com um aluno da Universidade Paris 6.
-- PedroRangelHenriques - Junho 2006
- Arranque da pagina (registo da info base e listagem dos trab's anteriores)
-- PedroRangelHenriques - 06 Mar 2006
|
Interactive information access in XML documents associated with ontologies
Description
Traditional IR consists of retrieving from a collection the relevant
documents to a query, while returning as few as
possible of non relevant documents. Moreover, the resulting documents
should be ranked by their relevance to the query.
A query is a natural language expression describing the desired
subject. To take advantage from the structural
information of XML documents, query formats for structured documents
retrieval were enriched to access certain
parts of documents. So, the user can access those parts based on
content and structural restrictions. Examples of such
queries are those defined by XPath language and XQuery,
the proposition by the
W3C? to become the standard XML
query language. To include similarity search operations of traditional
IR in XPath, some works developed relevance
computation methods, like the ones presented in the INEX workshop.
XQuery and XPath are being extended with the possibility of
associating a score (or relevance measure) to an expression that
verifies if some phrase exists in the content of
some element or attribute. This functionality is included in a
language that complements XPath and XQuery, the
Full-Text language proposed by the
W3C? .
However, structured queries construction is not always an easy process
because, among other reasons, the user may not
have a deep knowledge of the query language, or may not know a priori
exactly what to search. Moreover, after
specifying a query, the user may get a final result that it is not
what was expected. To solve this problem,
IXDIRQL was defined as an extension to XPath, not only with
textual similarity operations, but also with an
interactive/iterative paradigm for building queries. With this
paradigm, each operation specified by the user leads to
an intermediate result which the user can access. This helps the user
choosing the next operation, changing an
operation already introduced in the query, or selecting, using
selection operations, the interesting subsets of
intermediate results, until reaching the adequate query and thus the
desired result. If intermediate results are large, the user is able
select a number of interesting elements that is sufficient to satisfy
him. This avoids continuing the query with a large number of
unnecessary elements to process and further results are easier to analyse.
A prototype to process IXDIRQL
queries was created and used by real users allowing to verify, not
only its correct behavior, but also the correct
understanding and use of selection operations with respect to some
pre-defined information needs.
Our project aims at extending the interactive/iteractive paradigm of query construction to XQuery.
For that, XQuery and Full-Text are augmented with selection operations. Once the query language is defined, we intend to
build an adequate processing system.
The editing environment for the extended XQuery (XQuery++) must allow the user
to access intermediate results of query
operations. Besides, it should be associated with an incremental
processing of query operations. This means
that, each time a new operation is inserted or an existing one is
changed, the system does not calculate all the
query operations. Instead, it first calculates the intermediate
results of the new or changed operation; then,
it recalculates the intermediate results that are dependent
on the previous ones and the final result of the query.
Once the XQuery++ processing system built, we want to enrich it in order to take advantage from ontological information associated to documents.
In fact, more and more documents are associated with ontologies acting as metadata that describes them.
Thus, the answer to an information need should be based, not only in documents, but also in the associated ontologies.
partners
Research Team
Publications
- Information access from XML using semantics and context: application to the Portuguese Emigration Museum, Flavio Xavier Ferreira, Alda Lopes Gançarski, Pedro Rangel Henriques, National Portuguese Conference on Information Systems, Conferencia Nacional de Sistemans de Informacao 2008 (CAPSI08), Setubal, Portugal, October 2008.
- Iterative XML search based on data and associated semantics, Alda Lopes Gançarski, Flavio Xavier Ferreira, Pedro Rangel Henriques, International Conference on Enterprise Information Systems (ICEIS'08), Poster Session, Barcelona, Spain, June 2008.
- Analizing the structure of scientific articles to improve information retrieval (IADIS AC07)
- Using data together with metadata to improve XML information access (WEBIST 2007)
- A formal definition of selection operations that extend XQuery with interactive query construction (WEBIST 2006)
- AG-based interactive system to retrieve information from XML documents (IEE Software Journal April 2006)
- A processing environment for the IXDIRQL XML query language (IADIS MCCSIS 2005)
- Extending XQuery with selection operations to allow for interactive construction of queries (Elpub 2005)
- Presenting the Results of Relevance-Oriented Search over XML Documents (ACM DocEng 2004)
- Interactive Information Retrieval from Structured Documents Represented by Attribute Grammars (ACM DocEng 2003)
- IXDIRQL: an Interactive XML Data and Information Retrieval Query Language (Elpub03)
Undergoing Work
Case study
Our case study is the SIME project ("Sistema de Informação do Museu da Emigração"). More information about the SIME project will soon be available.
Student projects proposal (in portuguese)
- 1. Identificação do Projecto (Pepl01) Ilustrador de trajectórias em Web 2.0
2. Contexto: Sistema de informação para o Museu da Emigração (SIME)
3. Descrição do Projecto
Actualmente existe na web muita informação sobre determinados temas,
mas esta encontra-se espalhada por vários locais.
Por vezes existe a necessidade de criar uma aplicação web
que utilize recursos existentes em diferentes locais, como é o caso do
Museu da Emigração, no âmbito do qual há a necessidade de ilustrar o percurso de um
indivíduo, que deixa a sua terra para ir trabalhar num pais
distante.
Concretamente sabendo a localidade de partida, chegada e
localidades intermédias onde passou, pretende-se criar uma aplicação
web (uma página) que a partir dessa lista de locais, recrie a
trajectória seguida pelo emigrante, mostrando para cada localidade,
informação (histórica, cultural) recolhida a partir da Wikipédia, e
imagens recolhidas a partir de um repositório de imagens existente
online (ex. Flickr).
Esta aplicação terá que utilizar tecnologias web 2.0 na criação das paginas
(ex. mapa de percurso, galeria de imagens) e procurar utilizar serviços
já existentes para a recolha da informação (ex. API Flickr).
4. Proponente / Orientador
Pedro Rangel Henriques + Flávio Ferreira (+ Alda Lopes)
5. Áreas de trabalho
Information Retrieval + Web 2.0 + AJAX
- 1. Identificação do Projecto (Pepl02) Desenvolvimento de um sistema (processador e interface Web) de interrogação XQuery+SPARQL sobre XML e metadados.
2. Contexto: Sistema de recuperação de informação para o Museu da Emigração (ME) no âmbito do projecto SIME.
3. Descrição do Projecto
O ME é um museu virtual cujos espólio é constituído por uma vasta colecção
de documentos XML de tipos diferentes.
As visitas a essa colecção são suportadas por ontologias que lhes confere
um significado (semântica) coeso.
Para enriquecer os serviços prestados pelo museu, pretende-se disponibilizar
agora uma função de pesquisa semântica que permita aceder à informação desse
espólio documental de acordo com os interesses do utilizador.
Para isso pensou-se em combinar as facilidades de pesquisa sobre documentos XML com as da pesquisa semântica.
XQuery é uma linguagem de interrogação que permite recuperar informação
a partir de ficheiros XML (da mesma forma que SQL o permite sobre uma BD).
RDF é uma linguagem usada para associar descrições semânticas a fontes de
informação (por exemplo documentos XML).
A semelhança de XQuery, SPARQL é uma linguagem de interrogação sobre ficheiros RDF.
Pretende-se, neste projecto, desenvolver um processador da linguagem
composta XQuery+SPARQL que permite o acesso à informação usando as fontes (XML)
e os metadados sobre elas (RDF).
Pretende-se também desenvolver uma interface Web onde o utilizador
especifica as perguntas em XQuery+SPARQL e pode visualizar os resultados.
As linguagens e o material tecnológico está em aberto, ficando definida
aquando da reunião de lançamento do projecto.
4. Proponente / Orientador
Pedro Rangel Henriques + Alda Lopes + Flávio Ferreira
5. Áreas de trabalho
XML / XQuery + RDF / SPARQL + Web-Engineering
Visits
- March 2008 at UM. Planning: 1. Flavio's Thesis proposition. 2. ICEIS08 article camera ready version & poster. 3. CAPSI08 article. 4. PAI/Pessoa project proposition. (Travel Repport, in french)
- Dec 2007 at UM. Subject: 1. Paper ICEIS08. 2. Master project of Marco Freire. 3. Flavio's thesis subject proposition (start).
- April 2007 at UM. Subject: Analysis of undergoing work: articles, students, ...
- August 2006 at UM. Subject: 1. FCT project: we did not submit it but a detailed description is written and will be used later. 2. Definition of next articles to be written about undergoing work.
- April 2006 at UM. Subject: to define a proposition for a FCT funded project.
- 12 Feb - 11 March 2006, LIP6, Paris. (Travel Report)