...collaborate on
News

-- October 2008

  • Flavio starts his PhD? Thesis co-supervised by Universidade do Minho and TELECOM & Management SudParis? (France).

-- February 2008

  • Helder Pereira integrated the project working in the SIME application (XML markup of SIME resources).

-- February 2008

  • A group of 4 INT students started the project about "query by example from XML documents" using Web interface and cgi technology.

-- November 2007

  • Marco Freire integrated the project assuming the master project subject "interactive elements selection interface".

-- August 2007

  • Flavio Xavier Ferreira integrated the project participating in ICEIS 08 paper. The SIME project is now our case study.

-- May 2007

  • Artigo da Webist07 foi apresentado por Alda Lopes Gancarski em Barcelona.
  • Artigo da IADIS AC07 foi apresentado por Pedro Henriques a 18/2/07 em Salamanca.

-- PedroRangelHenriques - 02 Jan 2007

  • Foi realizado um trabalho para a visualizacao dos resultados da recuperacao de informacao com um aluno da Universidade Paris 6.

-- PedroRangelHenriques - Junho 2006

  • Arranque da pagina (registo da info base e listagem dos trab's anteriores)

-- PedroRangelHenriques - 06 Mar 2006


Interactive information access in XML documents associated with ontologies

Description

Traditional IR consists of retrieving from a collection the relevant documents to a query, while returning as few as possible of non relevant documents. Moreover, the resulting documents should be ranked by their relevance to the query. A query is a natural language expression describing the desired subject. To take advantage from the structural information of XML documents, query formats for structured documents retrieval were enriched to access certain parts of documents. So, the user can access those parts based on content and structural restrictions. Examples of such queries are those defined by XPath language and XQuery, the proposition by the W3C? to become the standard XML query language. To include similarity search operations of traditional IR in XPath, some works developed relevance computation methods, like the ones presented in the INEX workshop. XQuery and XPath are being extended with the possibility of associating a score (or relevance measure) to an expression that verifies if some phrase exists in the content of some element or attribute. This functionality is included in a language that complements XPath and XQuery, the Full-Text language proposed by the W3C? . However, structured queries construction is not always an easy process because, among other reasons, the user may not have a deep knowledge of the query language, or may not know a priori exactly what to search. Moreover, after specifying a query, the user may get a final result that it is not what was expected. To solve this problem, IXDIRQL was defined as an extension to XPath, not only with textual similarity operations, but also with an interactive/iterative paradigm for building queries. With this paradigm, each operation specified by the user leads to an intermediate result which the user can access. This helps the user choosing the next operation, changing an operation already introduced in the query, or selecting, using selection operations, the interesting subsets of intermediate results, until reaching the adequate query and thus the desired result. If intermediate results are large, the user is able select a number of interesting elements that is sufficient to satisfy him. This avoids continuing the query with a large number of unnecessary elements to process and further results are easier to analyse. A prototype to process IXDIRQL queries was created and used by real users allowing to verify, not only its correct behavior, but also the correct understanding and use of selection operations with respect to some pre-defined information needs.

Our project aims at extending the interactive/iteractive paradigm of query construction to XQuery. For that, XQuery and Full-Text are augmented with selection operations. Once the query language is defined, we intend to build an adequate processing system. The editing environment for the extended XQuery (XQuery++) must allow the user to access intermediate results of query operations. Besides, it should be associated with an incremental processing of query operations. This means that, each time a new operation is inserted or an existing one is changed, the system does not calculate all the query operations. Instead, it first calculates the intermediate results of the new or changed operation; then, it recalculates the intermediate results that are dependent on the previous ones and the final result of the query. Once the XQuery++ processing system built, we want to enrich it in order to take advantage from ontological information associated to documents. In fact, more and more documents are associated with ontologies acting as metadata that describes them. Thus, the answer to an information need should be based, not only in documents, but also in the associated ontologies.

partners

Research Team

Publications

Undergoing Work

Case study

Our case study is the SIME project ("Sistema de Informação do Museu da Emigração"). More information about the SIME project will soon be available.

Student projects proposal (in portuguese)

  • 1. Identificação do Projecto (Pepl01) Ilustrador de trajectórias em Web 2.0
2. Contexto: Sistema de informação para o Museu da Emigração (SIME) 3. Descrição do Projecto Actualmente existe na web muita informação sobre determinados temas, mas esta encontra-se espalhada por vários locais. Por vezes existe a necessidade de criar uma aplicação web que utilize recursos existentes em diferentes locais, como é o caso do Museu da Emigração, no âmbito do qual há a necessidade de ilustrar o percurso de um indivíduo, que deixa a sua terra para ir trabalhar num pais distante.

Concretamente sabendo a localidade de partida, chegada e localidades intermédias onde passou, pretende-se criar uma aplicação web (uma página) que a partir dessa lista de locais, recrie a trajectória seguida pelo emigrante, mostrando para cada localidade, informação (histórica, cultural) recolhida a partir da Wikipédia, e imagens recolhidas a partir de um repositório de imagens existente online (ex. Flickr). Esta aplicação terá que utilizar tecnologias web 2.0 na criação das paginas (ex. mapa de percurso, galeria de imagens) e procurar utilizar serviços já existentes para a recolha da informação (ex. API Flickr).

4. Proponente / Orientador Pedro Rangel Henriques + Flávio Ferreira (+ Alda Lopes) 5. Áreas de trabalho Information Retrieval + Web 2.0 + AJAX

  • 1. Identificação do Projecto (Pepl02) Desenvolvimento de um sistema (processador e interface Web) de interrogação XQuery+SPARQL sobre XML e metadados.
2. Contexto: Sistema de recuperação de informação para o Museu da Emigração (ME) no âmbito do projecto SIME. 3. Descrição do Projecto O ME é um museu virtual cujos espólio é constituído por uma vasta colecção de documentos XML de tipos diferentes. As visitas a essa colecção são suportadas por ontologias que lhes confere um significado (semântica) coeso. Para enriquecer os serviços prestados pelo museu, pretende-se disponibilizar agora uma função de pesquisa semântica que permita aceder à informação desse espólio documental de acordo com os interesses do utilizador. Para isso pensou-se em combinar as facilidades de pesquisa sobre documentos XML com as da pesquisa semântica. XQuery é uma linguagem de interrogação que permite recuperar informação a partir de ficheiros XML (da mesma forma que SQL o permite sobre uma BD). RDF é uma linguagem usada para associar descrições semânticas a fontes de informação (por exemplo documentos XML). A semelhança de XQuery, SPARQL é uma linguagem de interrogação sobre ficheiros RDF.

Pretende-se, neste projecto, desenvolver um processador da linguagem composta XQuery+SPARQL que permite o acesso à informação usando as fontes (XML) e os metadados sobre elas (RDF). Pretende-se também desenvolver uma interface Web onde o utilizador especifica as perguntas em XQuery+SPARQL e pode visualizar os resultados. As linguagens e o material tecnológico está em aberto, ficando definida aquando da reunião de lançamento do projecto.

4. Proponente / Orientador Pedro Rangel Henriques + Alda Lopes + Flávio Ferreira 5. Áreas de trabalho XML / XQuery + RDF / SPARQL + Web-Engineering

Visits

  • March 2008 at UM. Planning: 1. Flavio's Thesis proposition. 2. ICEIS08 article camera ready version & poster. 3. CAPSI08 article. 4. PAI/Pessoa project proposition. (Travel Repport, in french)
  • Dec 2007 at UM. Subject: 1. Paper ICEIS08. 2. Master project of Marco Freire. 3. Flavio's thesis subject proposition (start).
  • April 2007 at UM. Subject: Analysis of undergoing work: articles, students, ...
  • August 2006 at UM. Subject: 1. FCT project: we did not submit it but a detailed description is written and will be used later. 2. Definition of next articles to be written about undergoing work.
  • April 2006 at UM. Subject: to define a proposition for a FCT funded project.
  • 12 Feb - 11 March 2006, LIP6, Paris. (Travel Report)

Bibliography (bibtex style)

r70 - 03 Oct 2008 - 13:08:07 - PedroRangelHenriques
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
Syndicate this site RSSATOM