logo-small.jpg

Language Specification and Processing Group

Thoughs of the Day

Members

Group Description

Although all the technological advances related to multimedia, in a computer system most relevant digital data still circulates in textual format: computer programs, specifications, documents and other information resources have their content in this format. In order to be read and interpreted by a machine, texts should be valid sentences of a known language to be possible to associate them some meaning. The path towards this meaning takes us to syntax and then to semantics.

The "Language Processing" area is concerned with approaches, methods and tools for the "specification of languages" and the "systematic development of their processors".

Language syntax and semantics is formally defined by a Grammar; in this area we support our work on regular grammars (or regular expressions), context free grammars, translation or attribute grammars, affix grammars, stochastic-dependency or transition-network grammars, graph grammars, annotation schemas, etc.

Language Processors are systematic and automatically built using tools, called Generators, that read a grammar and produce the analysers/translators.

Although we keep the same approach, different kind of texts require specialized methods and tools. So the area is split into three main research directions:

  • 1) completely structured texts, that belong to "formal languages" (FLP);
  • 2) semi-structured texts, that belong to "annotation languages" (ALP);
  • 3) unstructured texts, that belong to "natural languages" (NLP).

Inside the "LP@di/cctc.uminho.pt" group:

1) Pedro Rangel Henriques, Maria João Varanda Pereira, Daniela da Cruz, Nuno Oliveira, and Mario Berón, work on the first direction (formal language processing).

At moment the research topics are:

  • textual and visual languages;
  • program analysis and transformation;
  • program visualization/animation;
  • program comprehension.

2) José Carlos Ramalho, Pedro Rangel Henriques, Alda Gançarski, Ricardo, Cristiana, Renato work on the second direction (semi-structured document processing).

At moment the research topics are:

  • document markup and XML standards;
  • archiving and digital libraries;
  • semantic web and virtual museums;
  • web-services and location aware services.

3) José João Dias de Almeida, Alberto Simões and Nuno Carvalho, work on the third direction (natural language processing).

At moment the research topics are:

  • Text classification and Automatic summarization;
  • Discourse analysis and Part-of-speech tagging;
  • Machine Translation;
  • Information extraction (IE) (including named entity recognition and relationship extraction);
  • Information retrieval (IR) and text mining.

Notice that "definition and support of Domain Specific Languages" and "ontology processing" are transversal research trends on which all the subgroups are working. In a similar way, it is clear that "ETL process" (extract-transform-load) is a target, an application area, common to all the subgroups.

The Master course on Language Engineering, UCE30-EL (Engenharia de Linguagens), offered by the "LP@di/ctcc.uminho.pt" group composed of the four modules below, directly reflects this status-quo and this organization: Grammar Engineering; Software Analysis and Transformation; Document Processing; Natural Language Processing.

Also the events and journals that the group members are co-founders and organizers corroborate the three research trends described:

  • SLaTE, International Symposium on Languages, Applications and Technology, that integrates the previous events:
    • XATA, XML -- Aplicações e Tecnologias Associadas;
    • CoRTA, Compilers, programming languages, Related Technologies and Applications;

  • Linguamática, Revista para o Processamento Automático das Línguas Ibéricas (ISSN: 1647-0818)

Some Int.Conference Organization and Program Committees membership can reinforce the characterization below:

  • ICPC'2010 -- IEEE conference organization at Braga
  • ICPC'2011 -- PC member (also SC member)
  • SCAM'2011 -- PC member

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Research Projects & Tools

FCT founded R&D projects:

  • PTDC/CLE-LLI/108948/2008 — Per-Fide, Portuguese in parallel with six other languages

  • PTDC/PSI-PCO/104679/2008 — Procura Palavras: a computerized tool for the evaluation of objective and subjective psycholinguistic indexes for the European Portuguese

  • PTDC/EIA-CCO/108995/2008 -- CROSS: An Infrastructure for Certification and Re-engineering of Open Source Software; supported by FCT; 2010-2012.

  • Quixote: Desenvolvimento de modelos do domínio do problema para inter-relacionar as vistas operacional e comportamental em sistemas de software; bilateral Argentina-Portugal joint-research project supported by FCT (Departamento das Relações Europeias, Bilaterais e Multilaterais); team leader; 2010-2011.

  • AsCoP: Assessing Comprehension of Domain Specific Programs; bilateral Slovenia-Portugal joint-research project supported by FCT (Departamento das Relações Europeias, Bilaterais e Multilaterais); team leader; 2010/2011.

  • PCVIA: Program Comprehension by Visual Inspection and Animation

  • DSLPC: Program Comprehension for Domain Specific Languages (bilateral Cooperation Project (Slovenia-Portugal))

  • Hermes: Aprendizagem e Povoamento de Ontologias a partir de Fontes Textuais; bilateral Brasil-Portugal joint-research project supported by Programa CAPES-FCT2009 (Departamento das Relações Europeias, Bilaterais e Multilaterais); 2010-2012.

  • [[http://wiki.di.uminho.pt/wiki/bin/view/OntXQuery/WebHome][OntXQuery: An Information Retrieval System, interactive and iteractive/incremental, based on XQuery++ to search XML documents over the Semantic-Web using ontologies

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some selected Publications:

Journal papers:

- Miguel Ferreira, Ana Alice Baptista, José Carlos Ramalho. An intelligent decision support system for digital preservation. International Journal on Digital Libraries; Springer Berlin / Heidelberg; issn: 1432-5012 (Print) 1432-1300 (Online); 05; 2007; URI no RepositoriUM? : http://hdl.handle.net/1822/6648;

- Giovani Rubert Librelotto, Renato Preigschadt de Azevedo, José Carlos Ramalho, and Pedro Rangel Henriques. Topic maps constraint languages: understanding and comparing. Int. Journal Reasoning-based Intelligent Systems, 1(3/4):173–181, Jun 2009.

- Daniela da Cruz and Pedro Rangel Henriques. Advances in Computer Science and IT, chapter Slicing techniques to derive the User Interface Abstract Model, pages 249–276. In-Teh, December 2009

- Nuno Oliveira, Maria João Varanda Pereira, Pedro Rangel Henriques, Daniela da Cruz, and Bastian Cramer. VisualLisa? : A visual environment to develop attribute grammars. ComSIS? – Computer Science an Information Systems Journal, Special issue on Advances in Languages, Related Technologies and Applications, 7(2):266 – 289. May 2010

- Daniela da Cruz, Mario Béron, Pedro Rangel Henriques, and Maria João Varanda Pereira. Code inspection approaches for program visualization. Acta Electrotechnica et Informatica, 9(3):32–42, Jul-Sep 2009. ISSN: 1335-8243

- Maria João Varanda Pereira, Marjan Mernik, Daniela da Cruz, and Pedro Rangel Henriques. Program comprehension for domain-specific languages (invited paper). ComSIS? – Computer Science an Information Systems Journal, Special Issue on Compilers, Related Technologies and Applications, 5(2):1–17, Dec 2008. ISSN: 1820-0214

- Daniela da Cruz, Pedro Rangel Henriques, and Maria João Varanda. Alma versus ddd. ComSIS? – Computer Science an Information Systems Journal, Special Issue on Compilers, Related Technologies and Applications, 5(2):119–136, Dec 2008. ISSN: 1820-0214

- Daniela da Cruz, Pedro Rangel Henriques, and Maria João Varanda. Constructing program animations using a pattern-based approach. ComSIS? – Computer Science an Information Systems Journal, Special Issue on Advances in Programming Languages, 4(2):97–114, Dec 2007. ISSN: 1820-0214

International Conference papers:

- José João Almeida, Alberto Simões. Automatic parallel corpora and bilingual terminology extraction from parallel websites. In Reinhard Rapp, Pierre Zweigenbaum, and Serge Sharoff, editors, 3rd Workshop on Building and Using Comparable Corpora, pages 50–55. Valletta, Malta, May 2010.

- Alberto Simões, José João Almeida. Bilingual terminology extraction based on translation patterns. Procesamiento del Lenguaje Natural, 41:281–288. Sept 2008.

- Christoph Becker, Michael Kraxner, Andreas Rauber, Miguel Ferreira, Ana Alice Baptista, José Carlos Ramalho. Distributed Preservation Services: Integrating Planning and Actions. ECDL2008 - European Conference on Research and Advanced Technology for Digital Libraries; Aarhus, Danmark; URI no RepositoriUM? : http://hdl.handle.net/1822/8239. 2008

- José Carlos Ramalho, Miguel Ferreira, Luís Faria. RODA and CRiB? a service-oriented digital repository. International Conference on Preservation of Digital Objects, iPRESS 2008, 5. London, 2008

- José Carlos Ramalho, Miguel Ferreira, Luís Faria, Rui Castro. Relational database preservation through XML modelling. Extreme Markup Languages 2007, Montréal - Canada; URI no RepositoriUM? : http://hdl.handle.net/1822/7120; August 2007.

- José Bernardo Barros, Daniela da Cruz, Pedro Rangel Henriques, and Jorge Sousa Pinto. Assertion-based slicing and slice graphs. In SEFM’10—8th IEEE International Conference on Software Engineering and Formal Methods, pages 93–102, Pisa, Italy, Sept 2010. IEEE Computer Society, Conference Publishing Services (CPS)

- Daniela da Cruz, Pedro Rangel Henriques, and Jorge Sousa Pinto. Contract-based slicing. In Tiziana Margaria and Bernhard Steffen, editors, Leveraging Applications of Formal Methods, Verification and Validation (ISoLA? ’2010–FLDVES track) LNCS- Lecture Notes in Computer Science, volume 6415, pages 106–120, Creta, Greece, Oct 2010. Springer

- Daniela da Cruz, Pedro Rangel Henriques, and Jorge Sousa Pinto. GamaSlicer? : an Online Laboratory for Program Verification and Analysis. In LDTA2010 — Language Descriptions, Tools and Applications, Paphos, Cyprus, March 2010

- Daniela da Cruz and Pedro Rangel Henriques. Exploring, visualizing and slicing the soul of XML documents. In Proceedings of 25th Symposium On Applied Computing - Document Engineering (SAC-DE), 2010

- Nuno Oliveira, Nuno Rodrigues, Pedro Rangel Henriques, and Luís Soares Barbosa. Pattern language for architectural analysis. In SBLP 2010 14th Brazilian Symposium in Programming Languages, volume 2, pages 167–180,Salvador, Brasil, Sep 2010. SBC: Brazilian Computer Society (ISSN: 2175-5922)

- Nuno Oliveira, Maria João Varanda Pereira, Daniela da Cruz, and Pedro Rangel Henriques. Visualization of domain-specific programs’ behavior. In Proceedings of VISSOFT’09 — 5th IEEE International Workshop on Visualizing Software forUnderstanding and Analysis, Edmonton, Canada, pages 37—40. IEEE Computer Society, Sept 2009

- Nuno Oliveira, Pedro Rangel Henriques, Daniela da Cruz, Maria João Varanda Pereira, Marjan Mernik, Tomaz Kosar, and Matej Crepinsek. Applying program comprehension techniques to Karel robot programs. In Proceedings of the International Multiconference on Computer Science and Information Technology – 2nd Workshop on Advances in Programming Languages (WAPL’2009), pages 697 — 704, Mragowo, Poland, October 2009. IEEE Computer Society Press

- Tomaz Kosar, Marjan Mernik, Matej Crepinsek, Pedro Rangel Henriques, Daniela da Cruz, Maria João Varanda Pereira, and Nuno Oliveira. Influence of domain-specific notation to program understanding. In Proceedings of the International Multiconference on Computer Science and Information Technology– 2nd Workshop on Advances in Programming Languages (WAPL’2009), pages 673-680, Mragowo, Poland, October 2009. IEEE Computer Society Press

- Daniela da Cruz, Pedro Henriques, and Jorge Sousa Pinto. Code analysis: Past and present. In Luis Barbosa, Antonio Cerone, and Siraj Shaikh (Guest Eds.), editors, Proceedings of the Third Int. Workshop on Foundations and Techniques for Open Source Software Certification (OpenCert? 2009), volume X (2009). Electronic Communications of the EASST, March 2009.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some basilar and important Concepts:

- Language Workbench: a term coined by Martin Fowler (Fowler, 2005), Language Workbenches are tools aimed to implement new languages as well as theirs IDEs. In addition to ease the development of languages, they also make language-oriented programming environments practical.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Working material

LogoLISS:

Calculadora:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Conferences and Journals in LP area:

  • Target Conferences:
    • (A) European Association of Computational Linguistics (EACL)
    • (A) International Conference on Computational Linguistics (COLING)
    • (A) ACM-SIGACT Symposium on Principles of Programming Languages (POPL)
    • (A) ACM-SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
    • (A) IEEE International Conference on Software Maintenance (ICSM)
    • (A) International Conference on Software Engineering (ICSE)
    • (A) Automated Software Engineering Conference (ASE)
    • (A) IEEE International Requirements Engineering Conference (RE)
    • (A) IEEE Symposium on Visual Languages and Human-Centric Computing (was VL)
    • (A) International Colloquium on Automata Languages and Programming (ICALP)
    • (A) European Symposium on Programming
    • (A) IEEE Information Visualization Conference
    • (A) International Symposium on Automated Technology for Verification and Analysis (ATVA)
    • (A) International Symposium on Software Testing and Analysis (ISSTA)

    • (B) Conference of the European Association for Machine Translation (EAMT)
    • (B) Language Descriptions, Tools and Applications (LDTA)
    • (B) International Conference on Software Engineering and Formal Methods (SEFM)
    • (B) ACM Symposium on Applied Computing (SAC)
    • (B) Practical Aspects of Declarative Languages
    • (B) European Conference on Digital Libraries (ECDL)
    • (B) ACM Conference on Digital Libraries (JCDL)
    • (B) International Workshop on Requirements Engineering: Foundation for Software Quality (REFSQ)
    • (B) Workshop on Logic, Language, Information and Computation
    • (B) Databases and Programming Language
    • (B) International Conference on Software Language Engineering (SLE)
    • (B) ACM SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE)
    • (B) IEEE International Workshop on Visualizing Software for Understanding and Analysis (VisSoft? )
    • (B) ACM Internacional Conference on Advances in Geographic Informacion Systems (SIGSPATIAL)

    • (C) Geographic Information Science
    • (C) AGILE (Association of Geographic Information Laboratories for Europe) International Conference on Geographic Information Science
    • (C) Free an Open Source Software for Geomatics conference (FOSS4G? )
    • (C) International Workshop on Program Comprehension (now ICPC)
    • (C) IEEE International Workshop on Source Code Analysis and Manipulation (SCAM)
    • (C) ACM SIGPLAN Workshop on Types in Language Design and Implementation (was TIC)
    • (C) International Conference on Language and Automata Theory and Applications
    • (C) International Symposium on Leveraging Applications of Formal Methods, Verification and Validation (ISoLA? )
    • (C) Domain - Specific Languages for Software Engineering
    • (C) Domain Specific Aspect Languages
    • (C) Forum on Specification, Verification and Design Languages
    • (C) Applications of Natural Language to Data Bases,
    • (C) Extreme Markup Languages
    • (C) International Conference on Advanced Language Processing and Web Information Technology (ALP&WIT)
    • (C) International Symposium on Static Analysis
    • (C) Workshop on Algorithmic Aspects of Advanced Programming Languages
    • (C) Knowledge Domain Visualisation (KDV)
    • (C) Knowledge Visualization and Visual Thinking
    • (C) Program Visualization Workshop
    • (C) Visual Languages and Formal Methods
    • (C) Visualization in Software Engineering (ViSE? )