TUD Logo

TUD Home » ... » Teaching » Summer term 2011 » Seminar on Natural Language Processing

Chair of Foundations of Programming

against racism

Seminar on Natural Language Processing in the summer term 2011

Painting of the ancient tower of Babel

Introduction

We will study the interplay between the field of automated translation of human languages, in particular syntax-based statistical machine translation, and the theory of tree automata, tree transducers and related models. Please click the links if you want to learn more.

Objectives

In brief: studying literature, giving a scientific talk, and writing a scientific essay. Every student is expected to give a talk (about 45 minutes) about his topic. Students who are not taking this course as a proseminar are expected to write a seminar essay (about 15 pages).

Prerequisites

It is advisable, though not compulsory, to have knowledge from one of our lectures about Machine Translation or Tree Automata. Basic formal language theory is also recommended.

Organization

During the first meeting topics are assigned to the students. Please attend this meeting if you want to participate. A second meeting, around one month later, is is intended to encourage your literature study. Further meetings will be scheduled individually between each student and his/her supervisor as needed. At the end of the lecture period, a day or two will be reserved for all the talks.

Important note: Students who fail to adhere to deadlines (i.e., who hand in too little or too late) will be excluded from the seminar. Any deviations from the deadlines have to be negotiated with the supervisor in advance.

Topics

No. Title Literature Supervisor Student
1 Types of Corpora and Their Availability (Proseminar) [1–3] Dietze
2 Binarization: Linguistic and Algorithmic Perspectives (Proseminar)
slides
[4,5] Büchse Langner
3 Approximative Parsing: Ideas and Evaluation (Proseminar)
slides
[21–23] Büchse Thamm
4 Incremental Dependency Parsing
slides
[6] Büchse Leyva
5 Dynamic Programming for Linear-time Incremental Dependency Parsing
slides
[7,8] Büchse Timany
6 Nonparametric Models: The Infinite PCFG [9] Büchse
7 Alignments
reportslides
[10–13] Dietze Wu
8 Synchronous Tree-Sequence-Substitution Grammars [14,15] Büchse Kashefi Pour
9 Parsing Problems based on Probabilistic Synchronous Tree-Insertion Grammars
slides
[16] Stüber Teichmann
10 Log-linear Models: Origin, Training and Applications
reportslides
[17–19] Büchse Ye
11 Variational Decoding
reportslides
[20] Büchse Theß

Literature

Some downloads only work from within the university network.

[1] http://en.wikipedia.org/wiki/Text_corpus
[2] http://en.wikipedia.org/wiki/Parallel_text_alignment
[3] http://en.wikipedia.org/wiki/Treebank
[4] Andreas Maletti, 2010. Why synchronous tree substitution grammars? In Proc. NAACL 2010. pdf
[5] Wei Wang, Kevin Knight, and Daniel Marcu, 2007. Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy. In Proc. EMNLP-CoNLL 2007. pdf
[6] Joakim Nivre, 2004. Incrementality in Deterministic Dependency Parsing. In Incremental Parsing: Bringing Engineering and Cognition Together. Workshop at ACL 2004. pdf
[7] Liang Huang and Kenji Sagae, 2010. Dynamic Programming for Linear-time Incremental Dependency Parsing. In Proc. ACL 2010. pdf
[8] slides for [7] from the talk at ACL 2010 pdf
[9] Percy Liang, Slav Petrov, Michael Jordan, and Dan Klein, 2007. The Infinite PCFG using Hierarchical Dirichlet Processes. In Proc. EMNLP 2007. pdf
[10] Stephan Vogel, Hermann Ney, and Christoph Tillmann, 1996. HMM-Based Word Alignment in Statistical Translation. In Proc. ACL 1996. pdf
[11] Franz Josef Och and Hermann Ney, 2003. A Systematic Comparison of Various Statistical Alignment Models. In Proc. ACL 2003. pdf
[12] Franz Josef Och and Hermann Ney, 2000. Improved Statistical Alignment Models. In Proc. ACL 2000. pdf
[13] Jason Riesa and Daniel Marcu, 2010. Hierarchical Search for Word Alignment. In Proc. ACL 2010. pdf
[14] David Chiang, 2010. Learning to translate with source and target syntax. In Proc. ACL 2010. pdf
[15] Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li, 2008. A Tree Sequence Alignment-based Tree-to-Tree Translation Model. In Proc. of ACL-HLT 2008. pdf
[16] Rebecca Nesson, Stuart M. Shieber, and Alexander Rush, 2006. Induction of probabilistic synchronous tree-insertion grammars for machine translation. In Proc. AMTA 2006. pdf
[17] Franz Josef Och and Hermann Ney, 2002. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. In Proc. ACL 2002. pdf
[18] David Chiang, 2007. Hierarchical phrase-based translation. In Comp. Ling. 33(2):201–228. pdf
[19] Steve DeNeefe and Kevin Knight, 2009. Synchronous Tree Adjoining Machine Translation. In Proc. EMNLP 2009. pdf
[20] Zhifei Li, Jason Eisner, and Sanjeev Khudanpur, 2009. Variational Decoding for Statistical Machine Translation. In Proc. ACL 2009. pdf
[21] John Carroll, Ted Briscoe, and Antonio Sanfilippo, 1998. Parser Evaluataion – a Survey and a New Proposal. In Proc. of the 1st International Conference on Language Resources and Evaluation, Granada, Spain. 447-454. pdf
[22] John Carroll, Anette Frank, Dekang Lin, Detlef Prescher, and Hans Uszkoreit, 2002. Beyond Parseval – Towards Improved Evaluation Measures for Parsing Systems. In Proc. of the Workshop Beyond PARSEVAL – Towards improved evaluation measures for parsing systems at the 3rd International Conference on Language Resources and Evaluation, Las Palmas, Gran Canaria. pdf
[23] Taavet Kikas and Margus Treumuth, 2007. Automatic Parsing Evaluation. Student report on NLP practical pdf

Schedule

The following table outlines the structure of the seminar over the course of the semester. Meetings will take place in the room INF 3027. See the panel at the right for your supervisor's contact information.

date event
April 8, 2011, 13:00 first meeting: topic assignment
May 6, 13:30 second meeting
May individual meetings with your supervisor (as needed) to discuss questions on the literature and the essay
June 5 latest date to hand in draft of essay for supervision (optional)
June individual meetings with your supervisor (as needed) to discuss your draft
June 26 hand in final version of essay (will be distributed to all participants via e-mail)
week of June 27 individual meetings to discuss slides for the talk
July 2 whoever is not excluded from the seminar by this date will definitely give his talk
July 7 final meeting: talks

Talks Schedule

All talks take place on Thusday, July 7. Each talk may take at most 35 minutes. This is a hard limit. Speakers can use their own laptops or the one which is on site. Speakers have to set up the respective laptop 15 minutes before their session begins.

08:00–08:15 Opening
Morning Session.
08:15 Langner: Binarization (Proseminar)
09:00 Thamm: Approximative Parsing (Proseminar)
09:45 (Short Break)
10:00 Leyva Galano: Incremental Dependency Parsing
10:45 Timani: Dynamic Programming for Linear-time Incremental Dependency Parsing
11:30–13:00 Lunch Break
Afternoon Session.
13:00 Wu: A Brief Survey on Word-Alignment Models
13:45 Teichmann: Parsing Problems based on PSTIGs
14:30 (Short Break)
14:45 Cheng Ye: Log-linear Models
15:30 Arne Theß: Variational Decoding
16:15–16:30 Closing

Getting Help

We have some information on writing articles available online. In general, if you have questions, do not hesitate to contact your supervisor. The earlier you address your problems, the easier the solutions will be.

Last modified: 5th Mar 2012, 10.39 AM
Author: Dipl.-Inf. Matthias Büchse

Contact
Prof. Dr.-Ing. habil.
Heiko Vogler

Phone: +49 (0) 351 463-38232
Fax: +49 (0) 351 463-37959
e-mail contact form

Dipl.-Inf.
Matthias Büchse

Phone: +49 (0) 351 463-38237
Fax: 
e-mail contact form

Dr. rer. nat.
Torsten Stüber

Phone: +49 (0) 351 463-39057
Fax: 
e-mail contact form

Dipl.-Inf.
Toni Dietze

Phone: +49 (0) 351 463-38469
Fax: 
e-mail contact form