TUD Logo

TUD Home » ... » Teaching » Winter term 2010/2011 » Seminar zur Verarbeitung natürlicher Sprache

Chair of Foundations of Programming

Seminar zur Verarbeitung natürlicher Sprache in the winter term 2010/11

The language of this seminar is going to be German, so some parts of this page are only available in German. If you do not speak German and you are interested in a seminar, please have a look at our offerings for next semester.

Painting of the ancient tower of Babel

Introduction

We will study the interplay between the field of automated translation of human languages, in particular syntax-based statistical machine translation, and the theory of tree automata, tree transducers and related models. Please click the links if you want to learn more.

Objectives

In brief: studying literature, giving a scientific talk, and writing a scientific essay. Every student is expected to give a talk (about 45 minutes) about his topic. Students who are not taking this course as a proseminar are expected to write a seminar essay (about 15 pages). After the seminar every student shall have a grasp of the central statement of each of the presented topics. When the seminar is included into an oral examination, this matter may be examined.

Prerequisites

It is advisable, though not compulsory, to have knowledge from one of our lectures about Machine Translation or Tree Automata. Basic formal language theory is also recommended.

Organization

During the first meeting topics are assigned to the students. Please attend this meeting if you want to participate. A second meeting, around one month later, is is intended to encourage your literature study. Further meetings will be scheduled individually between each student and his/her supervisor as needed. At the end of the lecture period, a day or two will be reserved for all the talks.

Important note: Students who fail to adhere to deadlines (i.e., who hand in too little or too late) will be excluded from the seminar. Any deviations from the deadlines have to be negotiated with the supervisor in advance.

Talks Schedule

All talks take place on Thursday, Februrary 03. Each talk may take at most 45 minutes. This is a hard limit. Speakers can use their own laptops or the one which is on site. Speakers have to set up the respective laptop 15 minutes before their session begins.

10:00–10:15 Opening
Morning Session. Chair: Toni Dietze
10:15 Lars Engel: n-Grams (Proseminar) slides
11:15–12:15 Lunch Break
Afternoon Session. Chair: Toni Dietze
12:15 Anja Fischer: Beweis der NP-Schwere des Decoding composition, slides
13:15 Stefan Prasse: State-Split Grammars composition, slides
14:15–14:30 Closing

Schedule

The following table outlines the structure of the seminar over the course of the semester. Meetings will take place in the room INF 3027. See the panel at the right for your supervisor's contact information.

date event
15. Oktober, 13 Uhr erstes Treffen: Themenvergabe
5. November, 13 Uhr zweites Treffen
November, Dezember individuelle Treffen mit dem Betreuer (nach Bedarf) zur Klärung von Fragen zur Literatur oder zur Ausarbeitung
10. Dezember 23:59 MEZ letzte Gelegenheit, einen Entwurf der Ausarbeitung zwecks Betreuung einzureichen (optional)
31. Dezember 23:59 MEZ Abgabe der finalen Version der Ausarbeitung (wird per E-Mail an alle Teilnehmer gereicht)
Woche vom 17. Januar individuelle Treffen zur Besprechung der Folien für den Vortrag
24. Januar wer bis jetzt nicht ausgeschlossen wurde, nimmt definitiv an den Vorträgen teil
3. Februar, 10:00 Uhr letztes Treffen: Vorträge

Topics

No. Title Literature Supervisor Student
1 Sprachmodelle: n-Gramm-Modelle (Proseminar) [5] Dietze Lars Engel
2 Syntaxbasierte Sprachmodelle: State-split Grammars [2,3] Dietze Stefan Prasse
3 Synchronous Tree-Sequence-Substitution Grammars [1,4]
4 Parsingprobleme für Probabilistic Synchronous Tree-Insertion Grammars [7]
5 Binarisierung [14,15,16]
6 Alignments [6,10,11,12]
7 Syntactic Realignment Models [13]
8 Bewertung von MT-Systemen [17,18,19]
9 Verfügbarkeit von Korpora (Proseminar) [20,21,22]
10 Decoding: NP-vollständig [23,24,25] Dietze Anja Fischer
11 Decoding: Variational Decoding [25]

Literature

[1] Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, Sheng Li. A Tree Sequence Alignment-based Tree-to-Tree Translation Model. Proc. of ACL-HLT 2008. pdf
[2] Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein. Learning Accurate, Compact, and Interpretable Tree Annotation. Proc. COLING/ACL 2006 (Main Conference) pdf
[3] Slav Petrov, Dan Klein. Learning and Inference for Hierarchically Split PCFGs. AAAI 2007 (Nectar Track) pdf
[4] David Chiang. Learning to translate with source and target syntax. 2010. In Proc. ACL, pages 1443–1452. pdf
[5] Daniel Jurafsky, James H. Martin. Speech and Language Processing. Pearson Education, 2009
[6] Jason Riesa, Daniel Marcu. Hierarchical Search for Word Alignment, Proc. ACL 2010. pdf
[7] R. Nesson, S. M. Shieber, A. Rush. Induction of probabilistic synchronous tree-insertion grammars for machine translation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA 2006), Boston, Massachusetts, 8–12 August 2006. pdf
[8] Liang Huang, David Chiang. 2005. Better k-best parsing. In Parsing '05: Proceedings of the Ninth International Workshop on Parsing Technology, pages 53–64, Morristown, NJ, USA. Association for Computational Linguistics. pdf
[9] Adam Pauls, Dan Klein. 2009. k-best a* parsing. In ACL-IJCNLP '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, pages 958–966, Morristown, NJ, USA. Association for Computational Linguistics. pdf
[10] Stephan Vogel, Hermann Ney, Christoph Tillmann. HMM-Based Word Alignment in Statistical Translation. Proc. ACL 1996 pdf
[11] Franz Josef Och, Hermann Ney. A Systematic Comparison of Various Statistical Alignment Models. Proc. ACL 2003 pdf
[12] Franz Josef Och, Hermann Ney. Improved Statistical Alignment Models. Proc. ACL 2000 pdf
[13] Jonathan May, Kevin Knight. Syntactic Re-Alignment Models for Machine Translation. Proc. EMNLP, 2007. pdf
[14] Andreas Maletti. Why synchronous tree substitution grammars? Proc. NAACL 2010. pdf
[15] Wei Wang, Kevin Knight, Daniel Marcu. Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy. Proc. EMNLP-CoNLL, 2007. pdf
[16] Liang Huang, Hao Zhang, Daniel Gildea, Kevin Knight. Binarization of Synchronous Context-Free Grammars. Computational Linguistics, 35 (4). pdf
[17] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: A method for automatic evaluation of machine translation. In ACL, pages 311-318, 2002. pdf
[18] Joseph P. Turian, Luke Shen, and I. Dan Melamed. Evaluation of machine translation and its evaluation. In MT Summit, 2003. pdf
[19] Satanjeev Banerjee and Alon Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgements. In ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65-72, 2005. pdf
[20] http://en.wikipedia.org/wiki/Text_corpus
[21] http://en.wikipedia.org/wiki/Parallel_text_alignment
[22] http://en.wikipedia.org/wiki/Treebank
[23] Kevin Knight. Squibs and discussions – Decoding complexity in word-replacement translation models. Computational Linguistics, 25(4), 1999. pdf
[24] Francisco Casacuberta and Colin de la Higuera. Computational Complexity of Problems on Probabilistic Grammars and Transducers. LNCS, 2000. pdf
[25] Zhifei Li, Jason Eisner and Sanjeev Khudanpur. Variational Decoding for Statistical Machine Translation. In Proc. ACL 2009. pdf

Getting Help

We have some information on writing articles available online. In general, if you have questions, do not hesitate to contact your supervisor. The earlier you address your problems, the easier the solutions will be.

Last modified: 14th Feb 2011, 7.57 AM
Author: Dr. rer. nat. Matthias Büchse

Contact
Prof. Dr.-Ing. habil. Dr. h.c./Univ. Szeged
Heiko Vogler

Phone: +49 (0) 351 463-38232
Fax: +49 (0) 351 463-37959
e-mail contact form

Sorry — there was an error in gathering the desired information