Info
- Participants list :
- ِAccomodation
- Flamenco Hotel Zamalek
- Venue
- Swiss Club - they have a map on the website.
- Venue directions
- go to Kitkat square, then go int Sudan st., take the 3rd left and in the end you will find the Swiss Club
- Dates
- Wed 20th - Sat 23rd May 2009
Schedule
- 9:30
- Pickup from Hotel Flamenco
- 10:00
- Sessions Start
- 11:00
- Coffee Break
- 11:30
- Sessions Continue
- 2:00
- Lunch Break
- 3:00
- Sessions Continue
- 6:00
- Sessions End (Soft ending, participants can stop when they get tired or can continue after 6 if they want to.)
- 8:00
- Dinner
Tasks
suggest a task for the code sprint, please add detailed description of the problem you want addressed, on what projects do you expect to use the code and which programming language should be used.Arabic text normalization
- Description
- Normalize text for searching stripping away tashkil, kashida as well as normalize for common spelling mistakes like wrong hamza, confusion over ya2 at end of word, ha2 marbouta instead of ta2 marbouta, etc.
- Useful for
- wiki word in mediawiki, improving search in drupal
- Language
- PHP
- Coders
- Alaa Khaled Al-Shamaa Amr Mostafa
Common word forms and their relationship (stemming heuristics)
- Description
- Stemming can be used to improve searching, indexing, automated terminology tagging etc. however it tends to introduce ambiguities for example the word الاشتراكية and the word اشتراك would almost always be used in completely different contexts despite sharing a root and having a very low Levenshtein distance. would it be possible to list the most commonly used word forms and cluster the ones that are most like to be related?
- Useful for
- full text indexing, automatic classification, trend detection etc.
- Languages
- Python
- Coders
- Alaa Amr Mostafa
Common proper nouns database
- Description
- when the word محمد appears it is almost always as a name but a word like مبروك is just highly likely to be a name while a word like أمل can sometimes be a person name but is most often not. a database of common Proper Nouns (perhaps with a weight representing statistical likelihood of word appearing as a common noun) so they won't get stemmed, parsed for semantics etc. can also be useful for narrowing down words to be used to auto classification calais style
Arabic wikipedia as poor man's Calais
- Description
- Can wikipedia be used as a repository of proper nouns, place names and historical figures names etc. to add build an rich tagging and terminology feature.
- Useful for
- auto tagging for Drupal and Wordpress
- Languages
- PHP
- Coders
- Alaa Slim
Yamli bookmarklet
- Description
- Availability of arabic keyboard can be an issue sometimes. Yamli solves this. And has API. So why not a Yamli bookmarklet, so that any text field in any web page becomes arabic input with one click.
- Useful for
- Arabic script typing on the web
- Languages
- javascript
- Coders
- Slim Amr Mostafa
Hunspell and Tatweel (a.k.a kashida)
- Description: Currently hunspell ignores Arabic tatweel regardless of context, so words like ـاـلمرــقمــ are considered correct (see SF bug #1868922).
- Useful for
- Arabic spell checking
- Language
- C++
- Coders :
Documentation of best practices in RTL design
- Description
- Create a Howto guide for desinger and developer to design a Bi-directional design and convert a LTR layout to RTL, with real example and a step-by-step guide to make a Drupal theme (why not: Hamoud's design?). Plus developed a CSS base-framework to simplify the Bidi design.
- Useful for
- Designing with RTL support
- Language
- CSS
- Coders
- Mohammed. S. HJIOUIJ Djihed Afifi (how do I sign in this wiki?)
I suggest making the document a semi official "How to" guide that encompasses any programming language, and includes recommendations for designs that are either direction neutral or easily flippable. Highlights will include sections on HTML and web design, GUI design in the popular GUI toolkits, and possible documentation of the know RTL bugs in major UI libraries.
Bidirectional LaTeX package for LuaTeX
- Description
- LuaTeX is the next generation TeX engine, it has all pdfTeX Bells & whistles, embedded Lua scripting and Omega Unicode and multi-directional extensions. Though all basic primitives needed for Arabic typesetting are (almost) there, a higher level LaTeX package is needed.
- Useful for
- typesetting bidirectional documents in LaTeX
- Language
- TeX, Lua
- Coders
- Khaled
OCR
- OCR
- it is indeed a huge task. There is no OSS OCR engine that supports Arabic. There are a multitude of research papers and articles that give algorithms for implementing various methods. It is worth noting that OCR for arbitrary hand writing is quite different from OCR for actual type written task (such as a newspaper).
If we decide to go for this, I'd plan on putting together a small presentation on what is needed to implement an OCR engine.
- Language: C (Suggested).
- Coders: Djihed
A 2.0 Web interface for transaltion
- Background
- Some background to this task are in this post:
http://djihed.com/linux/bringing-all-translation-management-tools-together
- Description
- This task is mostly concerned with the online translation component. Pootle and Launchpad's Rosetta are similar to this, but this would be a heavily ajaxifi'ed online translation tool with web 2.0 features and "statistics" that could make translation leap into the social web frenzy. This tool will aggregate translation texts of all projects into a single one stop place and invite the public at large to participate. Features would include push committing (auto committing when necessary), dead line management, term discussions, download/upload, etc..
- Language
- PHP or Python and SQL
- Coders
- Djihed
Improve ArPHP Wordpress plug-in
- Background
http://wordpress.org/extend/plugins/ar-php/ http://www.ar-php.org
- Description
- I would like to integrate more of my ArPHP library features into Wordpress blog tool:
http://www.ar-php.org/fetures_php_arabic.html Features like better Arabic search, Arabic auto-summarization, and Arabic dates.
- Language
- PHP
- Coder
- Khaled Al-Shamaa
Adabt Nafees Nastaleeq and Riqa fonts for Arabic
- Description
- Nafees Nastaleeq and Riqa fonts are free (GPL) fonts by CRULP in Nastaʿlīq and Ruq'ah Arabic calligraphic styles, supporting mainly Urdu and Persian but lack some characters needed for Arabic.
- Useful for
- typesetting Arabic (mainly titles, posters etc.)
- Language
- OpenType
- Coders
- Khaled

Bidi & RTL documentation
ضمن المشاكل التي حددنها من قبل تحدثنا عن:
- Documentation of best practices in RTL design
- Bidi
أرى أنهما أقرب إلى التصميم من البرمجة، لذلك أريد التأكد من أنه يمكن العمل عليهما ضمن هذا الماراثون لأختار القيام بهما.
يمكن العمل عليهم طبعا
أولا يمكن العمل على أي شيئ يهمك بما فيها الطريقة المثلى لطبخ الملوخية (أهم حاجة الشهقة).
ما يهمنا كمنسقين هو الاستمرارية و دي لها شقين، أولا أن يتحمس المشاركون للاستمرار في العمل المشترك بعد نهاية المارثون، و أن نجتذب مشاركين و مساهمين جدد و أخيرا أن نقنع الجهات المانحة و الممولة و المضيفة بكفائة فعاليات مثل المارثون و مجموعة المهاوييس حتى يتسنى لنا تكرار التجربة.
و لتحقيق الاستمرارية مطلوب مخرج مفيد و عملي يمكن تنفيذه في أربعة أيام بجهد عدد لا يزيد عن أربعة أفراد.
فلو قلنا مثلا أن مسائل التصميم سيتم التفاعل معها بصياغة وثيقة في صيغة howto مع أمثلة عملية، فهذا يوفي بالغرض و زيادة.