Code sprint planning

Info

Participants list :
ِAccomodation
Flamenco Hotel Zamalek
Venue
Swiss Club - they have a map on the website.
Venue directions
go to Kitkat square, then go int Sudan st., take the 3rd left and in the end you will find the Swiss Club
Dates
Wed 20th - Sat 23rd May 2009

Schedule

9:30
Pickup from Hotel Flamenco
10:00
Sessions Start
11:00
Coffee Break
11:30
Sessions Continue
2:00
Lunch Break
3:00
Sessions Continue
6:00
Sessions End (Soft ending, participants can stop when they get tired or can continue after 6 if they want to.)
8:00
Dinner

Tasks

suggest a task for the code sprint, please add detailed description of the problem you want addressed, on what projects do you expect to use the code and which programming language should be used.

Arabic text normalization

Description
Normalize text for searching stripping away tashkil, kashida as well as normalize for common spelling mistakes like wrong hamza, confusion over ya2 at end of word, ha2 marbouta instead of ta2 marbouta, etc.
Useful for
wiki word in mediawiki, improving search in drupal
Language
PHP
Coders
Alaa Khaled Al-Shamaa Amr Mostafa

Common word forms and their relationship (stemming heuristics)

Description
Stemming can be used to improve searching, indexing, automated terminology tagging etc. however it tends to introduce ambiguities for example the word الاشتراكية and the word اشتراك would almost always be used in completely different contexts despite sharing a root and having a very low Levenshtein distance. would it be possible to list the most commonly used word forms and cluster the ones that are most like to be related?
Useful for
full text indexing, automatic classification, trend detection etc.
Languages
Python
Coders
Alaa Amr Mostafa

Common proper nouns database

Description
when the word محمد appears it is almost always as a name but a word like مبروك is just highly likely to be a name while a word like أمل can sometimes be a person name but is most often not. a database of common Proper Nouns (perhaps with a weight representing statistical likelihood of word appearing as a common noun) so they won't get stemmed, parsed for semantics etc. can also be useful for narrowing down words to be used to auto classification calais style

Arabic wikipedia as poor man's Calais

Description
Can wikipedia be used as a repository of proper nouns, place names and historical figures names etc. to add build an rich tagging and terminology feature.
Useful for
auto tagging for Drupal and Wordpress
Languages
PHP
Coders
Alaa Slim

Yamli bookmarklet

Description
Availability of arabic keyboard can be an issue sometimes. Yamli solves this. And has API. So why not a Yamli bookmarklet, so that any text field in any web page becomes arabic input with one click.
Useful for
Arabic script typing on the web
Languages
javascript
Coders
Slim Amr Mostafa

Hunspell and Tatweel (a.k.a kashida)

Description: Currently hunspell ignores Arabic tatweel regardless of context, so words like ـاـلمرــقمــ are considered correct (see SF bug #1868922).
Useful for
Arabic spell checking
Language
C++
Coders :

Documentation of best practices in RTL design

Description
Create a Howto guide for desinger and developer to design a Bi-directional design and convert a LTR layout to RTL, with real example and a step-by-step guide to make a Drupal theme (why not: Hamoud's design?). Plus developed a CSS base-framework to simplify the Bidi design.
Useful for
Designing with RTL support
Language
CSS
Coders
Mohammed. S. HJIOUIJ Djihed Afifi (how do I sign in this wiki?)

I suggest making the document a semi official "How to" guide that encompasses any programming language, and includes recommendations for designs that are either direction neutral or easily flippable. Highlights will include sections on HTML and web design, GUI design in the popular GUI toolkits, and possible documentation of the know RTL bugs in major UI libraries.

Bidirectional LaTeX package for LuaTeX

Description
LuaTeX is the next generation TeX engine, it has all pdfTeX Bells & whistles, embedded Lua scripting and Omega Unicode and multi-directional extensions. Though all basic primitives needed for Arabic typesetting are (almost) there, a higher level LaTeX package is needed.
Useful for
typesetting bidirectional documents in LaTeX
Language
TeX, Lua
Coders
Khaled

OCR

OCR
it is indeed a huge task. There is no OSS OCR engine that supports Arabic. There are a multitude of research papers and articles that give algorithms for implementing various methods. It is worth noting that OCR for arbitrary hand writing is quite different from OCR for actual type written task (such as a newspaper).
If we decide to go for this, I'd plan on putting together a small presentation on what is needed to implement an OCR engine.
Language: C (Suggested).
Coders: Djihed

A 2.0 Web interface for transaltion

Background
Some background to this task are in this post:

http://djihed.com/linux/bringing-all-translation-management-tools-together

Description
This task is mostly concerned with the online translation component. Pootle and Launchpad's Rosetta are similar to this, but this would be a heavily ajaxifi'ed online translation tool with web 2.0 features and "statistics" that could make translation leap into the social web frenzy. This tool will aggregate translation texts of all projects into a single one stop place and invite the public at large to participate. Features would include push committing (auto committing when necessary), dead line management, term discussions, download/upload, etc..
Language
PHP or Python and SQL
Coders
Djihed

Improve ArPHP Wordpress plug-in

Background

http://wordpress.org/extend/plugins/ar-php/
http://www.ar-php.org

Description
I would like to integrate more of my ArPHP library features into Wordpress blog tool:

http://www.ar-php.org/fetures_php_arabic.html
Features like better Arabic search, Arabic auto-summarization, and Arabic dates.

Language
PHP
Coder
Khaled Al-Shamaa

Adabt Nafees Nastaleeq and Riqa fonts for Arabic

Description
Nafees Nastaleeq and Riqa fonts are free (GPL) fonts by CRULP in Nastaʿlīq and Ruq'ah Arabic calligraphic styles, supporting mainly Urdu and Persian but lack some characters needed for Arabic.
Useful for
typesetting Arabic (mainly titles, posters etc.)
Language
OpenType
Coders
Khaled

Add your suggestions here

|
محمد الساحلي's picture

Bidi & RTL documentation

ضمن المشاكل التي حددنها من قبل تحدثنا عن:

- Documentation of best practices in RTL design

- Bidi

أرى أنهما أقرب إلى التصميم من البرمجة، لذلك أريد التأكد من أنه يمكن العمل عليهما ضمن هذا الماراثون لأختار القيام بهما.

» |
Alaa Abd El Fattah's picture

يمكن العمل عليهم طبعا

أولا يمكن العمل على أي شيئ يهمك بما فيها الطريقة المثلى لطبخ الملوخية (أهم حاجة الشهقة).

ما يهمنا كمنسقين هو الاستمرارية و دي لها شقين، أولا أن يتحمس المشاركون للاستمرار في العمل المشترك بعد نهاية المارثون، و أن نجتذب مشاركين و مساهمين جدد و أخيرا أن نقنع الجهات المانحة و الممولة و المضيفة بكفائة فعاليات مثل المارثون و مجموعة المهاوييس حتى يتسنى لنا تكرار التجربة.

و لتحقيق الاستمرارية مطلوب مخرج مفيد و عملي يمكن تنفيذه في أربعة أيام بجهد عدد لا يزيد عن أربعة أفراد.

فلو قلنا مثلا أن مسائل التصميم سيتم التفاعل معها بصياغة وثيقة في صيغة howto مع أمثلة عملية، فهذا يوفي بالغرض و زيادة.

» |