I had sent an email to IRIS regarding Urdu/Farsi OCR software and hardware, so that if the software can read & recognize Urdu text as Farsi, we can still accomplish a lot and may need little to none extra editing/proof-reading, from the little I know about the similarities between Farsi and Urdu alphabet. Here is their reply:
urdu OCR? from a technical perspective it looks like a formidable task. but then, things that have looked impossible at one time, are commonplace today. the challenges of urdu OCR are summed up quite well in the introduction of this paper: Online Urdu OCR. similar quick guide in presentation format can be found here: Urdu OCR. ---- you can read more research papers (these are links to PDFs ): Optical Character Recognition System for Urdu (Naskh Font) Using Pattern Matching Technique Recognition of Printed Urdu Script Urdu Morphology, Orthography and Lexicon Extraction Word Segmentation for Urdu OCR systems OCR For Printed Urdu Script Using Feed Forward Neural Network Urdu Nastaliq OCR Improving Nastaliq Specific Pre-Recognition process for Urdu OCR ----------------------------------------- and if you haven't switched off already, here are some more: Guide to OCR for Indic Scripts from SpringerLink (paid content); however, book you can read extracts here on googlebooks. there are however other ways one can find a pdf somewhere lying stray on some bylane across the info-highway... Implementation Challenges for Nastaliq Character Recognition from SpringerLink. (again, paid content; read extracts here.)
something relevant for book digitization amateurs/enthusiasts/hobbyists: http://www.diybookscanner.org/ scan this book! many great articles if you are thinking of managing a library: here. diy bookscanning at instructables ---- for those with deep pockets: atiz and it comes with its own digitization software. an article on atiz kirtas ------------------------------------------------------ if you however, are not interested (or do not have time or money) to invest the above and will make do with flatbed ordinary scanners, here are some tools to help you in the process. http://scantailor.sourceforge.net/ opensource image to pdf: http://home.netspeed.com.au/astawowc/outpdf/index.html ----- help can be found here: http://diybookscanner.org/forum/ relevant parts on scanning etc on: http://www.pgdp.net/phpBB2/ http://djvu.sourceforge.net/ http://www.imagemagick.org/script/index.php (for linux users: this link or this) (for win: this, or this)
well! better to scan from your library than to work on recommendations as you may not be able to get those books. anyways, i want to get following books scanned and upload, those which i have in my library are tagged with (y), those i don't have are tagged with (n), but most of those i have and mentioned in below are difficult to scan until i unbind them. jilā al-şudūr by Molanā Ashraf Siyālvī (y) tafsīrāt e aĥmadiyyah (y) sawāneĥ Imām Abū Ĥanifah by Abū al-Ĥasan Zayed Farūqi (y) al-Ŝawārim al-Hindiyyah (n) Maţālé al-Masarrāt sharaĥ Dalāil al-Khayrāt (y) al-Kalimat al-Úlyā (n) (searching since long, unable to get a copy) Tarjamah of Jawāhir al-Baĥĥār 5 vols (y) sīrat an-Nabi by Államah Nūr Bakhs Tawakkalī (y)
so do i, but it is quite a time taking job, perhaps 30 to 40 times higher than scanning, plus it cannot be a one man job, we need good typists familiar with urdu software (gmail language translation gadget can be an alternative), we need proof readers, and most important of all financial support to hire a dedicated team and buy equipments.
I personally like to read from a hard copy, but unfortunately reading habit is drastically on decline specially among Muslims in the indian sub-continent; books go out of print very soon and only very famous books go in print above 10,000, therefore Internet is an excellent source to preserve them and share it for masses specially when new generation is becoming so inclined towards finding everything on Internet; i fear that soon people will start looking for their belongings they have misplaced for example their glasses . Though I have been doing it for the sake of Sunni Muslims only, but there are many questions that need be answered, for example; a) does the canon law permit such action that is scanning and publishing on Internet the books that are published under a copy rights law enforced by a government? b) what is the ruling on intellectual property, can a person/institution claim that a particular book is his/their intellectual property and nobody is allowed to reproduce, reprint, scan, ore use in part in any other form? c) most of the books fall into compilation (tālīf) category, if copy rights do apply by the canon law then how much is it valid for compilation works? d) how valid a publisher concerns are about its investment in the light of shariáh? perhaps publishing works on Internet does not have a direct effect on their investment because the hard copies will and eventually do sell out, i say this because i have observed that many famous books which are available on Internet are still in print. Also, I usually scan only those books that are either out of print are have gone many cycles of printing. e Makataba e Ahl e Sunnat has been doing a great job, they have published more than 1400 books collected from various sources. I think we all can do this job very easily togetehr, all we have to do is a) buy a normal scanner, would cost you maximum a hundred dollar only b) scan the book at your leisure time, for example scan only 10 page every day, it would only take 10 to 15 minutes c) upload it here on sunniport, or scribd, or give to one of the brothers who are already doing it d) you can leave a post about your work so that we avoid duplicates and save time.
I know it's a lot easier said than done and a major task, but I would really like to see typed textual content on the web - as that shows up in textual keyword search results and will get picked on by the search engines. With scanning, the resulting file is (normally) either a jpg or any other image format, or a pdf built out of jpg's (as opposed to a pdf built out of a doc or text files), and other than the title of the book for which text is provided as a file name, the rest of the scanned images are not picked up by the web crawlers as content meant for various different keyword searches. (I know we can tag the images, but if we really want to write out that many tags, we might as well type the page). With textual content, the books will appear more readily in search results. For example, if I search for "ridawiyya + taharah + pani" (in Urdu), I might as well get a result that points to a specific page in Fatawa Ridawiyya, if it has been uploaded in textual content form. I think us Urdu content providing desis should give up this addiction to image files and scans now that Urdu keyboards and phonetic keyboard is much more readily available, just like the Arabic keyboards. Please note I am not putting down the good efforts of any brothers/organizations. May Allah reward them for their efforts, and it is still a major feat they are accomplishing and helping other brothers and sisters. Just making a point in regards to what can be better and more efficient and maximize the benefit.
Nafseislam.com Assalaamu 'Alaykum Brother Qaasim has got a good point. I have been through many websites. The best ones are www.razanw.org www.faizaneraza.org www.nafseislam.com Nafseislam.com was suggested by one of the disciples of my Shaykh. It is a really good website. I am saying so, because I could easily find a very large collection of books on wide range of topics and beautiful speeches. I have personally talked to the owner of this website ( who lives in Kuwait, and is a Mureed of Huzoor Ameer-e-Ahle Sunnat ). He is in need of volunteers too, to help his collection to expand. I hope this will be a very "Happy Reading" . ALLAH HAAFIZ
Many brothers and sisters say that they do not have access to books. I think it would be a great idea to scan books and upload them on the internet like our brother Noori has been doing. If we could start to compile a list of books that should be scanned in sha Allah for both 'Ulamaa and the 'Awaam. If Brothers and sisters could please give their views.