Japanese Dictionary for Windows

The JMDict Project (or its predecessors, I have fond memories of JWPCE) has been helping Japanese learners in many ways for many years. EDict or its multlingual offspring JMDict are Japanese dictionaries now stored in large XML-files all available to everybody under free licenses (but unfortunately the share-alike clauses of non-English data sets differ). Most of the free dictionary apps you can find for learning Japanese are to some extent based on that data. The project has been and is so useful you probably cannot give them enough credit. I use the data for experimenting on different platforms - it is a considerable amount of data but manageable and a down to earth interface for a dictionary is not too tedious.

I would like to discourage anybody from attempts to get into programming for Windows by going it alone with Win32. It is a bad choice both for writing apps and for users - you waste a lot of time looking up and coding stuff that will eventually look dated or be lacking functionality people came to expect in the last two decades. I started from scratch with MFC after accepting that Win32 just won't let any signals leave an edit box while the focus is inside. I simply wanted the enter key to trigger a search so users don't need to reach for their mouse everytime, but no. Just no. Stackoverflow knows of wild and rigid hacks from back in the day that seemed to have worked for some in the past, but not for me.

With MFC you are fine just clicking stuff on property windows most of the times and customary behaviour is mostly default. The remaining issues that you have to research and hard-code are doable, just so much better than Win32. However the actual code is so full of opaque boiler-plate it's really not worth showing anybody. So much for the interface.

As for data structure, C++ has a collection called standard multimap. It allows multiple values (a word can have many meanings) and enables fast binary searches as it sorts itself as you build it. Essentially it compares sizes (>=, ==, etc.) - I understand how you can say B is bigger than A, but how did they decide which Kanji is small or big? Anyway, in the end you can search for complete matches and words that start with what you are looking for (just traverse the next biggest as long as they have the same beginning) and that's good enough. There is one multimap that relates unique id numbers as keys to the corresponding words as values, and one for each language that relates words as keys to their id numbers as values.

Messing with character sets can be so painful- I wish I had found the /utf-8 setting for Visual Studio right away. It does all the job. Use nothing but UTF-8. Converting the XML to a big multimap is not really that hard: With Notepad++ you search and replace tags (and clean out or escape ";" and the like..") until you have your code and delete everything you don't need. But for eliminating blank lines the file was too big for my machine, I had to delete them gradually (replace every 6 consecutive blank lines with 5, then every 5 with 4 etc.).

Then you have a giant file in your project and that is an issue in more than one way: If you have a Windows7 32bit (that is max 4GB RAM) machine that runs out of memory when compiling you might want to "enable 3gb" (command prompt as admin: bcdedit /set IncreaseUserVa 3072, then reboot). But don't forget to disable after you're done (bcdedit /deletevalue IncreaseUserVa, again reboot).

In Visual Studio, switch on /bigobj. If necessary, have the patience to slice your multimap building code file in halves and manage the references, over and over again, until it compiles.

Building the .exe does take quite a while, but the bottom line is that it still can be done with relative ease. C++ scales amazingly well. By contrast JAVA IDE's won't even look at files that size.

App Screenshot
Download the App Sorry for the inconvenience, but when looking for an English verb you have to include the "to " in the beginning ("do" will not yield "to do"). For desktop I somehow like this one better than the JAVA/SQLite app that I made as well. It's nice to have single .exe and no setup or working directory etc.. It starts within 3 seconds on my machine and the results pop up so fast you don't really have to worry about multi-threading. The downside is that you cannot look for stuff in the middle or at the end of a word and that makes taking a look at SQL worth while.