## 8bit archeology returns

While working on Märchen Maze translation I checked if other Namcot games use the same image encoding format. It appears that not only some Namcot games use it but there are Pack-In-Video games as well. Maybe the same subcontractor studio was involved.
Anyway, here is the list of the games I checked so far.

The encoding is pretty simple. The image is split in 32 bytes blocs. Each bloc can be encoded using one of he following 4 techniques :

1. 0 filled : the bloc is filled with 0. A plain old memset is enough.
memset(out, 0, 32)
2. Sparse : the bloc contains 0 bytes. The first 4 bytes indicates if the corresponding bloc byte is 0 or read from the data buffer.
uint8_t *input; // pointer to encoded input data.
uint8_t *out;   // pointer to output data.
// ...
// sparse_decode
int i, j;
uint8_t *data   = input+4;
uint8_t  bit;
for(i=0; i<4; i++)
{
for(j=0; j<8; j++)
{
*out++ = (bit & 1) ? *data++ : 0x00;
bit >>= 1;
}
}

3. XOR Sparse : Just like the “sparse” encoding but the bloc was previously processed using a sequence of xor on each consecutive 16 bytes in order to increase the number of 0 in the bloc.
sparse_decode(out, input);
// xor
for(i=0; i<16; i+=2)
{
out[i+0x02] ^= out[i+0x00];
out[i+0x03] ^= out[i+0x01];
out[i+0x12] ^= out[i+0x10];
out[i+0x13] ^= out[i+0x11];
}

4. Raw : the bloc data is stored unencoded, just call
memcpy(out, data, 32)

The encoding type is packed in 4 bits stored separately from the image data. This means that a byte stores the encoding type for 4 blocs.

The pseudo code for the whole decoding routine looks like this.

uint8_t *encoding; // pointer to the bloc encoding type array.
uint8_t *data;     // pointer to the encoded image data.
uint8_t *out;      // pointer to the output image data.
size_t bloc_count;
uint8_t current_bloc_encoding;
int i;

for(i=0; i<bloc_count; i++)
{
current_bloc_encoding = (encoding[i/4] >> (i%4)) & 0x03;
switch(current_bloc_encoding)
{
case 0: // zero filled.
memset(out, 0, 32);
break;
case 1: // sparse.
sparse_decode(out, data);
break;
case 2: // XOR sparse.
sparse_decode(out, data);
xor(out);
break;
case 3: // raw data.
memcpy(out, data, 32);
data += 32;
break;
}
out += 32;
}


## Inside Croissant Mountain, Evil Takes Its Form.

Let’s say you are going to work on some meaningless (not to say useless) piece of string processing code. One good way to test it is to apply it on a huge set of strings. You can take some huge random text. You may also want to use a lexicon for problems like palindrome, isogram, blanagram or any funky word list generation.
Let’s pick a language at random… French!
Your favorite search engine will probably returns the word list genrated from the Francais-Gutenberg lexicon by Christophe Pallier (link). It’s a 3.6MB file containing 336531 french words.

Another alternative is to use the lexicon from the ‘Association des Bibliophiles Universels’ (link). Unfortunately there’s a file per letter. So you need to remove the licence and footer before concatenating them.

#!/bin/bash
for i in {a..z}
do
wget http://abu.cnam.fr/cgi-bin/donner-dico-uncompress?liste_$i -O src/list_$i
tail -n +60 src/list_$i | head -n -2 | cut -f1 >> words.DICO.txt done  The generated file contains 289576 words. Finally, you can use the Morphalou lexicon from the Centre National de Resources Textuelles et Léxicales (lien). Unfortunately, the lexicon is a 155MB XML file containing the list of inflected forms for each entry. I will not describe the XML structure here. Long story short, we will extract all the ortography tags and remove duplicates. grep -in orthography Morphalou-2.0.xml | perl -pe 's/.*<orthography.*>(.*)<\/orthography>/$1/' | awk ' !x[$0]++' > words.Morphalou.txt  This wonderful one-liner generates a file containing 410940 words. The awk ‘ !x[$0]++’ is blatantly ripped from this stackoverflow answer.

## A penguin, an USB device and a video game console walk into a bar.

Some months ago I bought a cheap usb video capture device (dx.com). Using it with vlc under Windows is pretty straightforward. What about Linux?
First things first, if you are running a Debian Wheezy with a 3.2.x kernel you have nothing to do. The em28xx module is already here. You can use vlc or mplayer. Unfortunatly I didn’t managed to configure vlc properly. After some tweeking and intense internet browsing, I get the following parameters for mplayer:

mplayer tv:// -tv driver=v4l2:width=720:height=480:outfmt=uyvy:norm=NTSC-M:device=/dev/video1:input=1:fps=30:alsa:amode=1:forcechan=2:audiorate=48800:adevice=plughw.2,0:forceaudio:immediatemode=0 -ao sdl -vo sdl

The input lag is rather small but it can be troublesome if you intend to record longplays. On the other hand, it’s a must-have if you are a coder. No need for an extra screen! This gadget saved my life at Datastorm 2014. Our intro was nearly done and I borrowed Flubba’s usb EasyCap device to test it on the PC Engine hardware. It was running like a charm under Mednafen but it failed miserably on the hardware (blame crappy bank management code). Hopefully it was all fixed before the deadline. This little usb gizmo was also very helpful when I was working on the Everdrive SD card driver or on the Memory Base 128.

## 10 tons histogram.

What is an image histogram?
Well, it is fairly simple. Consider we have a grayscale image. Its histogram is an array that stores the number of occurence of each grey level. It gives us a representation of the tone distribution.

Source image and its histogram.

If an image is dark, there will be a peak near the lowest tone values. Or the other hand brighter image will have peaks near the maximum tone values.

Dark image

Bright image with associated histogram.

The image histogram can also give us informations about its contrast. An image with a low contrast will have a peak around its median value and most of the histogram values will be concentrated around that peak. Alternatively a highly contrasted image will have peaks around its minimal and maximal tone value as long as sparse low values in between.

Low contrast image

High contrast image

It seems logical that if we stretch the histogram over the whole tone range, we will increase the image constrast. This operation is called histogram equalization. Before going any further, let’s introduce some basic definitions. The first one is the probability density function:

$$P^{[l_s,l_f]}_l = \frac{h_l}{H^{[l_s,l_f]}}$$

With $$0 \le l_s < l_f \le l_{max}[/latex]. Usually $l_{max} = 255$. [latex]H^{[m,n]}$$ is introduced for brevity. It’s the sum of the histogram value over the range $$[m,n]$$.

$$H^{[m,n]} = \sum_{i=m}^{n}h_i$$

Next is the probability distribution function.

$$C^{[l_s,l_f]}_l = \frac{\sum_{i=l_s}^{l}h_i}{H^{[l_s,l_f]}}$$

Which can be rewritten as:

$$C^{[l_s,l_f]}_l = \sum_{i=l_s}^{l}P^{[l_s,l_f]}_i$$

The mean of the histogram values is also called the image brightness.

$$l_m = \frac{\sum_{i=l_s}^{l_f}i \times h_i}{H^{[l_s,l_f]}}$$

To put it simply,

$$l_m = \sum_{i=l_s}^{l_f}i P^{[l_s,l_f]}_i$$

The aim of histogram equalization is to produce a histogram where each tone has the same number of occurence (uniform distribution). This is equivalent to saying that any tone $$l$$ has the same density.

$$P’^{[l_s,l_f]}_{l} = \frac{1}{l_f – ls}$$

Which gives us the ideal probability distribution function.

$$C’^{[l_s,l_f]}_{l} = \sum_{i=l_s}^{l}P’^{[l_s,l_f]}_i = \frac{l-ls}{lf – ls}$$

A way to acheive this is to find $$l’$$ that minimizes the difference between $$C’^{[l_s,l_f]}_{l’}$$ and $$C^{[l_s,l_f]}_{l}$$. In other words, we want to find $$l’$$ so that:

$$C^{[l_s,l_f]}_{l} = C’^{[l_s,l_f]}_{l’}$$

Which gives us:

$$l’ = l_s + \big\langle(l_f – l_s) C^{[l_s,l_f]}_l \big\rangle$$

Where $$\langle x \rangle$$ is the nearest integer to $$x$$.

This technique is sometimes referred as the classical histogram equalization or CHE. The drawback of this method is that it changes the image brightness.

The following source code is performing CHE with $$ls = 0$$ and $$l_f = l_{max}$$.
This code is using the CImg library.

    // We will load a dark grayscale version of the
// classical lena image.
CImg<unsigned char> source;

unsigned long int lut[256];
unsigned long int histogram[256];

// Build image histogram.
memset(histogram, 0, sizeof(unsigned long int)*256);
cimg_forXY(source, x, y)
{
histogram[source(x,y,0,0)]++;
}

// Compute probality distribution function.
unsigned long int sum = 0;
for(int i=0; i<256; i++)
{
sum += histogram[i];
lut[i] = floor(255.0 * sum / (source.width() * source.height()) + 0.5);
}

// Perform CHE
cimg_forXY(source, x, y)
{
source(x,y,0,0) = lut[(int)source(x, y, 0, 0)];
}



Result of CHE source code.

## Roger Quartermain et les parchemins perdus de l’ère Heisei.

This is a french translation of Roger Quartermain and the lost script of the Heisei era originally published on ROMHack forum(a french forum about videogame fan translations).

Roger est heureux. Il a récemment fait l’acquisition à vil prix d’un jeu PC-Engine CDROM, “Nishimura Kyotaro Mystery.: Hokutosei No Onna”. Selon toute vraisemblance, il s’agirait d’un jeu d’enquête policière comme J.B. Harold Murder Club, Jake Hunter, Ace Attorney, Le Manoir de Mortevielle ou encore Maupiti Island.
Il fut publié à l’aube de l’année 1990 par Naxat (maintenant Kaga Create).
Alors qu’il glisse délicatement le disque dans sa bonne vieille DUO, il se rappelle que certains jeux offrent la possibilité de changer la langue. Par exemple, la version japonaise de J.B Harold Murder Club peut être joué en anglais. Malheureusement une telle option est introuvable dans Hokutosei No Onna… Qu’à cela ne tienne! Roger Quatermain attrape ses bottes, son chapeau et plonge vaillamment dans les entrailles sombres et fumeuses du jeu.
L’introduction montre un pauvre quidam se faire poignarder, deux jeunes femmes insouciantes se rendant à une gare et un personnage louche en train de les espionner.

S’agissant d’un jeu CDRom, ce dernier utilise la fonction du BIOS pour afficher le texte. Roger saisit ses vieilles notes et les parcourt frénétiquement en quête d’informations sur une quelconque routine d’affichage de texte. Et la voici! ex_fnt à l’adresse $e060. Il dégaine alors son émulateur favori et positionne un point d’arrêt à l’adresse indiquée. Le point d’arrêt est atteint après l’introduction sur ce qui semble être l’écran de sélection des sauvegardes à charger. Contrairement à ce qu’il pouvait laisser entendre Roger n’est pas intéressé par ex_fnt mais par le code appelant cette routine. Pour ce faire il place des points d’arrêts sur toutes les instructions rts d’ex_fnt. Une brève pression sur la touche R l’amène sur$f1e2. Une autre pression sur la touche S l’envoie sur l’appelant.

Il se retrouve donc à $566b. D’après les notes, le symbole shift-jis est chargé depuis$41, $42 vers$f8 et $f9. La logique lui dicte donc de placer un point d’arrêt en écriture sur les adresses$41 et $42. Le point d’arrêt est levé en$5b20. Le code est somme toute assez simple.

5b1c: ldy #$00 lda ($3c), Y
5b20: sta $41 iny lda ($3c), Y
iny
sta $42 L’étape suivante consiste à chercher où le pointeur$3c est initialisé. Roger place donc un autre point d’arrêt $3c et$3d.

5a18: lda #$90 bra$5a22
lda #$40 bra$5a22
lda #$80 5a22: sta$c0
5a24: sty $3d cpy #$00
beq $5a2f stx$3c
smb5 $c0 rts Malheureusement le code appelant est tout sauf trivial. Par contre la bonne nouvelle est que le texte n’est pas compressé. En notant attentivement les valeurs stockées en$41 et $42, Roger réussit à extraire la chaîne: 40 81 40 81 40 81 C7 82 CC 82 54 8B E4 88 59 8C 96 8E C5 82 6E 8E DF 82 DC 82 B7 82 A9 82 48 81 0D 00 FF 00 Le boutisme doit être inversé. Heureusement il a sous le coude un script Perl fort utile, swap_endian.pl. Les 4 derniers caractères ressemblent à des codes de contrôle. Sa grande et longue expérience lui permet de déduire que 00 0D doit symboliser un saut de ligne et 00 FF la fin du texte. Voici la chaîne décodée.  どの亀井刑事で始めますか？ Mais tout cela ne lui dit pas où le pointeur$3c est initialisé. Roger sait que sur CDRom le code est d’abord transféré en RAM avant d’être exécuté. Cela signifie aussi que le code peut être auto-modifiable. Et manque de chance, c’est le cas ici. Roger doit encore mettre un point d’arrêt à l’emplacement où le saut est effectué et suivre le fil d’exécution jusqu’à atteindre l’emplacement où le pointeur est initialisé. Plusieurs heures passent et il échoue finalement en $724d 724d: ldx #$fc
ldy #$74 jmp$5a18

Il redémarre le jeu en ne laissant qu’un point d’arrêt sur $41 et$42. Comme prévu il se retrouve là où il a découvert la première chaine. Il décide donc de continuer son chemin et se retrouve en $acb4. acae: lda$4f
asl A
tay
lda ($50), Y acb4: sta$41
iny
lda ($50), Y beq$acd8
sta $42 lda$2922
sta $3f lda$2923
sta $40 lda #$01
jsr $5655 Le pointeur$50 est initialisé en $ac92. ac8a: tay lda$312a, Y
asl
ac8e: tay
ac90: lda ($9f), Y sta$50
iny
lda ($9f), Y sta$51

Ainsi donc le pointeur est initialisé à partir d’une table. Ses valeurs sont

• b1b6
• bdb6
• cdb6
• d9b6
• e7b6

Par chance les indexes pour ces pointeurs sont 2,3,4,5,6. Roger cherche dans l’iso la chaîne b1 b6 bd b6 cb b6 d9 b6 e7 b6. Il la trouve à l’offset $11f49. L’endroit semble baigner dans une soupe de pointeurs. Quelques octets plus loin il reconnait du texte en shit-jis. Il commence en$11f53. La “soupe” doit donc contenir des valeurs ressemblant à $53Xf. Il remonte et trouve$53af à $11d45. Tadam, il a sa première table et son premier bloc de texte. Comme tout les pointeurs sont en ordre croissant, Roger en conclue que la liste contenant les noms des options est stockée de$11f53 à $126f4. Il remarque d’autres symboles shift-jis à la suite. Ils vont de$126f5 à $132f5. Une fois les options affichées, Il passe sur un autre écran où un vieux briscard de la crim’ lui taille le bout de gras. Une fois encore il met un point d’arrêt sur$41 et $42. Mais à cet instant il se rappelle de$5a18. Cela commence par:

5a18: lda #$90 bra$5a22
5a1c: lda #$40 bra$5a22
5a20: lda #$80 5a22: sta$c0

Il peut donc y arriver depuis $5a18,$5a1c ou $5a20. Cela lui donne 3 points d’arrêt.$5a20 est le grand gagnant et comme il s’en doutait il y arrive par un bout de code auto-généré ($59ab). Une longue et pénible route se prépare à l’horizon. Et pénible elle est. Néanmoins, Il découvre que$5a20 est atteint depuis $6c5e. 6c5e: ldy$a5
lda ($a8), Y sta$00
iny
lda ($a8), Y sta$01
iny
sty $a5 and$00
cmp #$ff beq$6c79
ldx $00 ldy$01
jmp $5a20 La première phrase du vieux briscard se trouve en$89c9 ($1339c9 dans l’iso). Cette adresse provient d’une table pointée par$a8. Lui même initialisé en $707e. 707e: lda #$00
sta $00 lda #$80
sta $01 lda ($00), Y
sta $a8 iny lda ($00), Y
sta $a9 stz$a5
stz $a6 lda #$03
jmp $619c Le pointeur est stocké en$13308c. Roger est perplexe. Il ne s’agit pas d’une simple table de pointeur. Elle contient d’autres données. Que signifient-elles? Mais il est exténué et il y a Extra Sangsues sur le câble.
Il s’étire, ferme l’émulateur et son portable et se prépare pour 90 minutes de pur puissance.

## Harry Dumper and the order of the Griffin

The Magic Griffin is a game copier. From the menu and the case, it seems to have been made by a taiwanese company called Front Fareast and JSI.
I don’t know if the Front Fareast listed on wikipedia is the same [link]. And I’ve found nothing about JSI.
About the hardware. You plug it in the Hucard slot and the joypad port. Oh I forgot, you can only use it on a PC Engine/CoreGrafx. On the front you have a standard Hucard slot and 2 joypad ports. Only the right one is working. I don’t know what’s the left one is used for. There are 2 parallel ports on the right. One for an external floppy disk drive and another one labeled COM port. I only tested the floppy disk.

What’s inside?

An Altera chip, a Motorola chip, some RAM, some TTL chips and a UV EEPROM (on the onther side). I haven’t identified the chips yet.
So what happens when you power it on? Well you see this:

It may work like the other flash cards (TotoTek PCE Pro, Krizz Everdrive). This means that it acts like a standard Hucard. But some special addresses are used to communicate with the special hardware. For example the Everdrive can access the SD card via some addressess that are mapped to the SPI registers. I pretty sure the code is hosted on the UV EEPROM.

• RUN FILE
• RUN IC CARD
• SAVE IC CARD
• RENAME FILE
• DELETE FILE
• FORMAT DISK

RUN FILE: Run a file from the floppy disk.

RUN IC CARD: Run the Hucard.
Unfortunately I was unable to run any card. It either tells me that there’s no Hucard or it will freeze. This surely comes from oxidized card connector.

SAVE IC CARD: Save Hucard to floppy disk.
It’s the menu that would have sent you to jail in the 90’s 😀 You name the file, press I and it will start dumping the Hucard on the floppy disk. Unfortunately the controls are awful. Pressing RUN will spit a whole lot of chars. Same for SEL, it’s more likely that it’ll delete the whole filename. You can’t write a single char. So you’ll end up with filenames like ‘ssssooooolll’ or ‘bbbbooommbbb’ 🙂
As I said before I was unable to run any Hucard. Despite the fact that it seems to have dump the game, I was unable to run any of the game I’ve dumped. I’ll check the dumped files as soon as I get my hands on an USB Floppy drive. But I’m pretty sure I’ll have a bunch of 0xff.

RENAME FILE: Rename one of the file on the floppy disk.

DELETE FILE: Delete a file from the floppy disk.

FORMAT DISK: Format floppy disk.
You can either format a HD (1.44M) or DD (720K) floppy drive. Be carefull, it’ll start formatting the disk once you enter the menu.

That’s all.
I’ll try to clean/fix the connector. And once I receive the USB floppy drive I’ll check the dumped files and verify which filesystem the disk was formatted.
The coolest stuff will be to dump the UV EEPROM. So if someone knows how to safely read it, I’ll be more than pleased to hear it.

Note: This was originally posted on the PC-Engine dev forum [here].

## Roger Quartermain and the lost script of the Heisei era.

Roger is happy. He bought a cheap PC-Engine CDRom game called “Nishimura Kyotaro Mystery.: Hokutosei No Onna”. It seems to be a detective game like J.B. Harold Murder Club, Jake Hunter, Ace Attorney, Le Manoir de Mortevielle or Maupiti Island. It was published by Naxat (now Kaga Create) in 1990.
He remembers that some games have a language option. For example, the japanese version of J.B Harold Murder Club can be played in english. Unfortunately such option was nowhere to be found in Hokutosei No Onna… Nevermind! Roger Quartermain grabs his boots, his hat and jumps into the game dark bowels!
The introduction seems to be about some dude stabbed in his room, 2 girls going to a train station and another guy who is apparently stalking them.

As it’s a CDRom game, it must use the BIOS function for displaying text. Roger took his old notes and search for any text related routine. Here it is! ex_fnt located at $e060. He draw his favorite emulator and set a breakpoint to it. It fires just after the introduction when what seems to be the savegame selection screen appears. Roger is not really interested in the ex_fnt routine but more in the calling environment. So he sent breakpoints to the potential rts. A gentle pression to the R key sends him to$f1e2. Another push to the S key brings him the the caller.

Here it is at $566b. According to the notes, the shift-jis is loaded from$41, $42 to$f8 and $f9. The logical next step is to put a write breakpoint to$41 and $42. The write breakpoint is triggered at$5b20. The code is pretty simple.

5b1c: ldy #$00 lda ($3c), Y
5b20: sta $41 iny lda ($3c), Y
iny
sta $42 The next step is to search where the$3c pointer. Once again Roger has to set a new write breakpoint at $3c and$3d.

5a18: lda #$90 bra$5a22
lda #$40 bra$5a22
lda #$80 5a22: sta$c0
5a24: sty $3d cpy #$00
beq $5a2f stx$3c
smb5 $c0 rts Unfortunately the caller is not that obvious to discover. But the good news is that the text is not compressed. By keeping track of the values stored at$41 and $42, Roger manages to extract the string 40 81 40 81 40 81 C7 82 CC 82 54 8B E4 88 59 8C 96 8E C5 82 6E 8E DF 82 DC 82 B7 82 A9 82 48 81 0D 00 FF 00 The endiannes must be swapped. He hopefully has some handsome perl script at hand swap_endian.pl. The last 4 characters looks like some control code. His long time experience enables him to deduce that 00 0D is some kind of newline code and 00 FF may be for end of text. Here’s the string.  どの亀井刑事で始めますか？ But this doesn’t tell Roger where the$3c pointer is set. Roger knows that on CDRom systems, the data is first transfered to RAM before being executed. This means that the code can be self-modifiable. Unluckily this is the case here. Roger has to set a breakpoint where the jump is done and run it until he finally reaches the pointer initialization code. Time passes and he ends up at $724d 724d: ldx #$fc
ldy #$74 jmp$5a18

He restarts the game with only the write breakpoint at $41 and$42. As expected he ends up at the same code as before for the first sentence. The next run lands at a different location acb4.

acae: lda $4f asl A tay lda ($50), Y
acb4: sta $41 iny lda ($50), Y
beq $acd8 sta$42
lda $2922 sta$3f
lda $2923 sta$40
lda #$01 jsr$5655

The pointer $50 is initialized at ac92. ac8a: tay lda$312a, Y
asl
ac8e: tay
ac90: lda ($9f), Y sta$50
iny
lda ($9f), Y sta$51

So the pointer is set up from a table. He gets the values

• b1b6
• bdb6
• cdb6
• d9b6
• e7b6

Hopefully the indices for this pointers are 2,3,4,5,6. Roger searchs in the iso for b1 b6 bd b6 cb b6 d9 b6 e7 b6. He ends up at the offset $11f49. The place is filled with what looks like pointers. Some bytes after the string he recognises byte swapped shit-jis values. It starts at$11f53. So The pointer soup must contain some value equals to $53Xf. He scrolls up and finds$53af at $11d45. Tadam, he has his first table and string bloc. As all the pointers are in increasing order, Roger deduces that the option strings are stored from$11f53 to $126f4. But just after this, he notices some other shift-jis symbols. It goes from$126f5 to $132f5. Once the options are display, he jumps to another screen where some grumpy old cop is talking to you. Once again, he set his breakpoints to$41 and $42. But then he remembers$5a18. It starts with

5a18: lda #$90 bra$5a22
5a1c: lda #$40 bra$5a22
5a20: lda #$80 5a22: sta$c0

So he can get here from $5a18,$5a1c or $5a20. This gives him 3 more breakpoints.$5a20 is the one and as expected it comes from RAM initialized code ($59ab). He prepares for a painful ride. And painful it is. Nevertheless, he finds out that that$5a20 is reached from $6c5e. 6c5e: ldy$a5
lda ($a8), Y sta$00
iny
lda ($a8), Y sta$01
iny
sty $a5 and$00
cmp #$ff beq$6c79
ldx $00 ldy$01
jmp $5a20 The first sentence of the grumpy cop comes from$89c9 (iso offset $1339c9). This address comes from a table pointed by$a8. It’s setup at $707e. 707e: lda #$00
sta $00 lda #$80
sta $01 lda ($00), Y
sta $a8 iny lda ($00), Y
sta $a9 stz$a5
stz $a6 lda #$03
jmp $619c The pointer is store at$13308c. Roger is perplex. It’s not just a simple pointer table. There is some extra data. He needs to figure out how it works but he is tired and there’s Night of the Creeps on TV.
He streches, closes the emulator and his laptop and prepares himself for 90 minutes of pure 80’s badassery.

## Hole in the sky.

I first came across the à trous algorithm when I was looking for a good edge avoiding filter for SSAO. Back then I was trying to code a demo. I had depth of field, some kind of hdr, spherical harmonics ambient lighting and screen space ambient transfer. I tried several techniques like the bilateral filter described in this paper (4.2 Bilateral Upsampling). I don’t remember why but I have issues with this filter so I went on a quest for a better filter. Long story short, thanks to Ke-Sen Huang’s page I read a paper called Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering (paper, slides).
I won’t make a lecture about wavelet transforms but this is how it roughly works.
We start with the source image I. At each iteration iwe will compute 2 images, $$d_i$$ and $$c_i$$. The first one will hold the filtered output and the second the details. This can be translated as:

• $$c_0 = I$$
• $$c_{i+1}(p) = \sum_{k \in H} h_i(k) c_i(p+2^ik)$$
• $$d_i = c_i – c_{i+1}$$

With $$h_i$$ being the filter. And $$H$$ the size of our filter. When the desired iteration $$n$$ is reached, the set of detail images and the last filtered image form our wavelet transform.

• $$w(I) = \{d_0, d1, \cdots, d_{n-1}, c_n\}$$

As you can see, the space between each sample doubles at each iteration. Hence the name “à-trous” (with holes). Moreover the dimensions of the filtered images do not change. As opposed to Haar wavelet transform where $$c_{i+1}$$ is half the size of $$c_i$$.
The next step is called the synthesis. We simply add the elements of the $$w(I)$$ to produce the final image.

• $$I’ = c_n + \sum_{0 \leq i < n}d_i$$

The wavelet filter can be seen as a pyramidal algorithm. The wavelet transform is the “pull” phase. And the synthesis the “push” phase (see this paper for an overview of “push pull” algorithm). As the scaling function is a $$B_3$$ spline, the filter $$h$$ is :

• $$\left(\frac{1}{16},\frac{1}{4},\frac{3}{8},\frac{1}{4},\frac{1}{16}\right)$$ in $$\mathbb{R}$$
• $$\begin{pmatrix} \frac{1}{256} & \frac{1}{64} & \frac{3}{128} & \frac{1}{64} & \frac{1}{256} \\ \frac{1}{64} & \frac{1}{16} & \frac{3}{32} & \frac{1}{16} & \frac{1}{64} \\ \frac{3}{128} & \frac{3}{32} & \frac{9}{64} & \frac{3}{32} & \frac{3}{128} \\ \frac{1}{64} & \frac{1}{16} & \frac{3}{32} & \frac{1}{16} & \frac{1}{64} \\ \frac{1}{256} & \frac{1}{64} & \frac{3}{128} & \frac{1}{64} & \frac{1}{256} \end{pmatrix}$$ in $$\mathbb{R}^2$$

If we apply this to image filtering, we will have:

1. $$c_0 = I$$
2. $$\forall i \in [0,n[$$ and $$\forall p \in [0,S_x] \times [0,S_y[$$
• $$c_{i+1}(p) = \sum_{0 \leq y < 5} \sum_{0 \leq x < 5} h_i(x,y) c_i\left(p+ 2^i(x,y)\right)$$
• $$d_i(p) = c_i – c_{i+1}(p)$$
3. $$I’=c_n + \sum_{0 \leq i < n}d_i$$

Edge awareness is acheived by adding a Gaussian distribution based weighing function of the colorimetric difference between the sample and source pixel (just like any standard bilateral filter).

• $$\omega_{\sigma}(u,v) = e^{\frac{\left\|c_i(u) – c_i(v)\right\|^2}{\sigma}}$$

2will be:

• $$c_{i+1}(p) = \frac{1}{W} \sum \sum \omega_{\sigma}\left(c_i(p),c_i\left(p+2^i(x,y)\right)\right)h_i(x,y) c_i(p+2^i(x,y))$$
• with $$W = \sum\sum \omega_{\sigma}\left(c_i(p),c_i\left(p+2^i(x,y)\right)\right)h_i(x,y)$$

Denoising can be acheived by thresholding the detail images.

• $$d’_i = max\left(0, |d_i|-\tau\right) \cdot sign(d_i)$$

More details can be found in “Edge-Optimized À-TrousWavelets for Local Contrast
Enhancement with Robust Denoising” by Johannes Hanika, Holger Dammertz and Hendrik Lensch (paper, slides).
Implementing it in OpenGL/glsl is pretty straightforward. The detail textures will be stored in a texture array. And as we only need the last filtered image for the synthesis, we will ping-pong between 2 textures.
Here is how the texture array is created:

glBindTexture(GL_TEXTURE_2D_ARRAY, texId);
glTexImage3D(GL_TEXTURE_2D_ARRAY, 0, GL_RGB32F, imageWidth, imageHeight, LEVEL_MAX, 0, GL_RGB, GL_UNSIGNED_BYTE, 0);
for(i=0; i<LEVEL_MAX; i++)
{
glTexSubImage3D(GL_TEXTURE_2D_ARRAY, 0, 0, 0, i, imageWidth, imageHeight, 1, GL_RGB, GL_UNSIGNED_BYTE, imageData);
}
// Set texture wrap and filtering here.
glBindTexture(GL_TEXTURE_2D_ARRAY, 0);


You attach a layer just like a standard 3D texture.

glBindFramebuffer(GL_FRAMEBUFFER, framebuffer);
glFramebufferTextureLayer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, texId, 0, layerIndex);

You access it in a shader using a sampler2DArray. Here is an example based on the synthesis shader:

#extension GL_EXT_texture_array : enable
uniform sampler2D filtered;
uniform sampler2DArray details;

out vec4 outData;

void main()
{
int levelMax = int(textureSize(uDetails, 0).z);
vec4 data = texelFetch(filtered, ivec2(gl_FragCoord.xy), 0);;

for(int i=0; i<levelMax; i++)
{
data += texelFetch(details, ivec3(gl_FragCoord.xy, i), 0);
}

outData = data;
}


I uploaded the source code of a little demo here. It will filter the infamous mandrill image using a edge preserving à-trous wavelet transform. I intentionally used “extreme” parameters to show the effects of the edge avoiding filter. It comes with a Microsoft Visual Studion 2010 project. Note that you will need GLFW and GLEW. You will also have to edit the ContribDir item in platform\windows\wavelet.vcxproj.props.
On the left you have the original image. And on the right the filtered image with $$\sigma=1.125$$ and $$\tau=0.05125$$.

## You see this? This… is my boomstick!

Yeah a new post!

Some years ago (more like a decade than a year), I bought a Turbo Stick. It’s the standard 2 buttons stick for NEC PC Engine console. Unfortunately the stick makes an awfull squealing nose. Forget 1 life run with this plastic beast. After 5 minutes all I wanted to do was to use it as a soccer ball. According to the Wataru’s PC Engine hardware list, 4 six buttons sticks were released. They are not small and given shipping costs I decided to build one. So I bought a stick and some buttons from Sparkfun. Later a friend gave me a Sanwa stick and buttons.

There are several places on the internet where to find PC Engine pad schematics. The main source are Emanuele Bettidi‘s pad schematics, PCE Wiki, NFG PC engine/TG 16 article. I first built a 2 buttons circuit with no auto fire. But there are a few games that requires a 6 buttons stick (Street Fighter 2 for example). The Avnue Pad 6 comes with auto fire and slow mode. The later is a particular ugly hack. The slow mode is acheived by toggling the pause rapidly. It’s as if you drank too much coffee and go berserk on the RUN button. Here are the guts shots and schematic of an Avenue Pad 6. I spent hours trying to make PCB board in Eagle for the “complete” version (with auto fire and slow mode). I didn’t find a local source for the DTC114Y. A replacement was found on the interweb. It was told that this component can be replaced by a 2N3904 and 2 resistors. This is rather straightforward if you look at the DTC114Y datasheet. Anyway, in the end I didn’t liked the way the board looked. The most logical step was to remove the rapid fire and slow mode (schematic). And by the way, I think I’ll never use slow mode and barely ever used auto-fire.

Here are the current version of the schematic and pcb board:

Here’s a 3d view of the nearly version of the board (made with WebGerber)

I need to add mounting holes in the board. Make some board. And last but not least, build the enclosure!

## Alice’s bubbles (Part 3)

Here comes level names. This was the most painful part of the whole translation hacking. First, they are made of sprites. Second, there is a limited number of them. And last but not least, the table that tells which sprites to use for a given level name can’t be expanded.
The LSB of the pattern indices for the first level are stored at $5196 (file offset) and$519C for the second. After putting a read breakpoint on $*4f96 (remember that there’s usually a 512 bytes header to PC Engine roms) and$*4f9C, I ended up at $8E3F. A quick look at the code show me there is a address table stored at$8F81 where each address starts 1 byte before the LSB pattern index list. The first byte indicates the number of sprites. Here’s the complete index list:

 8F95 : 05 00 02 04 06 08 8F9B : 05 0E 02 12 06 08 8FA1 : 05 18 0A 16 06 08 8FA7 : 05 14 00 16 06 08 8FAD : 05 1C 02 20 06 08 8FB3 : 04 18 1A 06 08 8FB8 : 04 22 24 06 08 8FBD : 05 02 10 18 06 08 8FC3 : 04 26 28 06 08

At this point, I managed to avoid using compressed data. Unfortunatelly the whole sprites were compressed. In fact, there are 4 “decompression” schemes. The first one is a direct transfer to VRAM using the tia instruction. The second is using a simple RLE. The third one and the most tricky mixes RLE and some XOR. The last one simply copies the data to VRAM using a handcrafted loop.

f796:   LDA <$7d ORA <$7e
BEQ f7e1
JSR f888
CMP #$00 BNE f7ac ; #1 Transfer to VRAM using tia TIA$3700, $0002,$0020
BRA f7d2
f7ac:   CMP #$02 BNE f7bc ; #2 RLE JSR f817 TIA$3700, $0002,$0020
BRA f7d2
f7bc:   CMP #$03 BNE f7cf ; #3 RLE + XOR JSR f817 JSR f857 TIA$3700, $0002,$0020
BRA f7d2
f7cf: ; #4 Loop transfer
JSR f803

f7d2:   LDA <$7d SBC #$01
STA <$7d LDA <$7e
SBC #$00 STA <$7e
BRA f796

f7e1:

The RLE routine:

f817:   LDA [$78] STA <$83
LDY #$04 LDA #$09
STA <$81 STZ <$80
CLX
LDA #$20 STA <$82

f828:   DEC <$81 BNE f83a PHY LDA #$08
STA <$81 INC <$80
LDY <$80 LDA [$78], Y
STA <$83 PLY f83a: ROR <$83 ; the rle counter
BCS f849
STZ $3740, X INX DEC <$82
BNE f828
JMP f7f6

f849:   LDA [$78], Y STA$3740, X
INY
INX
DEC <$82 BNE f828 JMP f7f6 The “XOR” routine: f857: PHX PHY LDY #$07
CLX

f85c:   LDA $3740, X EOR$3742, X
STA $3742, X LDA$3741, X
EOR $3743, X STA$3743, X
LDA $3750, X EOR$3752, X
STA $3752, X LDA$3751, X
EOR $3753, X STA$3753, X

INX
INX

DEY
BNE f85c

PLY
PLX
RTS

The XOR is used here to maximize the number of repetitions in order to make the RLE more efficient. Well, that’s how I understood it. I didn’t make any benchmark to verify if it was really the case. The encoder was written pretty quickly. I reworked the font so that it more or less matches the ASCII set. This simplified the script encoder and the display routine.

Back to level names! I got help from a bunch of people to translate them. The last character of each level can be translated to country. Unfortunately that was far too big so I switched to “land”.

• Candy Land
• Toy Land
• Greenery Land
• Ice Land
• Time Land
• Water Land
• Sky Land
• Mirror Land
• Queen Land

Here’s the actual “font” made of 16×16 sprites.
The zigzag lines on the last 16×16 sprite are there to make the compressed data fits into 504 bytes. I must also keep an empty sprite for spacing.

I had the sprite index list done pretty quickly. But when I started working on the index pointer table, I realized that there were 9 entries. 9? Did I forgot one?
Yes, I forgot one. The water land… And I didn’t have enough room to put the missing letters (bloody size limit).

After a cautious review, it was decided to rename “Greenery land” into “Green land”. As There were already a “Queen land”, there’s now enough room for the missing “Water”.

The only issue remaining was that the sprite index list is 42 bytes long, and mine was 43. As each level name ends with the same sequence I can use the last index as the beginning of the next one (each sequence starts with the length of the index list). So I swapped the sprite for “nd” with the beginning of the one whose index was equal to a sequence length, namely “To” which was at index 6. Why 6? First because it’s the length of the longest index list and also because sprite indexes are always even.

Here’s the original index list (the fist 2 hexadecimal values are the values of the pointer table):
 8F95 : 05 00 02 04 06 08 8F9B : 05 0E 02 12 06 08 8FA1 : 05 18 0A 16 06 08 8FA7 : 05 14 00 16 06 08 8FAD : 05 1C 02 20 06 08 8FB3 : 04 18 1A 06 08 8FB8 : 04 22 24 06 08 8FBD : 05 02 10 18 06 08 8FC3 : 04 26 28 06 08

The new one:
 8F95 : 05 00 06 04 24 06 8F9B : 04 02 04 24 06 8FA0 : 05 08 0A 0C 24 06 8FA6 : 04 0E 10 24 06 8FAB : 05 12 14 28 24 06 8FB1 : 05 20 22 26 24 06 8FB7 : 04 16 04 24 06 8FBB : 06 18 1A 1C 28 24 06 8FC2 : 05 1E 0A 0C 24 06

Shortly after beating the evil level names I received the first version of the translated script. Unfortunately the first tests revealed that the current text layout was unreadeable. There were only 2 or 3 words per line. So I had to crawl brack into the rom and change the text box position and size. I was only some couples of values to change.

So the 24th of July 2011, the english patch was released. Nearly 2 years after I started working on it… Anyway, some days after the release a couple of angry frenchmen complained that we didn’t release a french patch. In order to silence those annoying creeters I had to modify the font and the script encoder in order to add accents. That was not a big deal. But I let the french translators handle the level names (with some kind of sadistic grin). After some weeks of hard work, they decided to give up and use the english names. The french translation was released in December 2011.

• English translation
• French translation