IPS delivery (part 2)

Edit: The version listed in this post is buggy. Please consider using version 0.02 [link] or the current development version [link] instead.

Here it is! The gtk version of the ips patcher I wrote earlier.
It’s based on GTK. And I use Glade to create the interface.

I thought I’d have to implement zillions of callbacks but in the all I had to to is attach the IPS structure to the main window, implement a callback and add a thread.

This was also the occasion to clean up and simplify the IPS code.

You can download it [here].
Feel free to contact me or leave a comment if you have any request or problem.

Here are the mandatory screenshots!

  • Main window :

    IPS patcher main window

  • Error window (it’s a little too big in my opinion 🙂 ) :

    IPS patcher error window

  • Success window :

    IPS patcher success window

IPS delivery

Edit: The version listed in this post is buggy. Please consider using version 0.02 [link] or the current development version [link] instead.

Today I wanted to try the translation of Metal Max Returns. This game looks really cool. As I’m super lazy, I ran the windows version of snes9x through wine. I haven’t tested the 64bit version yet. I realized that there’s no IPS patcher under linux. So once again I had to run a win32 IPS patcher (arkana ips) using wine.

I was wondering where there wasn’t any IPS patcher under linux… Maybe the IPS format is a real mess. In fact it’s the total opposite. It’s really simple.

I coded my own patcher in less than 2 hours (including testing). Here’s the quick non-documented code :

Here’s how you use it :
Usage : ips-patch ipsfile inputrom [outputrom]
ipsfile is the filename of the IPS patch to apply.
inputrom is the filename of the rom to patch.
outputrom is the filename of the patched rom. It’s an optional argument. If you don’t specify it, the input rom will be saved in inputrom.sav and then overwritten.

I’ll try to add some GUI later if I have enough motivation 🙂

Double Bit Dragon

In my quest for demo effects I’m studying zoom. 2 days ago while looking at the code for right shifting signed bytes, I realized that this technique can be used to “double” a byte.

Here’s a little schema showing byte “doubling”:

b0 b1 b2 b3 b4 b5 b6 b7

b0 b0 b1 b1 b2 b2 b3 b3    b4 b4 b5 b5 b6 b6 b7 b7

How can signed byte bit right shifting can help us?
Signed numbers are represented in two’s complement. I won’t explain it here. But when you are shifting a negative value to the right the most significant bit is replaced by 0. However this bit is always 1 for negative values. A way to solve this issue is to use the right bit rotation instruction (ROR) instead of the right bit shift instruction (LSR). ROR shifts the bits on position to the right. The carry flag is shifted to bit 7 and the bit 0 is shifted to the carry flag. We must copy the value of the 7th bit to the carry flag. We’ll use the CMP for this. The carry flag is set if the value in the accumulator is equal or greater to the compared value. So comparing the accumulator to #$80 (128) will do the trick.

We’ll use this technique to “double” the bits. First we shit the source byte to the right. The least significant bit will be in the carry flag. Then we rotate the accumulator to the right. The carry flag is then shifted to the 7th bit of the accumulator. We compare the accumulator to #$08. The carry flag is now equal to the 7th bit of A. Rotate the accumulator to the right one more time. Et voila! 🙂

This macro reads a bit from the source byte and adds it twice to the destination.

doubleBit .macro
    lsr <__src
    ror A
    cmp #$80
    ror A
          .endm

And finally here’s the code to “double” a 8 pixels long line. You’ll have to repeat it for each lines. If you want to double a tile on pc-engine you’ll have to call this of code for each “line” (32 times).

    cla

    doubleBit    ; bit 0
    doubleBit    ; bit 1
    doubleBit    ; bit 2
    doubleBit    ; bit 3
    sta <__dest

    doubleBit    ; bit 4
    doubleBit    ; bit 5
    doubleBit    ; bit 6
    doubleBit    ; bit 7
    sta <__dest+1

We now have the basis to make a complete zoom-in routine! 🙂

8bitranger! Tornado blast!

Some months (nearly a year) ago, i made a post about etripator. It’s a pc-engine rom disassembler… At that time the code was pretty ugly. Except the csv file reading and other utility functions, all the work was made in a single C file and (worst of all) in the main function. Well some people started to test it and asked for new features.
You can easily imagine that as soon as i added new features, the all code became a real mess. If i didn’t want it to become unmaintainable and buggy as hell, i had to clean it up and start thinking seriously about its design. That’s what i did during my summer vacation. After a week of mad coding and testing, what first started as a toy looks more like a real usable program now.

Ladies and gentlemen! I’m pleased to announce the first official release of etripator!

The previous test releases can be found here.

I would like to thank tomaitheous, Charles MacDonald, B.T Garner, David Shadoff, all the punks on #utopiasoft and necstasy.

Expect more releases. Mainly because it was not heavily tested and it needs some polishing (especially the documentation. The ReadMe file I wrote sucks.).

Zero right ahead!

Yes! Zero wouldn’t be considered as the default/standard return error value. I learnt the hard way today.

I had some kind of weird NULL pointer bug. After some investigations, i found that the deserialization function was failing at some point leaving the whole structure uninitialized. First i blamed myself for not calling the standard structure initialization function when the deserialization function failed. Then i looked at the data file.
Everything seemed to be ok. So i went on a step by step execution with the debugger. 5 minutes later i found the culprit. It was zlib in the kitchen with ten tone hammer. It was strange.

Let’s say that i compressed a buffer of N bytes into M bytes. When i read back the N bytes from the compressed buffer, i get the whole data before the end of the compressed buffer. Leaving me with 5 extra bytes.
On the next run i try to read P bytes. But i still have those 5 bytes to decompress. So i feed them to inflate (the zlib decompression routine). It returns a nice Z_STREAM_END. At this point, nothing wrong as i finished to process the previous compressed buffer. But (and that’s a big but     …     sorry    ) no bytes were output.
In fact it’s an excellent news as we already get the N bytes back. So this bytes are … what… padding bytes ?
Anyway, the fact that i got 0 bytes back without any zlib or system error was a relief. But the “dezip” function is returning 0 on error or the size of the decompressed data when everything works…
So… I was considering that the “dezip” function failed even if everything was alright.

The whole issue was fixed in 5 minutes by changing the default error value from 0 to -1….

I spent the whole day tracking this bug…

The Attack of the Patatoids!

Last day i was reading the wikipedia article about the Pioneer program. I don’t know how, but after following some links i ended on the Hayabusa mission article.
The Hayabusa space probe was launch in 2003 to study the Itokawa asteroid. The most ambitious part of the mission was about landing on the astoreid to collect samples. From what i understood, the samples would not have been collected by drilling the asteroid but from the debris made by the collision between the asteroid and a projectile launched from the space probe. Unfortunately, the attempts to land on the asteroid failed. Nevertheless, it seems that the Hayabusa managed to collect some samples.
The Hayabusa may be bak to earth around 2010.

Using the data collected, Robert Gaskell from the Planetary Science Institute generated 3D models of the Itokawa asteroid. The model is available in 3 formats (plate-vertex table, STL ASCII and triangle) and in 4 resolutions (from 49,152 to 3,145,728 triangles).
The plate-vertex table format is similar to ply2 format. So, i took the 3M triangles version. Translated it into ply2 and launched my crappy raytracer on it.
I was surprised by the rendering speed. A 256×256 image was rendered in less than 1 minute. Nevertheless, i had a nasty bug 🙁

Buggy rendering of Itokawa asteroid

As you may have noticed there’s a bug on the left part. Here’s a close up.

buggy correct
Buggy rendering of Itokawa asteroid (close up) Correct rendering of Itokawa asteroid (correct)

The “correct” image was obtained after 3 hours of computation by disabling BVH. So the problem came from the BVH code. After almost rewriting the BVH building and intersection code, i still had the same bug. Some heavy code review later, i realized that the wrong ray intersection was returned. In fact it was the informations of the last intersection instead of the one from the closest intersection point. It’s a very stupid bug. Well at least i cleaned up the BVH code.
So for your pleasure here is a nice 512×512 image of the Itokawa asteroid.

512×512 image of Itokawa asteroid

And some stats:

  • BVH building time : 0h 0min 38.943722 sec
  • Rendering time : 0h 0min 7.411200 sec

As i said before, the code is greatly unoptimized but it manages to render a 3M triangles model in less than one minute. I’ll try to speed up the BVH building code, use a clever cost function and monitor memory usage (while running the top command, i noticed that more than 70% of the RAM (1go) was used.)

Bonus : dark side of the patatoid!

Dark side of the Itokawa asteroid

Having fun with OpenGL

This weekend i played a little bit with OpenGL. First, i helped my brother with some issues using the GL_ARB_texture_rectangle extensions. I never used it before. It’s very easy to use. You just have to use GL_TEXTURE_RECTANGLE_ARB instead of GL_ GL_TEXTURE_2D when you call openGL texture functions (or when you want to enable / disable texturing). The tricky part is that you have to specify that the pixel elements are stored one after the one. Also, textures coordinates values (u,v) are between [0,twidth]x[0,theight] instead of [0,1]x[0,1].

// Enable non rectangular textures
glEnable( GL_TEXTURE_RECTANGLE_ARB );
// Generate texture id
glGenTextures( 1, &id);
// Pixel component are stored linearly
glPixelStorei( GL_UNPACK_ALIGNMENT, 1 );
// And now store the texture data
glBindTexture( GL_TEXTURE_RECTANGLE_ARB, id);
glTexParameteri( GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MIN_FILTER, GL_LINEAR );
glTexParameteri( GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MAG_FILTER, GL_LINEAR );
glTexImage2D( GL_TEXTURE_RECTANGLE_ARB, 0, GL_RGB, textureWidth, textureHeight, 0, GL_RGB, GL_UNSIGNED_BYTE, data );

I must admit that i only used it to display a rectangular texture on screen. I found several examples on the web where it was used to avoid to scale the framebuffer when you save it to a texture.

As one of the GGI gsoc student is working on libggigl, i decided to code a little GLX example so that i don’t look stupid if he had questions about it 🙂
Here it is!
There’re no functions, no structures. Everything is the main. That’s what I call pigforce coding.
If I have time, I’ll try to play with offscreen rendering with GLX.

PCEngine tracker status : I’m writing code for hsync and vsync interrupts. It’s mostly a cleanup/study of mkit code. I hope to finish them this week and then start with tile/map management.

Captain’s clog

Aouch… 3 months since last post.
A lot of things happened. First of all i changed job (who said lame excuse?). Then GGI was selected for google summer of code. Guess who is the GGI admin on gsoc.. me :). 2 projects from the idea list were “slotted”. This means that google will fund them. The project i was about to mentor wasn’t selected. Anyway, i hope everything will be alright because some of the selected projects are part of the 3.0 roadmap.
Still about GGI, my current task is to implement Xdbe helper for X target. The task is not easy as it seems. My first try was kinda simplistic and … buggy. Instead of increasing X target speed, it slowed it down. And worst of all, i totally misunderstood the goal of Xdbe helper. It’s not just about swapping the window drawable. In fact, it’s the last and final part of the job. For the moment in X target, all the drawing operation are performed on the window drawable. Multiple buffering is performed by using window clipping. Let’s say we want a visual of 320×240 pixels and 3 frames (in order to do triple buffering, or whatever). We create a parent window with a size of 320 by 240. Then we create a child window of 320 x 240*3. We display a given frame by moving the child window so that the required area becomes visible.
For example, if we want to view the second frame, we move the child window by (0,240).
Back to Xdbe. We don’t need to use window auto-clipping anymore. We can use a single window with a backbuffer attach to it. If we want multiple frames, we can create a pixmap the same way we created the child window or create a pixmap per frame. All the drawing operations will be perfomed on the pixmap. Then on flush, we copy the pixmap data to the backbuffer and swap the window. It’s not as easy as it sounds because i’ll have to reorganize (or completly rewrite) the X target.

On the crbn/ray tracing front. There’s not much coding going on but i’m reading a lot… Ok. Let me rephrase it… I have a lot of papers and books to read 🙂

The pcengine tracker development was resumed this week. I hope to release a test rom for the instrument editor before the end of the month.

And now something different. Yesterday (and today) i rewrote the gallery with javascript and css. The old one was in perl/cgi and had a small oneliner. I still have the code lying around. I’d better burn it for the sake of mankind 🙂 However the only dynamic part in the new one is the image display in javascript, and as i now have this wonderfull blog, i threw the onliner away.
I put a link (“Doodles, drawings, sketch dump…”) to it in the section “My other stuffs”.

Typo negative

I cleaned up the 5×7 font rendering routine. It now uses mkit macros and variables. Here’s a little example of how to use it:

; set string length
lda    #28
sta    <_al

; set string pointer (don't forget to map it before)
lda    #low(string)
sta    <_si
lda    #high(string)
sta    <_si+1

; set vram address where the font will be rendered
lda    #$00
sta    <_di
lda    #$10
sta    <_di+1

; Here we go!
jsr    print5x7

; You can set BAT using the zp var _block containing
; the number of 8x8 bloc written

The code contained in the archive doesn’t need mkit as all the needed macros and variable declarations are included. If you want to use the font rendering with mkit, you’ll only have to include print5x7.asm and font.inc. font.inc contains both the standard 8×8 font, the 5×7 font and datas needed by the font routine.

You can download the archive [HERE].

Shrinked the glyph

I made some mockups for the tracker interface using tile studio. It helped me realize that the standard 8×8 font might be too big. Here’re test images for the instrument and waveform editors.


Instrument editor interface

Waveform editor interface

As you can see, the 8×8 font used in the instrument editor menu is too big. The screen is 320×224 wide and i need to have an area of 256 pixels for the sequence editor. So one way to space is to use a smaller font. I decided to use 5×7 font. But there’s a problem. It’s easy to use 8×8 font as pcengine stores background graphics using 8×8 tiles. And these tiles are displayed using an array (a map). So if you want to print text, all you have to do is to store the font in VRAM and then set the map so that it points to the correct 8×8 tile. The code looks looks more or less like this (in fact it’s a little more complicated or pcengine) :

for(i=0; i<strlen(str); ++i)
{
    tile[x+i][y] = mapBaseAddress + 8 * ((x+i) + (y * W));
}

On the other hand, you can’t use the background tile map for 5×7 as it doesn’t fit the 8×8 boundary 🙁 So all we have to do is to render each text in vram. We’ll have to brutally concatenate bits.
In order to keep things a little bit optimized, i decided to unroll the concatenation loop for 8 iterations. Why? Erm… A little example will explain it better than i will.

byte 0: a0 b0 c0 d0 e0 a1 b1 c1
byte 1 d1 e1 a2 b2 c2 d2 e2 a3
byte 2: b3 c3 d3 e3 a4 b4 c4 d4
byte 3: e4 a5 b5 c5 d5 e5 a6 b6
byte 4: c6 d6 e6 a7 b7 c7 d7 e7

This table represents the output of the concatenations of 8 five bits values. We may repeat this for every 8 characters in the string and we’ll be done … 🙂 The following 2 images show the same string rendered with 8×8 and 5×7 fonts (clic them for a x2 zoom).


8x8 font example

5x7 font example

Here’s a marvelous rom demonstrating this breakthrough in computer science 🙂