ZX Spectrum Assembly, Pong – 0x0A Optimisation
In this ZX Spectrum Assembly chapter, we will implement several optimisations.
Translation by Felipe Monge Corbalán
Table of contents
- ReprintPoints
- ScanKeys
- Cls
- MoveBall
- ReprintLine
- GetPointSprite
- PrintPoints and ReprintPoints
- Ball strike bug at the bottom of the paddle
- ZX Spectrum Assembly, Pong
- Useful links
ReprintPoints
Yes, the ball is a bit slow. This is largely because the marker is repainted on every iteration of the main loop, which is not necessary.
The marker should only be repainted when it is erased by the ball. By changing this aspect, we will gain speed in the sphere by reducing the processing time in each iteration of the main loop.
As usual, we create the folder Step10 and copy the files controls.asm, game.asm, main.asm, sprite.asm and video.asm from the folder Step09.
The first thing to do is to locate the area of the screen where the ball clears the marker by defining a series of constants in sprite.asm, under the constant POINTS_P2:
POINTS_X1_L: EQU $0c
POINTS_X1_R: EQU $0f
POINTS_X2_L: EQU $10
POINTS_X2_R: EQU $13
POINTS_Y_B: EQU $14
The meaning of these constants, in order of appearance, is as follows
- POINTS_X1_L: column in which the ball starts to leave player one’s marker from the left.
- POINTS_X1_R: column in which the ball starts to leave the marker of the first player from the right.
- POINTS_X2_L: column in which the ball starts to leave player two’s marker from the left.
- POINTS_X2_R: column in which the ball starts to leave player two’s marker from the right.
- POINTS_Y_B: third row and scanline where the ball starts to leave the marker at the bottom.
Once the constants have been defined, we will modify the PrintPoints and ReprintPoints routines in video.asm, starting by locating the printPoint_print tag, which we will replace with PrintPoint.
Within the PrintPoints routine we find that there are three calls to printPoint_print, which we will replace with PrintPoint.
Compile, load in the emulator and check that we haven’t broken anything.
Next, we will delete the ReprintPoints routine as we will be reimplementing it from start to finish.
ReprintPoints:
ld hl, (ballPos)
call GetPtrY
cp POINTS_Y_B
ret nc
We load the position of the ball in HL, LD HL, (ballPos), then get the third, line and scanline of the ball position, CALL GetPtrY, and compare it with the position where the marker starts to be erased by the ball from below, CP POINTS_Y_B. No carry, the ball passes under the marker and exits, RET NC.
If we have carry, depending on the X coordinate of the ball, it could erase the marker.
ld a, l
and $1f
cp POINTS_X1_L
ret c
jr z, reprintPoint_1_print
We load the row and column of the ball’s position in A, LD A, L, keep the column, AND $1F, and compare it with the X coordinate where player one’s marker starts to be erased from the left, CP POINTS_X1_L. If there is a carry, the ball passes to the left of the marker and exits, RET C. If the two coordinates match, the ball will erase player one’s marker and jump to reprint it, JR Z, reprintPoint_1_print.
If we don’t go out or jump, we continue with the checks:
cp POINTS_X2_R
jr z, reprintPoint_2_print
ret nc
We compare the X coordinate of the ball with the coordinate where player two’s marker starts to be erased from the right, CP POINT_X2_R. If they are the same, it jumps to reprint player two’s marker, JR Z, reprintPoint_2_print. If it does not jump and there is no carry, the ball passes to the right and goes out, RET NC.
If we don’t jump or go out, we continue with the checks:
reprintPoint_1:
cp POINTS_X1_R
jr c, reprintPoint_1_print
jr nz, reprintPoint_2
We compare the X-coordinate of the ball with the coordinate where the ball starts to erase the marker of player two from the right, CP POINTS_X1_R. If there is a carry, it erases the marker and jumps to reprint it, JR C, reprintPoint_1_print. If they are not the same coordinates, it passes to the right of player one’s marker and jumps to check if it erases player two’s marker, JR NZ, reprintPoint_2.
If it erases player one’s marker, repaint it:
reprintPoint_1_print:
ld a, (p1points)
call GetPointSprite
push hl
We load the points of player one into A, LD A, (p1points), get the address of the sprite, CALL GetPointSprite, and keep the value, PUSH HL.
We start by painting the first digit, the tens:
ld e, (hl)
inc hl
ld d, (hl)
ld hl, POINTS_P1
call PrintPoint
pop hl
We load the low part of the address of the first digit sprite into E, LD E, (HL), point HL to the high part of the address, INC HL, load it into D, LD D, (HL), load the address where player one’s marker will be painted into HL, LD HL, POINTS_P1, paint the first digit, CALL PrintPoint, and get the value of HL, POP HL.
Finally we paint the second digit:
inc hl
inc hl
ld e, (hl)
inc hl
ld d, (hl)
ld hl, POINTS_P1
inc l
jr PrintPoint
We point HL to the address of the second digit sprite, INC HL, INC HL, load the low part into E, LD E, (HL), point HL to the high part, INC HL, load it into D, LD D, (HL), load the address to paint player one’s marker into HL, LD HL, POINTS_P1, point HL to the address where the second digit will be painted, INC L, and paint the digit and exit, JR PrintPoint.
You might ask, how do we get out? There is no RET!
You might think that instead of JR PrintPoint we should have written JR PrintPoint:
call PrintPoint
ret
And it does work, but it is not necessary. Besides, the way we have implemented it saves time and bytes.
PrintPoint’s last instruction is a RET, and since this is the RET we want to leave, that’s why we use JR instead of CALL and RET. That, and the fact that we don’t have anything to retrieve from the stack. If we did, the results would be unpredictable.
Below we see the difference in time and bytes between one way and the other:
Instruction | Clock cycles | Bytes |
CALL PrintPoint | 17 | 3 |
RET | 10 | 1 |
JR PrintPoint | 12 | 2 |
We have saved fifteen clock cycles and two bytes.
We have also changed the way we repaint.
We used to repaint the markers by doing OR with whatever we had painted in that area, and now we paint the marker directly. The result is that when we paint the marker, we erase the ball, which can cause some flickering. As this flickering is also present in the original arcade, we will leave it as it is, or you can change it.
Let’s see how we repaint player 2’s marker:
reprintPoint_2:
cp POINTS_X2_L
ret c
At this point we just need to check that the ball does not pass between the markers without clearing them. We compare with the left boundary of player two’s marker, CP POINTS_X2_L, and if there is a carry, it goes out, it passes to the left, RET C.
If it doesn’t, we have to repaint player two’s marker, which is almost identical to what we do with player one’s marker, so we’ll mark the differences without going into detail:
reprintPoint_2_print: ; Change!
ld a, (p2points) ; Change!
call GetPointSprite
push hl
; 1st digit
ld e, (hl)
inc hl
ld d, (hl)
ld hl, POINTS_P2 ; Change!
callPrintPoint
pop hl
; 2nd digit
inc hl
inc hl
ld e, (hl)
inc hl
ld d, (hl)
ld hl, POINTS_P2 ; Change!
inc l
jr PrintPoint
The final aspect of the routine is as follows:
; -------------------------------------------------------------------
; Repaint the scoreboard.
; Each number is 1 byte wide by 16 bytes high.
; Alters the value of the AF, BC, DE and HL registers.
; -------------------------------------------------------------------
ReprintPoints:
ld hl, (ballPos) ; HL = ball position
call GetPtrY ; Third, line and scanline
cp POINTS_Y_B ; Compare with position Y where starts
; deleting marker
ret nc ; No carry? Passes underneath
; If the ball arrives here it could erase marker,
; depending on Y position.
ld a, l ; A = line and ball column
and $1f ; A = column
cp POINTS_X1_L ; Compare with the position where deletes
; marker player 1 from the left
ret c ; Carry? pass left
jr z, reprintPoint_1_print ; Same? Delete, repaint
; Continue with the checks
cp POINTS_X2_R ; Compare X ball coordinate with position
; where the marker is deleted 2 on the right
jr z, reprintPoint_2_print ; Equal? Repaint marker
ret nc ; No carry? pass right
; Remaining checks to find out if it clears marker 1
reprintPoint_1:
cp POINTS_X1_R ; Compare X ball coordinate with position
; where the marker is deleted 1 on the right
jr c, reprintPoint_1_print ; Carry? Delete, repaint
jr nz, reprintPoint_2 ; != 0? passes through right
; Repaint player 1's marker
reprintPoint_1_print:
ld a, (p1points) ; A = score player 1
call GetPointSprite ; Address of sprite to paint
push hl ; Preserves HL
ld e, (hl) ; E = lower part of direction
inc hl ; HL = upper part
ld d, (hl) ; D = upper part
ld hl, POINTS_P1 ; HL = address where to paint
call PrintPoint ; Paint first digit
pop hl ; Retrieve HL
inc hl
inc hl ; HL = sprite second digit
ld e, (hl) ; E = lower part direction
inc hl ; HL = upper part
ld d, (hl) ; D = upper part
ld hl, POINTS_P1 ; HL = address where to paint marker 1
inc l ; HL = direction where to paint second digit
jr PrintPoint ; Paint digit and it comes out
; Other checks to find out if it deletes the marker 2
reprintPoint_2:
cp POINTS_X2_L ; Compare X ball coordinate with position
; where the marker is deleted 2 on the left
ret c ; Carry? Pass left
; Repaint player 2's marker
reprintPoint_2_print:
ld a, (p2points) ; A = player score 2
call GetPointSprite ; Address of the sprite to be painted
push hl ; Preserves HL
ld e, (hl) ; E = lower part direction
inc hl ; HL = upper part
ld d, (hl) ; D = upper part
ld hl, POINTS_P2 ; HL = address where to paint marker 2
call PrintPoint ; Paints first digit
pop hl ; Retrieve HL
inc hl
inc hl ; HL = sprite second digit
ld e, (hl) ; E = lower part direction
inc hl ; HL = upper part
ld d, (hl) ; D = upper part
ld hl, POINTS_P2 ; HL = address where to paint marker 2
inc l ; HL = direction where to paint second digit
jr PrintPoint ; Paints digit, it comes out over there
We compile, load the emulator and see the result.
We can see that the ball is now moving faster, even when it should be moving slower. If you look closely, when player two scores and the ball has to go out to the right, you can see part of the ball on the left of the screen for a moment.
If we remember, when we score a point, the ball leaves the court of the player who scored the point. This leads us to the conclusion that the problem lies in the SetBallRight routine, and more specifically in the first line:
ld hl, $4d7f
According to this line, we position the ball at third 1, scanline 5, line 3, column 31.
In addition, two lines below, we change the rotation of the ball, setting it to minus one:
ld a, $ff
ld (ballRotation), a
If we look for the sprite that corresponds to this rotation, we see that it is the following:
db $00, $78 ; +7/$07 00000000 01111000 -1/$ff
So we paint column 31 empty, and in column 32 we paint $78. But column 32 does not exist: there are 32 columns in total, but they go from 0 to 31. If we paint there, we paint in column 0 of the next row.
Having seen this, the solution is simple. We change the first line of the SetBallRight routine to position the ball in column 30:
ld hl, $4d7e
We compile it, load it into the emulator and see how it solves the problem.
And now we’re going to change the speed of the ball so that it doesn’t run so fast.
The ball configuration is stored in ballSetting, in the sprite.asm file:
; Ball speed and direction.
; bits 0 to 3: movements of the ball to change the Y position.
; Values: f = half-diagonal, 2 = half-diagonal, 1 = diagonal
; bits 4 and 5: ball speed: 1 very fast, 2 fast, 3 slow
; bit 6: X direction: 0 right / 1 left
; bit 7: Y direction: 0 up / 1 down
ballSetting: db $31 ; 0011 0001
As we can see in the comments, the speed of the ball is configured in bits 4 and 5. It would be as simple as speed 2 being very fast, 3 being fast, and so on. In two bits we can only specify values from 0 to 3, the rest of the bits are occupied.
We will borrow a bit for the pitch of the ball. This will allow us to reduce the speed of the ball. In return, when the ball goes flat, it will go a little steeper:
; Ball speed and direction.
; bits 0 to 2: Ball movements to change the Y position.
; Values 7 = half-plane, 2 = half-diagonal, 1 = diagonal
; bits 3 to 5: ball speed: 2 very fast, 3 fast, 4 slow
; bit 6: X direction: 0 right / 1 left
; bit 7: Y direction: 0 up / 1 down
ballSetting: db $21 ; 0010 0001
And now there are five routines we need to change:
- CheckCrossY in game.asm: here we assign the inclination and speed of the ball depending on which part of the paddle it hits.
- MoveBallY in game.asm: here we check if the accumulated ball movements are the ones we need to change the Y-coordinate.
- SetBallLeft and SetBallRight in game.asm: here we reset the ball configuration.
- Loop in main.asm: at the start of this routine we check if we have reached the number of loop iterations we need to move the ball.
We start with CheckCrossY found in game.asm. We locate the checkCrossY_1_5 tag and then the OR $31 line:
or $31 ; Up, speed 3, diagonal
According to the new definition, we set speed 4 and diagonal tilt:
00 100 001
Bits three to five specify the speed, bits zero to two specify the pitch. The OR $31 line should look like this:
or $21
Locate the checkCrossY_2_5 tag and set speed 3, semi-diagonal tilt:
00 011 010
We modify the line:
or $22 ; Top, speed 2, half-diagonal
And we leave it as:
or $1a
Locate tag checkCrossY_3_5 and set speed 2, semi-flat tilt:
00 010 111
We modify the line:
or $1f ; Up/Down, speed 1, half-flat
And we keep it that way:
or $17
Locate the checkCrossY_4_5 tag and set speed 3, semi-diagonal tilt:
10 011 010
We modify the line:
or $a2 ; Down, speed 2, semi diagonal
And we keep it that way:
or $9a
Find the checkCrossY_5_5 tag and set speed 4, diagonal tilt:
10 100 001
We modify the line:
or $b1 ; Down, speed 3, diagonal
And we keep it that way:
or $a1
This brings us to the most tedious part of the modification.
From the MoveBallY routine, modify the second line:
and $0f
And we keep it that way:
and $07
With $0f we would get the tilt and the first bit of velocity. With $07 we only get the tilt.
We modify the reset of the ball configuration, which is in the SetBallLeft and SetBallRight routines.
In SetBallLeft we change the line:
or $31 ; Direction X right, speed 3, diagonal
And we keep it that way:
or $21
In SetBallRight we change the line:
or $71 ; Direction X left, speed 3, diagonal
And we keep it that way:
or $61
Finally, let’s modify the code of the Loop tag of main.asm.
On the second line we find 4 RRCA instructions. We remove one, rotate it three times and leave the ball speed in bits 0 to 2.
; rrca ; Delete!
rrca
rrca
rrca
As we now have three bits for the speed instead of the previous two, we modify the following line, which reads:
and $03
And we keep it that way:
and $07
We compile, load into the emulator and find that the ball speed is now more tolerable, at the expense of the slope.
ScanKeys
Now it is time to optimise the ScanKeys routine, as announced in step 2.
In ScanKeys we have several BIT instructions, two BIT $00, A and two BIT $01, A. With BIT instructions we evaluate the state of a particular BIT in a register without changing its value; the BIT instruction occupies 2 bytes and takes 8 clock cycles.
Let’s replace the BIT instructions with AND instructions, saving one clock cycle each. We replace the instructions BIT $00, A with AND $01 and the instructions BIT $01, A with AND $02. With this change we save four clock cycles, even though we are changing the value of register A, which is not important in this case.
Cls
In step 3 we commented that the Cls routine could be optimised by saving eight clock cycles and 4 bytes.
Let’s remember what the routine currently looks like:
; -------------------------------------------------------------------
; Clean screen, ink 7, background 0.
; Alters the value of the AF, BC, DE and HL registers.
; -------------------------------------------------------------------
Cls:
; Clean the pixels on the screen
ld hl, $4000 ; HL = start VideoRAM
ld (hl), $00 ; Clear pixels from that address
ld de, $4001 ; DE = next VideoRAM address
ld bc, $17ff ; 6143 repetitions
ldir ; Clears VideoRAM pixels
; Sets the ink to white and the background to black.
ld hl, $5800 ; HL = start area attributes
ld (hl), $07 ; White ink, black background
ld de, $5801 ; DE = next address area attributes
ld bc, $2ff ; 767 repetitions
ldir ; Assigns value to attribute area
ret
The first part of the routine cleans the pixels, while the second part assigns the colours to the screen. It is in this second part that we do the optimisation.
Once the first LDIR has been executed, HL is worth $57FF, while DE is worth $5800. Loading a 16-bit value into a register takes ten clock cycles and 3 bytes, so with LD HL, $5800 and LD DE, $5801, we consume twenty clock cycles and 6 bytes.
As we can see, HL and DE are worth one less than the value we need to assign the attributes to the screen, so all we need to do is increment their value by one, and that’s where we get the optimisation; we replace LD HL, $5800 and LD DE, $5801 with INC HL and INC DE. Incrementing a 16-bit register takes six clock cycles and occupies one byte, so the total cost will be twelve clock cycles and 2 bytes, as opposed to the current twenty clock cycles and 6 bytes, saving eight clock cycles and 4 bytes.
The last aspect of the routine is:
; -------------------------------------------------------------------
; Clean screen, ink 7, background 0.
; Alters the value of the AF, BC, DE and HL registers.
; -------------------------------------------------------------------
Cls:
; Clean the pixels on the screen
ld hl, $4000 ; HL = start VideoRAM
ld (hl), $00 ; Clear pixels from that address
ld de, $4001 ; DE = next VideoRAM address
ld bc, $17ff ; 6143 repetitions
ldir ; Clears VideoRAM pixels
; Sets the ink to white and the background to black
inc hl ; HL = start attribute area
ld (hl), $07 ; White ink, black background
inc de ; DE = next address attribute area
ld bc, $2ff ; 767 repetitions
ldir ; Assigns value to attribute area
ret
MoveBall
In step 5 we said that we could save 5 bytes and two clock cycles, which we will do by modifying five lines of the MoveBall routine set found in game.asm. Let’s replace the five JR moveBall_end lines with RET; JR takes 2 bytes and twelve clock cycles, while RET takes ten clock cycles and one byte.
As we can see, there is only one instruction in the MoveBall_end tag, RET, so we can replace the JR moveBall_end with RET.
We have said that we save two clock cycles, this is because each time MoveBall is called only one of the JR is executed, so we only save two cycles and not ten, although we do save 5 bytes.
The JR to be replaced are found as the last line of the labels:
- moveBall_right.
- moveBall_rightLast.
- moveBall_rightChg.
- moveBall_left.
- moveBall_leftLast.
The movelBall_end tag can be removed, but not the RET that follows it, even though the tag takes up nothing.
ReprintLine
In step 6 we said that we could save 5 bytes and twenty-two clock cycles, which we will achieve by modifying eight lines of the ReprintLine routine in the video.asm file.
Locate the reprintLine_loopCont tag and move it three lines down, just above the CALL NextScan line.
Locate the line LD C, LINE and delete the following three lines:
jr ReprintLine_loopCont
ReprintLine_00:
ld c, ZERO
Locate JR C, reprintLine_00 and JR Z, reprintLine_00 and replace reprintLine_00 with reprintLine_loopCont.
Find the position of the reprintLine_loopCont tag and delete LD C, LINE four lines above it. Two lines below the deleted line we replace OR C with OR LINE.
What have we done?
The final objective of the routine is to repaint the deleted central line without deleting the area of the ball where it is to be repainted, for which we obtain the pixels on the screen and mix them with the part of the line to be painted, and that is the point; if the part of the line to be repainted is the part that goes to ZERO (white), it is not necessary to repaint it.
The final aspect of the routine is as follows:
; -------------------------------------------------------------------
; Repaint the centre line.
; Alters the value of the AF, B and HL registers.
; -------------------------------------------------------------------
ReprintLine:
ld hl, (ballPos) ; HL = ball position
ld a, l ; A = row and column
and $e0 ; A = line
or $10 ; A = row and column 16 ($10)
ld l, a ; L = A. HL = Initial position
ld b, $06 ; Repaints 6 scanlines
reprintLine_loop:
ld a, h ; A = third and scanline
and $07 ; A = scanline
; If it is on scanlines 0 or 7, it paints ZERO.
; If you are on scanlines 1, 2, 3, 4, 5 or 6, paint LINE.
cp $01 ; Scanline 1?
jr c, reprintLine_loopCont ; Scanline < 1, skip
cp $07 ; Scanline 7?
jr z, reprintLine_loopCont ; Scanline = 7, skip
ld a, (hl) ; A = pixels current position
or LINE ; Add LINE
ld (hl), a ; Paints current position
reprintLine_loopCont:
call NextScan ; Get next scanline
djnz reprintLine_loop ; Until B = 0
ret
GetPointSprite
In step 8 we commented that we could save 2 bytes and a few clock cycles by implementing GetPointSprite in a different way; we will do this without using a loop.
Currently this routine takes longer the higher the score of the players. As long as the maximum score is fifteen, there is no problem, but if it is ninety-nine or two hundred and fifty-five, we have a problem; we saw this in the tests when the game did not stop at fifteen points.
As we can see in the definition of the sprites, each one is 4 bytes away from the other, so we make a loop starting from the address of Cero and adding 4 bytes for each point of the player we are going to paint the marker for. This is the same as multiplying the player’s points by 4 and adding the result to the address of the Zero sprite. This way we would always take the same time whether the points were zero or ninety-nine; we save 2 bytes and a few clock cycles.
In GetPointSprite we get the score in A and return the address of the sprite to paint in HL.
How do we multiply by four, since the Z80 does not have a multiply instruction?
Multiplying is nothing more than adding a number as many times as the multiplier says, or in other words, multiplying a number by 4 would be equal to
2*4 = 2+2+2+2 = 8
We could do this with a loop, but we are going to make it even simpler, because to multiply a number by 4 we only need to do two additions:
3*4 = 3+3 = 6 y 6+6 = 12
That is, we add the number to itself, do the same with the result, and we already have a multiplication by 4. If we added this result to itself, we would already have a multiplication by 8. We continue in this way to multiply by 16, 32, 64, etc., or in other words, n*2n .
There are two ways to implement GetPointSprite without further modification: with a scoreboard of up to sixty-one points or with a scoreboard of up to ninety-nine points.
Let’s take the first one, with a scoreboard of up to sixty-one points (61 * 4 = 244 = 1 byte).
; -------------------------------------------------------------------
; Gets the corresponding sprite to paint on the marker.
; Input: A -> score.
; Output: HL -> address of the sprite to be painted.
; Alters the value of the AF, BC and HL registers.
; -------------------------------------------------------------------
GetPointSprite:
; UP TO 61 POINTS
ld hl, Zero ; HL = address sprite Zero
; Each sprite is 4 bytes from the previous one
add a, a ; A = A * 2
add a, a ; A = A * 2 ( A * 4)
ld b, ZERO
ld c, a ; BC = A
add hl, bc ; HL = HL + BC = sprite to be painted
ret
In this case, the maximum score would be sixty-one, which multiplied by 4 gives two hundred and forty-four, a result that occupies only one byte, so we can use register A to multiply by 4. This routine occupies 10 bytes and takes fifty clock cycles.
If a game of ZX-Pong is too short at sixty-one points, we can do it at ninety-nine. The routine would take the same time as the previous one, but it would take sixty-four clock cycles because we use a 16-bit register for the additions (99 * 4 = 396 = 2 bytes).
; -------------------------------------------------------------------
; Gets the corresponding sprite to paint on the marker.
; Input: A -> score.
; Output: HL -> Address of the sprite to be painted.
; Alters the value of the AF, BC and HL registers.
; -------------------------------------------------------------------
GetPointSprite:
; UP TO 99 WITHOUT CHANGING MARKER PRINT ROUTINE
ld h, ZERO
ld l, a ; HL = points
; Each sprite is 4 bytes from the previous one.
add hl, hl ; HL = HL * 2
add hl, hl ; HL = HL * 2 (HL * 4)
ld bc, Zero ; BC = sprite address Zero
add hl, bc ; HL = HL + BC (sprite to be painted)
ret
If we want a score higher than ninety-nine, we have to modify the marker printing routine, they only print two digits, and take into account that the GetPointSprite implementations would not be valid (we would have to rethink everything, even the way we declare the sprites).
PrintPoints and ReprintPoints
But hey, we just implemented ReprintPoints at the beginning of this chapter!
Well, we’ve actually added a part to repaint the marker only when necessary, but we’ve inherited some things from the original implementation.
In step 8 we said that we could save 2 bytes and twelve clock cycles by modifying the PrintPoints routine. Well, we are in luck, because we will actually save 33 bytes and one hundred and seventy-eight clock cycles; the changes to be made in PrintPoints are also made in ReprintPoints.
On the third line of PrintPoints we find PUSH HL, and this is the first line we are going to move, preserving the value of the HL register in advance. We cut this line and paste it three lines down, just before loading the memory address where player one’s dots are painted in HL, LD HL, POINTS_P1; this is the instruction that motivates us to preserve HL.
After calling paint the dot, we retrieve the value of HL, POP HL, and increment HL twice to point to the lower part of the address where the second digit is. Since we have already preserved HL after positioning it on the high part of the address of the first digit, we remove one of these two INC HL, saving 1 byte and six clock cycles.
The same modification is made when we paint player two’s marker and in the ReprintPoints routine. We save 4 bytes and twenty-four clock cycles.
Spirax told me about another optimisation we could do, where we could remove four INC L instructions, saving 4 bytes and sixteen clock cycles.
In both PrintPoints and ReprintPoints, when we draw the second digit of the markers, we do the following.
ld hl, POINTS_P1
inc l
ld hl, POINTS_P2
inc l
As we do in both PrintPoints and ReprintPoints, we actually do four INC L, and we can avoid it this way:
ld hl, POINTS_P1 + 1
ld hl, POINTS_P2 + 1
In this way we point HL directly to the position where the second digit is drawn and store INC L.
And now we are going to save 25 bytes and one hundred and thirty-eight more clock cycles, thanks again to Spirax.
At the end of ReprintPoints is the reprintPoint2_print tag, and just above it is the RET C instruction. Well, let’s delete the reprintPoint2_print tag and everything below it until the end of the routine. After RET C we’ll insert JR printPoint2_print.
In the previous implementation, PrintPoints and ReprintPoints painted differently, ReprintPoints did an OR with the pixels on the screen, but this is no longer the case, we are going to use the code that paints the marker for player two to repaint it, and we are going to save 25 bytes and one hundred and thirty-eight clock cycles.
The printPoint2_print tag does not exist, so we include it. We look for the PrintPoints tag and see how it first paints the marker for player one, and when it is finished it paints the marker for player two, which starts just below the second CALL PrintPoint. So it is there, just below the second CALL PrintPoint, that we will add the printPoint_2_print tag.
Locate reprintPoint_1, two lines above it is the JR Z line, reprintPoint_2_print. We replace it with:
jr z, printPoint_2_print
Thank you very much Spirax!
The final look and feel of the routines is as follows:
; -------------------------------------------------------------------
; Paint the scoreboard.
; Each number is 1 byte wide by 16 bytes high.
; Alters the value of the AF, BC, DE and HL registers.
; -------------------------------------------------------------------
PrintPoints:
ld a, (p1points) ; A = points player 1
call GetPointSprite ; Sprite to be painted on marker
; 1st digit of player 1
ld e, (hl) ; E = low part 1st digit address
inc hl ; HL = top management
ld d, (hl) ; D = top management
push hl ; Preserves HL
ld hl, POINTS_P1 ; HL = address where to paint points player 1
call PrintPoint ; Paint 1st digit marker player 1
pop hl ; Retrieves HL
; 2nd digit of player 1
inc hl ; HL = low part 2nd digit address
ld e, (hl) ; E = lower part 2nd digit address
inc hl ; HL = top management
ld d, (hl) ; D = top management
; Spirax
ld hl, POINTS_P1 + 1 ; HL = address where to paint
; 2nd digit points player 1
call PrintPoint ; Paint 2nd digit marker player 1
printPoint_2_print:
; 1st digit of player 2
ld a, (p2points) ; A = points player 2
call GetPointSprite ; Sprite to be painted on marker
ld e, (hl) ; E = low part 1st digit address
inc hl ; HL = high side
ld d, (hl) ; D = upper part
push hl ; Preserves HL
ld hl, POINTS_P2 ; HL = address where to paint points player 2
call PrintPoint ; 1st digit of marker player 2
pop hl ; Retrieves HL
; 2nd digit of player 2
inc hl ; HL = low part 2nd digit address
ld e, (hl) ; E = lower part
inc hl ; HL = high part
ld d, (hl) ; D = upper part
; Spirax
ld hl, POINTS_P2 + 1 ; HL = address where to paint 2nd digit
; Paint the second digit of player 2's marker.
PrintPoint:
ld b, $10 ; Each digit 1 byte by 16 (scanlines)
printPoint_printLoop:
ld a, (de) ; A = byte to be painted
ld (hl), a ; Paints the byte
inc de ; DE = next byte
call NextScan ; HL = next scanline
djnz printPoint_printLoop ; Until B = 0
ret
; -------------------------------------------------------------------
; Repaint the scoreboard.
; Each number is 1 byte wide by 16 bytes high.
; Alters the value of the AF, BC, DE and HL registers.
; -------------------------------------------------------------------
ReprintPoints:
ld hl, (ballPos) ; HL = ball position
call GetPtrY ; Third, line and scanline of ball position
cp POINTS_Y_B ; Compare lower limit marker
ret nc ; No haulage? Pass underneath
ld a, l ; A = line and column ball position
and $1f ; A = column
cp POINTS_X1_L ; Compare left boundary marker 1
ret c ; Carry? Pass on the left
jr z, reprintPoint_1_print ; 0?, Is in left margin jumps to paint
cp POINTS_X2_R ; Compare right boundary marker 2
jr z, printPoint_2_print ; 0? It's in the right margin jumps to paint
ret nc ; No carry? Pass right
reprintPoint_1:
cp POINTS_X1_R ; Compare limit marker 1
jr c, reprintPoint_1_print ; Carry? passes through marker 1
; jumps to paint
jr nz, reprintPoint_2 ; !=0? Passes right,jumps check step marker 2
reprintPoint_1_print:
ld a, (p1points) ; A = points player 1
call GetPointSprite ; Sprite to be painted on marker
; 1st digit
ld e, (hl) ; E = lower part 1st digit address
inc hl ; HL = high side
ld d, (hl) ; D = upper part
push hl ; Preserves HL
ld hl, POINTS_P1 ; HL = address where to paint points player 1
call PrintPoint ; Paint 1st digit marker player 1
pop hl ; Retrieves HL
; 2nd digit
inc hl ; HL = low part 2nd digit address
ld e, (hl) ; E = lower part
inc hl ; HL = high part
ld d, (hl) ; D = upper part
ld hl, POINTS_P1 + 1 ; HL = address where to paint 2nd digit
; points player 1
jr PrintPoint ; Paint 2nd digit marker player 1
reprintPoint_2:
cp POINTS_X2_L ; Compare right boundary marker 2
ret c ; Carry? Pass on the left
; Spirax
jr printPoint_2_print ; Paint marker player 2
We compile, load the emulator and check that everything still works.
Thanks a lot Spirax.
Ball strike bug at the bottom of the paddle
It’s time to fix a bug we’ve had since we introduced the ability to change the speed and angle of the ball depending on the part of the paddle it hits. When the ball hits the last scanline of the paddle, it doesn’t change its inclination, speed or vertical direction. Why is this?
The reason is the way we have implemented collision detection. Before evaluating in which part of the paddle it hits, we evaluate if it hits the paddle, and here is the error; when it hits the last scanline of the paddle, it exits the routine, indicating with the active Z flag that there is a collision, but without evaluating in which part of the paddle it hits.
We locate the CheckCrossY tag in the game.asm file, sixteen lines down we find this.
ret nc ; Carry? No, ball passes underneath
; or collide in the last scanline.
; The latter case activates flag Z
If we read the comments, we get out of the routine if there is no carry. If there is no carry, the result is greater than or equal to zero. If the result is zero, we leave the routine with the Z flag activated (there is a collision) and without evaluating where the ball hit the paddle. If the result is greater than zero, we leave the routine with the Z flag deactivated (no collision).
To solve this, we will double check and add a new label to jump to.
The actual code for the part we want to touch is as follows.
ret nc ; Carry? No, ball passes underneath
; or collide in the last scanline.
; The latter case activates flag Z
; Depending on collision location, inclination and speed
ld a, c ; A = penultimate paddle scanline
Let’s add a line before RET NC and a tag before LD A, C, so that the code looks like this:
jr z, checkCrossY_eval ; 0?, crash in last scanline
ret nc ; No carry? Pass underneath
; Depending on collision location, inclination and speed
checkCrossY_eval:
ld a, c ; A = penultimate paddle scanline
Even this JR Z, checkCrossY_eval could be changed to JR Z, checkCrossY_5_5 as we know that the ball has hit the bottom of the paddle (try both ways).
We compile, load the emulator and see that we have fixed the bug.
ZX Spectrum Assembly, Pong
In the next ZX Spectrum Assembly chapter, we will implement the sound effects.
Download the source code from here.
Useful links
ZX Spectrum Assembly, Pong by Juan Antonio Rubio García.
Translation by Felipe Monge Corbalán.
This work is licensed to Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Any comments are always welcome.