Cloning VisiCalc
WTF is this?#
This is the result of my attempts (so far) to recreate VisiCalc (1978), the world’s first spreadsheet software from nearly 50 years ago, from scratch.
I’ve taken from scratch rather seriously – what you see above is the product of several thousand lines of 6502 assembly code, running on an emulation of the Apple ][ platform with a cycle-accurate simulation of the 8-bit 6502 microprocessor, both of which I coded from scratch in C++. That means that in theory this would run without issue on a real Apple ][ from the 1970s. Because modern computers are so vastly more powerful than their ancestors, it requires a miniscule amount of resource to emulate a whole vintage computer in the browser – it probably loaded so quickly that you didn’t even notice it booting up and autolaunching WilliCalc.
It’s not yet finished. There are bugs, and obvious limitations. But in core functionality it has gotten reasonably close to 1978’s killer app, and further development would require exponentially more time. In any case, I think at this stage I’ve made my point.
I encourage you to have a play with it. It’s amazing to think that people once used this to do their work, and even more amazing that such a capability was once considered revolutionary. Detailed instructions can be found in the next section. After that there is a fuller discussion of the technical challenges encountered.
NB: decimal division isn’t yet working properly – that’s next on the to do list
readme#
Click on the WilliCalc frame above, and it will subsequently respond to your keyboard inputs (until you click away). As this is keyboard only, that means that mobile users are not welcome.
Navigation#
- Navigate with the arrow keys. You can also use
Ctrl + T
andCtrl + B
to jump to the top and bottom of the displayed cells.
Input#
- Enter text or numbers using the keyboard. WilliCalc accepts strings, integers, and decimals. You can use up to 8 integer and 2 decimal digits. Use a leading
"-"
for negative numbers. Eg:HELLO!
1
63382709
-837.28
- Formulae are entered in the format
=A1+A2
(no spaces yet!). WilliCalc currently handles:- Addition:
=B23+K17
- Subtraction:
=E2-I101
- Multiplication:
=M3*M4
- Division:
=A8/P99
- 2D Range Sum:
=F5:Q19
- use this to add up all the values in a column, row, or combination of both!
- Addition:
- Use
Ctrl + C
to clear the selected cell
How it works#
WilliCalc is running on an emulation of the Apple ][+. That project began here, with a simulation of the 6502 microprocessor.
After messing around with assembly language, I got it into my head that I should see how far I could get cloning VisiCalc. I’d already seen how challenging assembly language is, so was under no illusions that this would be easy. And it wasn’t. But what I genuinely never expected was to make this much progress.
At the outset, the language was a total enigma. The lack of functions, scopes, or objects felt totally insurmountable. But after a few hours things started to click. Spend enough time programming at this low level, and you start to invent forms of abstraction. You figure out how to approximate pointers, function parameters, even classes. A little longer and it almost becomes… comfortable.
There’s also something incredibly zen about working in assembly. There are no frameworks, no tooling, no modules, no dependencies, no cloud infrastructure, no deployment, no syntactic sugar. You don’t need to be online. Chat-GPT doesn’t have a fucking clue. It’s just a few pages of documentation and total focus. Realising something from just 56 instructions and a measly 3 registers is the ultimate logical challenge. If you love programming, but have fallen out of love with your current tools, I would highly recommend setting yourself a more arcane objective.
Far too much went into WilliCalc, over many iterations, to try and document here. When even a for-loop must be implemented from first principles, suffice it to say that the simplest features - like displaying which cell is highlighted - merit entire dissertations. Instead, I’ll try and give a flavour of what it is like to develop on this platform.
Key challenges#
Input#
At the heart of WilliCalc is an input listening loop. The program repeatedly checks a memory address, does something if it finds a new keyboard input, and then goes back to checking for input.
.proc inputListeningLoop
lda CHAR_INPUT_ADDR ; Load input character
cmp #0
beq mainLoop ; Branch on zero to main loop
cmp #$8D ; Clear on CR
beq process_input
cmp #$92
beq refresh_screen ; CTRL+R
cmp #$95 ; right arrow
beq right_arrow
cmp #$88 ; left arrow
beq left_arrow
cmp #$89 ; up arrow - ctrl + i
beq up_arrow
cmp #$8A ; down arrow - ctrl + k
beq down_arrow
cmp #$82 ; jmp to bottom - ctrl + b
beq jmp_to_bottom
cmp #$94 ; jmp to top - ctrl + t
beq jmp_to_top
cmp #$83 ; clear cell - ctrl + c
beq clear_cell
cmp #$9b ; clear cell - esc
beq clear_cell
cmp #$8C ; draw line - ctrl + l
beq draw_line
...
no_input:
clear_char_input
jmp mainLoop ; Return to start of main loop
.endproc
If a user wants to enter a number into a cell, they will press enter after typing some numbers. In order to store those numbers in memory, they need to be “packed” into a vintage data format called a Binary-Coded Decimal, or BCD.
All the code below does is store a number:
.proc asciiToPackedBCD
; stores edit line input back to cell storage
input_ptr = screen::addr
output_ptr = cell::heap_addr
sign = cell::sign
digits = DIGITS
; need this as a temp variable
; because there is no indirect addressing mode for ROL
; unless we can think of a better way of doing things below without ROL
; buffer and cell_arr share same schema
; first byte is header byte
buffer = buffer::string_buffer
; decimal counter
decimals = ctrl::ctrl_a
start:
lda #$ff ; set decimal to -1
sta decimals
; clear buffer
lda #<buffer
sta clear8Bytes::ptr
lda #>buffer
sta clear8Bytes::ptr + 1
jsr clear8Bytes
ldx #digits
; set start index to 0, or 1 to skip "-"
ldy sign
for_each_char:
check_decimal_counter:
; if we already have 2 decimals, go to end
lda decimals
cmp #2
beq set_header_byte
load_char:
lda (input_ptr),Y
and #%00111111
check_null_char_reached:
cmp #$20
beq set_header_byte
check_decimal_point_reached:
cmp #$2E
bne no_decimal_point_reached
decimal_point_reached:
; if decimal point is reached, go to next digit
; set decimals to 0
inc decimals
iny
bne for_each_char
no_decimal_point_reached:
load_nibbles:
.repeat 4
asl A ; move to high nibble
.endrepeat
.repeat 4
asl A ; could be better I expect!
.repeat 5, i ; number of integer bytes
rol buffer + 1 + i
.endrepeat
.endrepeat
check_increment_decimals:
lda decimals
bmi do_not_increment_decimals
increment_decimals:
inc decimals
do_not_increment_decimals:
iny
dex
bne for_each_char
set_header_byte:
lda decimals
bmi decimals_is_negative
decimals_is_positive:
; subtract 1 from Y to account for the decimal point
dey
jmp decimal_shift
decimals_is_negative:
lda #0
sta decimals
decimal_shift:
lda #0
ldx decimals
shift_bits_for_unused_decimals:
cpx #2
beq after_decimal_shift
; TO DO - make this more efficient
.repeat 4
asl A
.repeat 5, i ; number of integer bytes
rol buffer + 1 + i
.endrepeat
.endrepeat
inx
jmp shift_bits_for_unused_decimals
after_decimal_shift:
set_length:
tya ; Y contains length
; subtract the number of decimal digits (0-2) from the length
sec
sbc decimals
; do not subtract sign from length
; if negative, length is 1 greater
; useful for printing, to make room for the -1
; sec
; sbc sign ; subtract sign (1) from length
ora #NUMERIC_TYPE
sta buffer ; save acc to header
set_sign:
lda sign
sta buffer + SIGN_BYTE
copy_bytes_to_cell:
ldy #0
.repeat 8
lda buffer,Y
sta (output_ptr),Y
iny
.endrepeat
end:
rts
.endproc
Output#
We obviously have to do the inverse when displaying the number on the screen. Unfortunately, printing to the screen is its own challenge. The solution I came up with was to abstract this into helper functions that output a given character to a given row/col screen address. If you squint hard enough, you might see object-oriented programming.
.proc setScreenAddressFromScreenLineCol
addr = screen::addr
line = screen::line
col = screen::col
line_0_l = screen_lines::line_0_l
line_0_h = screen_lines::line_0_h
start:
lda line
asl A
tay
lda line_0_l,Y
sta addr
lda line_0_h,Y
sta addr + 1
ldy col
end:
rts
.endproc
.proc setScreenChar
addr = screen::addr
char = screen::char
start:
jsr setScreenAddressFromScreenLineCol
lda char
sta (addr),Y
end:
rts
.endproc
Data storage#
I went through many iterations of the memory schema, and have several more in mind. Currently WilliCalc uses dynamic allocation, which anticipates malloc
and free
in C, and is therefore ahead of its time for this platform.
When you input data to a cell, quite a lot happens:
- If the cell already had data in it, the data is cleared.
- If memory had already been allocated for the cell, it is released.
- If the cell is a formula, we clear and release the extra memory used to store the calculated value.
- Memory is allocated for the cell value, which could be a string, number, or formula - these all take 8-bytes, including a 1-byte header.
- User input is parsed into the allocated memory location, and the header is set.
- If the cell is a formula, memory is also allocated for the calculated value, the pointer to which is stored in the cell value.
- The calculated value is then calculated and stored.
- Recursive formula recalculation logic is triggered to look for any cells that need to be updated.
- The displayed cells are refreshed.
Because all of our data units are the same length (8-bytes), we can manage all of the memory with a single variable: next_free_addr
. Whenever we allocate memory, we will increment this variable until we find a blank location. And whenever we deallocate memory, we will set this to the address that was deallocated.
That means that new data will always use the lowest available memory location, which avoids leaks and also has the handy side-effect of automatically filling holes in the heap. Neat!
.proc allocate
start:
incr_memory_addr:
lda memory::next_free_addr
clc
adc #CELL_BYTES
sta memory::next_free_addr
lda memory::next_free_addr + 1
adc #0
sta memory::next_free_addr + 1
; check header byte at memory address
ldy #0
lda (memory::next_free_addr),Y
bne incr_memory_addr ; if header == 0, loc is unused
; need to add overflow / limit protection here
end:
rts
.endproc
Arithmetic#
Proper arithmetic on a system that can only natively handle positive integers up to 255 is… non-trivial.
Here’s how we achieve multiplication in three easy steps, reminiscent of how you learnt to multiply large numbers in primary school:
- First we need to clear the
destination
, and copyoperand_1
to themultiplicand_buffer
. For maximum flexibility, the operand and destination addresses are set by pointers in the zeropage.
.proc packedBCDMultPtrs
operand_0 = operations::ptr_0
operand_1 = operations::ptr_1
destination = operations::ptr_2
; need to use a buffer because of lack of indirect bit shifting!
partial_result_buffer = buffer::data_buffer_1
multiplicand_buffer = buffer::data_buffer_2
digit_counter = ctrl::ctrl_c
var_a = ctrl::ctrl_d
var_b = ctrl::ctrl_e
error_code = operations::error_code
BYTES = 5
start:
sed
lda #0
ldy #7 - 1 ; clear all data bytes
clear_destination:
sta (destination),Y
dey
bpl clear_destination
ldx #BYTES - 1
ldy #1 ; to skip header byte
copy_operand_1_to_multiplicand_buffer:
lda (operand_1),Y
sta multiplicand_buffer,Y
iny
dex
bpl copy_operand_1_to_multiplicand_buffer
- Separately multiply
multiplicand_0
by each digit ofmultiplicand_1
(using repeated addition), set the order of magnitude by bitshifting, and then add it to thedestination
.
lda #0
sta digit_counter
for_each_digit_of_operand_1:
lda multiplicand_buffer + 1 ; to skip header
and #%00001111 ; this is working properly
beq after_multiplication ; skip to next round if 0
sta var_a
lda #0
ldx #7
clear_partial_result:
sta partial_result_buffer,X
dex
bpl clear_partial_result
ldx var_a
; while x > 0
add_operand_0_to_partial_result:
beq set_partial_result_order_of_magnitude
stx var_a
ldx #BYTES - 1 + 1 ; + 1 for .00
ldy #1
clc
for_each_partial_result_byte:
lda (operand_0),Y
adc partial_result_buffer,Y
sta partial_result_buffer,Y
iny
dex
bpl for_each_partial_result_byte
check_addition_overflow:
bcs overflow_error
ldx var_a
dex
jmp add_operand_0_to_partial_result
set_partial_result_order_of_magnitude:
ldx digit_counter
multiply_by_ten:
beq add_partial_result_to_destination
stx var_a
; 4 bitshifts = 10x
ldy #4
rotate_partial_result:
sty var_b
ldy #BYTES - 1 + 1 ; + 1 for .00
ldx #1 ; to skip header
clc
rotate_each_partial_result_byte:
rol partial_result_buffer,X
inx
dey
bpl rotate_each_partial_result_byte
check_partial_result_overflow:
bcs overflow_error
ldy var_b
dey
bne rotate_partial_result
ldx var_a
dex
jmp multiply_by_ten
add_partial_result_to_destination:
ldx #BYTES - 1 + 1 ; + 1 for .00
ldy #1
clc
for_each_destination_byte:
lda (destination),Y
adc partial_result_buffer,Y
sta (destination),Y
iny
dex
bpl for_each_destination_byte
check_overflow:
bcs overflow_error
after_multiplication:
ldy #4
rotate_multiplicand_buffer:
ldx #BYTES + 1 ; + 1 for .00
clc
rotate_each_multiplicand_byte:
ror multiplicand_buffer,X
dex
bpl rotate_each_multiplicand_byte
dey
bne rotate_multiplicand_buffer
increment_digit_counter:
inc digit_counter
lda digit_counter
cmp #(BYTES * 2)
bne for_each_digit_of_operand_1
- Finally, we need to divide the answer by 100 to get our decimal places, and set the sign from the operand signs.
divide_result_by_100:
temp = ctrl::ctrl_c
lda #0
sta temp
ldy #BYTES + 1
shift_each_byte:
lda (destination),Y
tax
lda temp
sta (destination),Y
stx temp
dey
bne shift_each_byte
set_sign:
jsr multDivSetSign
jmp end
overflow_error:
lda #OVERFLOW
sta error_code
end:
cld
rts
.endproc
You’ll notice that we check for overflow errors at several stages of the operation.
Division is unfortunately around twice as complicated.
Recursive recalculation#
This was one of the most satisfying things to get working. I chose to implement a breadth-first search in order to accurately and efficiently perform recursive recalculations. Here’s how it works:
- When you enter a value into a cell, WillCalc enters the recursive recalculation subroutine.
- We find all the formulae that reference the updated value and add them to a queue.
- We take the first formula from the queue and recalculate it.
- Then we find all formulae that reference the updated value, and are not already in the queue, and add them to the queue.
- If the queue is not empty, go back to step 3.
.proc updateFormulaeBFS
iteration_counter = ctrl::ctrl_a
line_counter = formula::var_a
col_counter = formula::var_b
curr_dependency_line = formula::curr_dependency_line
curr_dependency_col = formula::curr_dependency_col
line_queue = formula_queue_line
col_queue = formula_queue_col
ptr_l = formula::queue_l
ptr_r = formula::queue_r
start:
lda #0
sta iteration_counter
; iterate through entire grid
initialise_queue:
lda #0
sta ptr_l
lda #1
sta ptr_r
; set queue[0] to value that was just updated
ldx #0
lda cell::line
clc
adc window::line_scroll
sta line_queue,X
lda cell::col
clc
adc window::col_scroll
sta col_queue,X
while_ptr_l_less_than_ptr_r:
lda ptr_l
cmp ptr_r
bcs bfs_complete
break_on_256th_iteration:
inc iteration_counter
beq bfs_complete
jsr getNextDependencyInQueue
set_line_counter:
lda #N_ROWS - 1
sta line_counter
lda #0
sta cell::line
for_each_page:
check_row_for_formulae:
ldx cell::line
lda formulae_per_line,X
beq at_end_of_line
lda #N_COLS - 1
sta col_counter
lda #0
sta cell::col
for_each_col:
check_col_for_formulae:
ldx cell::col
lda formulae_per_col,X
beq after_process_formula
; set cell line, col
jsr setCellAddressFromLineCol
; check if formula
ldy #0
lda (cell::heap_addr),Y
and #TYPE_BITS
CMP #FORMULA_TYPE
bne after_process_formula
cell_is_formula:
jsr processFormula
after_process_formula:
inc cell::col
dec col_counter
bpl for_each_col
at_end_of_line:
inc cell::line
dec line_counter
bpl for_each_page
jmp while_ptr_l_less_than_ptr_r
bfs_complete:
; debugprint_imm #$90
; lda iteration_counter
; debugprint_acc
reset_cursor_to_selected:
lda cursor::cell::line_0
sta cell::line
lda cursor::cell::col_0
sta cell::col
end:
rts
.endproc
Future Improvements#
Here’s what I’ve got in mind to really take WilliCalc to the next level:
- Though cell data is dynamically allocated, the whole sheet is currently stored as an array of blank pointers. This is great for quick indexing, but means a lot of memory is wasted on blank cells. Instead I would like to use Balanced Binary Trees to store the rows, each of which is a Balanced Binary Tree of columns, each of which is dynamically allocated memory address. This is a complicated way of having my cake and eating it.
- I would like to add support for floating point numbers. I think most of the hard work has already been done with the implementation of decimals.
- Would be great to have save and load functionality, if only so I can easily load up test sheets.