DESKTOP USERS: click on the frame above and it will recognise your keyboard inputs -- then you can use WilliCalc!
MOBILE USERS: sorry, no touchscreens in 1978 -- you won't be able to get past the splash screen.



WTF is this?#

This is the result of my attempts (so far) to recreate VisiCalc (1978), the world’s first spreadsheet software from nearly 50 years ago, from scratch.

I’ve taken from scratch rather seriously – what you see above is the product of several thousand lines of 6502 assembly code, running on an emulation of the Apple ][ platform with a cycle-accurate simulation of the 8-bit 6502 microprocessor, both of which I coded from scratch in C++. That means that in theory this would run without issue on a real Apple ][ from the 1970s. Because modern computers are so vastly more powerful than their ancestors, it requires a miniscule amount of resource to emulate a whole vintage computer in the browser – it probably loaded so quickly that you didn’t even notice it booting up and autolaunching WilliCalc.

It’s not yet finished. There are bugs, and obvious limitations. But in core functionality it has gotten reasonably close to 1978’s killer app, and further development would require exponentially more time. In any case, I think at this stage I’ve made my point.

I encourage you to have a play with it. It’s amazing to think that people once used this to do their work, and even more amazing that such a capability was once considered revolutionary. Detailed instructions can be found in the next section. After that there is a fuller discussion of the technical challenges encountered.

NB: decimal division isn’t yet working properly – that’s next on the to do list


readme#

Click on the WilliCalc frame above, and it will subsequently respond to your keyboard inputs (until you click away). As this is keyboard only, that means that mobile users are not welcome.

  • Navigate with the arrow keys. You can also use Ctrl + T and Ctrl + B to jump to the top and bottom of the displayed cells.

Input#

  • Enter text or numbers using the keyboard. WilliCalc accepts strings, integers, and decimals. You can use up to 8 integer and 2 decimal digits. Use a leading "-" for negative numbers. Eg:
    • HELLO!
    • 1
    • 63382709
    • -837.28
  • Formulae are entered in the format =A1+A2 (no spaces yet!). WilliCalc currently handles:
    • Addition: =B23+K17
    • Subtraction: =E2-I101
    • Multiplication: =M3*M4
    • Division: =A8/P99
    • 2D Range Sum: =F5:Q19 - use this to add up all the values in a column, row, or combination of both!
  • Use Ctrl + C to clear the selected cell

How it works#

WilliCalc is running on an emulation of the Apple ][+. That project began here, with a simulation of the 6502 microprocessor.

After messing around with assembly language, I got it into my head that I should see how far I could get cloning VisiCalc. I’d already seen how challenging assembly language is, so was under no illusions that this would be easy. And it wasn’t. But what I genuinely never expected was to make this much progress.

At the outset, the language was a total enigma. The lack of functions, scopes, or objects felt totally insurmountable. But after a few hours things started to click. Spend enough time programming at this low level, and you start to invent forms of abstraction. You figure out how to approximate pointers, function parameters, even classes. A little longer and it almost becomes… comfortable.

There’s also something incredibly zen about working in assembly. There are no frameworks, no tooling, no modules, no dependencies, no cloud infrastructure, no deployment, no syntactic sugar. You don’t need to be online. Chat-GPT doesn’t have a fucking clue. It’s just a few pages of documentation and total focus. Realising something from just 56 instructions and a measly 3 registers is the ultimate logical challenge. If you love programming, but have fallen out of love with your current tools, I would highly recommend setting yourself a more arcane objective.

Far too much went into WilliCalc, over many iterations, to try and document here. When even a for-loop must be implemented from first principles, suffice it to say that the simplest features - like displaying which cell is highlighted - merit entire dissertations. Instead, I’ll try and give a flavour of what it is like to develop on this platform.


Key challenges#

Input#

At the heart of WilliCalc is an input listening loop. The program repeatedly checks a memory address, does something if it finds a new keyboard input, and then goes back to checking for input.

.proc inputListeningLoop
    lda CHAR_INPUT_ADDR     ; Load input character

    cmp #0
    beq mainLoop            ; Branch on zero to main loop

    cmp #$8D                ; Clear on CR
    beq process_input

    cmp #$92
    beq refresh_screen       ; CTRL+R

    cmp #$95                ; right arrow
    beq right_arrow
    cmp #$88                ; left arrow
    beq left_arrow
    cmp #$89                ; up arrow - ctrl + i
    beq up_arrow
    cmp #$8A                ; down arrow - ctrl + k
    beq down_arrow

    cmp #$82                ; jmp to bottom - ctrl + b
    beq jmp_to_bottom
    cmp #$94                ; jmp to top - ctrl + t
    beq jmp_to_top
    
    cmp #$83                ; clear cell - ctrl + c
    beq clear_cell
    cmp #$9b                ; clear cell - esc
    beq clear_cell
    cmp #$8C                ; draw line - ctrl + l
    beq draw_line
    
    ...

    no_input:
        clear_char_input

        jmp mainLoop            ; Return to start of main loop
.endproc

If a user wants to enter a number into a cell, they will press enter after typing some numbers. In order to store those numbers in memory, they need to be “packed” into a vintage data format called a Binary-Coded Decimal, or BCD.

All the code below does is store a number:

.proc asciiToPackedBCD
    ; stores edit line input back to cell storage
    
    input_ptr = screen::addr
    output_ptr = cell::heap_addr
    sign = cell::sign

    digits = DIGITS

    ; need this as a temp variable
    ; because there is no indirect addressing mode for ROL
    ; unless we can think of a better way of doing things below without ROL
    ; buffer and cell_arr share same schema
    ; first byte is header byte
    buffer = buffer::string_buffer

    ; decimal counter
    decimals = ctrl::ctrl_a

    start:
        lda #$ff            ; set decimal to -1
        sta decimals

        ; clear buffer
        lda #<buffer
        sta clear8Bytes::ptr
        lda #>buffer
        sta clear8Bytes::ptr + 1
        jsr clear8Bytes

        ldx #digits

        ; set start index to 0, or 1 to skip "-"
        ldy sign

    for_each_char:

        check_decimal_counter:
            ; if we already have 2 decimals, go to end
            lda decimals
            cmp #2
            beq set_header_byte

        load_char:
            lda (input_ptr),Y
            and #%00111111

        check_null_char_reached:
            cmp #$20
            beq set_header_byte

        check_decimal_point_reached:
            cmp #$2E
            bne no_decimal_point_reached

        decimal_point_reached:
            ; if decimal point is reached, go to next digit
            ; set decimals to 0
            inc decimals
            iny
            bne for_each_char

        no_decimal_point_reached:

        load_nibbles:
            .repeat 4
                asl A           ; move to high nibble
            .endrepeat

            .repeat 4               
                asl A               ; could be better I expect!
                .repeat 5, i        ; number of integer bytes
                    rol buffer + 1 + i
                .endrepeat
            .endrepeat

        check_increment_decimals:
            lda decimals
            bmi do_not_increment_decimals

        increment_decimals:
            inc decimals

        do_not_increment_decimals:

        iny
        dex
        bne for_each_char

    set_header_byte:

        lda decimals
        bmi decimals_is_negative

        decimals_is_positive:
            ; subtract 1 from Y to account for the decimal point
            dey
            jmp decimal_shift

        decimals_is_negative:
            lda #0
            sta decimals

        decimal_shift:
            lda #0
            ldx decimals
            shift_bits_for_unused_decimals:
                cpx #2
                beq after_decimal_shift

                ; TO DO - make this more efficient
                .repeat 4
                    asl A
                    .repeat 5, i            ; number of integer bytes
                        rol buffer + 1 + i
                    .endrepeat
                .endrepeat
                
                inx
                jmp shift_bits_for_unused_decimals

        after_decimal_shift:

        set_length:
            tya                     ; Y contains length
            ; subtract the number of decimal digits (0-2) from the length
            sec
            sbc decimals

        ; do not subtract sign from length
        ; if negative, length is 1 greater
        ; useful for printing, to make room for the -1
        ; sec
        ; sbc sign                ; subtract sign (1) from length
        ora #NUMERIC_TYPE
        sta buffer              ; save acc to header
    
    set_sign:
        lda sign
        sta buffer + SIGN_BYTE

    copy_bytes_to_cell:
        ldy #0

        .repeat 8
            lda buffer,Y
            sta (output_ptr),Y
            iny
        .endrepeat

    end:
        rts

.endproc

Output#

We obviously have to do the inverse when displaying the number on the screen. Unfortunately, printing to the screen is its own challenge. The solution I came up with was to abstract this into helper functions that output a given character to a given row/col screen address. If you squint hard enough, you might see object-oriented programming.

.proc setScreenAddressFromScreenLineCol
    addr = screen::addr
    line = screen::line
    col = screen::col
    line_0_l = screen_lines::line_0_l
    line_0_h = screen_lines::line_0_h
    start:
        lda line
        asl A
        tay
        lda line_0_l,Y
        sta addr
        lda line_0_h,Y
        sta addr + 1
        ldy col
    end:
        rts
.endproc

.proc setScreenChar
    addr = screen::addr
    char = screen::char
    start:
        jsr setScreenAddressFromScreenLineCol
        lda char
        sta (addr),Y
    end:
        rts
.endproc

Data storage#

I went through many iterations of the memory schema, and have several more in mind. Currently WilliCalc uses dynamic allocation, which anticipates malloc and free in C, and is therefore ahead of its time for this platform.

When you input data to a cell, quite a lot happens:

  1. If the cell already had data in it, the data is cleared.
  2. If memory had already been allocated for the cell, it is released.
  3. If the cell is a formula, we clear and release the extra memory used to store the calculated value.
  4. Memory is allocated for the cell value, which could be a string, number, or formula - these all take 8-bytes, including a 1-byte header.
  5. User input is parsed into the allocated memory location, and the header is set.
  6. If the cell is a formula, memory is also allocated for the calculated value, the pointer to which is stored in the cell value.
  7. The calculated value is then calculated and stored.
  8. Recursive formula recalculation logic is triggered to look for any cells that need to be updated.
  9. The displayed cells are refreshed.

Because all of our data units are the same length (8-bytes), we can manage all of the memory with a single variable: next_free_addr. Whenever we allocate memory, we will increment this variable until we find a blank location. And whenever we deallocate memory, we will set this to the address that was deallocated.

That means that new data will always use the lowest available memory location, which avoids leaks and also has the handy side-effect of automatically filling holes in the heap. Neat!

.proc allocate
    start:
        incr_memory_addr:
            lda memory::next_free_addr
            clc
            adc #CELL_BYTES
            sta memory::next_free_addr
            lda memory::next_free_addr + 1
            adc #0
            sta memory::next_free_addr + 1

            ; check header byte at memory address
            ldy #0
            lda (memory::next_free_addr),Y
            bne incr_memory_addr                ; if header == 0, loc is unused 

            ; need to add overflow / limit protection here

    end:
        rts

.endproc

Arithmetic#

Proper arithmetic on a system that can only natively handle positive integers up to 255 is… non-trivial.

Here’s how we achieve multiplication in three easy steps, reminiscent of how you learnt to multiply large numbers in primary school:

  1. First we need to clear the destination, and copy operand_1 to the multiplicand_buffer. For maximum flexibility, the operand and destination addresses are set by pointers in the zeropage.
.proc packedBCDMultPtrs
    operand_0 = operations::ptr_0
    operand_1 = operations::ptr_1
    destination = operations::ptr_2

    ; need to use a buffer because of lack of indirect bit shifting!
    partial_result_buffer = buffer::data_buffer_1
    multiplicand_buffer = buffer::data_buffer_2

    digit_counter = ctrl::ctrl_c
    var_a = ctrl::ctrl_d
    var_b = ctrl::ctrl_e

    error_code = operations::error_code

    BYTES = 5

    start:
        sed

    lda #0
    ldy #7 - 1                      ; clear all data bytes
    clear_destination:
        sta (destination),Y
        dey
        bpl clear_destination

    ldx #BYTES - 1
    ldy #1                          ; to skip header byte
    copy_operand_1_to_multiplicand_buffer:
        lda (operand_1),Y
        sta multiplicand_buffer,Y
        iny
        dex
        bpl copy_operand_1_to_multiplicand_buffer
  1. Separately multiply multiplicand_0 by each digit of multiplicand_1 (using repeated addition), set the order of magnitude by bitshifting, and then add it to the destination.
    lda #0
    sta digit_counter

    for_each_digit_of_operand_1:
        lda multiplicand_buffer + 1     ; to skip header
        and #%00001111                  ; this is working properly
        beq after_multiplication        ; skip to next round if 0
        sta var_a
        
        lda #0
        ldx #7
        clear_partial_result:
            sta partial_result_buffer,X
            dex
            bpl clear_partial_result

        ldx var_a

        ; while x > 0
        add_operand_0_to_partial_result:
            beq set_partial_result_order_of_magnitude
            stx var_a

            ldx #BYTES - 1 + 1      ; + 1 for .00
            ldy #1
            clc
            for_each_partial_result_byte:
                lda (operand_0),Y
                adc partial_result_buffer,Y
                sta partial_result_buffer,Y
                iny
                dex
                bpl for_each_partial_result_byte

            check_addition_overflow:
                bcs overflow_error
            
            ldx var_a
            dex
            jmp add_operand_0_to_partial_result

        set_partial_result_order_of_magnitude:
            ldx digit_counter
            multiply_by_ten:
                beq add_partial_result_to_destination
                stx var_a

                ; 4 bitshifts = 10x
                ldy #4
                rotate_partial_result:
                    sty var_b
            
                    ldy #BYTES - 1 + 1      ; + 1 for .00
                    ldx #1      ; to skip header
                    clc
                    rotate_each_partial_result_byte:
                        rol partial_result_buffer,X
                        inx
                        dey
                        bpl rotate_each_partial_result_byte

                    check_partial_result_overflow:
                        bcs overflow_error

                    ldy var_b
                    dey
                    bne rotate_partial_result

                ldx var_a
                dex
                jmp multiply_by_ten

        add_partial_result_to_destination:
            ldx #BYTES - 1 + 1      ; + 1 for .00
            ldy #1
            clc
            for_each_destination_byte:
                lda (destination),Y
                adc partial_result_buffer,Y
                sta (destination),Y
                iny
                dex
                bpl for_each_destination_byte

            check_overflow:
                bcs overflow_error

        after_multiplication:
            ldy #4
            rotate_multiplicand_buffer:
                ldx #BYTES + 1      ; + 1 for .00
                clc
                rotate_each_multiplicand_byte:
                    ror multiplicand_buffer,X
                    dex
                    bpl rotate_each_multiplicand_byte
                dey
                bne rotate_multiplicand_buffer

            increment_digit_counter:
                inc digit_counter
                lda digit_counter

                cmp #(BYTES * 2)
                
                bne for_each_digit_of_operand_1
  1. Finally, we need to divide the answer by 100 to get our decimal places, and set the sign from the operand signs.
    divide_result_by_100:
        temp = ctrl::ctrl_c
        lda #0
        sta temp

        ldy #BYTES + 1
        shift_each_byte:
            lda (destination),Y
            tax
            lda temp
            sta (destination),Y
            stx temp
            dey
            bne shift_each_byte

    set_sign:
        jsr multDivSetSign
        jmp end

    overflow_error:
        lda #OVERFLOW
        sta error_code
    
    end:
        cld
        rts
.endproc

You’ll notice that we check for overflow errors at several stages of the operation.

Division is unfortunately around twice as complicated.


Recursive recalculation#

This was one of the most satisfying things to get working. I chose to implement a breadth-first search in order to accurately and efficiently perform recursive recalculations. Here’s how it works:

  1. When you enter a value into a cell, WillCalc enters the recursive recalculation subroutine.
  2. We find all the formulae that reference the updated value and add them to a queue.
  3. We take the first formula from the queue and recalculate it.
  4. Then we find all formulae that reference the updated value, and are not already in the queue, and add them to the queue.
  5. If the queue is not empty, go back to step 3.
.proc updateFormulaeBFS

    iteration_counter = ctrl::ctrl_a

    line_counter = formula::var_a
    col_counter = formula::var_b

    curr_dependency_line = formula::curr_dependency_line
    curr_dependency_col = formula::curr_dependency_col

    line_queue = formula_queue_line
    col_queue = formula_queue_col

    ptr_l = formula::queue_l
    ptr_r = formula::queue_r

    start:
        lda #0
        sta iteration_counter

        ; iterate through entire grid
        initialise_queue:
            lda #0
            sta ptr_l
            lda #1
            sta ptr_r

            ; set queue[0] to value that was just updated
            ldx #0
            lda cell::line
            clc
            adc window::line_scroll
            sta line_queue,X

            lda cell::col
            clc
            adc window::col_scroll
            sta col_queue,X

        while_ptr_l_less_than_ptr_r:
            lda ptr_l
            cmp ptr_r
            bcs bfs_complete

            break_on_256th_iteration:
                inc iteration_counter
                beq bfs_complete

            jsr getNextDependencyInQueue

            set_line_counter:            
                lda #N_ROWS - 1
                sta line_counter
                lda #0
                sta cell::line

            for_each_page:

                check_row_for_formulae:
                    ldx cell::line
                    lda formulae_per_line,X
                    beq at_end_of_line
                
                lda #N_COLS - 1
                sta col_counter
                lda #0
                sta cell::col
                
                for_each_col:

                    check_col_for_formulae:
                        ldx cell::col
                        lda formulae_per_col,X
                        beq after_process_formula

                    ; set cell line, col
                    jsr setCellAddressFromLineCol
                    ; check if formula
                    ldy #0
                    lda (cell::heap_addr),Y
                    and #TYPE_BITS
                    CMP #FORMULA_TYPE
                    bne after_process_formula

                    cell_is_formula:
                        jsr processFormula

                    after_process_formula:
                        inc cell::col
                        dec col_counter
                        bpl for_each_col

                at_end_of_line:
                    inc cell::line
                    dec line_counter
                    bpl for_each_page

            jmp while_ptr_l_less_than_ptr_r

        bfs_complete:
            ; debugprint_imm #$90
            ; lda iteration_counter
            ; debugprint_acc

    reset_cursor_to_selected:
        lda cursor::cell::line_0
        sta cell::line
        lda cursor::cell::col_0
        sta cell::col

    end:
        rts
.endproc

Future Improvements#

Here’s what I’ve got in mind to really take WilliCalc to the next level:

  • Though cell data is dynamically allocated, the whole sheet is currently stored as an array of blank pointers. This is great for quick indexing, but means a lot of memory is wasted on blank cells. Instead I would like to use Balanced Binary Trees to store the rows, each of which is a Balanced Binary Tree of columns, each of which is dynamically allocated memory address. This is a complicated way of having my cake and eating it.
  • I would like to add support for floating point numbers. I think most of the hard work has already been done with the implementation of decimals.
  • Would be great to have save and load functionality, if only so I can easily load up test sheets.