Every now and again, some aspiring young hacker mentions that he or she is bored and would like some project to get on with, or that he or she would like to learn a new programming language but doesn't know what to program. I've often had ideas for projects - some small, some large - so I've collected a bunch here to serve as some inspiration. I'm happy to discuss any and all of these, check the contact link at the bottom for details.
Starters
- A program that solves an equation. There are simple ones like finding quadratic roots and converting units, or you could go for equations of motion or chemical reactions.
- A simple statistical system. It should maintain a table of data and provide typical statistical measures. Think about how to extend it to apply appropriate significance tests based on the type of data.
- Simple text-handling. Give statistics about a piece of text - word length frequencies, word frequencies, sentence length, use of punctuation - and perhaps some manipulation functions to extract sentences. You could even try something like translating to and from Morse code.
- Working with arbitrary data. Given a file, provide a human-readable view, perhaps 16 bytes to a row on a standard terminal or in a GUI. Give some simple convertors to extract data in human-readable form, e.g. for integers and floating-point numbers.
- Graphical text symmetry. Some letters have symmetries: E, B, C have a horizontal axis, T, M, W have a vertical axis, O, H, I have multiple axes. Some letters are reflections or rotations of others, most of the time: p, b, q, d, u, n, s, z (sort of), a, e (again, sort of). Write some code that reports on the symmetries of a text string.
Low-level
- Code that determines the limits of its own types - this may not be practical in some languages.
- A numerical library that works on characters and uses manual techniques e.g. explicit carry, long division. Optionally show the working. Try to get it to handle decimal points correctly. Bonuses for working in multiple bases.
- Generic UTF8 conversion and character-by-character iteration.
- Doubly-linked, non-embedded linked list. Try to get the memory overhead as low as possible.
- A sparse matrix implementation.
- A representation for arbitrary mathematical graphs (nodes and arcs).
Algorithms
- Mergesort, quicksort - implement, visualise, compare.
- A comparator for lexicographical sorting of book titles.
- For recursive sorts, there is a cutoff point where iteration becomes preferable. Determine this point for random, near-sorted, near-unsorted data.
- A random text generator that is targeted at demonstrating typographical features.
- A random password generator that makes memorable passwords.
ASCII Graphics
- Get coloured blocks on the terminal and use them to display a zoomed-in bitmap.
- Use blocks and other characters to present statistical data.
- Use blocks to display a mandelbrot set.
- Draw an IFS - extra bonus for using detail within the characters for the smaller parts of the fractal.
System Stuff
- Visualise the computer's boot process.
- Visualise the individual packages within the filesystem. For example, all the files belonging to the Eclipse install.
- Turn the content of another suggestion from the list into a shared library with suitable language bindings.
Language Bits
- A processor that accepts only a variant of C with := instead of = and produces plain C.
- A processor that accpets a variant of C with explicit state-machine syntax of some kind and produces plain C.
- A processor that takes C that uses significant whitespace to indicate block scopes and converts it to standard C.
Graphics
- Render mathematical functions.
- Render statistical charts.
- Give a 3D visualisation of computer processes / memory usage.
Applications
- Take a basic hex-editor concept and create a structured data editor.
- A human-readable data language and a round-trip compiler to transalte between binary data files and human-readable equivalents.
- A data format type system and type-checker for the human-readable data language.
- An automaton simulator.
- An addressbook system.
- A disk-space manager that identifies regenerable and temporary files.
- A specification language, library and visualisation for various kinds of random behaviours.
- A calculator for astronomical phenomena - sunrise, eclipse, conjunction, transit etc.)
Large Projects
- A language and system for generic access to user settings and management of settings migration.
- A system to determine, for a given setting in a given application, the corresponding configuration file entry. This could be partially automated using a testing strategy.
- A system to find all of the documentation for a given program.
- A processor that determines the quote-level in emails or usenet posts or similar systems, and converts to format=flowed.
- A generic feedback system to support user-involved bug-reporting.
- A modelling system for decision points in programs and a processor to convert between design-time, compile-time, link-time and run-time decisions.
Huge Projects
- A virtual processor simulator with pluggable and configurable behaviours and hardware device simulations all supported by formal models.
- A system that determines the syntax of a program's options.
- A system that, given the name of a tool or package and a specific version, creates a chroot jail or virtual machine that runs that particular version for debugging and development.
- A graphical depiction of the computer context e.g. network, hosts, printers, software installations, OSs, users). Bonus for providing configuration of same through the interface.
- A 3D scenery renderer that creates random desktop backgrounds.
- A simple tool to report details of the current development status of a tool compared to the version you have installed and the behaviour you're experiencing, to help in deciding what to do about a bug/missing feature.
- A representation of typical encoding mismatches, e.g. describing a process and the encodings used for the data. The system should identify where corrective action can be taken.
- A representation of the intent of error-handling mechanisms, plus a checker that determines whether the intent is met. A simple aspect of intent is "Who is this error message for?"
- A representation of in-band control information (escape sequences) and an associated general framework for highlighting, translating, identifying encoding gaps.
- A representation of expected results from application profiling to highlight strange behaviours, e.g. "rare", "frequent", "more frequent than" and complexity with respect to input size.
- A terminal emulator that directly supports high-level concepts as exposed e.g. via ncurses, and a replacement implementation of same to drive such an emulator.