Imagine you have a special machine called a "Hasher." This machine takes any kind of information you give it, like a sentence or a picture, and turns it into a secret code. The code is like a jumbled-up version of the original information, and it always has the same fixed length, let's say 10 digits.
Now, let's think about some important things this Hasher machine can do:
  1. It's like a representative: The code it generates represents the original information. Just like if you have a picture of a cat, the code represents that cat. But you can't really tell what the original information was just by looking at the code.
  2. It's hard to go backwards: If you have the code, it's very difficult to figure out the original information. It's like having a secret language that only the Hasher machine knows how to decode. So even if someone gets hold of the code, they can't easily find out what the original information was.
  3. No two things are the same: The Hasher machine makes sure that if you give it two different pieces of information, it will always produce different codes for them. It's like having two different pictures of different animals and the Hasher machine always gives you different codes for each animal.
Now, let's apply this to a real-world scenario relating to something you might get. Imagine you have a school with lots of students, like this one, and each student has a locker. The lockers are labeled with unique codes, just like the codes from the Hasher machine.
Teachers want to use the lockers to store something important, like the students' grades. But they want to keep the grades private, so they put the grades in envelopes and lock them in the lockers using the codes. The codes are generated by the Hasher machine.
Here's how the Hasher machine's properties relate to this scenario:
  1. Representing the message: The code on each locker represents the student's grades. Teachers can tell which locker belongs to which student without actually looking inside the locker.
  2. Going backwards: Even if someone tries to peek inside the lockers or tries different codes, they can't figure out what the grades are. Only the teachers, who know the original grades, can open the lockers and see the grades.
  3. No two lockers are the same: Each student has their own locker with a unique code. If you try to open one student's locker using another student's code, you won't find the correct grades. The lockers and their codes ensure that each student's grades are kept separate.
In short, a cryptographic hash function (like the Hasher machine) is a way to transform information into secret codes with specific properties. It helps protect the privacy and integrity of data, just like lockers and codes can keep students' grades secure in a school.
P.S I am working on pictures.
US Grade 8? I would likely describe it with how websites or a computer would store their passwords safely, since by that age a lot of them will (sadly) have been on computers a hell of a lot and know the importance of a password. If I am assuming it's the USA school system then I am not sure on the level of tech knowledge schools would teach however.
I had the fortunate opportunity to teach 13-16 year olds about a cryptography challenge during an event, this is a brief rundown of what I did.
I described this mostly as:
"Hashing is a method to store strings of data confidentially and/or prove the correctness of data (integrity). This process involves a (cryptographic) technique where an input (such as a text, picture, or file) is turned into output of a jumbled up string of random text (called the hash). Hashing techniques can be different to others, and different techniques are called 'algorithms'.
Hashes have a fixed length, meaning if the algorithm specifies to output a 32 character string, any input that is hashed with that algorithm will output a 32 character string.
To note:
  1. The hash will always be the same if the input is the same. It is a transformation of the original information and doesn't change or randomize it every time you do it. The only time a hash changes if the input is the same is if they are using a different algorithm.
  2. It is not possible to know what the hash means without knowing what the input value of that hash is. (Yes yes, rainbow tables etc but it doesn't need to be mentioned that much)
  3. Hashing will always produce a different output providing the input is different. No two hashes are the same.
In a real world example, Hashing is used to store your password safely. Have you ever used POPULAR_WEBSITE? Now lets say you are registering an account on POPULAR_WEBSITE and you type in your password you want to use...
(A presentation or diagram would be good here)
When you enter that password, that website then hashes it, and saves the hash with your account on a list where everyone's accounts are stored. (If they know databases it would be better to explain that)
When you ever want to sign into that account, you type in your password and the website will hash what you placed in the password box, and check if the hash of the password is the same as the hash of what you typed in. If it matches, it signs you in, and if it doesnt it will tell you the password is wrong. This is how websites know you typed the wrong password, even if they don't know your password.
Hashing provides confidentiality and integrity by:
  • Being able to store your password without the person storing it knowing what your password actually is.
  • Being able to tell between a correct and incorrect one by the hashes generated from both values."
(If they were understanding by this point, you could explain something like collisions or something like weak/well known hash digests like 'password123' to SHA or whatever, personally I'd avoid this for younger kids though.)
I'd then demonstrate a tool like CyberChef on their computer or your own, which can show them how hashing works in the real world.
reply
only the Hasher machine knows how to decode
I would say: "not even the Hasher machine can decode"
Now, let's apply this to a real-world scenario relating to something you might get. Imagine you have a school with lots of students, like this one, and each student has a locker. The lockers are labeled with unique codes, just like the codes from the Hasher machine.
Teachers want to use the lockers to store something important, like the students' grades. But they want to keep the grades private, so they put the grades in envelopes and lock them in the lockers using the codes. The codes are generated by the Hasher machine. Here's how the Hasher machine's properties relate to this scenario:
Representing the message: The code on each locker represents the student's grades. Teachers can tell which locker belongs to which student without actually looking inside the locker.
I don't get this example. The code on each locker represents the student's grade? Isn't the code on their locker independent from any grade they get? I assume you are talking about a code which identifies that this is their locker?
reply
Thanks for the question, all questions help me perfect this. So ..
The code is a representation of the grade. So by looking at the code the teacher knows the grade for the student.
So for example
LockerGradeGrade (MD5 Hash)
1C0b90224d10e44e42f4891b2e5a2d4d16
2A7fc56270e7a70fa81a5935b72eacbe29
3Be9d71f5ee7c92d6dc9e92ffdad17b8bd
To anyone who does not know the code, they see the hash and have no idea what is happening. The teacher however knows the code and what grade they are linked to. The locker have names on. So the teacher can see that locker 1 (lets say john) has a grade of C
reply
Ah, okay, now I see.
However, two things:
  1. Any student with the same grade will have the same hash
  2. A smart child may realize they can just put every grade into the Hasher machine and compare the hashes. This assumes that the Hasher machine is publicly accessible (which should be the case if you want to have a good analogy to hash functions).
Because of this, I would say this example is not a good use case for hashing, to be honest. Non-deterministic encryption with a secret key should be used here.
reply
If you have any suggestions I would appreciate another analogy
reply
You can fix your example by using salt || MD5(grade || salt || secret) as the code on the lockers.
salt is just a random value which makes sure that every hash will be different.
secret is a secret value which only teachers know. This prevents that students can just try every grade and compare hashes.
With || I mean concatenation.
This may be too complicated for 8 graders now though.
reply
I've heard people describe hashes as "fingerprints" metaphor often and thanks but I hate it. Fingerprints still trigger if you move your finger slightly which isn't the case for hashes. Also fingerprinting is a literal other discipline in CS.
So incase you're still working on that text please don't build this one in :D
reply
any 5 year-old in 8th grade already know this stuff
reply
That actually made me laugh out loud
reply
it will always produce different codes for them. I think you shouldn't hide the theoretical possibility of hash collisions.
The locker example is very confusing. I suggest teaching them how to do trusted coin flip over chat. One kid chooses 0 or 1 and any salt, then hashes them together and sends the hash to the other kid. That kid chooses 0 or 1 and publishes his choice. The first kid publishes what he hashed. The second kid verifies the hash. Both proceed with XOR of two choices as the final outcome. Make them actually do it with GtkHash or something (just don't use md5, it's not secure enough against the kids these days).
reply
8 year olds know basic arithmetic. You can explain the notion of multiplication being easier than factoring without getting into any discussion of field theory.
You can even time them on multiplication vs factoring to show the challenge of brute force.
reply
Wait... Grade 8 in the UK is 13/14 year old.
reply
Yes. I was tired and misread it. They'll definitely understand that factoring primes is a lot harder than multiplying them. I think this aspect is critical understanding why public-private key pair cryptography works, rather than just stating "these are the pieces and what they look like in action".
reply
is critical understanding why public-private key pair cryptography works
Kinda sorta... the thing is, few use RSA anymore exactly because factoring is too easy. It's all elliptic curves nowadays which do not require any secret prime numbers.
reply
Ah, I guess I need to take OP's course then šŸ˜‚
reply
Seems pretty good to me, but I have a decent understanding of the concept already, so Iā€™m probably a little biased. Keep it up though, this is good work!
reply
I asked ChatGPT for a similar explaination using small numbers.
Imagine: A=0 B=1 C=2 D=3 ...
You hash your message by dividing the number by 3 and using the remainder. Example message: CBD 3 / 2 = 1 with remainder of 1 3 / 1 = 3 with remainder of 0 3 / 3 = 1 with remainder of 0 So the hash is 100 Which is: BAA
You can verify the hash using the seed (3) and the original message. But you can't determine easily the original message from the hash. This is a bad example because hashes are usually more unique.
At this point it is explained that hash functions are much more complex but this is a very simple example.