investigation_encoded_1
Problem
We have recovered a binary and 1 file: image01. See what you can make of it. Its also found in /problems/investigation-encoded-1_4_eb9020306ac9fe150ac9b9ca32ee1cc6 on the shell server. NOTE: The flag is not in the normal picoCTF{XXX} format.
Solution
Running
file ouput
shows that it is not an image:output: Non-ISO extended-ASCII text, with no line terminators
.xxd -g 1 output
:Running the
mystery
binary produces:./flag.txt not found
. Lets create a fake flag:echo picoCTF{fake_flag} > flag.txt
. Running the binary again produces:Error, I don't know why I crashed
.Decompile the binary file using Ghidra (cheat sheet):
It reads the flag, sets some global variables and calls
encode()
encode()
function decompiled:encode()
iterates through the flag character by character, and (assuming the character is valid) converts it to lowercase. Then, it reads two values from a mysterious global matrix variable. The first value looks like a starting index, and the second value looks like a length from the starting index. Forend-start
iterations, it reads a value fromgetValue()
and callssave()
with that value.After all the characters in the flag were processed (after
*flag_index
, which is the looping variable, is equal to or greater thanflag_size
), the function callssave(0)
for a number of times and returns.Explanation of the
*
s: The phrasenumbers[1]
can also be expressed as*(numbers + 1)
, where the*
operator is said to dereference the pointer addressnumbers + 1
. dereference can be thought of in this case as read the value pointed to by. (source)isValid()
function decompiled:Valid characters are uppercase and lowercase letters, and space. The first run with the fake flag did not work because
echo
creates a newline character, which is not accepted. Lets try without the newline:echo -n picoCTF{fakeflag} > flag.txt
(also no underscore). The output is shorter than the input and I cannot manually decode the cipher.getValue()
decompiled:This function performs some bit-manipulation on the input using the
secret
global variable and returns a0
or1
(because of bitwise AND (& 1
) at the end of the return statement).save()
decompiled (value fromgetValue()
goes tosave()
inmain()
):This function creates a buffer in the
buffChar
global variable and writes to theoutput
file once enough bits have come in to form a byte. The bitwise OR operation and multiplication by'\x02'
effectively append the incoming bit to the buffer. For example if the bit1
is the first bit inputted (sobuffChar
is 0) thenbuffChar
is set to 1 and multiplied by 2, making it10
in binary. Next, a1
is inputted, so the bitwise OR between10
and1
is calculated to give11
, which is then multiplied by 2 (decimal, 10 binary) to get110
.This section of
encode()
now makes sense:Since
remain
is global, bothencode()
andsave()
know about it. However many bytes are required to get to the next bit are appended as 0 (hence thesave(0)
). This ensures that the output is padded to the nearest byte and can be flushed correctly.Find the sequences that correspond with each letter using the decode.py script:
This script is a python version of the
getValue()
andencode()
functions. Theencode()
function contains only the necessary components to encode each lower case letter of ascii and then print the calculated encoding.Additionally, this script uses a modified version of the
matrix
variable as shown below:As you can see, the differences are in the final three lines. In the original 1 byte representation of the original matrix all of the bytes are surrounded by
0x00
except for some in the last 3 rows. Without combining these bytes (bytes that are non-zero right next to each other like0x06, 0x01
), the the letters "y" and "z" do not produce the correct values. When changing the values it is important to not remove a byte, instead just set it to0x00
. Additionally, the bytes should be combined from right-to-left. So, for instance,0x06, 0x01
becomes0x0106
or just0x106
.I'm not sure why the above is the case, but I believe it has to do with how primitive data types work in C++. I think that the
int
andlong
in the below lines force 4 bytes to be read.These 4 bytes are read in reverse because of little endian. In the
getValue()
function, the data is read usingbyte
andchar
, which can both only hold one byte so only one byte is read frommatrix
. Python does not work the same way and will only return the value at a position in an array, nothing more.Run solve.py to get
Flag: encodedsrdotzdkhx
. This script has a copy of the dictionary converting encoded characters to ascii as produced by decode.py. It loops through the bits of the originaloutput
file, which contains the encoded flag. The hex representation can be obtained withxxd -c1 output | cut -d" " -f2 | tr -d "\n"
. For each loop the script adds the current bit to a buffer and checks if that buffer can decode to an ascii character using the dictionary. If not, it adds another bit, if yes, the script adds that letter to theflag_decoded
variable and clears the buffer.
Flag
encodedsrdotzdkhx
Last updated