investigation_encoded_1
Last updated
Was this helpful?
Last updated
Was this helpful?
We have recovered a binary and 1 file: image01. See what you can make of it. Its also found in /problems/investigation-encoded-1_4_eb9020306ac9fe150ac9b9ca32ee1cc6 on the shell server. NOTE: The flag is not in the normal picoCTF{XXX} format.
Running file ouput
shows that it is not an image: output: Non-ISO extended-ASCII text, with no line terminators
. xxd -g 1 output
:
Running the mystery
binary produces: ./flag.txt not found
. Lets create a fake flag: echo picoCTF{fake_flag} > flag.txt
. Running the binary again produces: Error, I don't know why I crashed
.
Decompile the binary file using ():
It reads the flag, sets some global variables and calls encode()
encode()
function decompiled:
encode()
iterates through the flag character by character, and (assuming the character is valid) converts it to lowercase. Then, it reads two values from a mysterious global matrix variable. The first value looks like a starting index, and the second value looks like a length from the starting index. For end-start
iterations, it reads a value from getValue()
and calls save()
with that value.
After all the characters in the flag were processed (after *flag_index
, which is the looping variable, is equal to or greater than flag_size
), the function calls save(0)
for a number of times and returns.
Explanation of the *
s: The phrase numbers[1]
can also be expressed as *(numbers + 1)
, where the *
operator is said to dereference the pointer address numbers + 1
. dereference can be thought of in this case as read the value pointed to by. ()
isValid()
function decompiled:
Valid characters are uppercase and lowercase letters, and space. The first run with the fake flag did not work because echo
creates a newline character, which is not accepted. Lets try without the newline: echo -n picoCTF{fakeflag} > flag.txt
(also no underscore). The output is shorter than the input and I cannot manually decode the cipher.
getValue()
decompiled:
This function performs some bit-manipulation on the input using the secret
global variable and returns a 0
or 1
(because of bitwise AND (& 1
) at the end of the return statement).
save()
decompiled (value from getValue()
goes to save()
in main()
):
This function creates a buffer in the buffChar
global variable and writes to the output
file once enough bits have come in to form a byte. The bitwise OR operation and multiplication by '\x02'
effectively append the incoming bit to the buffer. For example if the bit 1
is the first bit inputted (so buffChar
is 0) then buffChar
is set to 1 and multiplied by 2, making it 10
in binary. Next, a 1
is inputted, so the bitwise OR between 10
and 1
is calculated to give 11
, which is then multiplied by 2 (decimal, 10 binary) to get 110
.
This section of encode()
now makes sense:
Since remain
is global, both encode()
and save()
know about it. However many bytes are required to get to the next bit are appended as 0 (hence the save(0)
). This ensures that the output is padded to the nearest byte and can be flushed correctly.
Get matrix
and secret
using radare2
():
The means unsigned (no bit for negative or positive).
Find the sequences that correspond with each letter using the script:
This script is a python version of the getValue()
and encode()
functions. The encode()
function contains only the necessary components to encode each lower case letter of ascii and then print the calculated encoding.
Additionally, this script uses a modified version of the matrix
variable as shown below:
As you can see, the differences are in the final three lines. In the original 1 byte representation of the original matrix all of the bytes are surrounded by 0x00
except for some in the last 3 rows. Without combining these bytes (bytes that are non-zero right next to each other like 0x06, 0x01
), the the letters "y" and "z" do not produce the correct values. When changing the values it is important to not remove a byte, instead just set it to 0x00
. Additionally, the bytes should be combined from right-to-left. So, for instance, 0x06, 0x01
becomes 0x0106
or just 0x106
.
I'm not sure why the above is the case, but I believe it has to do with how work in C++. I think that the int
and long
in the below lines force 4 bytes to be read.
These 4 bytes are read in reverse because of little endian. In the getValue()
function, the data is read using byte
and char
, which can both only hold one byte so only one byte is read from matrix
. Python does not work the same way and will only return the value at a position in an array, nothing more.
Run to get Flag: encodedsrdotzdkhx
. This script has a copy of the dictionary converting encoded characters to ascii as produced by . It loops through the bits of the original output
file, which contains the encoded flag. The hex representation can be obtained with xxd -c1 output | cut -d" " -f2 | tr -d "\n"
. For each loop the script adds the current bit to a buffer and checks if that buffer can decode to an ascii character using the dictionary. If not, it adds another bit, if yes, the script adds that letter to the flag_decoded
variable and clears the buffer.
Other write-ups: () and ()
encodedsrdotzdkhx