Applied Reverse Engineering (HACS408E)

Course Introduction

--

Grading

Assignment Percentage %
Homework 50%
Quizzes 10%
Participation 10%
Team Presentation 10%
Final Project 20%
Total 100%

Labs

The majority of time in class will be spent on labs. This is an applied course. The weekly homeworks (50% of your grade) will require you to demonstrate mastery of the techniques taught through the labs.

Team Presentation

In lieu of a midterm exam students will present their reverse engineering efforts against malware samples assigned to each group.

Final Project

In lieu of a final exam students will work on the NSA Codebreaker Challenge during the last two weeks of class.

Ethics

  • Do not attempt to use what you learn in this class to commit illegal acts.

  • You will learn things in this course that you potentially can use to 'steal' intellectual property, and it is not our intent that you become criminals.

  • Use your best judgement, what you choose to do with this knowledge is on you.

Legal

  • In the United States even if an artifact or process is protected by trade secrets, reverse-engineering the artifact or process is often lawful as long as it has been legitimately obtained.

  • Reverse engineering of computer software in the US often falls under both contract law as a breach of contract. This is because most EULAs specifically prohibit it.

  • A person who is in legal possession of a program, is permitted to reverse-engineer and circumvent its protection if this is necessary in order to achieve "interoperability".

Questions?

Survey

Introduction to Reverse Engineering

What is Reverse Engineering

The process by which a man-made object is deconstructed to reveal its designs, architecture, or to extract knowledge from the object.

Scope of this class

This class is focused on the reverse engineering of software and related formats.

Linux Binary Analysis

  • ELF
  • Static/Dyanmic
  • Malware
  • Firmware

Some Windows

  • PE vs ELF
  • Calling conventions`

Class VM (Kali + Additional Tools)

Installed

  • Binaryninja 32-bit (demo)
  • IDA 7 Free (64/32)
  • Radare2 *
  • Edb - GUI for gdb
  • Ollydebug * (wine32 required)
  • Shellen (python3)
  • Readelf
  • Objdump
  • Binwalk

C Programming

General-purpose, imperative, low-level, memory management, compiled

"C was originally developed at Bell Labs by Dennis Ritchie between 1972 and 1973 to make utilities running on Unix" -Wiki

Control Statements

  • If-statements
  • While-loops
  • Do-whiles
  • For-loops
  • Switch-statements
  • Gotos
if ( 1 && !0 ){
    printf("Non-zero is true\n");
} else {
    printf("!0 is 1\n");
}
while ("Is this non-zero?") {
    printf("Who thinks yes?\n");
} 
int i; /* C99 */
for (; i < 10; i++) {
    print("What is wrong with this?\n");
}
/* What might this snipped be used for? */
switch(ast_node->type) {
    case INT:
        return atoi(ast_node->data);
    case PLUS:
        int a = process(ast_node->left);
        int b = process(ast_node->right);
        return a + b
    default:
        printf("There was an error\n");
        exit(1); /* What does exit 1 mean? */
}
C:
B:
    if (a) {
        goto A;
    }
A: 
    if (b) {
        goto B;
    }
    a = ~a;
    b = ~b;
    goto C;

Pointers & Dynamic Memory

What are pointers used for in C?

  • referencing memory on the stack or heap
  • referencing arrays
  • extra return values

What are pointers used for in Java?

  • Objects, Objects, Objects
//stack memory, how many bytes are used?
int a[10] = {42, 0}; 

//pointer to stack memory
int *pa = a;

//pointer to heap memory
int *b = malloc(sizeof(int)*10); 

//pointer to heap memory pointing to pointers,
// how many bytes are used?
int **c = malloc(sizeof(int*)*3);

c[0] = b;
c[1] = b;
c[2] = a+1;

**(c+2) = 72;

printf("What is c[2][0]? %d\n", c[2][0]);
printf("What is c[2][-1]? %d\n", c[2][-1]);

Referencing memory on the stack or heap

Arrays are contiguous blocks of memory. Static 2D Arrays are contiguous, but not Dynamic 2D Arrays

Pointers & Dynamic Memory

These data structures and how they are represented in memory are important! 

int foo(int x, int y, int *z) {
    if (x > 0 && y > 0){
        *z = x*y;
        return SUCCESS;
    }
    return FAILURE;
}

int main() {
    int a = -1;
    if (foo(1, 2, &a) == FAILURE) {
        return 1;
    }

    printf("%d\n", a);
    return a;
}

How do you return multiple items from a function?

Sometimes one return value is not enough, so pointer as arguments can help receive more data from a function. 

Pointers & Dynamic Memory

Stack and Heap

What is the Stack?

 

 

 

 

 

What is the Heap?

*if you know where to look

  • Memory used to separate function frames for local memory usage
  • Starts at a high address and grows down
  • Dynamic memory that is globally accessible*
  • Must be allocated & freed manually

C experts now?

/* 1. What is wrong here? */
int *baz(){
    int i = 10;
    return &i;
}
/* 2. Is this valid? */
int *bar() {
    static int i = 5;
    return &i;
}    
/* 3. Is this valid? */
int *foo() {
    void *yeet = malloc(sizeof(double)*10);
    return (int *)yeet;
}

/* 4. Is this valid? */
int main() {
    char *yeett = (char *)foo();
    printf(yeett);
    
    /* 5. Missing something? */
    return 0;
}

Why do we care about details like these?

What does the stack look like?

int *foo(int c, int d) {
    char e;
    void *yeet = malloc(sizeof(c)*d);
    /* Stop! */
    return (int *)yeet;
}

int main(int argc, char *argv[]) {
    int a = 5, 3;
    int b = 7;
    char *bar = foo((b,a),b);
    
    return 0;
}

High

Low

argv

argc

ret addr

old base

5

7

5

7

ret addr

old base

junk

heap addr

x86 Assembly

Assembly

Low-level programming language that is translated into the the architecture's byte-code. Here we will use the x86_64 architecture.

What is x86_64?

64-bit architecture that supports 32-bit. Used by most modern computers.

Registers x86

  • eax - accumulator
  • ebx - base
  • ecx - counter
  • edx - data
  • edi - destination
  • esi - source
  • esp - stack pointer
  • ebp - base stack     
          frame pointer 
  • eip* - instruction pointer
  • Flags - set from instructions

High speed memory used to store information temporarily

* not accessible like the other registers

The names do not matter for the use of the registers, but sometimes are hints to how they are used.

Registers x64

Same as x86 but now we have more and larger registers! 

Heres the big picture, but we don't need all these!

Floating Point Registers

Flags

And a bunch of other stuff...

Sizes

  • rax   - 64-bits, 8-bytes, quad-word (qword) 
  • eax   - 32-bits, 4-bytes, double-word (dword)
  • ax    - 16-bits, 2-bytes, word   
  • al/ah - 8-bits,  1-byte,  byte

- eax is the lower 32-bits of rax

- ax is the lower 16-bits of eax and rax

- And so on

- This is true for ebx, ecx, edx, and the numbered registers as well. 

- Not all registers have byte sized references, such as esp and ebp

Intel vs AT&T

We will focus on Intel syntax, but know that AT&T syntax exists.

 

Main difference is in the source and destination operand order

 

edi - destination
esi - source
mov edi, esi

Intel

mov %esi, %edi

AT&T

In both examples, the contents of the esi register are copied to the edi register

Instruction types

If interested in disassembly then this diagram is useful to you!

We will use libraries that handle disassembly for us, but you should be familiar with the concept.

mov & push/pop

mov eax, 0x01        ;put 1 into eax
mov [eax], 0x01      ;put 1 into the address in eax
mov eax, [esi]       ;put contents of address (esi)

push eax             ;put contents of eax on top of stack
push 0x01            ;put 1 on top of stack
                     ; and inc the stack pointer

pop eax              ;put contents top of the stack into eax,
                     ; and dec the stack pointer 

Displacement

[] indicates a access to memory*

[base + index*size + offset]
; size can only be 1,2,4,8

[arr + esi*4 + 0]     ;array of int

*does not mean the memory is actually accessed

What could the offset be used for?

lea

lea eax, ecx   ;invalid
lea eax, [ecx] ;valid, equivalent to mov eax, ecx

lea eax, [ecx + edx]   ;mov eax, ecx + edx*1 (implicit 1)
lea eax, [ecx + edx*3] ;invalid, valid numbers are 1,2,4,8

lea eax, [eax + edx*4] ;can be thought of as 
                       ; eax = (DWORD *)eax[edx] why?

Displacement

lea does not access memory with the displacement operator! It only does the pointer arithmetic with no dereference! 

Branching

jmp addr     ;addr could be a register
             ; with an address or a label

this_is_a_label:

call addr    ; functions are just labels (addresses), with a calling convention
ret          ; using the correct calling convention, 
             ;  ret returns from the called function
syscall      ; more commonly seen as 'int' for interrupt
je addr  ; or jz  -- if zero flag is set
jg addr  ; or ja  -- if greater - signed or unsigned 
jl addr  ; or jb  -- if less    - signed or unsigned
jge addr ;        -- if greater or equal to
jle addr ;        -- if less or equal to
js addr  ;        -- if sign bit is set (if negative)

Conditional branching

Flags

carry    -- used to indicate carry in arithmetic operation                    
zero     -- if a value is zero or comparison equals 0
sign     -- if negative
overflow -- if overflow occurred

Each flag is set from certain instructions

int *foo(c,d) {
    char e;
    void *yeet = malloc(sizeof(c)*d);
    /* Stop! */
    return (int *)yeet;
}

int main(int argc, char *argv[]) {
    int a = 5, 3;
    int b = 7;
    char *bar = foo((b,a),b);
    
    return 0;
}
foo:
    push ebp
    mov ebp, esp

    sub esp, 8            ;make room
    mov ecx, [ebp + 4]    ;get c
    mov edx, [ebp + 8]    ;get d
    
    mov eax, 4     ;sizeof(int)    
    mul edx        ;sizeof(int)*d
    
    push eax       ;arg to malloc
    call malloc    
    add esp, 4     ;clean up arg
    mov [esp], eax ;store in yeet
    
    add esp, 8     ;clean up locals
    pop ebp
    ret

main:
    push ebp
    mov ebp, esp
    
    push 5       ;a
    push 7       ;b
    sub esp, 4   ;bar
    
    mov eax, [esp]      ;get b
    mov ebx, [esp + 4]  ;get a

    push ebx         ;d
    push eax         ;c
    call foo
    add esp, 8       ;clean up args  
    mov [esp], eax   ;store in bar

    add esp, 12      ;clean up locals
    mov eax, 0       ;return 0
    pop ebp
    ret