1. the possible functions are identified and

First give a summary of the paper consisting of 4 to 5 sentences. What problems
does the paper want to solve? How does it solve it? What are the conclusions
and contributions?

This paper states a technique that is
of more precision than the existing-synonym techniques for binary diffing i.e.
determining differences present between two binary executables. When we talk
about analysis of malware variants that encrypt files, to protect the software
program from any attacker who is intended to reverse engineer it different
number of techniques were invented that focused on analyzing similarity
semantically instead of apparent and evident difference in the syntax. Existing
methods model the semantics of code snippet and compare the runtime behaviors
by analyzing a program to determine what inputs cause each part of a program to
execute. However, none of the techniques provides the expected amount of

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

The proposed method gives attention to
detail while comparing two binary executables. In this method, the computation
of the set of program statements and symbolic evaluation i.e. analyzing a
program to see the behavior of each part of the program by providing an input
is performed. Using this method the binary diffing is improved in two manners.
Firstly, deciding whether the behaviors of the two binary executables are
equivalent. Secondly, determining the similarities or differences that affect
the straight-line code sequence (one branch for entry and one for exit).

To calculate the system call arguments
that are matched, symbolic execution and dynamic analysis is performed.
Firstly, both of the programs are run with same inputs to collect the logged
traces according to their call sequence. Then the list of systems call
arguments are managed to extract the matching sequences. Once the list is
aligned backward slicing is performed to identify specific instructions that
are having some effect on the argument during the flow of data and control. The
next step involves calculating the weakest precondition for every slices of the
program. Weakest precondition is a formula for capturing the data and control
flow that has any effect on the calculation of arguments. To overcome the
complications that might occur during the use of cryptographic functions only
the possible functions are identified and decomposed from the equivalence
checking algorithm. Then verification is processed to check for the equivalency
of the two weakest preconditions.  The
last step involves the calculation of the similarity score by matching the
cryptographic functions which were identified in the previous step.

BinSim consists of two stages. The
first one is online trace logging and the other one comprise of offline
comparison. Preprocessing, weakest precondition calculation and segment
equivalency checker are the three components of offline analysis. The data by
trace logging has three parts. An instruction log, which contains the values of
the operand and the x86 opcode of each of the instructions executed. A memory
log, which stores addresses for accessing the memory. To store the trace from
real payload BinSim uses an unpacking module. It prevents the traces from being
recorded from different unpacking routines.

Preprocessing, weakest precondition
calculation and segment equivalency checker are the three components of offline
analysis. Upon the arrival of trace data the x86 instructions are lifted to
Vine IL by BinSim. When performing the backward slicing Vine IL helps tracking
the data flow. Then the sequence of system calls is arranged and the matched
calls are identified. After it, dynamic slicing is performed to backtrack the instructions
having data dependencies. The slicing is terminated if one the following
situation occurs: if the value is constant, the value is read by data that is
initialized, or output of previous system call. The program dependence graph
represents the control and data dependencies. The dynamic slicing method is
dependent on this graph. Dynamic slicing, if compare to slicing of the source
code, the access of memory in an indirect manner will deviate the tracing of data
flow which will result in weakest precondition calculation. The solution
presented splits the control and data dependencies into three steps. The first
one involves value based slicing by considering just the control data. The
second one, track of control dependencies and the last on removes the fake dependencies
which are result of obfuscation code dispatcher.

To check whether the two calls are semantically
equivalent or not the equivalence of the weakest precondition of their
arguments is checked. In the block –centric method the equivalence is checked
limited to a single block while in the BinSim approach the logic of the
instructions is captured throughout the basic block. For better performance a
HashMap structure is maintained to store the results of previous comparison.
Similarity is represented by different scores from 0.5-1.0. A specific result
is extracted by these scores. 1.0 stands for pass signature i.e. the two calls
have passed the equivalence checking. 0.7 indicates that the two calls belong
to same cryptographic function. 0.5 means that the two calls do not meet the
above mentioned criteria of equivalence.

BinSim is a prototype developed by using
the above mentioned methodology and tested against malware samples including
the crypto-ransomware and also against the binaries with obfuscation on them. The
results inferred by experimentations show that BinSim attains success
identifying the similarities and differences between the binaries that
protected from attackers to reverse engineer. It enhances the precision of the existing
binary diffing tools. To compare two binaries and detect their similarity
symbolic evaluation and dynamic slicing is used. The evaluation shows that
Binsim is a great addition in software security section. In comparison with the
existing binary diffing methods, this approach finds the similarities or
differences in the basic block and checks for the equivalency of the two

For analysts, investigating suspicious
executables this method is a compelling technique. BinSim introduces System
Call Sliced Segment Equivalence Checking, a concept that slices the program and
executes it in a symbolized manner and then infers if there is any similarity.
Because of its similarity and difference detecting techniques across multiple
basic blocks it makes possible to terminate the block-centric limitation. It is
complicated to perform program slicing on the obfuscated executables. Because
of accessing the memory indirectly the output generated by program slicing can
be contaminated. The algorithm in BinSim is improved to ensure precision.

What do you like about the paper? In your opinion, what’s important about this


What don’t you like about the paper? What are the limitations or potential

While performing malware dynamic
analysis the analysts face certain limitations using the method introduced in
this research. The path could be covered incompletely and there is an increase
in dependencies of malware on the runtime environment. The similarities and
differences by BinSim can only be detected during execution of the binaries but
not in the sleep state. While analyzing similar malware variants the result can
have alternative signatures. This difference in the signature of similar
malwares can reduce the similarity percentage of BinSim. The dynamic slicing
performed in BinSim uses the basic block instead of the block-centric
evaluation that can enhance the amount of dependencies which will impact in a
huge memory consumption than normally equipped and the whole process can become
more complicated. In the worst case, dependencies can be widely spread to the
entire set of instruction. To bypass this methodology, the attackers must have in-depth
knowledge regarding program analysis.  To
achieve this, it requires that future have much deeper understanding about
program analysis including the analysis of data and control flow inside of the
procedures. The attackers will also require great effort while customizing and engineering.
Another limitation is with them matching for which BinSim uses cryptographic
functions. An attacker can prevent the approximate matching performed by BinSim
by customizing unknown cryptographic algorithm which is not an easy task. Another
limitation is the use of stalling code in the malware. A malware can sleep for
any amount of time and invokes after it. Detecting the stalling code in the
malware is not a simple task. An approach to detect stalling code in the
malware is to run its instance multiple times. But yet there is another issue
with that. The stalling code in the malware changes every time we initiate the
process which makes it even more difficult to detect the stalling code.  One more limitation is the detection of the
sandbox environment by the malware. This is again a crucial issue that should
be focused. When the malware detects the sandbox environment the reaction
deviate the results and the system calls fail to match. In able to fix the
sandbox detection our first priority should be understanding the techniques
used by the malwares to evade the cryptographic functions. Understanding these
techniques is quite a bulky task. There are a number of tools available to
detect the malware behavior changes but they need additional information
including the set of systems calls used to invoke the behavior of the malware. For
which a lot of resources and time are needed. To overcome these limitations
there are some solutions which will enhance the chances for attackers not being
able to bypass. The platform upon which the analysis runs should be transparent
so the malware cannot perform its platform dependent operations. Also, the
input generation is needed to be automated. Another limitation is with the
unpacking methods which can be bypassed by engineering new methods of packing
extending the concepts of parallel unpacking.  The solution to this issue could be possible
if a whole new architecture is designed that allows tracing the events occurring
alternatively so that functionality to a whole new level could be achieved.