News | Recommended Books | Recommended Links | Compilation | Decompilation | Decompilers |
Disassemblers | Legal Aspects | C | Java | Clipper and Foxpro | Etc |
Decompilation is actually is pretty underdeveloped area, but many methods developed for compilation and especially for the optimization of object code are directly applicable to the problem.
Students that wish to study this area are strongly advised to learn basic theory of compilation and, especially, classic code optimization methods. Generally one needs to augment pattern-based constructs recovery with control flow and data flow analysis.
The control flow analysis is much better suited for recovery of control structures than pure pattern matching.
Data flow analysis makes it possible to detect major structures and even somewhat understand their usage.
Another way to extract useful information is slicing. Peephole optimization can used as decompilation method as well.
Dr. Nikolai Bezroukov
freshmeat.net Project details for uncc
uncc is a C decompiler which helps reverse engineers and programmers improve their understanding of assembly code.
Homepage:
http://freshmeat.net/redir/uncc/41369/url_homepage/www.uncc.info
Tar/GZ:
http://freshmeat.net/redir/uncc/41369/url_tgz/uncc-0.1.2.1.tar.gz
Pyreverse v. 0.2
About: pyreverse is a set of tools for reverse engineering Python code. It features dependency analysis tools, documentation generation, and XMI generation for importation in a UML modeling tool. A special module can be used to generate files readable by Argo UML.
Changes: Support was added for links between classes, exceptions raised in functions, and package diagrams. Some documentation with a full example was added.
[Dec 28, 2001] Decompilation page and link to a decompiler by Satish Kumar. Contains a beta version of DisC - Decompiler for TurboC and a small intro to the problem of decompilation using Intel assembler fragments of small C programs as an example. Compare to Decompilation of Binary Programs - dcc
[Dec 18, 2000] Clipper and FoxPro decompilation
[Oct 10, 2000] Filter Factory Plug-in Decompiler This little 16bit DOS program generates a ".afs" of any ".8bf" file compiled by the Filter Factory of Adobe Photoshop (PC version only).
[Sept 15, 2000] Java decompilers
C. Cifuentes and K. Gough. Decompilation of binary programs. Software -- Practice and Experience, 25(7):811--829, July 1995.
C. Cifuentes, Interprocedural data-flow decompilation.
Journal of Programming Languages Vol 4 No 2 (1996) pp 77--99
Of Graphs and Coffi
Grounds: Decompiling Java - Patrick Lam (1998)
......expressions, must be combined with the control-flow restructuring. Finally, phase III of decompilation must be implemented; we must eliminate goto for arbitrary control-flow graphs. This will probably be done using work previously done with the Compilers and Concurrency group, described in [Ero95] [EH94]. 6 Summary This paper discusses techniques used in the control-flow structuring phase of the Sculpt decompiler. In particular, the three main phases of control-flow structuring were described. Stage I recognized simple structures in the control-flow graph which could be locally reduced; among ......
[EH94] Ana M. Erosa and Laurie J. Hendren. Taming control flow: A structured approach to eliminating goto statements. In Proceedings of the 1994 International Conference on Computer Languages, pages 229--240, May 1994.
Making Graphs Reducible with Controlled Node Splitting - Johan Janssen (1997) H (Correct)
......control flow graphs is the fact that data flow analysis, that is an essential part of any compiler, can be done more efficiently [3]. Related work: The problem of converting irreducible flow graphs to reducible flow graphs can be tackled at the front-end or at the back-end of the compiler. In [4] and [5] methods for normalizing the control flow graph of a program at the front-end are given. These methods rewrite an intermediate program in a normalized form. During normalization irreducible flow graphs are converted to reducible ones. To make a graph reducible, code has to be duplicated, ......
[4] Ana M. Erosa and Laurie J. Hendren. Taming control flow: A structured approach to eliminating goto statements. In Proceedings of the 1994 International Conference on Computer Languages, pages 229--240, Toulouse, France, May 1994.
GOTO Removal Based On Regular Expressions - Paul Morris Ronald
......our GOTO removal program served as a bridge to extend the range of the software. We have also applied this work to the process of "levitating" IBM 370 assembly language code to higher-level forms (Morris et al., 1996). Previous work (Ashcroft et al., 1972, Peterson et al., 1973, Ramshaw, 1988, Erosa et al., 1994), while highly developed and abstract in nature, treats GOTO removal as an isolated problem in the area of programming languages. This paper is based on the observation that GOTO removal for flowcharts resembles the problem of converting finite-state transition networks to regular expressions, and ......
......preferred over maintaining the existing structure. The algorithm presented here is applicable to nonreducible programs, and meets the lesser path-equivalence standard. Subjectively, this standard appears to produce more readable results than approaches that introduce auxiliary variables, such as (Erosa et al., 1994), and is easier to relate to the source presented for restructuring. These considerations are important for reverse-engineering. While the algorithm discussed in this paper has worked well in practical contexts, we feel its most significant feature is the link to important concepts in computer ......
Erosa, A. M. and Hendron, L. J. (1994). `Taming Control Flow: A structured Approach to Eliminating Goto Statements,' Proc. IEEE International Conference on Computer Languages, 229--240.
freshmeat.net Decompilers and Disassemblers -- 11 entries as of 01/14/2002
Decompiler - Room 42 Computers & The Internet Resource Center
Java Code Engineering & Reverse Engineering - Miscellaneous - Meurrens 980131
Pin-Outs.Com Programming Languages Java Decompilers_and_Disassemblers
Java Virtual Machine (JVM) and Bytecodes Web Resources
Open Directory:
Cracking The Art of Reverse Engineering
The dcc decompiler decompiles .exe files from the (i386, DOS) platform to C programs. The final C program contains assembler code for any subroutines that are not possible to be decompiled at a higher level than assembler.
The analysis performed by dcc is based on traditional compiler optimization techniques and graph theory. The former is capable of eliminating registers and intermediate instructions to reconstruct high-level statements; the later is capable of determining the control structures in each subroutine.
Please note that at present, only C source is produced; dcc cannot (as yet) produce C++ source.
The structure of a decompiler resembles that of a compiler: a front-, middle-, and back-end which perform separate tasks. The front-end is a machine-language dependent module that reads in machine code for a particular machine and transforms it into an intermediate, machine-independent representation of the program. The middle-end (aka the Universal Decompiling Machine or UDM) is a machine and language independent module that performs the core of the decompiling analysis: data flow and control flow analysis. Finally, the back-end is high-level language dependent and generates code for the program (C in the case of dcc).
In practice, several programs are used with the decompiler to create the high-level program. These programs aid in the detection of compiler and library signatures, hence augmenting the readability of programs and eliminating compiler start-up and library routines from the decompilation analysis.
SAS2000: Efficient Inference of Static Types for Java Bytecode | back |
Authors: Etienne Gagnon, Laurie Hendren and Guillaume Marceau
Date: January 2000Abstract
Even though Java bytecode has a significant amount of type information embedded in it, there are no explicit types for local variables. However, knowing types for local variables is very useful for both program optimization and decompilation. In this paper, we present an efficient and practical algorithm for inferring static types for local variables in a 3-address, stackless, representation of Java bytecode.By decoupling the type inference problem from the low level bytecode representation, and abstracting it into a constraint system, we show that there exists verifiable bytecode that cannot be statically typed. Further, we show that, without transforming the program, the static typing problem is NP-hard. In order to develop a practical approach we have developed an algorithm that works efficiently for the usual cases and then applies efficient program transformations to simplify the hard cases.
Our solution is an multi-stage algorithm. In the first stage, we propose an efficient algorithm that infers static types for most bytecode found in practice. In case this stage fails, the second stage is applied. It consists of a simple and efficient variable splitting operation that renders most bytecode typeable using the algorithm of stage one. Finally, for completeness of the algorithm, we present a final stage that efficiently transforms and infers types for all remaining bytecode (such bytecode is likely to be a contrived example, and not code produced from a compiler).
We have implemented this algorithm in the Soot framework. Our experimental results show that all of the 17,000 methods used in our tests were successfully typed, 99.8% of those required only the first stage, 0.2% required the second stage, and no methods required the third stage.
Krakatoa: Decompilation in Java (Does Bytecode Reveal Source?) - Todd A. Proebsting, Scott A..
Abstract: This paper presents our technique for
automatically decompiling Java bytecode into Java source. Our technique
reconstructs source-level expressions from bytecode, and reconstructs
readable, high-level control statements from primitive goto- like branches.
Fewer than a dozen simple coderewriting rules reconstruct the high-level
statements. 1 Introduction Decompilation transforms a low-level language into
a high-level language. The Java Virtual Machine (JVM) specifies a low-level
bytecode language for a stack-based machine [LY97]. This language defines 203
operators, with most of the control flow specified by simple explicit
transfers and labels. Compiling a Java class yields a class file that contains
type information and bytecode. The JVM requires a significant amount of
type... (Correct
Abstract)
......since the output of the decompiler should be as readable as possible.
Her technique structures old FORTRAN programs for readability. As a result,
her technique may leave some goto's in the resulting programs, which is not
allowed in Java. Other techniques for eliminating goto's have been proposed
[EH94, Amm92, AKPW83, AM75]. These
techniques may change the structure of the program, and may add condition
variables, or create subroutines. 8 Conclusion In this paper, we present a
technique for decompiling Java bytecode into Java source. Our decompiler,
Krakatoa, produces syntactically legal Java source from legal,
......
[EH94] Ana M. Erosa and Laurie J. Hendren.
Taming control flow: A structured approach to eliminating goto
statements. pages 229--240. International Conference on Computer
Languages, May 1994.
Ir. Marc W.F. Meurrens: "Java Bytecode Engineering and Reverse Engineering" -- very interesting page [Link updated Jan 16, 2000]
Structuring Decompiled Graphs - Cifuentes (ResearchIndex)
Details Context 1.
F.E. Allen. Control flow analysis. SIGPLAN Notices, 5(7):1--19, July
1970.
Details Context 2.
F.E. Allen. A basis for program optimization. In Proc. IFIP Congress,
pages 385--390, Amsterdam, Holland, 1972. North-Holland Pub.Co.
Details Context 3.
F.E. Allen. Interprocedural data flow analysis. In Proc. IFIP Congress,
pages 398--402, Amsterdam, Holland, 1974. North-Holland Pub.Co.
Details Context 4.
F.E. Allen and J. Cocke. Graph theoretic constructs for program control flow
analysis. Technical Report RC 3923 (No. 17789), IBM, Thomas J. Watson
Research Center, Yorktown Heights, New York, July 1972.
Details Context 5.
F.E. Allen and J. Cocke. A program data flow analysis procedure.
Communications of the ACM, 19(3):137--147, March 1976.
Details Context 6.
Z. Ammarguellat. A control-flow normalization algorithm and its
complexity. IEEE Transactions on Software Engineering, 18(3):237--251, March
1992.
Details Context 7.
B.S. Baker. An algorithm for structuring flowgraphs. Journal of the ACM,
24(1):98--120, January 1977.
Details
Context
8. C. Bohm and G. Jacopini. Flow diagrams, Turing machines and languages with
only two formation rules. Communications of the ACM, 9(5):366--371, May
1966.
Details
Context
9. C. Cifuentes. Reverse Compilation Techniques. PhD dissertation,
Queensland University of Technology, School of Computing Science, July
1994.
Details
Context
10. C. Cifuentes. Interprocedural dataflow decompilation. In print:
Journal of Programming Languages, 1996.
Details Context 11.
C. Cifuentes and K.J. Gough. Decompilation of binary programs. Software --
Practice and Experience, 25(7):811--829, July 1995.
Details Context 12.
J. Cocke. Global common subexpression elimination. SIGPLAN Notices,
5(7):20-- 25, July 1970.
Details Context 13.
A.M. Erosa and L.J. Hendren. Taming control flow: A structured approach to
eliminating goto statements. In Proceedings of the International Conference
on Computer Languages, Universit'e Paul Sabatier, Toulouse, France, May 1994.
IEEE Computer Society.
Details Context 14.
M.S. Hecht. Flow Analysis of Computer Programs. Elsevier North-Holland,
Inc, 52 Vanderbilt Avenue, New York, New York 10017, 1977.
Details Context 15.
B.C. Housel. A Study of Decompiling Machine Languages into High-Level Machine
Independent Languages. PhD dissertation, Purdue University, Computer
Science, August 1973.
Details Context 16.
G.L. Steele Jr. and G.J. Sussman. Design of a LISP-based microprocessor.
Communications of the ACM, 23(11):628--645, November 1980.
Details Context 17.
D.E. Knuth and R.W. Floyd. Notes on avoiding go to statements.
Information Processing Letters, 1(1):23--31, 1971.
Details Context 18.
S.R. Kosaraju. Analysis of structured programs. Journal of Computer and
System Sciences, 9(3):232--255, 1974.
Details Context 19.
U. Lichtblau. Decompilation of control structures by means of graph
transformations. In Proceedings of the International Joint Conference on
Theory and Practice of Software Development (TAPSOFT), Berlin,
1985.
Details Context 20.
U. Lichtblau. Recognizing rooted context-free flowgraph languages in
polynomial time. In G. Rozenberg H. Ehrig, H.J. Kreowski, editor, Graph
Grammars and their application to Computer Science, number 532 in Lecture Notes
in Computer Science, pages 538--548. Springer-Verlag, 1991.
Details
Context
21. J. McCarthy. Recursive functions of symbolic expressions and their
computation by machine, part I. Communications of the ACM, 3(4):184--195,
April 1960.
Details Context 22.
G. Oulsnam. Unravelling unstructured programs. The Computer Journal,
25(3):379--387, 1982.
Details Context 23.
D.J. Pavey and L.A. Winsborrow. Demonstrating equivalence of source code and
PROM contents. The Computer Language, 36(7):654--667, 1993.
Details Context 24.
L. Ramshaw. Eliminating go to's while preserving program structure.
Journal of the ACM, 35(4):893--920, October 1988.
Details Context 25.
M. Sharir. Structural analysis: A new approach to flow analysis in optimizing
compilers. Computer Languages, 5:141--153, 1980.
Details Context 26.
R.L. Sites, A. Chernoff, M.B. Kirk, M.P. Marks, and S.G. Robinson. Binary
translation. Communications of the ACM, 36(2):69--81, February
1993.
Details Context 27.
M.H. Williams. Generating structured flow diagrams: the nature of
unstructuredness. The Computer Journal, 20(1):45--50, 1977.
Details Context 28.
M.H. Williams and G.Chen. Restructuring Pascal programs containing goto
statements. The Computer Journal, 28(2):134--137, 1985.
Details Context 29.
M.H. Williams and H.L. Ossher. Conversion of unstructured flow diagrams to
structured form. The Computer Journal, 21(2):161--167, 1978. This article
was processed using the L A T E X macro package with LLNCS style
Details Context
[AKPW83] J.R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. Conver-
sion of control dependence to data dependence. pages 177--189,
1983.
Details Context
[AM75] E. Ashcroft and Z. Manna. Translating programs schemas to
while-schemas. SIAM Journal of Computing, 4(2):125-- 146, 1975.
Details Context
[Amm92] Zahira Ammarguellat. A control-flow normalization algorithm and its
complexity. IEEE Transactions on Software Engineering, 18(2):237--250,
1992.
Details Context
[ASU86] A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles,
Techniques, and Tools. Addison-Wesley, Reading, Massachusetts,
1986.
Details Context
[Bak77] Brenda S. Baker. An algorithm for structuring flowgraphs. Journal
of the Association for Computing Machinery, 24(1):98--120, January
1977.
Details
Context
[Cif93] Cristina Cifuentes. A structuring algorithm for decompilation. In
Proceedings of the XIX Conferencia Latinoamericana de Informatica, pages
267--276, Buenos Aires, Argentina, August 1993.
Details Context
[EH94] Ana M. Erosa and Laurie J. Hendren. Taming control flow: A structured
approach to eliminating goto statements. pages 229--240. International
Conference on Computer Languages, May 1994.
Details Context
[LY97] Tim Lindholm and Frank Yellin. The Java Virtual Machine
Specification. The Java Series. Addison-Wesley, 1997.
Details Context
[PKT73] W.W. Peterson, T. Kasami, and N. Tokura. On the capabilities of while,
repeat and exit statements. Communications of the ACM, 16(8):503--512,
1973.
Details Context
[Ram88] Lyle Ramshaw. Eliminating go to's while preserving program
structure. Journal of the Association for Computing Machinery,
35(4):893--920, October 1988.
Details
Context
[Sri96] KB Sriram. Hashjava. url: http://www.sbktech.org/hashjava.html,
1996.
Details Context
[vV96] Hanpeter van Vliet. Mocha. current url:
http://www.brouhaha.com/~eric/ computers/mocha-b1.zip, 1996.
Details Context
[Wil77] M.H. Williams. Generating structured flow diagrams: The nature of
unstructuredness. Computer Journal, 20(1):45--50, 1977.
Details Context
[Wil97] U. G. Wilhelm. Cryptographically protected objects, May 1997. A
french version appeared in the Proceedings of RenPar'9, Lausanne, CH.
http://lsewww.epfl.ch/~wilhelm/ CryPO.html.
Details Context
[WO78] M.H. Williams and H.L. Ossher. Conversion of unstructured flow diagrams
to structured. Computer Journal, 21(2):161--167, 1978. A Additional Rewriting
Rules We anticipate using a few other tree rewriting rules that might improve
readability of our code. The anticipated rules build more natural
for-loops.