• O6_295100
  • 3.6MB
  • zip
  • 0
  • VIP专享
  • 0
  • 2022-04-01 01:10
# `HeadersAnalyzer` This tool runs at compile time and generates binaries that our emulator uses at run time. > Note that this documentation might be outdated. ## How does it work? There are two types of functions - exported and callbacks. By exported functions, we mean functions exported from iOS `.dylib`s, i.e., functions that emulated apps can call. We generate wrappers for those functions that take an emulation context (i.e., registers and stack of the virtual machine, e.g., `uc_engine *`) and function pointer, extract its arguments and call the function. By callbacks, we mean functions that exist in the emulated code and that our compiled code can call (e.g., if some event occurs, etc.). We manually identify those callbacks in our source code and wrap it in some helper function or macro that instead of calling the function directly, looks if it's inside the emulated app or not. If it isn't, it is called normally. If it is, however, it calls a wrapper that we generate at compile time. This wrapper takes the function's arguments and emulation context and copies the arguments into registers and stack of the virtual machine and calls the function inside the VM. Here's how our compile-time code generation utility works: 1. It analyzes TBD files of iOS `.dylib`s to find exported functions. 2. Via Clang, it analyzes iOS header files (from `/deps/apple-headers/iPhoneOS11.1.sdk/`) to find those exported functions' signatures. It gets debugging information for those functions, as if we're inside them when they're being called (this will be useful when calling from emulated to compiled code). It also gets debugging information for callback functions, this time the other way around, i.e., as if we're outside of them when they're being called (this will be useful when calling from compiled to emulated code). 3. It analyzes our compiled `.dll`s and their `.pdb`s to find exported functions and their signatures, respectively. It maps iOS functions to our functions and verifies that they have compatible signatures. 4. It also generates a map that is used at run time to quickly map from function names imported in the emulated app to function addresses exported from our compiled `.dll`s (this mapping is done via ordinals on the `.dll` side, rather than names, for better performance). 5. Via LLDB, it generates wrappers mentioned above. It throws function signatures at LLDB and asks it where are the arguments in registers and stack. It then puts or extracts arguments from there (when generating wrappers from compiled to emulated code, or from emulated to compiled code, respectively). 6. We also parse `WinObjC` headers (i.e., those used to produce our `.dll`s) to find which functions are implemented and which are not. This can be easily determined because unimplemented functions are marked as deprecated with `__attribute((deprecated))`. They also have documentation comments with more details and we parse those, too. ### Analyzing TBD files This is implemented with the help of Apple's TAPI library that can read TBD files. See [our documentation about it](../../docs/tapi.md) for details. Note that our goal is to later generate `.dylib`s that look exactly like it's described in TBD files from the iOS SDK. To be able to do that, several things interest us in the TBD files. First is list of exported symbols, of course. We simply export those symbols from our wrapper `.dylib` (see below). Second is list of re-exported symbols. Again, we simply re-export symbols the same way real `.dylib` would do it (and our dynamic loader handles it the same way `dyld` would). Third is list of Objective-C classes. Not only we export symbols for them (`OBJC_CLASS_$_` and `OBJC_METACLASS_$_`), but we also note them to later generate wrappers for all of their methods. ### Analyzing iOS headers > Note that better approach to this would be analyzing debugging symbols of iOS `.dylib`s. > That way, we would get all the necessary type information and we could be 100% sure that it is correct (i.e., matching the `.dylib`s). > When analyzing headers, we might get wrong information if we don't configure Clang exactly the same way Apple did when building the `.dylib`s. > Unfortunately, it seems that there are no debugging symbols available for Apple's `.dylib`s. #### Clang command line To analyze headers, we run Clang on iOS headers as if we were building some iOS `.dylib`. In fact, we already know how to do this - we ported `libobj.A.dylib`. Of course, we built it for Windows platform, but the sources we based our build commands on were targeted for macOS/iOS. So, here we can base our Clang command line on those sources (it probably doesn't differ that much, it's mainly about the `-target` option). See `analyze_ios_headers.txt` for the command line arguments. #### Possible implementations We need LLVM and debugging information about all the functions, so that we can later figure out how and where arguments lie in memory and registers. To do that, we execute `EmitLLVMAction` along with option `-g` (to emit debug info). That won't preserve undefined functions, though (i.e., functions that only have declarations, no body). We obviously have mostly those functions as we analyze headers and we want them emitted too (with empty bodies possibly as we only care about signatures). This could be done by rewriting the AST tree before emitting LLVM to include empty bodies for every function (possibly with some throw statement, so that even functions that should return something are valid). Or we could lower the functions manually to LLVM representation and then get debug info for them. Or we could just somehow "use" the function's type since that all we care about anyway. For example, emitting some global variable and initializing it with address to that function. But that would also require an AST-rewriting step or emitting textual code. **TODO: This doesn't get us inherited methods, so we don't have wrappers for them.** #### Our approach We chose the simplest (and we believe also the cleanest) solution - we used Clang's `LangOpts.EmitAllDecls` and made it emit really all declarations. Until our changes, this option only emitted functions that *had bodies* but were discarded because no other function referenced them. After our changes, this option emits also functions without bodies. See tag `[emit-all-decls]` in Clang's code to see those changes. **TODO: Note that currently, we actually emit *definitions* even if they were only *declarations* in the source code. Those fake definitions have probably invalid bodies but we don't care since we are only interested in signatures. We do that just for simplicity - there were less modification of Clang's code this way. But it would really be better if we didn't emit any bodies, i.e., declarations were emitted as declarations. Now, it would probably be a problem if there was a definition and a declaration of the same function in the analyzed files. Because we would make the declaration into a definition and then there would be two definitions of the same thing. Also, we should probably name the option differently and don't extend the existing `EmitAllDecls` since it actually does a different thing. Our option (let's call it `DeclareEverything`) includes all declarations in the resulting LLVM IR, whereas the existing option `EmitAllDecls` really just makes *definitions* (duh) that would otherwise be discarded visible in the resulting LLVM IR.** ### Generating wrappers There are two possible approaches to this. One would be to leverage debugging information to get mapping from function arguments to registers and stack offsets. This is what debugger should know - if we are debugging and set a breakpoint inside a function, it has to show us values of the function's arguments. Although, it may as well use the other approach, too. This other approach is to use Clang to generate wrappers in ARM and in i386 for every function. The