vec-data-reader

所属分类:其他
开发工具:Clojure
文件大小:0KB
下载次数:0
上传日期:2020-09-14 17:54:21
上 传 者sh-1993
说明:  创建Clojure标记的文本时可能会发生意外的行为,该文本返回一个作为基元向量的值,
(Possibly unexpected behavior when one creates a Clojure tagged literal that returns a value that is a vector of primitives,)

文件列表:
deps.edn (185, 2020-09-14)
src/ (0, 2020-09-14)
src/data_readers.clj (76, 2020-09-14)
src/vec_data_reader/ (0, 2020-09-14)
src/vec_data_reader/vdr.clj (5142, 2020-09-14)

# vec-data-reader Just some code for testing the behavior of a Clojure data reader for a tagged literal that returns a Clojure persistent vector of primitive values, in this case bytes. That is, it returns the same type as a call like `(vector-of :byte 1 2 3)`. There is some behavior that is not immediately obvious why it occurs, where if you define such a data reader, and use it in your Clojure code to contain a literal, and do not quote it, when such a vector is evaluated by `eval`, it returns a vector with type `clojure.lang.PersistentVector`, not the original type. This repository was created to record experiments and thinking I did while trying to answer this question: https://ask.clojure.org/index.php/9567/possible-create-tagged-literal-reader-for-clojure-core-vec ## Usage Below is a sample REPL session that was started from the root directory of this repository using the Clojure CLI tool, with the command named `clojure`. ```clojure user=> (require '[vec-data-reader.vdr :as vdr]) nil user=> (def bv0 (vdr/hex-string-to-clojure-core-vec-of-byte "0123456789abcdef007f80ff")) #'user/bv0 user=> (type bv0) clojure.core.Vec user=> (def bv1 (read-string "#my.ns/byte-vec \"0123456789abcdef007f80ff\"")) #'user/bv1 user=> (type bv1) clojure.core.Vec user=> (def bv2 #my.ns/byte-vec "0123456789abcdef007f80ff") #'user/bv2 user=> (type bv2) clojure.lang.PersistentVector user=> (def bv3 '#my.ns/byte-vec "0123456789abcdef007f80ff") #'user/bv3 user=> (type bv3) clojure.core.Vec user=> bv0 [1 35 69 103 -119 -85 -51 -17 0 127 -128 -1] user=> bv1 [1 35 69 103 -119 -85 -51 -17 0 127 -128 -1] user=> bv2 [1 35 69 103 -119 -85 -51 -17 0 127 -128 -1] user=> bv3 [1 35 69 103 -119 -85 -51 -17 0 127 -128 -1] user=> (type (eval bv0)) clojure.lang.PersistentVector user=> (type (eval bv1)) clojure.lang.PersistentVector user=> (type (eval bv2)) clojure.lang.PersistentVector user=> (type (eval bv3)) clojure.lang.PersistentVector user=> (class #my.ns/byte-vec "0123456789abcdef007f80ff") clojure.lang.PersistentVector ;; See [Note 1] below for some details on why the exception below ;; occurs, and why there is an object with 'reify' in its name ;; involved. ;; Note that this same error occurs regardless of whether one defines ;; the print-dup method for class clojure.core.Vec, or not. user=> (class '#my.ns/byte-vec "0123456789abcdef007f80ff") Syntax error compiling fn* at (REPL:1:1). Can't embed object in code, maybe print-dup not defined: clojure.core$reify__8311@625abb97 user=> (pst) Note: The following stack trace applies to the reader or compiler, your code was not executed. CompilerException Syntax error compiling fn* at (1:1). #:clojure.error{:phase :compile-syntax-check, :line 1, :column 1, :source "NO_SOURCE_PATH", :symbol fn*} clojure.lang.Compiler.analyzeSeq (Compiler.java:7115) clojure.lang.Compiler.analyze (Compiler.java:6789) clojure.lang.Compiler.eval (Compiler.java:7174) clojure.lang.Compiler.eval (Compiler.java:7132) clojure.core/eval (core.clj:3214) clojure.core/eval (core.clj:3210) clojure.main/repl/read-eval-print--9086/fn--9089 (main.clj:437) clojure.main/repl/read-eval-print--9086 (main.clj:437) clojure.main/repl/fn--9095 (main.clj:458) clojure.main/repl (main.clj:458) clojure.main/repl-opt (main.clj:522) clojure.main/main (main.clj:667) Caused by: RuntimeException Can't embed object in code, maybe print-dup not defined: clojure.core$reify__8311@4795ded0 clojure.lang.Util.runtimeException (Util.java:221) clojure.lang.Compiler$ObjExpr.emitValue (Compiler.java:4893) clojure.lang.Compiler$ObjExpr.emitValue (Compiler.java:4808) clojure.lang.Compiler$ObjExpr.emitConstants (Compiler.java:4934) clojure.lang.Compiler$ObjExpr.compile (Compiler.java:4612) clojure.lang.Compiler$FnExpr.parse (Compiler.java:4106) clojure.lang.Compiler.analyzeSeq (Compiler.java:7105) clojure.lang.Compiler.analyze (Compiler.java:6789) nil ;; Same-looking exception message if you replace 'class' with 'type' or 'inc' user=> (defn doit [x] (print (class x)) x) user=> ``` [Note 1] Thanks to Kevin Downey (aka hiredman on Clojurians Slack) for details on what is happening here. When the Clojure compiler compiles Clojure to JVM byte code, it embeds code for constructing objects that represent those Clojure values that appear as literals in the Clojure source. One place in the Clojure compiler where this is done is in method `emitValue` of class `Compiler$ObjExpr` in source file `Compiler.java` in the Clojure implementation. That is the method where a couple of the lines in the stack trace occur: ``` clojure.lang.Compiler$ObjExpr.emitValue (Compiler.java:4893) clojure.lang.Compiler$ObjExpr.emitValue (Compiler.java:4808) ``` (These line numbers come from using version 1.10.1 of Clojure, so those line numbers are relevant for that version of the Clojure source code.) The second of those lines is lower on the call stack, thus that call occurs first in time during compiler execution. It occurs when a value satisfies the condition `(value instanceof IType)`, where `clojure.lang.IType` is a "marker interface" for all objects created by Clojure's `deftype` macro. Clojure's primitive vectors have type `clojure.core.Vec`, and that class is created using `deftype` in the source file `gvec.clj`. In the `emitValue` code handling objects created via `deftype`, the behavior is basically to iterate through all fields of the JVM object, and emit the value of each of its fields. One of the fields of `clojure.core.Vec` is `am`, for "array manager", and its value is the return value from a `reify` call in macro `mk-am` in file `gvec.clj`. That is where the object with "reify" in its name comes from. So the time order of compiler events includes: do `emitValue` on an instance of `clojure.core.Vec`, which involves iterating through each of its fields and calling `emitValue` on them. When it gets to the `am` field, which has a value returned by `reify`, `emitValue` tries all of its cases, finally reaching the default `else` case at the end of a long `if-then-else-if` daisy chain, which tries to call `RT.printString(value)` on that object returned by `reify`. That call to `RT.printString(value)` throws an exception, which is caught in `emitValue` and results in the message "Can't embed object in code, maybe print-dup not defined: " followed by the object with "reify" in its name. Because there is an explicit case for objects implementing the `IType` interface in `emitValue`, that is higher priority than the one that calls `RT.printString`, it seems that the only way to prevent objects with class `clojure.core.Vec` from attempting to call `emitValue` on all of its fields would be to change this `emitValue` method in the compiler. That is, defining a `print-dup` method for objects with class `clojure.core.Vec` will not avoid the current `emitValue` behavior. Defining a `print-dup` method for objects with classes of objects returned by `reify` _might_ help, but only if the value printed, then read and evaluated, actually returned a usable object, similar to one returned from the original `reify` call, and that seems potentially tricky to do. Here is a "skeleton" of the `emitValue` method (of class `Compiler$ObjExpr`) and which conditions it checks on the `value` passed to it: ```java if(value == null) else if(value instanceof String) else if(value instanceof Boolean) else if(value instanceof Integer) else if(value instanceof Long) else if(value instanceof Double) else if(value instanceof Character) else if(value instanceof Class) else if(value instanceof Symbol) else if(value instanceof Keyword) else if(value instanceof Var) else if(value instanceof IType) else if(value instanceof IRecord) else if(value instanceof IPersistentMap) else if(value instanceof IPersistentVector) else if(value instanceof PersistentHashSet) else if(value instanceof ISeq || value instanceof IPersistentList) else if(value instanceof Pattern) else ``` The last `else` branch is the only one that contains a call to `RT.printString`. It seems that if one wanted to change the Clojure compiler to allow objects created by `deftype` to have more control over how their literal values were emitted in JVM byte code, one way would be to, in the `(value instanceof IType)` branch, check if the class had an implemention of `print-dup`, and if it did, use that. Only if it did not, then fall back to the current behavior for such objects. Note: I have not _tried_ that approach yet, and there could easily be problems with it that I have not thought of. Thinking about it more, there are definitions of the multi-method `print-dup` in Clojure for all of the classes and interfaces listed as the output of the last expression below. Notes: When declaring a multi-function in Clojure using `defmulti` like `print-dup`, the Var `print-dup` has a value that is of class `clojure.lang.MultiFn`. When one later declares methods for that multi-function using `defmethod`, a key/value pair is added to a private field named `methodTable` of that object, which can be retrieved using Clojure's `methods` function. In that map, the key is the multi-method's dispatch value, which is the class of the first argument for `print-dup`, and the value is the function that is the body of the `defmethod` call. ```clojure $ clojure Clojure 1.10.1 user=> (class print-dup) clojure.lang.MultiFn user=> (->> (methods print-dup) keys (map #(if (nil? %) "nil" (str %))) sort pprint) ("class clojure.lang.BigInt" "class clojure.lang.Keyword" "class clojure.lang.LazilyPersistentVector" "class clojure.lang.Namespace" "class clojure.lang.PersistentHashMap" "class clojure.lang.PersistentHashSet" "class clojure.lang.PersistentVector" "class clojure.lang.Ratio" "class clojure.lang.Symbol" "class clojure.lang.Var" "class java.lang.Boolean" "class java.lang.Character" "class java.lang.Class" "class java.lang.Double" "class java.lang.Long" "class java.lang.Number" "class java.lang.String" "class java.math.BigDecimal" "class java.sql.Timestamp" "class java.util.Calendar" "class java.util.Date" "class java.util.UUID" "class java.util.regex.Pattern" "interface clojure.lang.Fn" "interface clojure.lang.IPersistentCollection" "interface clojure.lang.IPersistentList" "interface clojure.lang.IPersistentMap" "interface clojure.lang.IRecord" "interface clojure.lang.ISeq" "interface java.util.Collection" "interface java.util.Map" "nil") nil ``` Note the class `clojure.lang.PersistentVector` and the interface `clojure.lang.IPersistentCollection` have `print-dup` methods defined for them. ```clojure (defn emitValue-branch-used [value] (cond (nil? value) "null" (instance? String value) "String" (instance? Boolean value) "Boolean" (instance? Integer value) "Integer" (instance? Long value) "Long" (instance? Double value) "Double" (instance? Character value) "Character" (instance? Class value) "Class" (instance? clojure.lang.Symbol value) "clojure.lang.Symbol" (instance? clojure.lang.Keyword value) "clojure.lang.Keyword" (instance? clojure.lang.Var value) "clojure.lang.Var" (instance? clojure.lang.IType value) "clojure.lang.IType (interface that is implemented by all classes created via deftype)" (instance? clojure.lang.IRecord value) "clojure.lang.IRecord (interface that is implemented by all classes created via defrecord)" (instance? clojure.lang.IPersistentMap value) "clojure.lang.IPersistentMap" (instance? clojure.lang.IPersistentVector value) "clojure.lang.IPersistentVector" (instance? clojure.lang.PersistentHashSet value) "clojure.lang.PersistentHashSet" (instance? clojure.lang.ISeq value) "clojure.lang.ISeq" (instance? clojure.lang.IPersistentList value) "clojure.lang.IPersistentList" (instance? java.util.regex.Pattern value) "java.util.regex.Pattern" :else "other")) user=> (def inst1 #inst "2020-09-14T01:00:00") #'user/inst1 user=> inst1 #inst "2020-09-14T01:00:00.000-00:00" user=> (emitValue-branch-used inst1) "other" user=> (emitValue-branch-used [1 2 3]) "clojure.lang.IPersistentVector" user=> (emitValue-branch-used (vector-of :byte 1 2 3)) "clojure.lang.IType (interface that is implemented by all classes created via deftype)" ``` ## Analysis of current behavior The root cause of the "Can't embed object in code, maybe `print-dup` not defined" with an object that has "reify" and a bunch of hex digits in its printed representation, is the following combination of factors: (1) Clojure primitive vectors are defined with `deftype`. (2) For all types defined via `deftype`, there is an `emitValue` Java method inside of Clojure's `Compiler.java` source file that has many cases for deciding how to embed a literal value in JVM byte code. You can search that file for the first occurrence of "IType", which is a Java interface that Clojure `deftype`-created types all implement, in order to later recognize that they were objects of a class created via deftype. When such an object is a literal inside of Clojure code, `emitValue` attempts to create JVM byte code that can construct the original value when that JVM byte code is later executed, and for `deftype`-created objects, it always tries to iterate through all fields of the object, and emit code for the field and its value. (3) Clojure primitive vectors have a field `am`, short for "array manager", that is an object created by calling Clojure's `reify` function. This object is used to implement several Java methods on "leaves" of the tree used to represent Clojure primitive vectors, one such object for each different primitive type. The JVM byte code for dealing with arrays of each primitive type is different. Rich Hickey in the `gvec.clj` code was probably going for run-time efficiency here by not detecting the primitive type at run time and doing a multi-way branch on every operation, but instead having an object that already had baked into it code for dealing with that vector's primitive type. (4) `emitValue`, when called with an object that is the return of a `reify` call, tries to call `RT.printString` on it, which would work if a `print-dup` method were defined to handle such objects. However, implementing a `print-dup` that produced readable representations of all possible objects returned by `reify` would be very tricky, since such objects can have arbitrary references to other JVM objects with internal state, or can have internal state themselves. ## Possible approaches to creating a literal for primitive vectors that can be embedded in compiled Clojure code What could be done about this? There are probably many alternatives I haven't thought of, but here are a few potential approaches, most of which would require changing Clojure's implementation in some way. ### Approach #1a Change Clojure's primitive vector implementation so that all of its field values were immutable values with printable representations, i.e. no objects returned from `reify`, nor any function references. Since primitive vectors are trees with O(log_32 n) depth, the representation created via `emitValue` would reflect that tree structure, but it seems like it could be made to work correctly. This would likely lead to some lower run-time performance of operations on primitive vectors, since there would need to be a run-time multiway branch, e.g. `case`, to handle the different primitive types in leaf nodes. ### Approach #1b Create a new implementation of Clojure primitive vectors that uses `deftype`, but has the changes suggested in Approach #1a above. No changes to Clojure's implementation would be required, since it would be a third party implementation that can make its own implementation choices. ### Approach #2 Change the `emitValue` method in `Compiler.java` so that for `deftype`-created objects, it somehow checked whether there was a `print-dup` method for that object's class first, and used it if it was available, falling back to the current approach if there was not. That would be somewhat tricky in this case, because Clojure primitive vectors implement the `clojure.lang.IPersistentCollection` interface, which already has a `print-dup` method that will not work for primitive vectors. One possibility is not to simply call `print-dup` and see what happens, but to check whether the `print-dup` multimethod has an implementation for _exactly_ the class of the object one is trying to do `emitValue` on, e.g. `clojure.core.Vec` for primitive vectors. Such an exact class check for multimethod implementations seems against the philosophy of multimethods in Clojure, and seems a bit hackish. Another cleaner variation on this idea would be to define a new `emittable` interface in Clojure's implementation, and if a `deftype`-created class implemented it, then `emitValue` would use the `emit` method of that interface on objects that implemented it. ### Approach #3 Create a separate Clojure primitive vector implementation that does not use `deftype`, nor `defrecord`, and falls into the last `else` case of the long if-then-else daisy chain of Clojure's `emitValue`. This seems difficult, or maybe impossible, to me, without changing the `emitValue` method, because it currently has a case for `clojure.lang.IPersistentVector` before the last `else`, and it would be very strange to try creating a Clojure primitive vector implementation that did not implement that interface. ### Summary of approaches Of the ones I have thought about, Approach #1b, or the last variant of approach #2, seem possibly workable. Approach #1b requires no changes to Clojure's implementation. Approach #2 definitely does. Approach #3 probably isn't really a viable alternative, for reasons stated above. ## License Copyright 2020 Andy Fingerhut This program and the accompanying materials are made available under the terms of the Eclipse Public License 1.0 which is available at https://www.eclipse.org/org/documents/epl-v10.html ## Scratch experiments ```clojure (require '[clj-java-decompiler.core :refer [decompile disassemble]]) ;; What do calls to Clojure multi-methods look like in JVM byte code? (defmulti andymultifn (fn [x arg2] (class x))) (defmethod andymultifn clojure.lang.IPersistentCollection [x ^java.io.Writer w] (.write w "andymultifn clojure.lang.IPersistentCollection")) (defmethod andymultifn clojure.lang.PersistentVector [x ^java.io.Writer w] (.write w "andymultifn clojure.lang.PersistentVector")) (defmethod andymultifn clojure.core.Vec [x ^java.io.Writer w] (.write w "andymultifn clojure.core.Vec")) (andymultifn [1 2 3] *out*) ;; andymultifn clojure.lang.PersistentVector, as expected (andymultifn {1 2 3 4} *out*) ;; andymultifn clojure.lang.IPersistentCollection, as expected (andymultifn (vector-of :byte 1 2 3) *out*) ;; If you do not define a method for clojure.core.Vec, then output is: ;; andymultifn clojure.lang.IPersistentCollection ;; Note that the output does _not_ contain ;; clojure.lang.PersistentVector, because clojure.core.Vec is not a ;; subclass of clojure.lang.PersistentVector. ;; If you do define a method for clojure.core.Vec, then output is: ;; andymultifn clojure.core.Vec (defn f1 [x] (andymultifn x)) ;; These outputs are not very int ... ...

近期下载者

相关文件


收藏者