The tree manipulation functions are map
, filter
, mv
, cp
, rm
, merge
, diff
, clip
, and mapsubtrees
in combination with other functions.
A lot of tree manipulation involves pattern matching, so we recommend you read the section on pattern matching first.
map
and filter
map
can be used to apply a function to every node in a file tree, to create a new file tree. This function should return a File
or FileTree
object.
filter
can be used to filter only nodes that satisfy a given predicate function.
Both map
and filter
take a walk
keyword argument which can be either FileTrees.prewalk
or FileTrees.postwalk
, they do pre-order traversal and post-order traversal of the tree respectively. By default both operate on both FileTree
(subtree) nodes and File
nodes. You can pass in dirs=false
to work only on the file nodes.
merge
merge(t1::FileTree, t2::FileTree; combine)
Merge two FileTrees. If files at the same path contain values, the combine
callback will be called with their values to result in a new value.
If one of the dirs does not have a value, its corresponding argument will be NoValue()
If any of the values is lazy, the output value is lazy as well.
diff
and rm
diff(t1::FileTree, t2::FileTree)
For each node in t2
remove a node in t1
at the same path if it exists. Returns the difference tree.
rm(t::FileTree, pattern::Union{Glob, String, AbstractPath, Regex})
remove nodes which match pattern
from the file tree.
mv
and cp
The signature of mv
is mv(tree::FileTree, r::Regex, s::SubstitutionString; combine)
.
For every file in tree
whose path matches the regular expression r
, rewrite its path as decided by s
. All paths are to be matched with delimiter /
on all platforms (including Windows).
mv
and cp
not only allow you to move or copy nodes within a FileTree
but also merge many files by copying them to the same path. combine
is a callback that is called with the values of two files when a file is moved to an already existing or already created path. By default it is set to error on name clashes where either of the nodes has a non-null value.
s
can be a SubstitutionString, which is conveniently constructed using the s""
string macro.
Within the string, sequences of the form
\N
refer to the Nth capture group in the regex, and\g<groupname>
refers to a named capture group with namegroupname
.
Example:
using FileTrees
tree = FileTree("taxi-data")
taxi-data/
├─ 2019/
│ ├─ 01/
│ │ ├─ green.csv
│ │ └─ yellow.csv
│ └─ 02/
│ ├─ green.csv
│ └─ yellow.csv
└─ 2020/
├─ 01/
│ ├─ green.csv
│ └─ yellow.csv
└─ 02/
├─ green.csv
└─ yellow.csv
# first move */*/yellow.csv to yellow/*/*.csv
t2 = mv(tree, r"^([^/]*)/([^/]*)/yellow.csv$", s"yellow/\1/\2.csv")
# move */*/green.csv to green/*/*.csv
mv(t2, r"^([^/]*)/([^/]*)/green.csv$", s"green/\1/\2.csv")
taxi-data/
├─ green/
│ ├─ 2019/
│ │ ├─ 01.csv
│ │ └─ 02.csv
│ └─ 2020/
│ ├─ 01.csv
│ └─ 02.csv
└─ yellow/
├─ 2019/
│ ├─ 01.csv
│ └─ 02.csv
└─ 2020/
├─ 01.csv
└─ 02.csv
It's also possible to just move all the yellow files into a single yellow.csv file.
mv(tree, r"^([^/]*)/([^/]*)/yellow.csv$", s"yellow.csv")
taxi-data/
├─ 2019/
│ ├─ 01/
│ │ └─ green.csv
│ └─ 02/
│ └─ green.csv
├─ 2020/
│ ├─ 01/
│ │ └─ green.csv
│ └─ 02/
│ └─ green.csv
└─ yellow.csv
This works when there is no value loaded into the tree, but it probably shouldn't. Let's see what happens when the yellow files have some values loaded in them:
using CSV, DataFrames
dfs = FileTrees.load(tree) do file
DataFrame(CSV.File(path(file)))
end;
mv(dfs, r".*yellow.csv$", s"yellow.csv")
yellow.csv clashed with an existing file name at path taxi-data/./yellow.csv.
Pass `combine=f` to define how to combine them.
Oh oops! It says pass in combine=f
where f
can combine the values of the two clashing files. In our case we want to concatenate the DataFrames, so let's pass in vcat
.
mv(dfs, r".*yellow.csv$", s"yellow.csv", combine=vcat)
taxi-data/
├─ 2019/
│ ├─ 01/
│ │ └─ green.csv (9×20 DataFrame)
│ └─ 02/
│ └─ green.csv (9×20 DataFrame)
├─ 2020/
│ ├─ 01/
│ │ └─ green.csv (9×20 DataFrame)
│ └─ 02/
│ └─ green.csv (9×20 DataFrame)
└─ yellow.csv (36×18 DataFrame)
As you can see, the final yellow.csv file has a value that is a combination of all the yellow.csv values.
We can do the same with the green files:
df1 = mv(dfs, r".*yellow.csv$", s"yellow.csv", combine=vcat)
df2 = mv(df1, r".*green.csv$", s"green.csv", combine=vcat)
taxi-data/
├─ green.csv (36×20 DataFrame)
└─ yellow.csv (36×18 DataFrame)
mapsubtrees
mapsubtrees(f, pattern)
lets you apply a function to every node whose path matches pattern
which is either a Glob or Regex (see also pattern matching).
f
gets the subtree itself and may return a subtree which is to replace the one it matched. It can return nothing
to delete the node in the output tree, returning any other value will cause it to empty the subtree and set the value of the node to the returned value.
This last behavior makes it equivalent to Julia's mapslices
but on trees.
Suppose you have a nested tree of values, and you would like to join the data in the second level of the tree using vcat
but the first level of the tree using hcat
. This can be done in two stages: first use mapsubtrees
to collapse the second level tree into a single value which is the vcat
of all the values in each subtree. Then combine those results with an hcat
.
To demonstrate this let's create a nested tree with a nice structure:
tree = maketree("dir"=>
[string(i)=>[(name=string(j), value=(i,j)) for j in 1:5] for i=1:5])
dir/
├─ 1/
│ ├─ 1 ((1, 1))
│ ├─ 2 ((1, 2))
│ ├─ 3 ((1, 3))
│ ├─ 4 ((1, 4))
│ └─ 5 ((1, 5))
├─ 2/
│ ├─ 1 ((2, 1))
│ ├─ 2 ((2, 2))
│ ├─ 3 ((2, 3))
│ ├─ 4 ((2, 4))
│ └─ 5 ((2, 5))
├─ 3/
│ ├─ 1 ((3, 1))
│ ├─ 2 ((3, 2))
│ ├─ 3 ((3, 3))
│ ├─ 4 ((3, 4))
│ └─ 5 ((3, 5))
├─ 4/
│ ├─ 1 ((4, 1))
│ ├─ 2 ((4, 2))
│ ├─ 3 ((4, 3))
│ ├─ 4 ((4, 4))
│ └─ 5 ((4, 5))
└─ 5/
├─ 1 ((5, 1))
├─ 2 ((5, 2))
├─ 3 ((5, 3))
├─ 4 ((5, 4))
└─ 5 ((5, 5))
Step 1: reduce level 2 onwards:
vcated = mapsubtrees(tree, glob"*") do subtree
reducevalues(vcat, subtree)
end
dir/
├─ 1/ (5-element Array{Tuple{Int64,Int64},1})
├─ 2/ (5-element Array{Tuple{Int64,Int64},1})
├─ 3/ (5-element Array{Tuple{Int64,Int64},1})
├─ 4/ (5-element Array{Tuple{Int64,Int64},1})
└─ 5/ (5-element Array{Tuple{Int64,Int64},1})
Step 2: reduce intermediate results
reducevalues(hcat, vcated, dirs=true)
5×5 Array{Tuple{Int64,Int64},2}:
(1, 1) (2, 1) (3, 1) (4, 1) (5, 1)
(1, 2) (2, 2) (3, 2) (4, 2) (5, 2)
(1, 3) (2, 3) (3, 3) (4, 3) (5, 3)
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4)
(1, 5) (2, 5) (3, 5) (4, 5) (5, 5)
This can also be done lazily!
vcated = mapsubtrees(tree, glob"*") do subtree
reducevalues(vcat, subtree, lazy=true)
end
dir/
├─ 1/ (Thunk(vcat, (Thunk(vcat, ...), Thunk(vcat, ...))))
├─ 2/ (Thunk(vcat, (Thunk(vcat, ...), Thunk(vcat, ...))))
├─ 3/ (Thunk(vcat, (Thunk(vcat, ...), Thunk(vcat, ...))))
├─ 4/ (Thunk(vcat, (Thunk(vcat, ...), Thunk(vcat, ...))))
└─ 5/ (Thunk(vcat, (Thunk(vcat, ...), Thunk(vcat, ...))))
final = reducevalues(hcat, vcated, dirs=true)
Thunk(hcat, (Thunk(hcat, ...), Thunk(hcat, ...)))
exec(final)
5×5 Array{Tuple{Int64,Int64},2}:
(1, 1) (2, 1) (3, 1) (4, 1) (5, 1)
(1, 2) (2, 2) (3, 2) (4, 2) (5, 2)
(1, 3) (2, 3) (3, 3) (4, 3) (5, 3)
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4)
(1, 5) (2, 5) (3, 5) (4, 5) (5, 5)