You can "splat" multiple files into a tree by returning maketree("." => fs). This will place the files fs in the same directory as the file it replaces.
In map or mapsubtrees you can use maketree("." => fs) where fs is a vector to "splat" files into multiple files.
A convenience function will come in handy:
splatfiles(fs) = maketree("." => fs)splatfiles (generic function with 1 method)
using FileTrees
using DataFrames, CSV
taxi_dir = FileTree("taxi-data")
dfs = FileTrees.load(taxi_dir) do file
DataFrame(CSV.File(path(file)))
endtaxi-data/
├─ 2019/
│ ├─ 01/
│ │ ├─ green.csv (9×20 DataFrame)
│ │ └─ yellow.csv (9×18 DataFrame)
│ └─ 02/
│ ├─ green.csv (9×20 DataFrame)
│ └─ yellow.csv (9×18 DataFrame)
└─ 2020/
├─ 01/
│ ├─ green.csv (9×20 DataFrame)
│ └─ yellow.csv (9×18 DataFrame)
└─ 02/
├─ green.csv (9×20 DataFrame)
└─ yellow.csv (9×18 DataFrame)
Split up each yellow file into multiple files:
yellowdfs = dfs[r"yellow.csv$"]
expanded_tree = mapsubtrees(yellowdfs, glob"*/*/yellow.csv") do df
map(groupby(get(df), :RatecodeID) |> collect) do group
(name=string("yellow-ratecode-", group.RatecodeID[1], ".df"), value=DataFrame(group))
end |> splatfiles
endtaxi-data/
├─ 2019/
│ ├─ 01/
│ │ ├─ yellow-ratecode-1.df (7×18 DataFrame)
│ │ └─ yellow-ratecode-2.df (2×18 DataFrame)
│ └─ 02/
│ └─ yellow-ratecode-1.df (9×18 DataFrame)
└─ 2020/
├─ 01/
│ ├─ yellow-ratecode-1.df (8×18 DataFrame)
│ └─ yellow-ratecode-5.df (1×18 DataFrame)
└─ 02/
└─ yellow-ratecode-1.df (9×18 DataFrame)
You can save these files if you wish.
If the value field of a file passed to splatfiles is a Thunk, then it becomes a lazy value.
A thunk can be created with the syntax lazy(f)(x...). where the result is a Thunk which represents the result of executing f(x...).
yellowdfs = dfs[r"yellow.csv$"]
expanded_tree = mapsubtrees(yellowdfs, glob"*/*/yellow.csv") do df
map(groupby(get(df), :payment_type) |> collect) do group
id = group.payment_type[1]
(name=string("yellow-ptype-", group.payment_type[1], ".df"), value=lazy(repr)(group))
end |> splatfiles
endtaxi-data/
├─ 2019/
│ ├─ 01/
│ │ ├─ yellow-ptype-1.df (FileTrees.Thunk)
│ │ └─ yellow-ptype-2.df (FileTrees.Thunk)
│ └─ 02/
│ ├─ yellow-ptype-1.df (FileTrees.Thunk)
│ └─ yellow-ptype-2.df (FileTrees.Thunk)
└─ 2020/
├─ 01/
│ ├─ yellow-ptype-1.df (FileTrees.Thunk)
│ └─ yellow-ptype-2.df (FileTrees.Thunk)
└─ 02/
├─ yellow-ptype-1.df (FileTrees.Thunk)
└─ yellow-ptype-2.df (FileTrees.Thunk)
exec(expanded_tree)taxi-data/
├─ 2019/
│ ├─ 01/
│ │ ├─ yellow-ptype-1.df (2850-codeunit String)
│ │ └─ yellow-ptype-2.df (2565-codeunit String)
│ └─ 02/
│ ├─ yellow-ptype-1.df (1708-codeunit String)
│ └─ yellow-ptype-2.df (3703-codeunit String)
└─ 2020/
├─ 01/
│ ├─ yellow-ptype-1.df (3420-codeunit String)
│ └─ yellow-ptype-2.df (1995-codeunit String)
└─ 02/
├─ yellow-ptype-1.df (3420-codeunit String)
└─ yellow-ptype-2.df (1995-codeunit String)
exec(expanded_tree) |> files |> first |> get |> print5×18 SubDataFrame
Row │ VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count trip_distance RatecodeID store_and_fwd_flag PULocationID DOLocationID payment_type fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount congestion_surcharge
│ Int64 String31 String31 Int64 Float64 Int64 String1 Int64 Int64 Int64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Missing
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1 2019-01-01 00:46:40 2019-01-01 00:53:20 1 1.5 1 N 151 239 1 7.0 0.5 0.5 1.65 0.0 0.3 9.95 missing
2 │ 1 2019-01-01 00:59:47 2019-01-01 01:18:59 1 2.6 1 N 239 246 1 14.0 0.5 0.5 1.0 0.0 0.3 16.3 missing
3 │ 2 2018-12-21 13:48:30 2018-12-21 13:52:40 3 0.0 1 N 236 236 1 4.5 0.5 0.5 0.0 0.0 0.3 5.8 missing
4 │ 1 2019-01-01 00:21:28 2019-01-01 00:28:37 1 1.3 1 N 163 229 1 6.5 0.5 0.5 1.25 0.0 0.3 9.05 missing
5 │ 1 2019-01-01 00:32:01 2019-01-01 00:45:39 1 3.7 1 N 229 7 1 13.5 0.5 0.5 3.7 0.0 0.3 18.5 missing