:orphan: AutoML configuration ==================== { "train_data":{ "file_type": "csv", Uploaded dataset file type. Can be `csv` or `parquet` "file_name": "dados.csv", Uploaded dataset file name "sep": "," Separator for the csv file. }, "model_flow":"classification", Model class, for now can be only `classification` "target":"TARGET", Name of the target column in the uploaded dataset "cat_cols":["ID"], Name of the columns that need to be encoded as categorical. Default is a empty list (we will try to find categorical columns) "iterations":10, How many pipelines combinations we will test. Default is 1. "metric":"ks", Metric we will use to find the best model. For classification the options are `auc`, `precision`, `recall`, `f1`, `gini`, `ks`. Default is `auc`. "split_type":"random", How we will split the training, validation and test datasets. Options are `random`, `stratified` (random but trying to get the same proportion of data between splits) and `oot` (validation is random, but test is split by date) .Default value is `random` "val_size": 0.2, Proportion of the validation dataset to the full dataset. Default is 0.2 "holdout_size": 0.1, Proportion of the test dataset to the full dataset. Only used when `split_type` is `random` or `stratified`. Default is 0.1 "stratify_col": "TARGET", Which column to use to stratify the split (keeping the same proportion between splits). Only used when `split_type` is `stratified`. Default is the target column "date_col": "DATE", Which column to use to find the most recent records. Only used when `split_type` is `oot` "oot_split_size": 0.1, Fraction of the most recent data to use as test dataset. When `split_type` is `oot` this or `split_date` must be informed. Default is 0.2 "split_date": "2020-01-01", Date to filter the test dataset. When `split_type` is `oot` this or `oot_split_size` must be informed. "stages":{ "models":["lightgbm"] Algorithms to test. Options are `logeg`, `catboost`, `xgboost`, `lightgbm`, `rf`, `dt`. Default is use all "missing":["mean"] Missing imputation methods to test. Options are `mean`, `median`, `tail` (replacing missing data by a value at left tail of the distribution), `random` and `none` (will only work if the algorithm alreay handle missing data). Will only be used if data has missing values. Default is use all "cleaner":["iqr"] Outlier remover methods to test. Options are `iqr`, `rare` and `none`. Default is use all "encoding":["catboost"] Categorical encoder methods to test. Options are `rankcount`, `catboost`, `count` and `dt`. Will only be used if data has categorical columns. Default is use all "preprocess":["none"] Scaler methods to test. Options are `norm`, `robust`, `minmax`, `binarizer` and `none`. Default is use all "unbalance":["none"] Target balacing methods to test. Options are `smote`, `random_under` and `none`. Will only be used if the minority class for the target is less than 10% of the data. Default is use all } }