-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Set culture to culture invariant in LightGBM #454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
protected LightGbmTrainerBase(IHostEnvironment env, LightGbmArguments args, PredictionKind predictionKind, string name) | ||
{ | ||
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US"); [](start = 12, length = 63)
This is nice, and solves all the current problems, but are you sure this is a separate thread we spawn and not are user thread?
I can image users surprise if they code just stop working because somewhere in separate library you setup current thread culture to en-us :) #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is current thread at the start of the argument parsing on line 71.
In reply to: 199274365 [](ancestors = 199274365)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reactivating. Just to be clear since it's not obvious, changing a user's culture is not an appropriate solution.
In reply to: 199275306 [](ancestors = 199275306,199274365)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let me give you example:
TrainAndPredictIrisModelWithStringLabelTest
Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("fr-FR");
double v = 1.5f;
string t = v.ToString();
put this in begining of test
replace learner with LightGBM and check what you get same 1.5 as parse result
v = double.Parse(t);
pipeline.Add(new LightGbmClassifier());
run training:
PredictionModel<IrisDataWithStringLabel, IrisPrediction> model = pipeline.Train<IrisDataWithStringLabel, IrisPrediction>();
v = double.Parse(t);
check value in v
In reply to: 199275306 [](ancestors = 199275306,199274365)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proper solution is
a) change all places where we work with Option argument from ToString(CultureInfo("en-US")) preferably move culture to const variable
b) report this issue to LightGBM folks: https://github.com/Microsoft/LightGBM/blob/ba3e1ff24fd43ed33d7703881d8efad83a5ed452/include/LightGBM/utils/common.h#L174 people who do they own parsing mechanism, have special place in hell.
c) put comments which describe what this is temporary solution and ToString can be reverted to normal state as soon as LightGBM folks fix their parser.
It's a few lines of manual work, but it's a right way.
In reply to: 199277210 [](ancestors = 199277210,199275306,199274365)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
microsoft/LightGBM#1481
here is issue to track
In reply to: 199279475 [](ancestors = 199279475,199277210,199275306,199274365)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening the issue on LightGBM repo.
In reply to: 199284806 [](ancestors = 199284806,199279475,199277210,199275306,199274365)
using Microsoft.ML.Runtime.Internal.Utilities; | ||
using Microsoft.ML.Runtime.Training; | ||
using Microsoft.ML.Runtime.FastTree.Internal; | ||
using System.Globalization; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
order #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @codemzs thanks for looking at this. Solve this another way please -- changing a user's culture is not something we should do in code called by the user...
using System.Threading.Tasks; | ||
using System.Collections.Generic; | ||
using System.Globalization; | ||
using System.Linq; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you really need linq? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah there are references to it in this file. Last()
In reply to: 199288120 [](ancestors = 199288120)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reactivating. This is actually pretty bad. We ought to change that code to get the last index's value. I know it's not your fault, but, let's not let bad code stand, if we know where it is.
In reply to: 199289028 [](ancestors = 199289028,199288120)
protected override void CheckAndUpdateParametersBeforeTraining(IChannel ch, RoleMappedData data, float[] labels, int[] groups) | ||
{ | ||
Options["objective"] = "regression"; | ||
Options["objective"] = "regression".ToString(CultureInfo.InvariantCulture); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"regression".ToString(CultureInfo.InvariantCulture); [](start = 35, length = 52)
I have so many questions... #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
res[GetArgName(nameof(MaxCatThreshold))] = MaxCatThreshold.ToString(); | ||
res[GetArgName(nameof(CatSmooth))] = CatSmooth.ToString(); | ||
res[GetArgName(nameof(CatL2))] = CatL2.ToString(); | ||
res["label_gain"] = CustomGains.ToString(CultureInfo.InvariantCulture); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CustomGains.ToString [](start = 32, length = 20)
Anything that is a string, there is of course no need to call this. Hopefully it is obvious from intellisense and whatnot what those things are. #Resolved
protected override void GetDefaultParameters(IChannel ch, int numRow, bool hasCategorical, int totalCats, bool hiddenMsg=false) | ||
{ | ||
base.GetDefaultParameters(ch, numRow, hasCategorical, totalCats, true); | ||
int numLeaves = int.Parse(Options["num_leaves"]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int.Parse(Options["num_leaves"]); [](start = 28, length = 33)
If you're doing it one way, you must also do it the other way. #Resolved
There's a fairly obvious way this can be improved. |
Thanks, Tom! This is a great feedback and indeed results in a more maintainable code. In reply to: 401497686 [](ancestors = 401497686) |
|
||
public sealed class Arguments : TransformInputBase | ||
{ | ||
public Arguments() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public Arguments() [](start = 12, length = 18)
I think something terribly wrong happen with your merge #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am aware and fixing it via interactive rebase.
In reply to: 199600627 [](ancestors = 199600627)
{ | ||
base.GetDefaultParameters(ch, numRow, hasCategorical, totalCats, true); | ||
int numLeaves = int.Parse(Options["num_leaves"]); | ||
int numLeaves = int.Parse(string.Format(CultureInfo.InvariantCulture, "{0}", Options["num_leaves"])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int.Parse(string.Format(CultureInfo.InvariantCulture, "{0}", Options["num_leaves"])); [](start = 28, length = 85)
why can't you just call int.Parse(Options["num_leaves"],CultureInfo.InvariantCulture); ? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't work. You will need to convert object to string first and even then what advantage does this give over what I have now? I'm also concerned that we will convert to string without invariant and then try to parse with invariant flag.
In reply to: 199603671 [](ancestors = 199603671)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would work if your Options["num_leaves"] was Invarianted string. But in your case Option["num_leaves"] contains already int value which you hide behind object. So all you need to is just a cast.
int numLeaves = (int)Options["num_leaves"];
In reply to: 199605092 [](ancestors = 199605092,199603671)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will work but here I don't want to make an assumption that num_leaves is int, what if tomorrow someone inserts num_leaves as string or some other type? I'd rather have a generic way to translate to invariant string and then parse. What do you think?
In reply to: 199607828 [](ancestors = 199607828,199605092,199603671)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a proposal which will benefit all of us and remove any concerns regarding culture. Don't do parsing at all. Just set numLeaves as out parameter in the function and set it properly in base call.
In reply to: 199611982 [](ancestors = 199611982,199607828,199605092,199603671)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case casting to int is fine. I spoke with Tom and he mentioned we can make this hard assumptions if the datastructure/code is internal.
In reply to: 199630347 [](ancestors = 199630347,199611982,199607828,199605092,199603671)
if (earlyStoppingRound > 0 && (parameters["metric"] == "auc" | ||
|| parameters["metric"] == "ndcg" | ||
|| parameters["metric"] == "map")) | ||
if (earlyStoppingRound > 0 && (parameters["metric"].ToString() == "auc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameters["metric"].ToString() [](start = 43, length = 31)
What are these originally?
If they're strings, just use a direct cast. That communicates that you expect them to actually be strings. ToString()
suggests there's some conversion going on. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it makes to cast to string because a string is represented as a string unlike int, double, etc that can be represented in their original type or as strings.
In reply to: 199613266 [](ancestors = 199613266)
private static List<string> ConstructCategoricalFeatureMetaData(int[] categoricalFeatures, int rawNumCol, ref CategoricalMetaData catMetaData) | ||
{ | ||
List<int> catBoundaries = GetCategoricalBoundires(categoricalFeatures, rawNumCol); | ||
catMetaData.NumCol = catBoundaries.Count() - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Count() [](start = 47, length = 7)
Ah very good. Yes that is terrible. There really should be a warning or something on Count()
... :P #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (earlyStoppingRound > 0 && (parameters["metric"] == "auc" | ||
|| parameters["metric"] == "ndcg" | ||
|| parameters["metric"] == "map")) | ||
if (earlyStoppingRound > 0 && ((string)parameters["metric"] == "auc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing is being casted three times. it would be nice to do something like this
var metric = (string)parameters["metric"]
if (earlyStoppingRound > 0 && (metric == "auc"
|| metric == "ndcg"
|| metric == "map"))
``` #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good! I have left a small suggestion.
* Make options culture agnostic.
fixes #440