Skip to content

Conversation

codemzs
Copy link
Member

@codemzs codemzs commented Jun 29, 2018

fixes #440


protected LightGbmTrainerBase(IHostEnvironment env, LightGbmArguments args, PredictionKind predictionKind, string name)
{
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US"); [](start = 12, length = 63)

This is nice, and solves all the current problems, but are you sure this is a separate thread we spawn and not are user thread?
I can image users surprise if they code just stop working because somewhere in separate library you setup current thread culture to en-us :) #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is current thread at the start of the argument parsing on line 71.


In reply to: 199274365 [](ancestors = 199274365)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactivating. Just to be clear since it's not obvious, changing a user's culture is not an appropriate solution.


In reply to: 199275306 [](ancestors = 199275306,199274365)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me give you example:
TrainAndPredictIrisModelWithStringLabelTest

Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("fr-FR");
double v = 1.5f;
string t = v.ToString();

put this in begining of test

replace learner with LightGBM and check what you get same 1.5 as parse result

v = double.Parse(t);
pipeline.Add(new LightGbmClassifier());

run training:
PredictionModel<IrisDataWithStringLabel, IrisPrediction> model = pipeline.Train<IrisDataWithStringLabel, IrisPrediction>();
v = double.Parse(t);

check value in v


In reply to: 199275306 [](ancestors = 199275306,199274365)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proper solution is
a) change all places where we work with Option argument from ToString(CultureInfo("en-US")) preferably move culture to const variable
b) report this issue to LightGBM folks: https://github.com/Microsoft/LightGBM/blob/ba3e1ff24fd43ed33d7703881d8efad83a5ed452/include/LightGBM/utils/common.h#L174 people who do they own parsing mechanism, have special place in hell.
c) put comments which describe what this is temporary solution and ToString can be reverted to normal state as soon as LightGBM folks fix their parser.

It's a few lines of manual work, but it's a right way.


In reply to: 199277210 [](ancestors = 199277210,199275306,199274365)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

microsoft/LightGBM#1481
here is issue to track


In reply to: 199279475 [](ancestors = 199279475,199277210,199275306,199274365)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening the issue on LightGBM repo.


In reply to: 199284806 [](ancestors = 199284806,199279475,199277210,199275306,199274365)

using Microsoft.ML.Runtime.Internal.Utilities;
using Microsoft.ML.Runtime.Training;
using Microsoft.ML.Runtime.FastTree.Internal;
using System.Globalization;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order #Resolved

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @codemzs thanks for looking at this. Solve this another way please -- changing a user's culture is not something we should do in code called by the user...

using System.Threading.Tasks;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you really need linq? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah there are references to it in this file. Last()


In reply to: 199288120 [](ancestors = 199288120)

Copy link
Contributor

@TomFinley TomFinley Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactivating. This is actually pretty bad. We ought to change that code to get the last index's value. I know it's not your fault, but, let's not let bad code stand, if we know where it is.


In reply to: 199289028 [](ancestors = 199289028,199288120)

protected override void CheckAndUpdateParametersBeforeTraining(IChannel ch, RoleMappedData data, float[] labels, int[] groups)
{
Options["objective"] = "regression";
Options["objective"] = "regression".ToString(CultureInfo.InvariantCulture);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"regression".ToString(CultureInfo.InvariantCulture); [](start = 35, length = 52)

I have so many questions... #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

99 questions but ... aint one?


In reply to: 199290138 [](ancestors = 199290138)

res[GetArgName(nameof(MaxCatThreshold))] = MaxCatThreshold.ToString();
res[GetArgName(nameof(CatSmooth))] = CatSmooth.ToString();
res[GetArgName(nameof(CatL2))] = CatL2.ToString();
res["label_gain"] = CustomGains.ToString(CultureInfo.InvariantCulture);
Copy link
Contributor

@TomFinley TomFinley Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CustomGains.ToString [](start = 32, length = 20)

Anything that is a string, there is of course no need to call this. Hopefully it is obvious from intellisense and whatnot what those things are. #Resolved

protected override void GetDefaultParameters(IChannel ch, int numRow, bool hasCategorical, int totalCats, bool hiddenMsg=false)
{
base.GetDefaultParameters(ch, numRow, hasCategorical, totalCats, true);
int numLeaves = int.Parse(Options["num_leaves"]);
Copy link
Contributor

@TomFinley TomFinley Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int.Parse(Options["num_leaves"]); [](start = 28, length = 33)

If you're doing it one way, you must also do it the other way. #Resolved

@TomFinley
Copy link
Contributor

TomFinley commented Jun 29, 2018

There's a fairly obvious way this can be improved. Options is a map of string to string. Because of that, you have to format the message every time a parameter is set. If instead it was changed to a map from string to object, you could have had the conversion to string be in exactly one place, rather than literally every single place you are assigning a value to this. Then whatever is responsible for actually passing the strings to LightGBM should be responsible for doing string.Format(CultureInfo.InvariantCulture, "{0}", foo) for each value... that is, make it part of a loop, rather than doing it every single time. #Resolved

@codemzs codemzs changed the title Set culture to en-us in LightGBM Set culture to culture invariant in LightGBM Jul 2, 2018
@codemzs
Copy link
Member Author

codemzs commented Jul 2, 2018

Thanks, Tom! This is a great feedback and indeed results in a more maintainable code.


In reply to: 401497686 [](ancestors = 401497686)


public sealed class Arguments : TransformInputBase
{
public Arguments()
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public Arguments() [](start = 12, length = 18)

I think something terribly wrong happen with your merge #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am aware and fixing it via interactive rebase.


In reply to: 199600627 [](ancestors = 199600627)

{
base.GetDefaultParameters(ch, numRow, hasCategorical, totalCats, true);
int numLeaves = int.Parse(Options["num_leaves"]);
int numLeaves = int.Parse(string.Format(CultureInfo.InvariantCulture, "{0}", Options["num_leaves"]));
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int.Parse(string.Format(CultureInfo.InvariantCulture, "{0}", Options["num_leaves"])); [](start = 28, length = 85)

why can't you just call int.Parse(Options["num_leaves"],CultureInfo.InvariantCulture); ? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't work. You will need to convert object to string first and even then what advantage does this give over what I have now? I'm also concerned that we will convert to string without invariant and then try to parse with invariant flag.


In reply to: 199603671 [](ancestors = 199603671)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would work if your Options["num_leaves"] was Invarianted string. But in your case Option["num_leaves"] contains already int value which you hide behind object. So all you need to is just a cast.
int numLeaves = (int)Options["num_leaves"];


In reply to: 199605092 [](ancestors = 199605092,199603671)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will work but here I don't want to make an assumption that num_leaves is int, what if tomorrow someone inserts num_leaves as string or some other type? I'd rather have a generic way to translate to invariant string and then parse. What do you think?


In reply to: 199607828 [](ancestors = 199607828,199605092,199603671)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a proposal which will benefit all of us and remove any concerns regarding culture. Don't do parsing at all. Just set numLeaves as out parameter in the function and set it properly in base call.


In reply to: 199611982 [](ancestors = 199611982,199607828,199605092,199603671)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case casting to int is fine. I spoke with Tom and he mentioned we can make this hard assumptions if the datastructure/code is internal.


In reply to: 199630347 [](ancestors = 199630347,199611982,199607828,199605092,199603671)

if (earlyStoppingRound > 0 && (parameters["metric"] == "auc"
|| parameters["metric"] == "ndcg"
|| parameters["metric"] == "map"))
if (earlyStoppingRound > 0 && (parameters["metric"].ToString() == "auc"
Copy link
Contributor

@TomFinley TomFinley Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameters["metric"].ToString() [](start = 43, length = 31)

What are these originally?

If they're strings, just use a direct cast. That communicates that you expect them to actually be strings. ToString() suggests there's some conversion going on. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it makes to cast to string because a string is represented as a string unlike int, double, etc that can be represented in their original type or as strings.


In reply to: 199613266 [](ancestors = 199613266)

private static List<string> ConstructCategoricalFeatureMetaData(int[] categoricalFeatures, int rawNumCol, ref CategoricalMetaData catMetaData)
{
List<int> catBoundaries = GetCategoricalBoundires(categoricalFeatures, rawNumCol);
catMetaData.NumCol = catBoundaries.Count() - 1;
Copy link
Contributor

@TomFinley TomFinley Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Count() [](start = 47, length = 7)

Ah very good. Yes that is terrible. There really should be a warning or something on Count()... :P #Resolved

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codemzs codemzs requested a review from zeahmed July 2, 2018 20:56
if (earlyStoppingRound > 0 && (parameters["metric"] == "auc"
|| parameters["metric"] == "ndcg"
|| parameters["metric"] == "map"))
if (earlyStoppingRound > 0 && ((string)parameters["metric"] == "auc"
Copy link
Contributor

@zeahmed zeahmed Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing is being casted three times. it would be nice to do something like this

var metric = (string)parameters["metric"]
if (earlyStoppingRound > 0 && (metric == "auc"
                  || metric == "ndcg"
                  || metric == "map"))
``` #Resolved

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good! I have left a small suggestion.

@codemzs codemzs merged commit 7c536ae into dotnet:master Jul 2, 2018
@codemzs codemzs deleted the lightgbm branch July 2, 2018 22:16
eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong Culture when passing arguments to LightGBM
4 participants