Linear Regression on a String

Question

This challenge is a little tricky, but rather simple, given a string s:

meta.codegolf.stackexchange.com

Use the position of the character in the string as an x coordinate and the ascii value as a y coordinate. For the above string, the resultant set of coordinates would be:

Next, you must calculate both the slope and the y-intercept of the set you've garnered using Linear Regression, here's the set above plotted:

Which results in a best fit line of (0-indexed):

y = 0.014516129032258x + 99.266129032258

Here's the 1-indexed best-fit line:

y = 0.014516129032258x + 99.251612903226

So your program would return:

f("meta.codegolf.stackexchange.com") = [0.014516129032258, 99.266129032258]

Or (Any other sensible format):

f("meta.codegolf.stackexchange.com") = "0.014516129032258x + 99.266129032258"

Or (Any other sensible format):

f("meta.codegolf.stackexchange.com") = "0.014516129032258\n99.266129032258"

Or (Any other sensible format):

f("meta.codegolf.stackexchange.com") = "0.014516129032258 99.266129032258"

Just explain why it is returning in that format if it isn't obvious.

Some clarifying rules:

- Strings are 0-indexed or 1 indexed both are acceptable.
- Output may be on new lines, as a tuple, as an array or any other format.
- Precision of the output is arbitrary but should be enough to verify validity (min 5).

This is code-golf lowest byte-count wins.

Do you have any link / formula to calculate the slope and the y-intercept? — Rod, 18 hours ago
Dear Unclear-voters: While I agree that it is nice to have the formula, it is by no means necessary. Linear regression is a well-defined thing in the mathematical world, and the OP may want to leave finding the equation up to the reader. — Nathan Merrill, 18 hours ago
Is it okay to return the actual equation of the best-fit line, such as 0.014516129032258x + 99.266129032258? — Greg Martin, 18 hours ago
This challenge's title has put this wonderful song in my head for the rest of the day — Luis Mendo, 15 hours ago

Timtech · Answer 1 · 2017-01-09 19:04:31Z

up vote 6 down vote

TI-Basic, 51 (+ 141) bytes

Strings are 1-based in TI-Basic.

Input Str1
seq(I,I,1,length(Str1->L1
32+seq(inString(Str2,sub(Str1,I,1)),I,1,length(Str1->L2
LinReg(ax+b)

Like the other example, this outputs the equation of the best fit line, in terms of X. Also, in Str2 you need to have this string, which is 141 bytes in TI-Basic:

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_abcdefghijklmnopqrstuvwxyz{|}~

The reason this cannot be a part of the program is because two characters in TI-Basic cannot be automatically added to a string. One is the STO-> arrow, but this is not a problem because it is not a part of ASCII. The other is the string literal ("), which can be stringified only by typing into a Y= equation and using Equ>String(.

answered 17 hours ago

Timtech

7,65412546

I was seriously wondering if anyone would bust out their old calculators for this :). I had my old TI-83 in mind when I thought this up. – carusocomputing 17 hours ago

@carusocomputing Hey, nice! I like the TI-Basic programming language a lot and I use it for many of my code golfs. If only it supported ASCII... – Timtech 15 hours ago

Two comments: 1, you can stringify " by prompting for it as user input in a program as well, which doesn't help you here, but I just wanted to point that fact out. 2, I don't recognize some of those characters as existing on the calculator. I could be wrong, but for example, where do you get @ and ~? As well as #, $, and &. – Patrick Roberts 1 hour ago

add a comment |

rahnema1 · Answer 2 · 2017-01-09 19:54:19Z

Octave, 29 26 24 20 bytes

@(s)s/[!!s;1:nnz(s)]

Try it Online!

We have the model

y= intercept *x^0 + slope * x
 = intercept * 1  + slope * x

Here y is the ASCII value of string s

To find parameters intercept and slope we can form the following equation:

s = [intercept slope] * [1 X]

so

[intercept slope] = s/[1 x]

!!s converts a string to a vector of ones with the same length as the string.
The vector of ones is used for estimation of the intercept.
1:nnz(s) is range of values from 1 to number of elements of the string used as x.

Previous answer

@(s)ols(s'+0,[!!s;1:nnz(s)]')

For test paste the following code into Octave Online

(@(s)ols(s'+0,[!!s;1:nnz(s)]'))('meta.codegolf.stackexchange.com')

A function that accepts a string as input and applies ordinary least squares estimation of model y = x*b + e

The first argument of ols is y that for it we transpose the string s and add with number 0 to get its ASCII code.

dfernan · Answer 3 · 2017-01-09 21:26:31Z

up vote 4 down vote

Python, 82 80 bytes

_{-2 bytes thanks to @Mego}

Using scipy:

import scipy
lambda s:scipy.stats.linregress(range(len(s)),list(map(ord,s)))[:2]

edited 15 hours ago

answered 16 hours ago

dfernan

30310

Unnamed lambdas are allowed, so you can drop the f=. – Mego 16 hours ago

@DigitalTrauma numpy.linalg.lstsq apparently differs in arguments to scipy.stats.linregress and is more complex. – dfernan 15 hours ago

add a comment |

Greg Martin · Answer 4 · 2017-01-09 18:24:14Z

Mathematica, 31 bytes

Fit[ToCharacterCode@#,{1,x},x]&

Unnamed function taking a string as input and returning the actual equation of the best-fit line in question. For example, f=Fit[ToCharacterCode@#,{1,x},x]&; f["meta.codegolf.stackexchange.com"] returns 99.2516 + 0.0145161 x.

ToCharacterCode converts an ASCII string to a list of the corresponding ASCII values; indeed, it defaults to UTF-8 more generally. (Kinda sad, in this context, that one function name comprises over 48% of the code length....) And Fit[...,{1,x},x] is the built-in for computing linear regression.

Thanks for the example of the 1-indexed line, didn't have to calculate it because of you haha. — carusocomputing, 17 hours ago

busukxuan · Answer 5 · 2017-01-09 18:59:10Z

up vote 3 down vote

Sage, 76 bytes

var('m','c')
y(x)=m*x+c
f=lambda x:find_fit(zip(range(len(x)),map(ord,x)),y)

Hardly any golfing, probably longer than a golfed Python answer, but yeah...

answered 17 hours ago

busukxuan

1,96117

add a comment |

Luis Mendo · Answer 6 · 2017-01-10 01:37:24Z

MATL, 8 bytes

n:G3$1ZQ

1-based string indexing is used.

Try it online!

Explanation

n:     % Input string implicitly. Push [1 2 ... n] where n is string length.
       % These are the x values
G      % Push the input string. A string is an array of chars, which is
       % equivalent to an array of ASCII codes. These are the y values
3$     % The next function will use 3 inputs
1      % Push 1
ZQ     % Fit polynomial of degree 1 to those x, y data. The result is an
       % array with the polynomial coefficients. Implicitly display

Billywob · Answer 7 · 2017-01-10 10:27:00Z

up vote 1 down vote

R, 46 45 bytes

x=1:nchar(y<-scan(,""));lm(utf8ToInt(y)~x)$co

Reads input from stdin and for the given test case returns (one-indexed):

(Intercept)           x 
99.25161290  0.01451613

edited 2 hours ago

answered 2 hours ago

Billywob

2,46314

add a comment |

Patrick Roberts · Answer 8 · 2017-01-10 10:29:25Z

Node.js, 84 bytes

Using regression:

s=>require('regression')('linear',s.split``.map((c,i)=>[i,c.charCodeAt()])).equation

Demo

// polyfill, since this is clearly not Node.js
function require(module) {
  return window[module];
}
// test
["meta.codegolf.stackexchange.com"].forEach(function test(string) {
  console.log(string);
  console.log(this(string));
},
// submission
s=>require('regression')('linear',s.split``.map((c,i)=>[i,c.charCodeAt()])).equation
);

<script src="https://cdn.rawgit.com/Tom-Alexander/regression-js/master/src/regression.js"></script>

Markus Jarderot · Answer 9 · 2017-01-10 11:14:32Z

JavaScript, 151 148 bytes

s=>([a,b,c,d,e]=[].map.call(s,c=>c.charCodeAt()).reduce(([a,b,c,d,e],y,x)=>[a+1,b+x,c+x*x,d+y,e+x*y],[0,0,0,0,0]),[k=(e*a-b*d)/(c*a-b*b),(d-k*b)/a])

J, 11 bytes

3&u:%.1,.#\

This uses one-based indexing.

Try it online!

Explanation

3&u:%.1,.#\  Input: string S
         #\  Get the length of each prefix of S
             Forms the range [1, 2, ..., len(S)]
      1,.    Pair each with 1
3&u:         Get the ASCII value of each char in S
    %.       Matrix divide

Renzeee · Answer 11 · 2017-01-10 09:59:38Z

Haskell, 154 bytes

import Statistics.LinearRegression
import Data.Char
import Data.Vector
g x=linearRegression(generate(Prelude.length x)i)$i.ord<$>fromList x
i=fromIntegral

It is far too long for my likings because of the imports and long function names, but well. I couldn't think of any other Golfing method left, although I'm not expert on the area of golfing imports.

asked	today
viewed	815 times
active	today

current community

your communities

more stack exchange communities

Linear Regression on a String

11 Answers 11

TI-Basic, 51 (+ 141) bytes

Octave, 29 26 24 20 bytes

Python, 82 80 bytes

Mathematica, 31 bytes

Sage, 76 bytes

MATL, 8 bytes

Explanation

R, 46 45 bytes

Node.js, 84 bytes

Demo

JavaScript, 151 148 bytes

J, 11 bytes

Explanation

Haskell, 154 bytes

Your Answer

Not the answer you're looking for? Browse other questions tagged code-golf string math or ask your own question.

Visit Chat

Hot Network Questions

current community

your communities

more stack exchange communities

Linear Regression on a String

11 Answers 11

TI-Basic, 51 (+ 141) bytes

Octave, 29 26 24 20 bytes

Python, 82 80 bytes

Mathematica, 31 bytes

Sage, 76 bytes

MATL, 8 bytes

Explanation

R, 46 45 bytes

Node.js, 84 bytes

Demo

JavaScript, 151 148 bytes

J, 11 bytes

Explanation

Haskell, 154 bytes

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged code-golf string math or ask your own question.

Visit Chat

Related

Hot Network Questions