Hacker Newsnew | past | comments | ask | show | jobs | submit | ericphanson's commentslogin

Claude told me it stopped debugging since it would run out of tokens in its context window. I asked how many tokens it had left and it said actually it had plenty so could continue. Then again it stopped, and without me asking about tokens, wrote

Context Usage • Used: 112K/200K tokens (56%) • Remaining: 88K tokens • Sufficient for continued debugging, but fresh session recommended for clarity

lol. I said ok use a subagent for clarity.


I was happy to see that this article is actually talking about tensors, not just multidimensional arrays (which for some reasons are often called tensors by machine learning folks).


This is mostly a semantic argument, but I find this to be a very annoying perspective. Given a basis, there is a natural isomorphism between tensors of a certain type and multidimensional arrays of certain dimensions.


Of course there is, but if you perform an operation on a multidimensional array, there is no guarantee it corresponds to an operation on tensors, ie. the resulting tensor may depend on the basis.


Sure, if you perform an arbitrary operation on a multidimensional array. But the same is true of any representation of any mathematical object. It makes no physical sense to take the sine of a mass, or two to the power of a length. But that doesn't mean that whenever someone says "oh, the mass of an object is a real number" I need to nitpick them.


But it's like calling my table a cat because they both have four legs.


If your table is furry and has a tail it may actually be a cat (from a machine learning perspective).

The machine learning packages have an einsum function / tensor contraction, etc. What more do you need for it to be called a tensor?


I'm not familiar with tensor contraction as practiced by a machine learning package, but summation convention is just that, it's not a fundamental property of tensors.

As a way of describing physics or geometry they have additional structure which I'm not seeing.


Same word, different contexts and meaning. In ML tensors are multidimensional arrays (and nothing more). Neither physicists nor ML researchers/developers are confused about what it means.


> Neither physicists nor ML researchers/developers are confused about what it means.

I'm sure this has confused a lot of people, especially beginners. Clashing terminology is one of the main difficulties in interdisciplinary work, in my experience. I don't think it's good to shrug it off like that.


I'm literally learning this, or at least being reminded of something I had totally forgotten, as I read this thread. So yeah.

Words can have multiple meanings, but I think we can all agree it's preferable if they don't have multiple slightly different depending on context meanings. That's just confusing.


I'm pretty sure that a multidimensional array was the original implementation of tensors and the more fancy linear forms formalism came later in the twentieth century. Then the question was how do these multidimensional arrays operate on vectors and how do they transport around a manifold. (Source: first book I read on general relativity in the 1970s had a lot of pages about multidimensional arrays and also this good summary of the history: https://math.stackexchange.com/questions/2030558/what-is-the...

(Sort of like how vectors kind of got going via a list of numbers and then they found the right axioms for vector spaces and then linear algebra shifted from a lot of computation to a sort of spare and elegant set of theorems on linearity).


AFAIU, Matrices are categorically subsets of Tensors where the product operator, at least, is not the tensor product but the Dot product.

Dot product: https://en.wikipedia.org/wiki/Dot_product

Matrix multiplication > Dot product, bilinear form and inner product: https://en.wikipedia.org/wiki/Matrix_multiplication#Dot_prod...

> The dot product of two column vectors is the matrix product

Tensor > Geometric objects https://en.wikipedia.org/wiki/Tensor :

> The transformation law for a tensor behaves as a functor on the category of admissible coordinate systems, under general linear transformations (or, other transformations within some class, such as local diffeomorphisms.) This makes a tensor a special case of a geometrical object, in the technical sense that it is a function of the coordinate system transforming functorially under coordinate changes.[24] Examples of objects obeying more general kinds of transformation laws are jets and, more generally still, natural bundles.[25][26]


Mathematicians are comfortable with the idea that a matrix can be over any set, in other words matrix is a mapping from the cartesian product of two intervals [0,N_i) to a given set S. Of course to define matrix multiplication you need at least a ring, but a matrix doesn’t have to define such an operation, and there are many alternative matrix products, for example you can define the Kronecker product with just a matrix over a group. Or no product at all for example a matrix over the set {Red,Yellow,Blue}.

Tensors require some algebraic structure, usually a vector space.


I was confused by this when I first got to ML.


I am a bit confused


It's not clear to me what you're annoyed about exactly. The way I see it, there are a few options:

You're getting annoyed that people are confusing the map with the territory [1]. Multidimensional arrays with certain properties can be used to represent tensors, but aren't tensors. In the same way a diagram of torus isn't a topological space, or a multiplication table isn't a group, or a matrix is not a linear map. Isomorphic but not literally the thing.

Or you're annoyed that people forget an array representing a tensor needs to satisfy some transformation law and can't just be any big array with some numbers in it.

Or maybe you're a fan of basis-free linear algebra!

Which one is it?

1: https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation


I mean, if I wanted to refer to the reals exclusively as a vectorspace I wouldn't be wrong, but if you aren't actually using what makes it a vectorspace why would you choose to call it that? Hell, call 7 a tensor. There's more than a map-territory distinction (I'd argue formal mathematics is perhaps the only realm where the two are one and the same, but I see what you're saying), it's a convention of language more generally. You typically use the most necessary term, rather than a random also accurate label. If you don't care about invariance under coordinate transformations (and most machine learning does not), why would you call it a tensor?


I personally am a fan of basis free linear algebra.

More importantly though, “tensors” as commonly used in machine learning seem to rely on a single special basis, so they really are just multidimensional arrays. A machine learning algorithm isn’t really invariant under a change of basis. For example, the ReLU activation function is not independent of a change of basis.


What is the difference?


Tensors have additional properties that arrays don't necessarily have. For example, the coordinate system transform rule that the author describes in the beginning of the post.

One of my old physics professors taught us to think of tensors as "arrays with units." If it's a vector/matrix/higher dimensional array but has physical units, it's probably a tensor. The fact that it has units means it represents something physical which must obey additional constraints (like the coordinate system transformation rule).


This is a pretty tired line, I gotta say.

Obviously they are not. One is a linear operator, the other is a data structure for implementing computations using that operator. This description extends to all tensors.

It's like saying "queues are not just lists". That is true and also neither insightful nor helpful.

I don't see it as mystifying or complicated, what am I missing?


The reason we carve out a category of tensor, even when you could in principle just define ad-hoc functions on vectors and call it a day, is that we notice the commonalities between a number of objects which are invariant with respect to coordinate changes. Machine learning generally does not use this invariance at all, and has arrays which happen to be tensors for largely unrelated reasons. Calling them tensor makes more sense than calling, say, real numbers tensors, but less sense than calling reals reals.


Sure but if you’re working with lists you shouldn’t call them queues. Similarly if you’re working with mere multidimensional arrays don’t call them tensors.


Since I also thought Tensors were just higher dimension arrays, isn't this really what ML folks think Tensors are, since they (we?) do attach units to the Tensors most of the time?


The physicist's approach is a bit non-conceptual. From a mathematical point of view, a tensor is essentially an arbitrary multi-linear map. Think of the dot product, the determinant of a matrix (which is linear on each column but is not linear on a matrix), the exterior product in exterior algebra (or geometric algebra), a linear map itself (which is obviously a special case of a multilinear map), etc.

The coordinate change stuff that physicists talk about stems from observing that a matrix can be used to represent some tensors, but the rule for changing basis changes along with the kind of tensor. So if M is a matrix which represents a linear map and P is a matrix whose columns are basis vectors, then PMP^{-1} is the same linear map as M but in basis P; if on the other hand the matrix M represents a bilinear form as opposed to a linear map, then the basis change formula is actually PMP^T, where we use the matrix transpose. Sylvester's Law Of Inertia is then a non-trivial observation about matrix representations of bilinear forms.

Physicists conflate a tensor with its representation in some coordinate system. Then they show how changing the coordinate system changes the coordinates. This point of view does provide some concrete intuition, though, so it's not all bad. By a coordinate system, I mean a linear basis.

Hope that helps.


Not all physicists make that conflation - Wald’s book in General Relativity emphasizes tensors as abstract concepts, and then details how they can be expressed in terms of a basis.

The OP also emphasizes the abstract interpretation as providing more intuition than the coordinate transformation rule.


Physicists conflate a tensor with its representation in some coordinate system

Rather, mathematicians that complain about the physicist's approach just haven't advanced far enough in their studies to understand how vector bundles are associated to the frame bundle ;)


There are really just two things called tensors.

Physicists' tensors = generalization of arrays with units; have to transform according to certain coordinate laws.

Mathematicians' tensors = generalization of arrays, transformation rules don't matter.


> Mathematicians' tensors = generalization of arrays, transformation rules don't matter.

That's definitely inaccurate, at least it doesn't match what I think of as tensors in mathematics.

In mathematics, tensors are the most general result of a bilinear opteration. This does imply that they transform according to certain laws: if you represent the tensor using some particular basis, that basis can be expressed in the original vector spaces you multiplied, and choosing a different basis for your vector spaces results in a different basis for your tensors.

By "most general bilinear operation" I am talking about what is expressed in category theory as a universal property... with a morphism that preserves bilinear maps.

Tensors can be over multiple vector spaces or a single vector space (in which case it's typically implied that it's over the vector space and its dual). When you use a vector space and its dual, I believe you get the kind of tensor that physicists deal with, and all of the same properties. Note that while vector spaces and their dual may seem to be equivalent at first glance (and they are isomorphic in finite-dimensional cases), both mathematicians and physicists must know that they have different structure and transform differently.

Something that will throw you off is that mathematicians often like to use category theory and "point free" reasoning where you talk about vector spaces and tensor products in terms of things like objects and morphisms, and often avoid talking about actual vectors and tensors. Physicists talk about tensors using much more concrete terms and specify coordinate systems for them. It can require some insight in order to figure out that mathematicians and physicists are actually talking about the same thing, and figure out how to translate what a physicist says about a tensor to what a mathematician says.


But they are not totally different! The Physicists' tensors are actually Mathematicians' tensors, but parametrized by some parameters (coordinates). Then you have special laws what happens if you make a (possibly non-linear) change of the coordinates. See https://en.wikipedia.org/wiki/Tensor#Tensor_fields


Something like the difference between a “vector” of length N which is just a collection of N numbers and one that is a representation of a N-dimensional geometric algebra object.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: