Pointer-to-array overlapping end of array

Question

Is this code correct?

int arr[2];

int (*ptr)[2] = (int (*)[2]) &arr[1];

ptr[0][0] = 0;

Obviously ptr[0][1] would be invalid by accessing out of bounds of arr.

Note: There's no doubt that ptr[0][0] designates the same memory location as arr[1]; the question is whether we are allowed to access that memory location via ptr. Here are some more examples of when an expression does designate the same memory location but it is not permitted to access the memory location that way.

Note 2: Also consider **ptr = 0; . As pointed out by Marc van Leeuwen, ptr[0] is equivalent to *(ptr + 0), however ptr + 0 seems to fall foul of the pointer arithmetic section. But by using *ptr instead, that is avoided.

Note: I have tagged both C and C++ however each standard is a bit different in this area, so please indicate which language your answer applies to :) — Matt McNabb, 15 hours ago
No, it is not correct in general to treat an object as though it was an object of some other type. Clearly arr[1] is an int and not an int[2], nor does it form the initial part of such an object. — Kerrek SB, 15 hours ago
@KerrekSB it's OK to write, for example, long x; int *i = (int *)&x; so long as we do not actually read or write through i . (C11 6.3.2.3/7) I suppose a related question would be: is it OK to write int i; long *x = (long *)&i; if this isn't an alignment violation on the system, and we do not do any arithmetic or dereferencing on x after this? — Matt McNabb, 15 hours ago
Related question: is this line of code correct: int (*ptr)[2] = (int (*)[2])&arr[0]; or simply int (*ptr)[2] = (int (*)[2])arr; — juhist, 15 hours ago
Your intention might be clearer if you add a typedef or too to get around C's horrid syntax. I think what you're asking is equivalent to the question of whether, given typedef char charquad[4]; an expression like charquad *t = (charquad*)malloc(2); would be legitimate; I would think it would be equivalent to typedef struct { char a,b,c,d;} charquad2; charquad2 *t2 = (charquad2*)malloc(2);, which would I think be invalid if code ever accessed t2->c or t2->d, but valid if it did not. — supercat, 13 hours ago

user4709452 · Answer 1 · 2015-03-24 22:46:33Z

up vote 3 down vote

Yes, this is correct code. Quoting N4140 for C++14:

[expr.sub]/1 ... The expression E1[E2] is identical (by definition) to *((E1)+(E2))

[expr.add]/5 ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

There is no overflow here. &*(*(ptr)) == &ptr[0][0] == &arr[1].

For C11 (N1570) the rules are the same. §6.5.2.1 and §6.5.6

answered 14 hours ago

user4709452
811

If you think that "the pointer operand and the result point to elements of the same array object, or one past the last element of the array object", then which is that array object? Note that the cited phrase is weird, but it does not say "unless evaluation produces an overflow, the behavior is defined". – Marc van Leeuwen 3 hours ago

@MarcvanLeeuwen I'm wary of applying the quote in 5.7 too broadly. For example , for some large X, char *ptr = (char *)&X; ptr = ptr + 2;. In this case neither ptr nor ptr + 2 point to elements of an array object, yet this code should be OK. It seems to me that that quote should be taken as the spirit of the law, not the letter of the law. – Matt McNabb 2 hours ago

@MattMcNabb: if the Standard gives only the spirit of the law, then where shall one find the letter of the law? Also, your example is unconvincing since every object is has an underlying representation as an array of char values, which can serve as the "array object" for the quote. But for pointers to other types than char that is of no help. – Marc van Leeuwen 2 hours ago

@MarcvanLeeuwen c&v of "every object is has an underlying representation as an array of char" ? – Matt McNabb 2 hours ago

@MattMcNabb I guess this is said in 1.8[intro.object] "An object is a region of storage", together with 1.7[intro.memory] "The fundamental storage unit is the C++ memory model is the byte". Together with some text that specifies that bytes can be reliably manipulated as charvalues, which I am usre is said somewhere. As always, there remains much to be desired for clarity. But I read this as that objects are represented by a contiguous sequence (region) of bytes that provide the storage for the object. – Marc van Leeuwen 2 hours ago

| show 1 more comment

qeadz · Answer 2 · 2015-03-25 00:34:21Z

Not an answer but a comment that I can't seem to word well without being a wall of text:

Given arrays are guaranteed to store their contents contiguously so that they can be 'iterated over' using a pointer. If I can take a pointer to the begin of an array and successively increment that pointer until I have accessed every element of the array then surely that makes a statement that the array can be accessed as a series of whatever type it is composed of.

Surely the combination of: 1) Array[x] stores its first element at address 'array' 2) Successive increments of the a pointer to it are sufficient to access the next item 3) Array[x-1] obeys the same rules

Then it should be legal to at least look at the address 'array' as if it were type array[x-1] instead of type array[x].

Furthermore given the points about being contiguous and how pointers to elements in the array have to behave, surely it must be legal to then group any contiguous subset of array[x] as array[y] where y < x and it's upper bound does not exceed the extent of array[x].

Not being a language-lawyer this is just me spouting some rubbish. I am very interested in the outcome of this discussion though.

EDIT:

On further consideration of the original code, it seems to me that arrays are themselves very much a special case in many regards. They decay to a pointer, and I believe can be aliased as per what I just said earlier in this post.

So without any standardese to back up my humble opinion, an array can't really be invalid or 'undefined' as a whole if it doesn't really get treated as a whole uniformly.

What does get treated uniformly are the individual elements. So I think it only makes sense to talk about whether accessing a specific element is valid or defined.

In the case of this example, if I understood it right, y does exceed the extent because it is the second and third elements of a two-element array. — Samuel Edwin Ward, 14 hours ago
I agree with all your thoughts so long as the group chosen is fully within the bounds of the array (e.g. int b[3]; int (*p)[2] = (int(*)[2])&b[1];) however what I'm unsure about is whether it is a problem that original example "looks" out of bounds. — Matt McNabb, 14 hours ago
@MattMcNabb ok true - I ignored the bit which extends out of bounds because it wasn't accessed and then promptly forgot about it when I was considering Kerrek's comment to which this was going to be a reply before it got elevated to an answer for character count reasons. — qeadz, 13 hours ago

Otomo · Answer 3 · 2015-03-24 22:24:49Z

up vote 2 down vote

It depends on what you mean by "correct". You are doing a cast on the ptr to arr[1]. In C++ this will probably be a reinterpret_cast. C and C++ are languages which (most of the time) assume that the programmer knows what he is doing. That this code is buggy has nothing to do with the fact that it is valid C/C++ code.

You are not violating any rules in the standards (as far as I can see).

answered 14 hours ago

Otomo
3216

1

My opinion also. If it segfaults, that's an OS issue, and aren't a lot of security hacks basically using pointers to access memory beyond declared array bounds? – jamesqf 8 hours ago

add a comment |

Anton Savin · Answer 4 · 2015-03-24 22:32:49Z

up vote 2 down vote

For C++ (I'm using draft N4296) [dcl.array]/7 says in particular that if the result of subscripting is an array, it's immediately converted to pointer. That is, in ptr[0][0] ptr[0] is first converted to int* and only then second [0] is applied to it. So it's perfectly valid code.

For C (C11 draft N1570) 6.5.2.1/3 states the same.

answered 14 hours ago

Anton Savin
11.7k31435

I don't see 6.5.2.1/3 stating the same. In my opinion, it applies to multidimensional arrays, not pointers to arrays. – juhist 14 hours ago

For C++ the OP's code looks like an aliasing violation ([basic.lval]). – Kerrek SB 14 hours ago

1

@KerrekSB I don't think so, because again, no access is made through int[2] object – Anton Savin 14 hours ago

add a comment |

Marc van Leeuwen · Answer 5 · 2015-03-25 09:35:22Z

Let me give a dissenting opinion: this is (at least in C++) undefined behaviour, for much the same reason as in the other question that this question linked to.

First let me clarify the example with some typedefs that will simplify the discussion.

typedef int two_ints[2];
typedef int* int_ptr;
typedef two_ints* two_ints_ptr;

two_ints arr;

two_ints_ptr ptr = (two_ints_ptr) &arr[1];

int_ptr temp = ptr[0]; // the two_ints value ptr[0] gets converted to int_ptr
temp[0] = 0;

So the question is whether, although there is no object of type two_ints whose address coincides with that of arr[1] (in the same sense that the adress of arr coincides with that of arr[0]), and therefore no object to which ptr[0] could possibly point to, one can nonetheless convert the value of that expression to one of type int_ptr (here given the name temp) that does point to an object (namely the integer object also called arr[1]).

The point where I think behaviour is undefined is in the evaluation of ptr[0], which is equivalent (per 5.2.1[expr.sub]) to *(ptr+0); more precisely the evaluation of ptr+0 has undefined behaviour.

I'll cite my copy of the C++ which is not official [N3337], but probably the language has not changed; what bothers me slightly is that the section number does not at all match the one mentioned at the accepted answer of the linked question. Anyway, for me it is §5.7[expr.add]

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce overflow; otherwise the behavior is undefined.

Since the pointer operand ptr has type pointer to two_ints, the "array object" mentioned in the cited text would have to be an array of two_ints objects. However there is only one such object here, the fictive array whose unique element is arr that we are supposed to conjure up in such situations (as per: "pointer to nonarray object behaves the same as a pointer to the first element of an array of length one..."), but clearly ptr does not point to its unique element arr. So even though ptr and ptr+0 are no doubt equal values, neither of them point to elements of any array object at all (not even a fictive one), nor one past the end of such an array object, and the condition of the cited phrase is not met. The consequence is (not that overflow is produced, but) that behavior is undefined.

So behavior is already undefined before the indirection operator * is applied. I would not argue for undefined behavior from the latter evaluation, even though the phrase "the result is an lvalue referring to the object or function to which the expression points" is hard to interpret for expressions that do not refer to any object at all. But I would be lenient in interpreting this, since I think dereferencing a pointer past an array should not itself be undefined behavior (for instance if used to initialise a reference).

This would suggest that if instead of ptr[0][0] one wrote (*ptr)[0] or **ptr, then behaviour would not be undefined. This is curious, but it would not be the first time the C++ standard surprises me.

OK. *p is in fact different to *(p+0) as you say; but then (also as you point out) **ptr would avoid that objection based on pointer arithmetic. ISTR that in C99 there was a clause that an lvalue must designate an object when it is evaluated (which would rule out *ptr) however that was changed for C11 because it also ruled out a bunch of what was meant to be legal behaviour. — Matt McNabb, 3 hours ago

juhist · Answer 6 · 2015-03-25 08:59:17Z

Trying to answer here why the code works on commonly used compilers:

int arr[2];

int (*ptr)[2] = (int (*)[2]) &arr[1];

printf("%p\n", (void*)ptr);
printf("%p\n", (void*)*ptr);
printf("%p\n", (void*)ptr[0]);

All lines print the same address on commonly used compilers. So, ptr is an object for which *ptr represents the same memory location as ptr on commonly used compilers and therefore ptr[0] is really a pointer to arr[1] and therefore arr[0][0] is arr[1]. So, the code assigns a value to arr[1].

Now, let's suppose a perverse implementation where a pointer to an array (NOTE: I'm saying pointer to an array, i.e. &arr which has the type int(*)[], not arr which means the same as &arr[0] and has the type int*) is the pointer to the second byte within the array. Then dereferencing ptr is the same as subtracting 1 from ptr using char* arithmetic. For structs and unions, it is guaranteed that pointer to such types is the same as pointer to the first element of such types, but in casting pointer to array into pointer no such guarantee was found for arrays (i.e. that pointer to an array would be the same as pointer to the first element of the array) and as a matter of fact @FUZxxl planned to file a defect report about the standard. For such a perverse implementation, *ptr i.e. ptr[0] would not be the same as &arr[1]. On RISC processors, it would as a matter of fact cause problems due to data alignment.

Some additional fun:

int arr[2] = {0, 0};
int *ptr = (int*)&arr;
ptr[0] = 5;
printf("%d\n", arr[0]);

Should that code work? It prints 5.

Even more fun:

int arr[2] = {0, 0};
int (*ptr)[3] = (int(*)[3])&arr;
ptr[0][0] = 6;
printf("%d\n", arr[0]);

Should this work? It prints 6.

This should obviously work:

int arr[2] = {0, 0};
int (*ptr)[2] = &arr;
ptr[0][0] = 7;
printf("%d\n", arr[0]);

There's no doubt that ptr[0][0] designates the same memory location as arr[1]; the question is whether we are allowed to access that memory location via ptr. Here are some more examples of when an expression does designate the same memory location but it is not permitted to access the memory location that way. — Matt McNabb, 15 hours ago
OK. The 5 example is also clearly correct (although I have seen someone argue that ptr[1] = 5; would be incorrect) — Matt McNabb, 14 hours ago
The 6 example seems to be essentially the same as my question. Of course, compiler output is no guarantee of correctness. — Matt McNabb, 14 hours ago
Well, to me it isn't apparent that the 5 example is correct. But perhaps I should consider posting another question at Stack Overflow then. — juhist, 14 hours ago

asked	today
viewed	390 times
active	today

current community

your communities

more stack exchange communities

Pointer-to-array overlapping end of array

6 Answers 6

Your Answer

Not the answer you're looking for? Browse other questions tagged c++ c arrays language-lawyer or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Pointer-to-array overlapping end of array

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged c++ c arrays language-lawyer or ask your own question.

Linked

Related

Hot Network Questions