Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added wait for ACK after a I2C transaction #250

Merged
merged 3 commits into from
Aug 30, 2023

Conversation

boramonideep
Copy link
Contributor

By creating this pull request you agree to the terms in CONTRIBUTING.md.
https://github.com/Infineon/.github/blob/master/CONTRIBUTING.md
--- DO NOT DELETE ANYTHING ABOVE THIS LINE ---

CONTRIBUTING.md also tells you what to expect in the PR process.

Description
Added wait in I2C transaction

Context
Added a wait state until ACK is received after sending a START or REP_START. This was absent before so it leads to a intermittent erroneous reading from I2C sensors.

@@ -245,6 +245,8 @@ uint8_t TwoWire::requestFrom(uint8_t address, uint8_t quantity, uint32_t iaddres
XMC_I2C_CH_MasterStart(XMC_I2C_config->channel, (txAddress << 1), XMC_I2C_CH_CMD_READ);
}

while((XMC_I2C_CH_GetStatusFlag(XMC_I2C_config->channel) & XMC_I2C_CH_STATUS_FLAG_ACK_RECEIVED) == 0U);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm... I have no context info about this issue, but adding a blocking assert there entails that we are very sure that ack is received.
In a i2c communication, how can you ensure that the other end is sending an ack?
if it is doesn´t send and ack, I would expect there should be a way to handle that error without the system getting stuck forever.
Such blocking assertions are present in any other Wire function or built-in library?

@techpaul
Copy link
Contributor

I have so MANY concerns over this fix that it BREAKS the I2C specification and probably shows either a hardware design flaw in the USIC for IIC or conceptual problem or issue with sensors or sensors being driven incorrectly.

The FIRST ACK/NACK after a Start or Repeated Start, is VERY important as it tells you if the device is there or NOT and to abort. If someone is trying to emaulate the Linux utility i2cdetect to scan all I2C addresses as they have multiple devices, this will FAIL at first address of no device, LOCKING the system.

I have driven many I2C devices over the years as bit-banging software, master controllers under software control even dedicated FPGA designs that emulate I2C devices, currently fixing a 600MHz processor from a Slave device using bit banging software to use interrupts and work reliably. Many systems I deal with have multiple I2C busses, each with multiple peripherals and these days many instances of level translators as well.

This sounds the WRONG way to fix an issue. Personally I would like to know which sensors (probably SLOW) are being driven, at what speed and many other factors like wire length and pull up resistor values (often people use 4k7 when length of cable dictates a 2k). Along with scope shots of the issues.

If the ACK is taking a long time to appear the USIC is PROBABLY NOT handling I2C Wait states (clock stretching correctly, and the clock for the shift register is NOT as it should be driven from the pin but from internal clock source ONLY.

These may be sub 100 kHz devices being driven at 400 kHz.

What happens to folks who have MULTIPLE I2C devices.

Current device I am dealing with has a master controller that has a hardware bug that means it writes address and data at 80-90 kHz clock rate, but does READ bytes at DOUBLE clock rate see scope shot
TEK00001

@boramonideep
Copy link
Contributor Author

Taking into account all the concerns, a timeout provision is implemented which will quit the loop incase a valid ACK is not received within a certain time interval.

Copy link
Member

@jaenrig-ifx jaenrig-ifx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the absence of formal testing 👍

@techpaul
Copy link
Contributor

In the absence of formal testing 👍

I personally would like to know what sensors under what configuration and scopeshots of timing.

@techpaul
Copy link
Contributor

Yes the timeout is ARBITRARY and probably uses a value based on 12MHz AVR on the UNO, so wil have different timings due to CPU speed of XMC1 series at 32 MHz and XMC4700 at 144 MHz.

The whole function/method requestFrom( ) has lots of assumptions and MAGIC NUMBERS, that assumes the function will only be used on SMALL EEPROM or RAM devices. Most notable is isize parameter checks that CORRUPT data passed in by assuming only 3 bytes can be sent. There is NO specification for that, and the same process could be used for an ADC subsystem and be longer than 3 bytes. Instead of corrupting data onto I2C bus it should error with parameter error BEFORE sending anything on the I2C bus. The limit is by Magic Number not a constant in xmc_i2c_conf.h, with better notes and different values for WIRE_COMMUNICATION_TIMEOUT based on CPU Freq.

A better construct would be a while loop with tests for flags inside the loop.

Non-understanding of I2C and available flags is what worries me here. Without knowing WHICH devices this fails on I suspect the issue is NOT a Wire library issue but trying too quickly to do this function when this device is internally busy doing something else.. An example of this would be an EEPROM, which on many devices must NOT be accessed in any way whilst it is busy doing a byte/page erase function and may well do a long wait state/Clock stretch if accessed, rather than say it is not on the bus, or give garbled data or cause even internal address counters to be overwritten whilst using them for erase operations. This is a higher level operation issues NOT Wire library.

Your first description said the first byte after Start or Repeated Start, but your implementation is for the first byte of an INTERNAL address which is the second byte of the transmission. This is because the Start or repeated start methods send the start condition and I2C address byte.,

This timeout method is repeated not called in many places and I see random checking of ACK/NACK flags or possible error codes of got NACK when expected ACK, even on I2C address phase.

I fail to see when READING multiple bytes from a slave, that you need to check for RECEIVE ACK or NACK on DATA phase as this is sent by the I2C master the processor. You have RBIF and other flags for receive buffer full.

Looking at some of this code I am amazed some of it works

I will ask AGAIN what DEVICES are you having problems with to show it is a Wire library problem nd not higher operational errors?

@techpaul
Copy link
Contributor

As a comparison the problems in Wire and then seen same in Serail and SPI with pin setting not following the specfication causing problems was highlighted and tested better in issue #234

@techpaul
Copy link
Contributor

Explanation on worries with mod using timeout wrong and incorrect flags..

Most of the Wire library assumes everything always works without wait states
or mising devices or wrong address passed in. Most tests appear to have been
done on only ONE device at a time and only ONE device on the bus.

There are no COMMON error codes for

  • NACK when expecting ACK
  • Timeout
  • STUCK Bus
  • Invalid parameters

Many checks are done in wrong order so device or bus can get stuck easily.
Very few checks on NACK when addressing devices. Parameters should be checked BEFORE doing ny bus actions.

There is NO indication in data sheets, manuals for USIC on how a master handles RECEIVING Wait states (wrongly called Byte Stretching) and fanciful wish statements of slave devices sending error interrupts that do NOT exist on 99.99% of I2C Slave devices.

A simplified check for ACK checks for ACK OR NACK bits to be set as this means one of those states has actived after the SCL line has been released to go from Low to High on the 9th clock cycle. You need to check for both and if NACK received when expecting ACK error and reset bus (often STOP).

A timeout condition means SCL has not transitioned to HIGH level, for many reasons.

IF SCL STILL "STUCK" LOW THE ONLY WAY TO GET THE BUS UNSTUCK IS A RESET OR
POWER CYCLE OF THE SLAVE DEVICE. BEFORE ANY further I2C transactions can occur. You CANNOT SEND OR FORCE A STOP symbol/condition if SCL held LOW.

If SCL is STUCK LOW NOTHING else can happen, and this happens OUTSIDE of the micro, no amount bit twiddling will change that. Checking higher levels are not controlling the device BADLY can stop these situations occuring.

When master writing to slave data bytes (they may be internal address or other control) receiving a NACK from SLAVE when address was ACKed then means

  • Device being driven wrong
  • Device has hardware fault
  • Bus fault
  • Fault could be a different slave

When Master RECEIVES bytes from Slave the hardware should be automatically ACKing the data bytes in hardware as the controller should have been set to ACK data with exception of Master read from slave on LAST Byte of transaction CAN be a NACK but rareley implemented in Slave devices, as no action other than not incrementing an internal counter has any meaning. for last byte. Often Master always send ACK when READING as easier to the send STOP symbol/condition next.

A better construct that should be in a COMMON function to check for ACK is

// Check Receive NACK/ACK on 
//	Master write bytes (including address) and 
// 	Slave Mode READ bytes
//
// Return 1 = ACK
//        0 = NACK
//        -1 = BUS ERROR / timeout
//	 -2 = errored transmission bus now idle

do
  {
  StatusFlag = XMC_I2C_CH_GetStatusFlag(XMC_I2C_config->channel);
  if( StatusFlag & ( XMC_I2C_CH_STATUS_FLAG_ACK_RECEIVED
			 + XMC_I2C_CH_STATUS_FLAG_ACK_RECEIVED ) )
    break;
  ]
while( timeout-- );
if( timeout == 0 )
  // check and process BUS STUCK error
  // if bus recovered
  StatusFlag = -2;
  // else bus still stuck
  StatusFlag = -1;
else
  StatusFlag &= XMC_I2C_CH_STATUS_FLAG_ACK_RECEIVED;	// ACK/NACK
return StatusFlag;

Note better handling of timeout would be unsigned long calculations based on microseconds() values and a timeout value in microseconds in Wire.h allowing for different CPU Freq. A nominal value of 5 to 10 ms would be a good starting point.

@boramonideep
Copy link
Contributor Author

Thank you for your concerns @techpaul and the probable flaws and pitfalls have been noted.

However, the changes to the BSP are part of an internal process and will be executed incrementally in the future.

The issue was diagnosed as a part of the development process of Arduino libraries for Infineon’s XENSIV 3D magnetic sensors families, shown here: https://www.infineon.com/cms/en/product/sensor/magnetic-sensors/magnetic-position-sensors/3d-magnetics/

As the XMC devices run on different frequencies, the execution of timeout would be variable. Hence, a time-based delay logic is added to make the execution fairer for all the range of frequencies.

The checks currently in the Wire library are done based on the documentation given here in Chapter 14.5.5.2. wherein all the relevant flags are enlisted as bitfields.

The test framework is being improved to provide robust testing with multiple I2C devices.

@boramonideep boramonideep merged commit 901967a into Infineon:develop Aug 30, 2023
7 checks passed
@boramonideep boramonideep deleted the wire_fix branch August 30, 2023 08:28
@techpaul
Copy link
Contributor

techpaul commented Sep 2, 2023

Oh well marketing rarely listens and only repeats scripts even if they do not undersatnd them, or it totally contradicts any eveidence presented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants